Skip to content

Protocol

This is the internals reference for XChat, reverse-engineered from the chat.x.com client. You do not need any of it to use client.xchat (start at Setup & identity); it is here if you want to understand how XChat works or build something lower-level. Everything below is descriptive, not API. Use it mostly for feeding it to your clankers.

Architecture

XChat is X's end-to-end encrypted chat, served from chat.x.com, a separate system from the legacy /1.1/dm API. The web client is Kotlin Multiplatform compiled to JavaScript; local state is an encrypted SQLite database over OPFS; key backup is a wasm-bindgen Juicebox SDK; calls are WebRTC against a Janus SFU.

SurfaceEndpoint
Realtimewss://chat-ws.x.com/ws?token=<JWT>
GraphQLhttps://api.x.com/graphql/<id>/<OperationName>
History (gRPC-web)https://api.x.com/xai.chat_service.v1.ChatService/GetMessageEventsPage
Encrypted mediahttps://ton.x.com/i/ton/data/xchat_media/
Media upload (public)https://upload.x.com/1.1/media/upload.json
Key-backup realmsrealm-b.x.com, realm-east1.x.com, realm-west1.x.com
Group invite linkhttps://x.com/i/chat/group_join/<token>

Wire format

Every binary payload (WebSocket frames and the base64 blobs in GraphQL such as encoded_message_create_event) is Apache Thrift TBinaryProtocol, schema com.x.dmv2.thriftjava.*. It is not protobuf, although the field-numbered tables below look similar. Encoding it as protobuf yields a StratoThriftDeserializationException.

TBinaryProtocol, by hand:

  • Struct: a sequence of fields, terminated by a 0x00 stop byte.
  • Field header: [type (1 byte)] [field id (2 bytes, big-endian)], then the value.
  • Types: 2 bool, 8 i32, 10 i64, 11 string/binary, 12 struct, 15 list.
  • string / binary: [length (4-byte big-endian)] [bytes]. Booleans inline as one byte; i32/i64 are fixed-width big-endian.
  • list: [element type (1 byte)] [count (4-byte big-endian)] [elements].

Transport

WebSocket

wss://chat-ws.x.com/ws?token=<JWT>. Authentication is the ?token= query parameter only: no subprotocol, headers, or cookies on the browser socket. The token is a short-lived JWT from GenerateXChatTokenMutation (client.xchat.token()), cached and re-validated by its exp claim. binaryType is arraybuffer; there is no application-level hello frame. The frame boundary is the message boundary (no opcode or length prefix). Heartbeat: an empty KeepAliveInstruction every 30s. Reconnect: exponential 2^n backoff, give up after 11 attempts; close 1011 (unauthorized) is treated as a 401 and refetches the token.

GraphQL

POST https://api.x.com/graphql/<operationId>/<OperationName> with Bearer + ct0 cookie auth (the same session as the rest of emusks). The body is { operationName, variables, query, queryId } (the client sends the full query document, so a persisted-query hash isn't required). Every field takes a safety_level: argument (commonly XChat). A bare POST /graphql 404s; the /<id>/<name> path is required.

History backfill

POST .../ChatService/GetMessageEventsPage over application/grpc-web+proto with Authorization: Bearer <jwt>, for paged history beyond what the socket streams. The same data is reachable via the GetMessageEventsPageQuery GraphQL op.

Keys

Every device holds two P-256 keypairs (all asymmetric crypto is P-256 over WebCrypto; there is no Ed25519):

KeyUsagePurpose
IdentityECDH (deriveBits)wrapping/unwrapping conversation keys
SigningECDSA (sign/verify)authenticating message events and admin actions

Both are versioned together (NumericString version). Public keys are exportKey("spki") base64 (91 bytes, starting MFkwEwYHKoZIzj0CAQYI). ECDSA signatures are raw r || s (64 bytes).

Registration

AddXChatPublicKeyMutation publishes XChatPublicKeyInput { public_key, signing_public_key, identity_public_key_signature, registration_method } with generate_version: true (the server assigns the version). identity_public_key_signature = ECDSA-P256-SHA256 over the SPKI bytes of the identity key, signed with the signing key. registration_method is CustomPin, ManagedPin, or SelfCustody.

PIN backup (Juicebox)

The private keys are backed up to PIN-protected realms using Juicebox: register/recover the secret hardened by a numeric PIN (Argon2/OPRF, Standard2019), split across realms with a 20-guess limit. Thresholds: register to all 3, recover with any 2. The realm crypto lives in juicebox-sdk_bg-*.wasm. The secret is the two P-256 private scalars (identityScalar(32) || signingScalar(32)). Realm config + per-realm bearer tokens come from the KeyStoreTokenMap (the token_map returned by AddXChatPublicKey/GetPublicKeys); the wasm requests each realm's token through a JuiceboxGetAuthToken(realmId) callback, where realmId is the realm id bytes (hex-encoded in the token map).

emusks' createIdentity({ pin }) performs this Juicebox register by default (using a bundled build of the SDK wasm), exactly like the app; { selfCustody: true } skips the backup.

Encryption

Conversation key (cKey)

Each conversation has one symmetric key, the cKey, identified by a conversation_key_version. Message bodies are encrypted with it; the cKey itself is wrapped to each member.

Wrapping the cKey (P-256 ECIES)

Per recipient, the wrapped key is built as:

  1. Generate an ephemeral P-256 ECDH keypair.
  2. Z = deriveBits(ephemeralPriv, recipientIdentityPub, 256) (32 bytes).
  3. out = SHA-256(Z || 0x00000001 || ephemeralRawPub65) (ConcatKDF, single block).
  4. aesKey = out[0:16] (AES-128), iv = out[16:32].
  5. ct = AES-GCM(aesKey, iv, cKey) (no AAD, 128-bit tag).
  6. encrypted_conversation_key = base64(ephemeralRawPub65 || ct) (113 bytes for a 32-byte cKey). The IV is re-derived on unwrap, not transmitted.

Each wrapped copy is sent as ApiConversationParticipantKeyInput { user_id, encrypted_conversation_key, public_key_version }.

Message body cipher

libsodium crypto_secretbox_easy (XSalsa20-Poly1305) under the cKey: frame = nonce(24) || crypto_secretbox_easy(plaintext, nonce, cKey). The plaintext is the serialized MessageEntryHolder.

Media cipher

crypto_secretstream_xchacha20poly1305, chunked at 1024-byte plaintext blocks, keyed by the cKey. The 24-byte stream header is prepended to the ciphertext in TON storage. The key is never sent with the attachment; only media_hash_key travels in the message.

Event signatures

Every message event is ECDSA-P256-SHA256 signed by the sender's signing key over a canonical comma-joined string. The live client uses signature_version = 7:

payload = ["MessageCreateEvent", conversation_token, senderUserId, conversationId, conversation_key_version, base64url(frame)].join(",")

conversation_token is the conversation's server-issued JWT (from MessageEvent field 5; empty only for the very first message of a brand-new conversation). Getting the version, the conversation_token, or the struct shape below wrong makes the server accept the call but silently drop the message (it never reaches the recipient). The signature is wrapped in a Thrift MessageEventSignature struct (with signing_public_key set, and a key-info list of just the sender), base64'd into encoded_message_event_signature:

FieldName
1signature (base64url of raw r‖s)
2public_key_version
3signature_version
4signing_public_key
5message_signing_key_info_list (list of MessageSigningKeyInfo { member_id, public_key_version, signing_public_key })

The GraphQL mirror is XChatMessageEventSignatureInput. Admin/mutation actions (delete-for-everyone, mute, TTL, group ops) require an ActionSignatureInput of the same shape over an action-specific canonical string.

Groups

Group conversations derive the cKey from a TreeKEM/ART ratcheting tree (GroupKeysMgr), not sender keys. Membership changes rotate the root and re-encrypt node secrets to the affected members; the change rides on ConversationKeyChangeEvent { conversation_key_version, conversation_participant_keys, ratchet_tree_change, for_key_rotation }.

Franking & Grok

Each message carries FrankingData { franking_tag, encrypted_nonce, encrypted_media_hashes } so abuse reports verify against a message the server can't read (ReportFrankedMessageMutation). Messages to Grok are plaintext (XChatSendGrokMessagePlaintextMutation), outside the E2E boundary.

Events & messages

WebSocket envelope

Top-level Message union: 1 messageEvent, 2 messageInstruction, 3 batchedMessageEvents.

MessageEvent: 1 sequence_id, 2 message_id, 3 sender_id, 4 conversation_id, 5 conversation_token, 6 created_at_msec, 7 detail, 8 relay_source, 9 message_event_signature, 10 previous_sequence_id, 11 is_trusted.

MessageEventDetail union: 1 messageCreateEvent, 3 conversationKeyChangeEvent, 4 groupChangeEvent, 5 messageFailureEvent, 6 messageTypingEvent, 7 messageDeleteEvent, 8 conversationDeleteEvent, 9 conversationMetadataChangeEvent, 10 grokSearchResponseEvent, 12 markConversationReadEvent, 13 markConversationUnreadEvent, 14 memberAccountDeleteEvent, 15 grokMessageEvent, 16 grokResponseEvent.

MessageInstruction union: 1 pullMessages, 2 keepAlive, 3 pullMessagesFinished, 4 pinReminder, 5 switchToHybridPull, 6 displayTemporaryPasscode, 7 deviceEnrollment.

Sending: MessageCreateEvent

The encrypted body goes in MessageCreateEvent: 100 contents (the nonce‖ciphertext), 101 conversation_key_version, 102 should_notify, 103 ttl_msec, 104 delivered_at_msec, 105 is_pending_public_key, 106 priority, 107 additional_action_list, 108 franking_data, 109 is_message_request. Base64'd into encoded_message_create_event, sent via SendMessageCreateMutation.

Content variants

The plaintext is a MessageEntryHolder { 1: MessageEntryContents }, where MessageEntryContents is a union naming the action:

FieldVariantPayload
1messageMessageContents
2reaction_addMessageReactionAdd { message_sequence_id, emoji, message_attachment_id? }
3reaction_removesame shape as reaction_add
4message_editMessageEdit { message_sequence_id, updated_text, entities }
5mark_conversation_readMarkConversationRead { seen_until_sequence_id, seen_at_millis }
6mark_conversation_unreadMarkConversationUnread { seen_until_sequence_id }
7 / 8pin_conversation / unpin_conversation{ conversation_id }
9screen_capture_detected{ type } (DmScreenCaptureType: Unknown/Screenshot/Recording)
10 / 11 / 16av_call_ended / av_call_missed / av_call_startedcall lifecycle
12draft_messagedraft sync
13accept_message_requestempty
14nickname_message{ user_id (int64), nickname_text }
15set_verified_status

MessageContents: 1 message_text, 2 entities, 3 attachments, 4 replying_to_preview, 6 forwarded_message, 7 sent_from, 8 quick_reply, 9 ctas, 10 additional_fields. RichTextEntity { start_index, end_index, content } where content is an empty type marker (hashtag=1, cashtag=2, mention=3, url=4, email=5, address=6, phoneNumber=7).

Media & attachments

MessageAttachment union: 1 media, 2 post, 3 url, 4 unified_card, 5 money, 6 jetfuel. MediaAttachment: 1 media_hash_key, 2 dimensions, 3 type, 4 duration_millis, 5 filesize_bytes, 6 filename, 7 attachment_id, 8 legacy_media_url_https, 9 legacy_media_preview_url, 10 grok_tag. MediaType: IMAGE=1, GIF=2, VIDEO=3, AUDIO=4, FILE=5, SVG=6.

Upload is InitializeXChatMediaUploadMutation → push the encrypted stream (private: resumable PUT to ton.x.com/.../xchat_media/<media_hash_key>/...?concurrent=true&resumeId=..&partNumber=..; public: the upload.x.com/1.1/media/upload.json command flow) → FinalizeXChatMediaUploadMutation, then attach a MediaAttachment to a message. GIFs are Giphy bytes (GifSearchQuery) downloaded and re-uploaded as MediaType.GIF. Link cards resolve via GetCardPreviewOrJetfuelFromUrlQuery.

Groups

InitializeGroupConversationMutation allocates a conversation id, then CreateGroupConversationMutation creates it with admin_user_ids, member_user_ids, conversation_key_version, conversation_participant_keys, base64_encoded_key_rotation, and action_signatures. Membership changes (AddGroupMembersMutation, RemoveFromGroupMutation, AddAsAdminMutation, …) re-key (carry the same three key fields). Invite links: EnableGroupInviteMutationXChatGroupInviteDetails { invite_url, token, … }; join via GroupInviteDetailsQuery + RequestToJoinGroupMutation.

Calls

Audio/video is real WebRTC over Periscope infrastructure. In the official web client this all lives behind the useAvcallingSetup hook (the useAvcallingSetup-*.js bundle: ProxseeApi, GuestServiceApi, the Janus client, setupPeerConnection, the AvCallE2eePipelines). The high-level API is in Audio & video calls; this is what runs underneath.

Two transports

There are two signaling paths:

  • 1:1 calls use a Periscope P2P mesh. The caller creates a p2p/broadcast, and both sides exchange WebRTC OFFER / ANSWER / CANDIDATE envelopes over the guest-service relay (POST guest-cf.pscp.tv/api/v1/signaling/send plus a cursor-based long-poll signaling/receive). The caller is the impolite offerer; on track setup each side sends a MEDIA_STATUS signal, and the host publishes then offers when it sees one. Candidates are trickled as { id: sdpMid, label: sdpMLineIndex, candidate }.
  • Group calls use a Janus SFU (VideoRoom plugin). proxsee.pscp.tv/api/v2/createBroadcast returns the Janus gateway URL plus a JWT credential. The host creates a session, attaches two videoroom handles (publisher + subscriber), creates the room (audiocodec: opus, videocodec: h264, h264_profile: 42e01f, dummy_publisher: true), joins as publisher, and publishes a JSEP offer; publishBroadcast then ties the Janus publisher/handle/session ids to the broadcast to go live. Participants join via getAudiospaceaudiospace/joinaudiospace/stream/negotiate, then subscribe to publisher feeds. Janus messages are POST {gw}/{session}/{handle} with { janus: "message", body: { room, periscope_user_id, ... }, jsep? }, the JWT in the Authorization header (no Bearer); events arrive via a ?maxev=1 long-poll.

Bootstrap

Both paths start the same way: GraphQL useDirectCallSetupQuery (zCYojd6h_gVXYjFlaAk4bA) returns authenticate_periscope (a JWT) → POST proxsee.pscp.tv/api/v2/loginTwitterToken for a Periscope cookie → authorizeToken { service: "guest" } for the guest-service bearer → turnServers (turn-p2p.pscp.tv:3478/udp, turns:443/tcp) for ICE. This whole bootstrap is verified working with a normal account session.

Gateway architecture & codecs

Two practical findings from reimplementing this headless:

  • The group gateway (gw-prod-*.pscp.tv) is the Twitter Spaces mixing backend, not a plain peer VideoRoom. Each session gets an isolated room view (its own publisher plus a "Dummy publisher" placeholder); participants don't see each other's Janus feeds directly, so peer media is brokered by the Spaces backend rather than exchanged feed-to-feed. The host reliably connects to the SFU and goes live; full cross-participant media is the Spaces layer's job.
  • The app pins H.264 (videocodec: h264, profile 42e01f) and Opus. Headless WebRTC engines like @roamhq/wrtc only do VP8/VP9/AV1, so video interop with the official client needs an H.264-capable engine; between two emusks clients you can create the room as VP8 (videoCodec option on startGroupCall).

Lifecycle in the message channel

Call lifecycle also surfaces in the conversation as content variants (see Content variants): AVCallStarted { is_audio_only, broadcast_id }, AVCallEnded { sent_at_millis, duration_seconds, is_audio_only, broadcast_id }, AVCallMissed { sent_at_millis, is_audio_only }. Incoming calls are also pushed over LivePipeline (/avcall/create/{userId}).

Media encryption

Media frames go through WebRTC encoded-insertable-streams. When enabled it is AES-GCM-256 with a key derived (HKDF) from conversation key material; the OFFER/ANSWER advertise an encryption_info { fingerprint, version } and both sides verify the fingerprint match. Passthrough (no media encryption) is a first-class fallback and is what runs when no encryption_info is exchanged.

Permissions

DmAvPermissionsQuery (client.xchat.callPermissions()) reports { can_dm, error_code } per recipient; settings live in av_settings (UpdateDmSettingsMutation). Note that 1:1 calling is additionally gated server-side by X's rollout.

GraphQL operations

POST https://api.x.com/graphql/<id>/<OperationName>. The id is the x-apollo-operation-id; the full query is sent in the body so it isn't strictly required.

OperationOperation idBackend field
SendMessageCreateMutationTWRPP7gnKwV_R8-tE-Dd3Qxchat_send_create_message_event
SendMessageEventMutationG7WwJGKvTBVb-BXhZNSVMwxchat_send_message_event
DeleteMessageMutation4gsDQKEmYkOtvsSIpHXdQAxchat_delete_messages
AddEncryptedConversationKeysMutation4V1KC8ue2tHHvRuIzeczdgxchat_add_encrypted_conversation_key
AddXChatPublicKeyMutationCQsk6GRuWAVabyXqqEG1sAuser_add_public_key
DeleteXChatPublicKeyMutationW5iiIL1MVw4vomq-zLPHUQuser_delete_public_key
GetPublicKeysGJQbOZALDO5D3Zp2IZhH6wuser_results_by_rest_ids
GenerateXChatTokenMutationQh3fZRjPPtPoHYR_2sCZsAuser_get_x_chat_auth_token
MuteConversationMutation6iDsxSkhGLvdiJpqtAtzTQxchat_mute_conversation
UnmuteConversationMutation_f8wd8RlQCCysv8yMKeiawxchat_unmute_conversation
UpdateConversationTTLMutationGu3kCEwNN2V-Az8NDk30Zgxchat_update_conversation_message_duration
RemoveConversationTTLMutationEqSXvxskUyw99ARuIbhYlgxchat_remove_conversation_message_duration
AcceptMessageRequestMutation4YtAUhUwROL6ejia63Lj6Quser_add_trust
GetInboxPageRequestQuerydVXHY3CBFIw_Gi6eaAum-wget_inbox_page
GetConversationPageQueryIVlXls9JTnbgQ1gxsGAfJAget_conversation_page
GetMessageEventsPageQueryOaSNyAhxUZ9AaW2z9cC26Aget_message_events_page
GetMessageRequestsPageQueryB4ibdNFzMv5MBhhxk3CyKwget_message_requests_page
DmAvPermissionsQuerykfX5AHDKZrivyHwCaz68mQget_av_permissions
GifSearchQueryciUL4BnRPKal2uL1fL2aHwgif_search_slice
InitializeXChatMediaUploadMutationvTsSDEpF4eVYbR-waSl37gxchat_initialize_media_upload
FinalizeXChatMediaUploadMutationP1CLOMdiMe9ii1MdIJbhcQxchat_finalize_media_upload
CreateGroupConversationMutationdKl4aC-sBqQWgRhkQXV2wgxchat_create_group
EnableGroupInviteMutationWlxTMdzK_uh-miHVHXv15gxchat_enable_group_invite
RequestToJoinGroupMutationcV0VjzT5UDJW3cbcvALYOgxchat_request_join_group

The full set of ~67 operations is available through client.xchat.gql(name, variables).

not affiliated with X Corp.