Protocol
This is the internals reference for XChat, reverse-engineered from the chat.x.com client. You do not need any of it to use client.xchat (start at Setup & identity); it is here if you want to understand how XChat works or build something lower-level. Everything below is descriptive, not API. Use it mostly for feeding it to your clankers.
Architecture
XChat is X's end-to-end encrypted chat, served from chat.x.com, a separate system from the legacy /1.1/dm API. The web client is Kotlin Multiplatform compiled to JavaScript; local state is an encrypted SQLite database over OPFS; key backup is a wasm-bindgen Juicebox SDK; calls are WebRTC against a Janus SFU.
| Surface | Endpoint |
|---|---|
| Realtime | wss://chat-ws.x.com/ws?token=<JWT> |
| GraphQL | https://api.x.com/graphql/<id>/<OperationName> |
| History (gRPC-web) | https://api.x.com/xai.chat_service.v1.ChatService/GetMessageEventsPage |
| Encrypted media | https://ton.x.com/i/ton/data/xchat_media/ |
| Media upload (public) | https://upload.x.com/1.1/media/upload.json |
| Key-backup realms | realm-b.x.com, realm-east1.x.com, realm-west1.x.com |
| Group invite link | https://x.com/i/chat/group_join/<token> |
Wire format
Every binary payload (WebSocket frames and the base64 blobs in GraphQL such as encoded_message_create_event) is Apache Thrift TBinaryProtocol, schema com.x.dmv2.thriftjava.*. It is not protobuf, although the field-numbered tables below look similar. Encoding it as protobuf yields a StratoThriftDeserializationException.
TBinaryProtocol, by hand:
- Struct: a sequence of fields, terminated by a
0x00stop byte. - Field header:
[type (1 byte)] [field id (2 bytes, big-endian)], then the value. - Types:
2bool,8i32,10i64,11string/binary,12struct,15list. - string / binary:
[length (4-byte big-endian)] [bytes]. Booleans inline as one byte; i32/i64 are fixed-width big-endian. - list:
[element type (1 byte)] [count (4-byte big-endian)] [elements].
Transport
WebSocket
wss://chat-ws.x.com/ws?token=<JWT>. Authentication is the ?token= query parameter only: no subprotocol, headers, or cookies on the browser socket. The token is a short-lived JWT from GenerateXChatTokenMutation (client.xchat.token()), cached and re-validated by its exp claim. binaryType is arraybuffer; there is no application-level hello frame. The frame boundary is the message boundary (no opcode or length prefix). Heartbeat: an empty KeepAliveInstruction every 30s. Reconnect: exponential 2^n backoff, give up after 11 attempts; close 1011 (unauthorized) is treated as a 401 and refetches the token.
GraphQL
POST https://api.x.com/graphql/<operationId>/<OperationName> with Bearer + ct0 cookie auth (the same session as the rest of emusks). The body is { operationName, variables, query, queryId } (the client sends the full query document, so a persisted-query hash isn't required). Every field takes a safety_level: argument (commonly XChat). A bare POST /graphql 404s; the /<id>/<name> path is required.
History backfill
POST .../ChatService/GetMessageEventsPage over application/grpc-web+proto with Authorization: Bearer <jwt>, for paged history beyond what the socket streams. The same data is reachable via the GetMessageEventsPageQuery GraphQL op.
Keys
Every device holds two P-256 keypairs (all asymmetric crypto is P-256 over WebCrypto; there is no Ed25519):
| Key | Usage | Purpose |
|---|---|---|
| Identity | ECDH (deriveBits) | wrapping/unwrapping conversation keys |
| Signing | ECDSA (sign/verify) | authenticating message events and admin actions |
Both are versioned together (NumericString version). Public keys are exportKey("spki") base64 (91 bytes, starting MFkwEwYHKoZIzj0CAQYI). ECDSA signatures are raw r || s (64 bytes).
Registration
AddXChatPublicKeyMutation publishes XChatPublicKeyInput { public_key, signing_public_key, identity_public_key_signature, registration_method } with generate_version: true (the server assigns the version). identity_public_key_signature = ECDSA-P256-SHA256 over the SPKI bytes of the identity key, signed with the signing key. registration_method is CustomPin, ManagedPin, or SelfCustody.
PIN backup (Juicebox)
The private keys are backed up to PIN-protected realms using Juicebox: register/recover the secret hardened by a numeric PIN (Argon2/OPRF, Standard2019), split across realms with a 20-guess limit. Thresholds: register to all 3, recover with any 2. The realm crypto lives in juicebox-sdk_bg-*.wasm. The secret is the two P-256 private scalars (identityScalar(32) || signingScalar(32)). Realm config + per-realm bearer tokens come from the KeyStoreTokenMap (the token_map returned by AddXChatPublicKey/GetPublicKeys); the wasm requests each realm's token through a JuiceboxGetAuthToken(realmId) callback, where realmId is the realm id bytes (hex-encoded in the token map).
emusks' createIdentity({ pin }) performs this Juicebox register by default (using a bundled build of the SDK wasm), exactly like the app; { selfCustody: true } skips the backup.
Encryption
Conversation key (cKey)
Each conversation has one symmetric key, the cKey, identified by a conversation_key_version. Message bodies are encrypted with it; the cKey itself is wrapped to each member.
Wrapping the cKey (P-256 ECIES)
Per recipient, the wrapped key is built as:
- Generate an ephemeral P-256 ECDH keypair.
Z = deriveBits(ephemeralPriv, recipientIdentityPub, 256)(32 bytes).out = SHA-256(Z || 0x00000001 || ephemeralRawPub65)(ConcatKDF, single block).aesKey = out[0:16](AES-128),iv = out[16:32].ct = AES-GCM(aesKey, iv, cKey)(no AAD, 128-bit tag).encrypted_conversation_key = base64(ephemeralRawPub65 || ct)(113 bytes for a 32-byte cKey). The IV is re-derived on unwrap, not transmitted.
Each wrapped copy is sent as ApiConversationParticipantKeyInput { user_id, encrypted_conversation_key, public_key_version }.
Message body cipher
libsodium crypto_secretbox_easy (XSalsa20-Poly1305) under the cKey: frame = nonce(24) || crypto_secretbox_easy(plaintext, nonce, cKey). The plaintext is the serialized MessageEntryHolder.
Media cipher
crypto_secretstream_xchacha20poly1305, chunked at 1024-byte plaintext blocks, keyed by the cKey. The 24-byte stream header is prepended to the ciphertext in TON storage. The key is never sent with the attachment; only media_hash_key travels in the message.
Event signatures
Every message event is ECDSA-P256-SHA256 signed by the sender's signing key over a canonical comma-joined string. The live client uses signature_version = 7:
payload = ["MessageCreateEvent", conversation_token, senderUserId, conversationId, conversation_key_version, base64url(frame)].join(",")conversation_token is the conversation's server-issued JWT (from MessageEvent field 5; empty only for the very first message of a brand-new conversation). Getting the version, the conversation_token, or the struct shape below wrong makes the server accept the call but silently drop the message (it never reaches the recipient). The signature is wrapped in a Thrift MessageEventSignature struct (with signing_public_key set, and a key-info list of just the sender), base64'd into encoded_message_event_signature:
| Field | Name |
|---|---|
| 1 | signature (base64url of raw r‖s) |
| 2 | public_key_version |
| 3 | signature_version |
| 4 | signing_public_key |
| 5 | message_signing_key_info_list (list of MessageSigningKeyInfo { member_id, public_key_version, signing_public_key }) |
The GraphQL mirror is XChatMessageEventSignatureInput. Admin/mutation actions (delete-for-everyone, mute, TTL, group ops) require an ActionSignatureInput of the same shape over an action-specific canonical string.
Groups
Group conversations derive the cKey from a TreeKEM/ART ratcheting tree (GroupKeysMgr), not sender keys. Membership changes rotate the root and re-encrypt node secrets to the affected members; the change rides on ConversationKeyChangeEvent { conversation_key_version, conversation_participant_keys, ratchet_tree_change, for_key_rotation }.
Franking & Grok
Each message carries FrankingData { franking_tag, encrypted_nonce, encrypted_media_hashes } so abuse reports verify against a message the server can't read (ReportFrankedMessageMutation). Messages to Grok are plaintext (XChatSendGrokMessagePlaintextMutation), outside the E2E boundary.
Events & messages
WebSocket envelope
Top-level Message union: 1 messageEvent, 2 messageInstruction, 3 batchedMessageEvents.
MessageEvent: 1 sequence_id, 2 message_id, 3 sender_id, 4 conversation_id, 5 conversation_token, 6 created_at_msec, 7 detail, 8 relay_source, 9 message_event_signature, 10 previous_sequence_id, 11 is_trusted.
MessageEventDetail union: 1 messageCreateEvent, 3 conversationKeyChangeEvent, 4 groupChangeEvent, 5 messageFailureEvent, 6 messageTypingEvent, 7 messageDeleteEvent, 8 conversationDeleteEvent, 9 conversationMetadataChangeEvent, 10 grokSearchResponseEvent, 12 markConversationReadEvent, 13 markConversationUnreadEvent, 14 memberAccountDeleteEvent, 15 grokMessageEvent, 16 grokResponseEvent.
MessageInstruction union: 1 pullMessages, 2 keepAlive, 3 pullMessagesFinished, 4 pinReminder, 5 switchToHybridPull, 6 displayTemporaryPasscode, 7 deviceEnrollment.
Sending: MessageCreateEvent
The encrypted body goes in MessageCreateEvent: 100 contents (the nonce‖ciphertext), 101 conversation_key_version, 102 should_notify, 103 ttl_msec, 104 delivered_at_msec, 105 is_pending_public_key, 106 priority, 107 additional_action_list, 108 franking_data, 109 is_message_request. Base64'd into encoded_message_create_event, sent via SendMessageCreateMutation.
Content variants
The plaintext is a MessageEntryHolder { 1: MessageEntryContents }, where MessageEntryContents is a union naming the action:
| Field | Variant | Payload |
|---|---|---|
| 1 | message | MessageContents |
| 2 | reaction_add | MessageReactionAdd { message_sequence_id, emoji, message_attachment_id? } |
| 3 | reaction_remove | same shape as reaction_add |
| 4 | message_edit | MessageEdit { message_sequence_id, updated_text, entities } |
| 5 | mark_conversation_read | MarkConversationRead { seen_until_sequence_id, seen_at_millis } |
| 6 | mark_conversation_unread | MarkConversationUnread { seen_until_sequence_id } |
| 7 / 8 | pin_conversation / unpin_conversation | { conversation_id } |
| 9 | screen_capture_detected | { type } (DmScreenCaptureType: Unknown/Screenshot/Recording) |
| 10 / 11 / 16 | av_call_ended / av_call_missed / av_call_started | call lifecycle |
| 12 | draft_message | draft sync |
| 13 | accept_message_request | empty |
| 14 | nickname_message | { user_id (int64), nickname_text } |
| 15 | set_verified_status |
MessageContents: 1 message_text, 2 entities, 3 attachments, 4 replying_to_preview, 6 forwarded_message, 7 sent_from, 8 quick_reply, 9 ctas, 10 additional_fields. RichTextEntity { start_index, end_index, content } where content is an empty type marker (hashtag=1, cashtag=2, mention=3, url=4, email=5, address=6, phoneNumber=7).
Media & attachments
MessageAttachment union: 1 media, 2 post, 3 url, 4 unified_card, 5 money, 6 jetfuel. MediaAttachment: 1 media_hash_key, 2 dimensions, 3 type, 4 duration_millis, 5 filesize_bytes, 6 filename, 7 attachment_id, 8 legacy_media_url_https, 9 legacy_media_preview_url, 10 grok_tag. MediaType: IMAGE=1, GIF=2, VIDEO=3, AUDIO=4, FILE=5, SVG=6.
Upload is InitializeXChatMediaUploadMutation → push the encrypted stream (private: resumable PUT to ton.x.com/.../xchat_media/<media_hash_key>/...?concurrent=true&resumeId=..&partNumber=..; public: the upload.x.com/1.1/media/upload.json command flow) → FinalizeXChatMediaUploadMutation, then attach a MediaAttachment to a message. GIFs are Giphy bytes (GifSearchQuery) downloaded and re-uploaded as MediaType.GIF. Link cards resolve via GetCardPreviewOrJetfuelFromUrlQuery.
Groups
InitializeGroupConversationMutation allocates a conversation id, then CreateGroupConversationMutation creates it with admin_user_ids, member_user_ids, conversation_key_version, conversation_participant_keys, base64_encoded_key_rotation, and action_signatures. Membership changes (AddGroupMembersMutation, RemoveFromGroupMutation, AddAsAdminMutation, …) re-key (carry the same three key fields). Invite links: EnableGroupInviteMutation → XChatGroupInviteDetails { invite_url, token, … }; join via GroupInviteDetailsQuery + RequestToJoinGroupMutation.
Calls
Audio/video is real WebRTC over Periscope infrastructure. In the official web client this all lives behind the useAvcallingSetup hook (the useAvcallingSetup-*.js bundle: ProxseeApi, GuestServiceApi, the Janus client, setupPeerConnection, the AvCallE2eePipelines). The high-level API is in Audio & video calls; this is what runs underneath.
Two transports
There are two signaling paths:
- 1:1 calls use a Periscope P2P mesh. The caller creates a
p2p/broadcast, and both sides exchange WebRTCOFFER/ANSWER/CANDIDATEenvelopes over the guest-service relay (POST guest-cf.pscp.tv/api/v1/signaling/sendplus a cursor-based long-pollsignaling/receive). The caller is the impolite offerer; on track setup each side sends aMEDIA_STATUSsignal, and the host publishes then offers when it sees one. Candidates are trickled as{ id: sdpMid, label: sdpMLineIndex, candidate }. - Group calls use a Janus SFU (VideoRoom plugin).
proxsee.pscp.tv/api/v2/createBroadcastreturns the Janus gateway URL plus a JWT credential. The host creates a session, attaches two videoroom handles (publisher + subscriber), creates the room (audiocodec: opus,videocodec: h264,h264_profile: 42e01f,dummy_publisher: true), joins as publisher, and publishes a JSEP offer;publishBroadcastthen ties the Janus publisher/handle/session ids to the broadcast to go live. Participants join viagetAudiospace→audiospace/join→audiospace/stream/negotiate, then subscribe to publisher feeds. Janus messages arePOST {gw}/{session}/{handle}with{ janus: "message", body: { room, periscope_user_id, ... }, jsep? }, the JWT in theAuthorizationheader (noBearer); events arrive via a?maxev=1long-poll.
Bootstrap
Both paths start the same way: GraphQL useDirectCallSetupQuery (zCYojd6h_gVXYjFlaAk4bA) returns authenticate_periscope (a JWT) → POST proxsee.pscp.tv/api/v2/loginTwitterToken for a Periscope cookie → authorizeToken { service: "guest" } for the guest-service bearer → turnServers (turn-p2p.pscp.tv:3478/udp, turns:443/tcp) for ICE. This whole bootstrap is verified working with a normal account session.
Gateway architecture & codecs
Two practical findings from reimplementing this headless:
- The group gateway (
gw-prod-*.pscp.tv) is the Twitter Spaces mixing backend, not a plain peer VideoRoom. Each session gets an isolated room view (its own publisher plus a"Dummy publisher"placeholder); participants don't see each other's Janus feeds directly, so peer media is brokered by the Spaces backend rather than exchanged feed-to-feed. The host reliably connects to the SFU and goes live; full cross-participant media is the Spaces layer's job. - The app pins H.264 (
videocodec: h264, profile42e01f) and Opus. Headless WebRTC engines like@roamhq/wrtconly do VP8/VP9/AV1, so video interop with the official client needs an H.264-capable engine; between two emusks clients you can create the room as VP8 (videoCodecoption onstartGroupCall).
Lifecycle in the message channel
Call lifecycle also surfaces in the conversation as content variants (see Content variants): AVCallStarted { is_audio_only, broadcast_id }, AVCallEnded { sent_at_millis, duration_seconds, is_audio_only, broadcast_id }, AVCallMissed { sent_at_millis, is_audio_only }. Incoming calls are also pushed over LivePipeline (/avcall/create/{userId}).
Media encryption
Media frames go through WebRTC encoded-insertable-streams. When enabled it is AES-GCM-256 with a key derived (HKDF) from conversation key material; the OFFER/ANSWER advertise an encryption_info { fingerprint, version } and both sides verify the fingerprint match. Passthrough (no media encryption) is a first-class fallback and is what runs when no encryption_info is exchanged.
Permissions
DmAvPermissionsQuery (client.xchat.callPermissions()) reports { can_dm, error_code } per recipient; settings live in av_settings (UpdateDmSettingsMutation). Note that 1:1 calling is additionally gated server-side by X's rollout.
GraphQL operations
POST https://api.x.com/graphql/<id>/<OperationName>. The id is the x-apollo-operation-id; the full query is sent in the body so it isn't strictly required.
| Operation | Operation id | Backend field |
|---|---|---|
SendMessageCreateMutation | TWRPP7gnKwV_R8-tE-Dd3Q | xchat_send_create_message_event |
SendMessageEventMutation | G7WwJGKvTBVb-BXhZNSVMw | xchat_send_message_event |
DeleteMessageMutation | 4gsDQKEmYkOtvsSIpHXdQA | xchat_delete_messages |
AddEncryptedConversationKeysMutation | 4V1KC8ue2tHHvRuIzeczdg | xchat_add_encrypted_conversation_key |
AddXChatPublicKeyMutation | CQsk6GRuWAVabyXqqEG1sA | user_add_public_key |
DeleteXChatPublicKeyMutation | W5iiIL1MVw4vomq-zLPHUQ | user_delete_public_key |
GetPublicKeys | GJQbOZALDO5D3Zp2IZhH6w | user_results_by_rest_ids |
GenerateXChatTokenMutation | Qh3fZRjPPtPoHYR_2sCZsA | user_get_x_chat_auth_token |
MuteConversationMutation | 6iDsxSkhGLvdiJpqtAtzTQ | xchat_mute_conversation |
UnmuteConversationMutation | _f8wd8RlQCCysv8yMKeiaw | xchat_unmute_conversation |
UpdateConversationTTLMutation | Gu3kCEwNN2V-Az8NDk30Zg | xchat_update_conversation_message_duration |
RemoveConversationTTLMutation | EqSXvxskUyw99ARuIbhYlg | xchat_remove_conversation_message_duration |
AcceptMessageRequestMutation | 4YtAUhUwROL6ejia63Lj6Q | user_add_trust |
GetInboxPageRequestQuery | dVXHY3CBFIw_Gi6eaAum-w | get_inbox_page |
GetConversationPageQuery | IVlXls9JTnbgQ1gxsGAfJA | get_conversation_page |
GetMessageEventsPageQuery | OaSNyAhxUZ9AaW2z9cC26A | get_message_events_page |
GetMessageRequestsPageQuery | B4ibdNFzMv5MBhhxk3CyKw | get_message_requests_page |
DmAvPermissionsQuery | kfX5AHDKZrivyHwCaz68mQ | get_av_permissions |
GifSearchQuery | ciUL4BnRPKal2uL1fL2aHw | gif_search_slice |
InitializeXChatMediaUploadMutation | vTsSDEpF4eVYbR-waSl37g | xchat_initialize_media_upload |
FinalizeXChatMediaUploadMutation | P1CLOMdiMe9ii1MdIJbhcQ | xchat_finalize_media_upload |
CreateGroupConversationMutation | dKl4aC-sBqQWgRhkQXV2wg | xchat_create_group |
EnableGroupInviteMutation | WlxTMdzK_uh-miHVHXv15g | xchat_enable_group_invite |
RequestToJoinGroupMutation | cV0VjzT5UDJW3cbcvALYOg | xchat_request_join_group |
The full set of ~67 operations is available through client.xchat.gql(name, variables).