Agent Internals
This explanation focuses on the inner workings of Fishjam Agents. If you are looking for a tutorial that shows how to implement an agent, then you may find the Agent Introduction more useful.
What is an Agent?
An agent is a piece of software that allows your backend server to participate in a Fishjam room, similar to how the Fishjam client SDKs allow your client-side application to participate in a Fishjam room. They can be used to implement features such as real-time audio transcription, audio recording, real-time content moderation and more.
You can simply think of an agent as a peer running within your backend application.
Agent lifecycle
There are four steps to the lifecycle of a Fishjam Agent:
- The agent is created in the room
roomId
. Fishjam generates a token that the agent will use to connect to the room. - The agent connects to the room
roomId
via a WebSocket connection. - The agent receives audio from subscribed peers in the room
roomId
. Subscriptions are explained in the Peer subscriptions section. - The agent disconnects from the room
roomId
by closing the WebSocket connection.
We describe the steps of an agent's lifecycle in detail below.
The Fishjam server SDKs provide a simple way to integrate with the APIs described below.
Step 1. Creating an Agent in Fishjam
Agents are created via the /room/{roomId}/peer
REST API endpoint, by setting the type
option to "agent"
.
An example request with cURL would look like:
curl -XPOST -H "Authorization: Bearer $FISHJAM_MANAGEMENT_TOKEN" \ "https://fishjam.io/api/v1/room/$ROOM_ID/peer" \ --json '{"type": "agent"}'
Step 2. Connecting to the Room
Agents connect to Fishjam by initiating a WebSocket connection on the /socket/agent/websocket
REST API endpoint.
Once connected, the agent must authenticate with Fishjam by sending an AgentRequest.AuthRequest
protobuf message.
An example of initiating a connection using websocat and Buf CLI would look like:
# Get the Fishjam protobuf definitions git clone https://github.com/fishjam-cloud/protos.git echo "{\"authRequest\": {\"token\": \"$AGENT_TOKEN\"}}" \ | buf convert protos/fishjam/agent_notifications.proto \ --type="fishjam.AgentRequest" --from="-#format=json" \ | websocat -vbn wss://fishjam.io/api/v1/connect/$FISHJAM_ID/socket/agent/websocket
In a backend application, we suggest using the Protobuf compiler to generate utilities for creating protobuf message from code.
Step 3. Receiving Audio
Once connected, an agent will receive chunks of audio as AgentRequest.TrackData
protobuf message.
Step 4. Disconnecting from the Room
An agent disconnects from the room by closing the WebSocket connection created in Step 2. It may reconnect using the same token it provided in Step 2.
Peer subscriptions
By default, agents won't receive peers' audio streams. This is by design and aims to prevent unnecessary resource usage by the agents.
For an agent to start receiving a peer's audio, the peer must be created with the subscribe option set.
The subscribe option contains parameters describing the desired output format of the peer's audio stream. This output format will be sent to connected agents.
Output format
Currently, only 16-bit PCM (raw) audio output is supported.
Output sample rate
The output can have either a sample rate of 16kHz or 24kHz.
The above values aim to be compatible with popular real-time AI audio APIs. If you require a different output format, then make sure to contact us at contact@fishjam.io.