Version: 0.24.0

Agent Internals

info

This explanation focuses on the inner workings of Fishjam Agents. If you are looking for a tutorial that shows how to implement an agent, then you may find the Agent Introduction more useful.

What is an Agent?

An agent is a piece of software that allows your backend server to participate in a Fishjam room, similar to how the Fishjam client SDKs allow your client-side application to participate in a Fishjam room. They can be used to implement features such as real-time audio transcription, real-time content moderation, conversations with AI agents and more.

You can simply think of an agent as a peer running within your backend application.

Agent lifecycle

There are four steps to the lifecycle of a Fishjam Agent:

The agent is created in the room roomId. Fishjam generates a token that the agent will use to connect to the room.
The agent connects to the room roomId via a WebSocket connection.
The agent receives audio from subscribed peers in the room roomId. Subscriptions are explained in the Peer subscriptions section.
The agent disconnects from the room roomId by closing the WebSocket connection.

We describe the steps of an agent's lifecycle in detail below.

tip

The Fishjam server SDKs provide a simple way to integrate with the APIs described below.

Step 1. Creating an Agent in Fishjam

Agents are created via the /room/{roomId}/peer REST API endpoint, by setting the type option to "agent". An example request with cURL would look like:

curl -XPOST -H "Authorization: Bearer $FISHJAM_MANAGEMENT_TOKEN" \
"https://fishjam.io/api/v1/room/$ROOM_ID/peer" \
--json '{"type": "agent"}'

Step 2. Connecting to the Room

Agents connect to Fishjam by initiating a WebSocket connection on the /socket/agent/websocket REST API endpoint. All communication over the WebSocket between Agent the Fishjam is done using Protobuf messages. Check Fishjam Protobuf reference for more info.
Once connected, the agent must authenticate with Fishjam by sending an AgentRequest.AuthRequest protobuf message. An example of initiating a connection using websocat and Buf CLI would look like:

# Get the Fishjam protobuf definitions
git clone https://github.com/fishjam-cloud/protos.git

echo "{\"authRequest\": {\"token\": \"$AGENT_TOKEN\"}}" \
| buf convert protos/fishjam/agent_notifications.proto \
   --type="fishjam.AgentRequest" --from="-#format=json" \
| websocat -vbn wss://fishjam.io/api/v1/connect/$FISHJAM_ID/socket/agent/websocket

In a backend application, we suggest using the Protobuf compiler to generate utilities for creating protobuf message from code.

Step 3. Receiving Audio

Once connected, an agent will receive chunks of audio as AgentRequest.TrackData protobuf message.

Step 4. Sending Audio

All further steps are done via the WebSocket connection created in Step 2.

To send audio to an agent, you need to create an audio track first. This is done by sending an AgentRequest.CreateTrack protobuf message. Once the track is created, you can send audio to it using AgentRequest.TrackData protobuf message.

Step 5. Disconnecting from the Room

An agent disconnects from the room by closing the WebSocket connection created in Step 2. It may reconnect using the same token it provided in Step 2.

Remember to disconnect your agents!

It's important to disconnect agents, because every connected agent generates usage just as a normal peer.

Peer subscriptions

By default, agents won't receive peers' audio streams. This is by design and aims to prevent unnecessary resource usage by the agents.

For an agent to start receiving a peer's audio, the peer must be created with the subscribe option set.

The subscribe option contains parameters describing the desired output format of the peer's audio stream. This output format will be sent to connected agents.

Output format

Currently, only 16-bit PCM (raw) audio output is supported.

Output sample rate

The output can have either a sample rate of 16kHz or 24kHz.

The above values aim to be compatible with popular real-time AI audio APIs. If you require a different output format, then make sure to contact us at contact@fishjam.io.

What is an Agent?​

Agent lifecycle​

Step 1. Creating an Agent in Fishjam​

Step 2. Connecting to the Room​

Step 3. Receiving Audio​

Step 4. Sending Audio​

Step 5. Disconnecting from the Room​

Peer subscriptions​

Output format​

Output sample rate​