Version: Next

Fishjam Agent Introduction

tip

We recommend going through the steps in the Backend Quick Start before trying Fishjam agents, as you will need a working backend server to use them.

This page gives an introduction to Fishjam agents and how to use them. You can learn more about how Agents work on the Agent Internals page.

What is an Agent?

An agent is a piece of software that allows your backend server to participate in a Fishjam room, similar to how the Fishjam client SDKs allow your client-side application to participate in a Fishjam room. They can be used to implement features such as real-time audio transcription, real-time content moderation, conversations with AI agents and more.

You can simply think of an agent as a peer running within your backend application.

Writing an Agent

In this section we show how to implement an agent using the Fishjam server SDKs. If you are not using the SDKs, then you can check out the Agent Internals, to learn how to integrate with Fishjam Agents.

Prerequisites

Before we create the actual agent, we need to create a room first, as agents are scoped to rooms. Additionally, we will also create a peer so that the agent has someone to listen to and talk to.

TypeScript
Python

import { FishjamClient } from '@fishjam-cloud/js-server-sdk';

const fishjamClient = new FishjamClient({ fishjamId, managementToken });
const room = await fishjamClient.createRoom();
const peer = await fishjamClient.createPeer(room.id);

from fishjam import FishjamClient

fishjam_client = FishjamClient(fishjam_id, management_token)
room = fishjam_client.create_room()
peer = fishjam_client.create_peer(room.id)

Creating a listening Agent

If you are using the server SDKs, then creating an agent and defining its behavior is very simple. By default, agents receive all peers' audio streams. However, it's likely that in your scenario you'll want to use the Selective Subscriptions API for fine-grained control over which peers/tracks they should receive audio from.

TypeScript
Python


import type { AgentCallbacks, PeerOptions } from '@fishjam-cloud/js-server-sdk';

const agentOptions = {
  subscribeMode: 'auto',
  output: { audioFormat: 'pcm16', audioSampleRate: 16000 }
} satisfies PeerOptions;

const agentCallbacks = {
  onError: console.error,
  onClose: (code, reason) => console.log('Agent closed', code, reason)
} satisfies AgentCallbacks;

const { agent } = await fishjamClient.createAgent(room.id, agentOptions, agentCallbacks);

// Register a callback for incoming audio data
agent.on('trackData', ({ track, peerId, data }) => {
  // process the incoming data
})

from fishjam import FishjamClient
from fishjam.agent import AgentResponseTrackData

fishjam_client = FishjamClient(fishjam_id, management_token)

agent_options = AgentOptions(subscribe_mode="auto")
agent = fishjam_client.create_agent(room_id)

# the agent will disconnect once you exit the context
async with agent.connect() as session:
  async for track_data in session.receive():
    # process the incoming data
    pass

Making the Agent speak

Apart from just listening, agents can also send audio data to peers.
Let's assume that in the previous section we forwarded the peer's audio to some audio chatbot. Now, the chatbot returns responses, and we want to play it back to the peer.

tip

You can interrupt the currently played audio chunk. See the example below.

TypeScript
Python

import { type AudioCodecParameters } from '@fishjam-cloud/js-server-sdk';

const codecParameters = {
  encoding: 'pcm16',
  sampleRate: 16000,
  channels: 1,
} satisfies AudioCodecParameters;

const agentTrack = agent.createTrack(codecParameters);

// that's a dummy chatbot,
// you can bring your audio from anywhere
chatbot.on('response', (response: Uint8Array) => {
  agent.sendData(agentTrack.id, response);

  // you're able to interrupt the currently played audio chunk
  agent.interruptTrack(agentTrack.id);
});

from fishjam import FishjamClient
from fishjam.agent import AgentResponseTrackData, OutgoingAudioTrackOptions, TrackEncoding

fishjam_client = FishjamClient(fishjam_id, management_token)

agent_options = AgentOptions(subscribe_mode="auto")

agent = fishjam_client.create_agent(room_id)

track_options = OutgoingAudioTrackOptions(encoding=TrackEncoding.PCM16)

# the agent will disconnect once you exit the context
async with agent.connect() as session:
    outgoing_track = session.create_track(track_options)

    # that's a dummy chatbot
    async for chatbot_data in chatbot.receive():
        outgoing_track.send_chunk(chatbot_data)

    # you're able to interrupt the currently played audio chunk
    ongoing_track.interrupt()

Disconnecting

After you're done using an agent, you can disconnect it from the room.

TypeScript
Python

agent.disconnect();

# the agent will disconnect once you exit the context
async with agent.connect() as session:
    pass

Remember to disconnect your agents!

It's important to disconnect agents, because every connected agent generates usage just as a normal peer.

Pricing

Agents are billed as if they were normal peers.

For example, a room with 2 peers and 1 agent connected will be billed as if there were 3 peers connected to the room. Exact pricing values can be found on our pricing page.

Using agents without our SDKs

If you are using Fishjam's REST API directly, then check out this Agent Internals section.

What is an Agent?​

Writing an Agent​

Prerequisites​

Creating a listening Agent​

Making the Agent speak​

Disconnecting​

Pricing​

Using agents without our SDKs​

See also​