Fishjam Agent Introduction
We recommend going through the steps in the Backend Quick Start before trying Fishjam agents, as you will need a working backend server to use them.
This page gives an introduction to Fishjam agents and how to use them. You can learn more about how Agents work on the Agent Internals page.
What is an Agent?
An agent is a piece of software that allows your backend server to participate in a Fishjam room, similar to how the Fishjam client SDKs allow your client-side application to participate in a Fishjam room. They can be used to implement features such as real-time audio transcription, real-time content moderation, conversations with AI agents and more.
You can simply think of an agent as a peer running within your backend application.
Writing an Agent
In this section we show how to implement an agent using the Fishjam server SDKs. If you are not using the SDKs, then you can check out the Agent Internals, to learn how to integrate with Fishjam Agents.
Prerequisites
Before we create the actual agent, we need to create a room first, as agents are scoped to rooms. Additionally, we will also create a peer so that the agent has someone to listen to and talk to.
- TypeScript
- Python
import {
FishjamClient } from '@fishjam-cloud/js-server-sdk'; constfishjamClient = newFishjamClient ({fishjamId ,managementToken }); constroom = awaitfishjamClient .createRoom (); constpeer = awaitfishjamClient .createPeer (room .id );
from fishjam import FishjamClient fishjam_client = FishjamClient(fishjam_id, management_token) room = fishjam_client.create_room() peer = fishjam_client.create_peer(room.id)
Creating a listening Agent
If you are using the server SDKs, then creating an agent and defining its behavior is very simple. By default, agents receive all peers' audio streams. However, it's likely that in your scenario you'll want to use the Selective Subscriptions API for fine-grained control over which peers/tracks they should receive audio from.
- TypeScript
- Python
import type {
AgentCallbacks ,PeerOptions } from '@fishjam-cloud/js-server-sdk'; constagentOptions = {subscribeMode : 'auto',output : {audioFormat : 'pcm16',audioSampleRate : 16000 } } satisfiesPeerOptions ; constagentCallbacks = {onError :console .error ,onClose : (code ,reason ) =>console .log ('Agent closed',code ,reason ) } satisfiesAgentCallbacks ; const {agent } = awaitfishjamClient .createAgent (room .id ,agentOptions ,agentCallbacks ); // Register a callback for incoming audio dataagent .on ('trackData', ({track ,peerId ,data }) => { // process the incoming data })
from fishjam import FishjamClient from fishjam.agent import AgentResponseTrackData fishjam_client = FishjamClient(fishjam_id, management_token) agent_options = AgentOptions(subscribe_mode="auto") agent = fishjam_client.create_agent(room_id) # the agent will disconnect once you exit the context async with agent.connect() as session: async for track_data in session.receive(): # process the incoming data pass
Making the Agent speak
Apart from just listening, agents can also send audio data to peers.
Let's assume that in the previous section we forwarded the peer's audio to some audio chatbot.
Now, the chatbot returns responses, and we want to play it back to the peer.
You can interrupt the currently played audio chunk. See the example below.
- TypeScript
- Python
import { type
AudioCodecParameters } from '@fishjam-cloud/js-server-sdk'; constcodecParameters = {encoding : 'pcm16',sampleRate : 16000,channels : 1, } satisfiesAudioCodecParameters ; constagentTrack =agent .createTrack (codecParameters ); // that's a dummy chatbot, // you can bring your audio from anywherechatbot .on ('response', (response :Uint8Array ) => {agent .sendData (agentTrack .id ,response ); // you're able to interrupt the currently played audio chunkagent .interruptTrack (agentTrack .id ); });
from fishjam import FishjamClient from fishjam.agent import AgentResponseTrackData, OutgoingAudioTrackOptions, TrackEncoding fishjam_client = FishjamClient(fishjam_id, management_token) agent_options = AgentOptions(subscribe_mode="auto") agent = fishjam_client.create_agent(room_id) track_options = OutgoingAudioTrackOptions(encoding=TrackEncoding.PCM16) # the agent will disconnect once you exit the context async with agent.connect() as session: outgoing_track = session.create_track(track_options) # that's a dummy chatbot async for chatbot_data in chatbot.receive(): outgoing_track.send_chunk(chatbot_data) # you're able to interrupt the currently played audio chunk ongoing_track.interrupt()
Disconnecting
After you're done using an agent, you can disconnect it from the room.
- TypeScript
- Python
agent .disconnect ();
# the agent will disconnect once you exit the context async with agent.connect() as session: pass
It's important to disconnect agents, because every connected agent generates usage just as a normal peer.
Pricing
Agents are billed as if they were normal peers.
For example, a room with 2 peers and 1 agent connected will be billed as if there were 3 peers connected to the room. Exact pricing values can be found on our pricing page.
Using agents without our SDKs
If you are using Fishjam's REST API directly, then check out this Agent Internals section.
See also
Learn more about how agents work:
Writing a backend server with Fishjam: