Gemini Live Integration
This tutorial requires a working Fishjam backend. If you haven't set one up yet, please check the Backend Quick Start.
This guide demonstrates how to build a real-time speech-to-speech agent using Fishjam and Google's Multimodal Live API. By connecting these two services, you can create a low-latency voice assistant that not only listens to peers in a room and responds with natural voice, but also provides a real-time transcription of the conversation.
Overview
The implementation acts as a bridge between two real-time streams:
- Fishjam ➡️ Gemini: The Agent receives audio from the room and forwards it to Google GenAI.
- Gemini ➡️ Fishjam: The Agent receives audio generated by Gemini and plays it back into the room.
To ensure these streams connect without audio glitches (garbled voice, wrong pitch), the audio sample rates must match between services.
Prerequisites
You will need:
- Fishjam Server Credentials:
fishjamIdandmanagementToken. You can get them at fishjam.io/app. - Google Gemini API Key: Obtainable from Google AI Studio.
Installation
Since the Google integration is optional, you need to install the specific dependencies for your SDK.
- TypeScript
- Python
First, ensure you have the Google GenAI SDK installed alongside Fishjam.
npm install @fishjam-cloud/js-server-sdk @google/genai
Install Fishjam with the gemini extra to pull in the necessary libraries.
pip install "fishjam-server-sdk[gemini]"
Implementation
Step 1: Initialize Clients
We provide a helper factory to initialize the Google Client.
- TypeScript
- Python
import {FishjamClient } from '@fishjam-cloud/js-server-sdk'; importGeminiIntegration from '@fishjam-cloud/js-server-sdk/gemini'; constfishjamClient = newFishjamClient ({fishjamId :process .env .FISHJAM_ID !,managementToken :process .env .FISHJAM_TOKEN !, }); constgenAi =GeminiIntegration .createClient ({ // Pass standard Google client options hereapiKey :process .env .GOOGLE_API_KEY !, });
import os from fishjam import FishjamClient from fishjam.integrations.gemini import GeminiIntegration fishjam_client = FishjamClient( fishjam_id=os.environ["FISHJAM_ID"], management_token=os.environ["FISHJAM_TOKEN"] ) # pass standard Google client kwargs here gen_ai = GeminiIntegration.create_client(api_key=os.environ["GOOGLE_API_KEY"])
Step 2: Configure the Agent
Create a Fishjam agent configured to match the audio format that the Google client expects (16kHz and 24kHz on Gemini input and output, respectively).
- TypeScript
- Python
importGeminiIntegration from '@fishjam-cloud/js-server-sdk/gemini'; constroom = awaitfishjamClient .createRoom (); const {agent } = awaitfishjamClient .createAgent (room .id , {subscribeMode : 'auto', // Use our preset to match the required audio format (16kHz)output :GeminiIntegration .geminiInputAudioSettings , });
from fishjam.peer import SubscribeOptions, SubscribeOptionsAudioSampleRate from fishjam.agent import OutgoingAudioTrackOptions, TrackEncoding from fishjam.integrations.gemini import GeminiIntegration room = fishjam_client.create_room() # Use our preset to match the required audio format (16kHz) agent_options = AgentOptions(output=GeminiIntegration.GEMINI_INPUT_AUDIO_SETTINGS) agent = fishjam_client.create_agent(room.id, agent_options)
Step 3: Connect the Streams
Fishjam handles raw bytes, while Google GenAI SDKs often expect Base64 strings. Ensure you convert between them correctly as shown below.
- TypeScript
- Python
Now we setup the callbacks. We need to forward incoming Fishjam audio to Google, and forward incoming Google audio to Fishjam.
importGeminiIntegration from '@fishjam-cloud/js-server-sdk/gemini'; constGEMINI_MODEL = 'gemini-2.5-flash-native-audio-preview-12-2025' // Use our preset to match the required audio format (24kHz) constagentTrack =agent .createTrack (GeminiIntegration .geminiOutputAudioSettings ); constsession = awaitgenAi .live .connect ({model :GEMINI_MODEL ,config : {responseModalities : [Modality .AUDIO ] },callbacks : { // Google -> Fishjamonmessage : (msg ) => { if (msg .data ) { constpcmData =Buffer .from (msg .data , 'base64');agent .sendData (agentTrack .id ,pcmData ); } if (msg .serverContent ?.interrupted ) {console .log ('Agent was interrupted by user.'); // Clears the buffer on the Fishjam media serveragent .interruptTrack (agentTrack .id ); } } } }); // Fishjam -> Googleagent .on ('trackData', ({data }) => {session .sendRealtimeInput ({audio : {mimeType :GeminiIntegration .inputMimeType ,data :Buffer .from (data ).toString ('base64'), } }); });
Now we connect the websocket loops. We need to forward incoming Fishjam audio to Google, and forward incoming Google audio to Fishjam.
import asyncio from fishjam.integrations.gemini import GeminiIntegration from google.genai.types import Blob, Modality GEMINI_MODEL = "gemini-2.5-flash-native-audio-preview-12-2025" async with agent.connect() as fishjam_session: # Use our preset to match the required audio format (24kHz) outgoing_track = await fishjam_session.add_track(GeminiIntegration.GEMINI_OUTPUT_AUDIO_SETTINGS) async with gen_ai.aio.live.connect( model=GEMINI_MODEL, config={"response_modalities": [Modality.AUDIO]} ) as gemini_session: # Fishjam -> Google async def forward_audio_to_gemini(): async for track_data in fishjam_session.receive(): await gemini_session.send_realtime_input(audio=Blob( mime_type=GeminiIntegration.GEMINI_AUDIO_MIME_TYPE, data=track_data.data )) # Google -> Fishjam async def forward_audio_to_fishjam(): async for msg in gemini_session.receive(): server_content = msg.server_content if server_content is None: continue if server_content.interrupted: await outgoing_track.interrupt() if server_content.model_turn and server_content.model_turn.parts: for part in server_content.model_turn.parts: if part.inline_data and part.inline_data.data: await outgoing_track.send_chunk(part.inline_data.data) # Run both loops concurrently await asyncio.gather( forward_audio_to_gemini(), forward_audio_to_fishjam() )