Version: 0.24.0

Gemini Live Integration

info

This tutorial requires a working Fishjam backend. If you haven't set one up yet, please check the Backend Quick Start.

This guide demonstrates how to build a real-time speech-to-speech agent using Fishjam and Google's Multimodal Live API. By connecting these two services, you can create a low-latency voice assistant that not only listens to peers in a room and responds with natural voice, but also provides a real-time transcription of the conversation.

Overview

The implementation acts as a bridge between two real-time streams:

Fishjam ➡️ Gemini: The Agent receives audio from the room and forwards it to Google GenAI.
Gemini ➡️ Fishjam: The Agent receives audio generated by Gemini and plays it back into the room.

To ensure these streams connect without audio glitches (garbled voice, wrong pitch), the audio sample rates must match between services.

Prerequisites

You will need:

Fishjam Server Credentials: fishjamId and managementToken. You can get them at fishjam.io/app.
Google Gemini API Key: Obtainable from Google AI Studio.

Installation

Since the Google integration is optional, you need to install the specific dependencies for your SDK.

TypeScript
Python

First, ensure you have the Google GenAI SDK installed alongside Fishjam.

npm install @fishjam-cloud/js-server-sdk @google/genai

Install Fishjam with the gemini extra to pull in the necessary libraries.

pip install "fishjam-server-sdk[gemini]"

Implementation

Step 1: Initialize Clients

We provide a helper factory to initialize the Google Client.

TypeScript
Python

import { FishjamClient } from '@fishjam-cloud/js-server-sdk';
import GeminiIntegration from '@fishjam-cloud/js-server-sdk/gemini';

const fishjamClient = new FishjamClient({
  fishjamId: process.env.FISHJAM_ID!,
  managementToken: process.env.FISHJAM_TOKEN!,
});

const genAi = GeminiIntegration.createClient({
  // Pass standard Google client options here
  apiKey: process.env.GOOGLE_API_KEY!,
});

import os
from fishjam import FishjamClient
from fishjam.integrations.gemini import GeminiIntegration

fishjam_client = FishjamClient(
    fishjam_id=os.environ["FISHJAM_ID"],
    management_token=os.environ["FISHJAM_TOKEN"]
)

# pass standard Google client kwargs here
gen_ai = GeminiIntegration.create_client(api_key=os.environ["GOOGLE_API_KEY"])

Step 2: Configure the Agent

Create a Fishjam agent configured to match the audio format that the Google client expects (16kHz and 24kHz on Gemini input and output, respectively).

TypeScript
Python

import GeminiIntegration from '@fishjam-cloud/js-server-sdk/gemini';

const room = await fishjamClient.createRoom();

const { agent } = await fishjamClient.createAgent(room.id, {
  subscribeMode: 'auto',
  // Use our preset to match the required audio format (16kHz)
  output: GeminiIntegration.geminiInputAudioSettings,
});

from fishjam.peer import SubscribeOptions, SubscribeOptionsAudioSampleRate
from fishjam.agent import OutgoingAudioTrackOptions, TrackEncoding
from fishjam.integrations.gemini import GeminiIntegration

room = fishjam_client.create_room()

# Use our preset to match the required audio format (16kHz)
agent_options = AgentOptions(output=GeminiIntegration.GEMINI_INPUT_AUDIO_SETTINGS)
agent = fishjam_client.create_agent(room.id, agent_options)

Step 3: Connect the Streams

Encoding

Fishjam handles raw bytes, while Google GenAI SDKs often expect Base64 strings. Ensure you convert between them correctly as shown below.

TypeScript
Python

Now we setup the callbacks. We need to forward incoming Fishjam audio to Google, and forward incoming Google audio to Fishjam.

import GeminiIntegration from '@fishjam-cloud/js-server-sdk/gemini';

const GEMINI_MODEL = 'gemini-2.5-flash-native-audio-preview-12-2025'

// Use our preset to match the required audio format (24kHz)
const agentTrack = agent.createTrack(GeminiIntegration.geminiOutputAudioSettings);

const session = await genAi.live.connect({
  model: GEMINI_MODEL,
  config: { responseModalities: [Modality.AUDIO] },
  callbacks: {
    // Google -> Fishjam
    onmessage: (msg) => {
      if (msg.data) {
        const pcmData = Buffer.from(msg.data, 'base64');
        agent.sendData(agentTrack.id, pcmData);
      }

      if (msg.serverContent?.interrupted) {
        console.log('Agent was interrupted by user.');
        // Clears the buffer on the Fishjam media server
        agent.interruptTrack(agentTrack.id);
      }
    }
  }
});

// Fishjam -> Google
agent.on('trackData', ({ data }) => {
  session.sendRealtimeInput({
    audio: {
      mimeType: GeminiIntegration.inputMimeType,
      data: Buffer.from(data).toString('base64'),
    }
  });
});

Now we connect the websocket loops. We need to forward incoming Fishjam audio to Google, and forward incoming Google audio to Fishjam.

import asyncio
from fishjam.integrations.gemini import GeminiIntegration
from google.genai.types import Blob, Modality

GEMINI_MODEL = "gemini-2.5-flash-native-audio-preview-12-2025"

async with agent.connect() as fishjam_session:

    # Use our preset to match the required audio format (24kHz)
    outgoing_track = await fishjam_session.add_track(GeminiIntegration.GEMINI_OUTPUT_AUDIO_SETTINGS)

    async with gen_ai.aio.live.connect(
        model=GEMINI_MODEL,
        config={"response_modalities": [Modality.AUDIO]}
    ) as gemini_session:

        # Fishjam -> Google
        async def forward_audio_to_gemini():
            async for track_data in fishjam_session.receive():
                await gemini_session.send_realtime_input(audio=Blob(
                    mime_type=GeminiIntegration.GEMINI_AUDIO_MIME_TYPE,
                    data=track_data.data
                ))

        # Google -> Fishjam
        async def forward_audio_to_fishjam():
            async for msg in gemini_session.receive():
                server_content = msg.server_content

                if server_content is None:
                    continue

                if server_content.interrupted:
                    await outgoing_track.interrupt()

                if server_content.model_turn and server_content.model_turn.parts:
                    for part in server_content.model_turn.parts:
                        if part.inline_data and part.inline_data.data:
                            await outgoing_track.send_chunk(part.inline_data.data)

        # Run both loops concurrently
        await asyncio.gather(
            forward_audio_to_gemini(),
            forward_audio_to_fishjam()
        )

Overview​

Prerequisites​

Installation​

Implementation​

Step 1: Initialize Clients​

Step 2: Configure the Agent​

Step 3: Connect the Streams​