ElevenLabs Developers on X: "How to build Realtime Multilingual Subtitles with ElevenLabs" / X

--- Summary:

Pro tip: copy this article to your coding agent, it can perform many steps from it for you.
Real-time translation has been shown in product demos for many years.
However, with the Scribe v2 Realtime model, it is now fast and accurate enough to make this viable for real-world use cases.
Using Scribe v2 Realtime and the Chrome AI API, I built a real-time translation demo that translates in real-time and displays the committed history.

--- Full Article:

Pro tip: copy this article to your coding agent, it can perform many steps from it for you.

Real-time translation has been shown in product demos for many years. However, with the Scribe v2 Realtime model, it is now fast and accurate enough to make this viable for real-world use cases.

Using Scribe v2 Realtime and the Chrome AI API, I built a real-time translation demo that translates in real-time and displays the committed conversation history.

Demo screenshot

Translation flow

The input and original language translation are handled by ElevenLabs.

The translation can be handled by any AI API, but we chose the Chrome Translator API because it is built into the browser and fast.

ElevenLabs API Key
Google Chrome
Node.js Server
React Frontend

You can also use an SSR web framework like Next.js or Astro.

Install the ElevenLabs SDK.

bash

npm install @elevenlabs/react @elevenlabs/elevenlabs-js

Create a server-side endpoint to generate a token.

typescript

// Node.js server, this example uses express.js
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";

const elevenlabs = new ElevenLabsClient({  
 apiKey: process.env.ELEVENLABS_API_KEY,  
});

app.get("/scribe-token", yourAuthMiddleware, async (req, res) => {  
 const token = await elevenlabs.tokens.singleUse.create("realtime_scribe");

 res.json(token);  
});

Use the useScribe hook from the ElevenLabs SDK to create a Scribe instance. Set the commit strategy to VAD, which is Voice Activity Detection. This will detect when you stop talking in order to commit the transcription.

Scribe provides two types of transcriptions: partial and committed. The partial transcription is returned in real time as you speak, and the committed transcripts are finalized transcripts after the VAD notices that the speaker is done talking. The latter will have higher accuracy, because they have the full segment for context.

Then you can connect your microphone to Scribe to start transcribing. You will find the transcript results in the partialTranscript and committedTranscript properties.

typescript

import { useScribe } from "@elevenlabs/react";

function MyComponent() {  
 const scribe = useScribe({  
   commitStrategy: CommitStrategy.VAD,  
   modelId: "scribe_v2_realtime",  
 });

 const handleStart = async () => {  
   // Fetch a single-use token from the server  
   const token = await fetchTokenFromServer();

   await scribe.connect({  
     token,  
     microphone: {  
       echoCancellation: true,  
       noiseSuppression: true,  
     },  
   });  
 };

 return (  
   <div>  
     <button onClick={handleStart} disabled={scribe.isConnected}>  
       Start Recording  
     </button>  
     <button onClick={scribe.disconnect} disabled={!scribe.isConnected}>  
       Stop  
     </button>

     {scribe.partialTranscript && <p>Live: {scribe.partialTranscript}</p>}

     <div>  
       {scribe.committedTranscripts.map((t) => (  
         <p key={t.id}>{t.text}</p>  
       ))}  
     </div>  
   </div>  
 );  
}

The Chrome Translator API is built directly in the browser, and we can access it using globalThis. Depending on the languages you are translating to, you will need to download them.

typescript

async function createTranslator(sourceLanguage, targetLanguage, onDownloadProgress) {  
 const api = globalThis.Translator ?? null;  
 if (!api) {  
   throw new Error('Chrome Translator API is not available');  
 }

 const availability = await api.availability({  
   sourceLanguage,  
   targetLanguage  
 });

 if (availability === 'unsupported' || availability === 'unavailable') {  
   throw new Error(`Translation ${sourceLanguage}→${targetLanguage} not supported`);  
 }

 const translator = await api.create({  
   sourceLanguage,  
   targetLanguage,  
   monitor: onDownloadProgress  
     ? (monitor) => {  
         monitor.addEventListener('downloadprogress', onDownloadProgress);  
       }  
     : undefined  
 });

 await translator.ready;

 return translator;  
}

Once they are downloaded and ready, you can use the translator to translate your partial and/or committed transcriptions.

typescript

export function Demo({ inputLanguage, targetLanguage }: { inputLanguage: string; targetLanguage: string }) {
  const scribe = useScribe({ commitStrategy: CommitStrategy.VAD, modelId: 'scribe_v2_realtime' });

  const translatorRef = useRef<any>(null);
  const [liveTranslatedPartial, setLiveTranslatedPartial] = useState('');
  const [translatedById, setTranslatedById] = useState<Record<string, string>>({});

  useEffect(() => {
    const t = translatorRef.current;
    const src = scribe.partialTranscript.trim();
    if (!t || !src) return void setLiveTranslatedPartial('');
    const id = setTimeout(async () => setLiveTranslatedPartial(await t.translate(src)), 220);
    return () => clearTimeout(id);
  }, [scribe.partialTranscript]);

  // committedTranscripts -> translatedById
  useEffect(() => {
    const t = translatorRef.current;
    if (!t) return;
    const next = scribe.committedTranscripts.find((s) => s.text.trim() && !translatedById[s.id]);
    if (!next) return;
    t.translate(next.text).then((txt: string) => setTranslatedById((p) => ({ ...p, [next.id]: txt })));
  }, [scribe.committedTranscripts, translatedById]);

  return null; // render however you like
}

Then you can dress this up in a nice UI, and you can translate between all the supported language pairs in Chrome!

You can find the source code here:

Keen's Clippings

Explorer

ElevenLabs Developers on X: "How to build Realtime Multilingual Subtitles with ElevenLabs" / X

Graph View