Skip to main content

How Transcription Works

Ospri Brain uses a two-stage transcription process:

Stage 1: Real-Time Streaming

During the meeting, the bot streams live transcript segments as they’re spoken. These appear in real-time and are useful for monitoring but may have lower accuracy.

Stage 2: Final Transcript

After the meeting ends, the full recording is processed through Deepgram (or Recall.ai’s built-in transcription) to generate a high-quality final transcript. This replaces the real-time segments with a more accurate version.

Speaker Diarization

Ospri uses Perfect Diarization — each speaker is identified separately using individual audio streams when available (e.g., on Microsoft Teams). This means:
  • Each speaker gets a consistent label throughout the transcript
  • No confusion between speakers even in fast-paced conversations
  • Names are matched to calendar attendees when possible

Viewing the Transcript

  1. Click any meeting in the Recordings tab
  2. Go to the Transcript tab
  3. The transcript is displayed chronologically with:
    • Speaker name (bold)
    • Timestamp (clickable if video is available)
    • Spoken text
Alex’s screenshot note: Take a screenshot of the Transcript tab on a meeting detail page showing 5-6 transcript entries with different speaker names and timestamps.

Transcript Status

StatusMeaning
PendingMeeting ended but transcript hasn’t been generated yet
In ProgressTranscript is currently being generated
DoneFull transcript is available

Accuracy Notes

  • Final transcripts are generated with accuracy-prioritized settings
  • Speaker identification uses separate audio streams when the platform supports it
  • For platforms without separate streams, AI-based speaker separation is used
  • Medical and scientific terminology is generally handled well but may occasionally need manual correction

Transcript Deduplication

The system automatically prevents duplicate transcript segments. A unique index on (meeting_id, speaker_name, start_time) ensures each spoken segment is stored exactly once, even if real-time and final transcription processes overlap.