Meeting Transcription & Summaries
People kept asking the same thing after every meeting: who agreed to what? This pipeline answers it. You drop in a recording, and it returns a tidy summary — decisions, owners, and next steps — instead of a raw wall of text.
The problem
Meeting notes are either skipped or written by whoever was least busy, which means they’re inconsistent and often wrong. The useful signal — this person committed to this thing by this date — gets lost. I wanted that captured automatically and in a shape people would actually read.
The pipeline
flowchart LR A[Audio file] --> B[Whisper<br/>transcribe] B --> C[Diarize<br/>who spoke] C --> D[LLM<br/>summarize] D --> E[Structured notes<br/>decisions · actions]
Four stages, each one swappable on its own.
Notes from building it
- Transcription is the easy part now — Whisper handles it well.
- Diarization (separating speakers) is where most of the mess lives, and it matters because “Sohail will do X” is useless if you can’t tell who Sohail is.
- Summarization only works if you give the model structure to fill in. Asking for “a summary” gives mush; asking for decisions, owners, deadlines gives something usable.
Stack
Python, Whisper (ASR), speaker diarization, a self-hosted LLM for summarization.
Result
Keeping the four stages independent meant I could improve diarization later without touching anything else — the kind of boring modularity that pays off every time you need to change one piece.