Meeting Transcription & Summaries

People kept asking the same thing after every meeting: who agreed to what? This pipeline answers it. You drop in a recording, and it returns a tidy summary — decisions, owners, and next steps — instead of a raw wall of text.

The problem

Meeting notes are either skipped or written by whoever was least busy, which means they’re inconsistent and often wrong. The useful signal — this person committed to this thing by this date — gets lost. I wanted that captured automatically and in a shape people would actually read.

The pipeline

flowchart LR
A[Audio file] --> B[Whisper<br/>transcribe]
B --> C[Diarize<br/>who spoke]
C --> D[LLM<br/>summarize]
D --> E[Structured notes<br/>decisions · actions]

Four stages, each one swappable on its own.

Notes from building it

Transcription is the easy part now — Whisper handles it well.
Diarization (separating speakers) is where most of the mess lives, and it matters because “Sohail will do X” is useless if you can’t tell who Sohail is.
Summarization only works if you give the model structure to fill in. Asking for “a summary” gives mush; asking for decisions, owners, deadlines gives something usable.

Stack

Python, Whisper (ASR), speaker diarization, a self-hosted LLM for summarization.

Result

Keeping the four stages independent meant I could improve diarization later without touching anything else — the kind of boring modularity that pays off every time you need to change one piece.