Local audio transcriptions with whisper-cpp

If you want to do audio transcriptions on your machine, here is a bunch of scripts that show how to do it with whisper-cpp.

I found the transcription results to be better than what youtube et al generate automatically. On my M1 MacBook Air the transcription is mostly faster than real time.

install.sh:

#!/bin/bash
brew install whisper-cpp ffmpeg wget

download-model.sh:

#!/bin/bash
wget https://ggml.ggerganov.com/ggml-model-whisper-large-q5_0.bin

transcode-to-wave.sh:

#!/bin/bash

# Check if an argument was provided
if [ -z "$1" ]; then
    echo "Usage: $0 <input_file>"
    exit 1
fi

# Extract filename without extension
input_file="$1"
base_name="$(basename "$input_file" | sed -E 's/[^a-zA-Z0-9]//g')"
output_file="${base_name}.wav"

# Run FFmpeg command
ffmpeg -i "$input_file" -ac 1 -ar 16000 "$output_file"

whisper-transcribe.sh:

#!/bin/bash

if [ "$#" -ne 2 ]; then
    echo "Usage: $0 <wav_file> <language>"
    exit 1
fi

WAV_FILE="$1"
LANGUAGE="$2"

if [ ! -f "$WAV_FILE" ]; then
    echo "Error: File '$WAV_FILE' not found!"
    exit 1
fi

MODEL="ggml-model-whisper-large-q5_0.bin"

echo "Transcribing '$WAV_FILE' to language '$LANGUAGE'..."
whisper-cli \
    -l "$LANGUAGE" \
    -m "$MODEL" \
    --output-txt \
    --output-vtt \
    --output-srt \
    --output-lrc \
    --output-words \
    --output-csv \
    --output-json \
    --output-json-full \
    "$WAV_FILE"

echo "Transcription completed!"