Project 1: OpenAI whisper model ran on local computer.
Script used to transcribe audio to text by utilizing OpenAI's Whisper open-source model: https://github.com/openai/whisper. My gaming computer has an NVIDIA RTX-3080 GPU, which allows me to run the Whisper model locally without using the OpenAI API (which can be expensive). I wrote a script a while ago that records a live stream called "La Hora del Té." The script I worked on now cuts segments of the audio where there are commercials and then creates a new file. After the file is created, it utilizes the Whisper model to transcribe the audio to text. The purpose of this is to create a GPT with knowledge from about 10 years' worth of transcribed live streams. Unfortunately, I didn't document all the steps I took to make this work; I just pulled an all-nighter until it worked.
Here are the steps I remember following:
- Install PyTorch with CUDA support for NVIDIA GPUs:
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
- This works with NVIDIA GPUs. You can also use your computer's CPU to transcribe, but it takes longer, and a different version of PyTorch needs to be installed.
- Install Whisper:
pip install git+https://github.com/openai/whisper.git
- Download and install Chocolatey
- Download and install FFmpeg
- Download the NVIDIA CUDA Toolkit
- Add FFmpeg to the system path. If this doesn't work, you may need to copy the files to a directory like
C:\ffmpeg\bin
and then point the path within the script to this location.