Luis Mejía - All things data.

My Projects

Project 1: OpenAI whisper model ran on local computer.

Script used to transcribe audio to text by utilizing OpenAI's Whisper open-source model: https://github.com/openai/whisper. My gaming computer has an NVIDIA RTX-3080 GPU, which allows me to run the Whisper model locally without using the OpenAI API (which can be expensive). I wrote a script a while ago that records a live stream called "La Hora del Té." The script I worked on now cuts segments of the audio where there are commercials and then creates a new file. After the file is created, it utilizes the Whisper model to transcribe the audio to text. The purpose of this is to create a GPT with knowledge from about 10 years' worth of transcribed live streams. Unfortunately, I didn't document all the steps I took to make this work; I just pulled an all-nighter until it worked.

Here are the steps I remember following:

  • Install PyTorch with CUDA support for NVIDIA GPUs: pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
    • This works with NVIDIA GPUs. You can also use your computer's CPU to transcribe, but it takes longer, and a different version of PyTorch needs to be installed.
  • Install Whisper: pip install git+https://github.com/openai/whisper.git
  • Download and install Chocolatey
  • Download and install FFmpeg
  • Download the NVIDIA CUDA Toolkit
  • Add FFmpeg to the system path. If this doesn't work, you may need to copy the files to a directory like C:\ffmpeg\bin and then point the path within the script to this location.

Project 2: Clash of Clans and Discord API's

This script will retrieve Clan War information for each clan where I play with friends. It will identify players who are missing attacks, and a separate script will tag them on Discord, reminding them about the war attacks that need to be completed and how many hours are left in the war.

Project 3: Live stream recording.

Script I wrote to record a popular live show from Honduras called "La Hora del Té": These recordings are then uploaded to their official app so they can be listened to by fans worldwide.

Project 4: Clash of Clans - Clan War Stats.

Project capturing data from my favorite mobile game API - Clash of Clans. These stats are then loaded into different tables in my BigQuery project.

Project 5: Exploring Seaborn library.

Following visualization in Python with the Seaborn library. It's okay for doing everything within python, but I will stick to Tableau instead.