Making knowledge easier is the focus of this site. We normally use Otter.ai to create transcripts and clickable immersive text to help our readers read, digest and most importantly, learn the massive amount of information on this site and that is available from medical professionals, scientists, lived experience professionals that create massive amounts of content in the form of podcasts, audio and video.
It's always been the goal to integrate this technology, for ease of use, control of visual aesthetics, cost and more. I currently cover all of the sites costs and donate my time so anyway to create and utilize tools to aid in that can go along way.
This post is to serve as a testing ground for some new tools I have found. Specifically, Whisper AI and WebVTT.
Utilizing this Colab notebook I have found another alternative to the self hosted Whisper AI Web Docker program I was using. Being able to use free GPU power from Google for Transcripts is going to enable more translations as well as audio playback. As AI grows, likely text to speech too.
Tools:
DeepGram's Google Colab Python Notebook: https://colab.research.google.com/github/deepgram-devs/try-whisper-in-google-collab/blob/main/try_whisper_in_three_easy_steps.ipynb
Huberman Labs Podcasts:
HTML5 Player with Clickable Transcript project:
AblePlayer
To get started, we are going to be using the Huberman Labs Podcast # 87 Dr. Erich Jarvis: The Neuroscience of Speech, Language & Music
Getting WebVTT Transcript using Python and Google CoLAB:
First, we will setup the notebook and run the imports, We need to modify the code to point to that specific podcast:
Pull in the Audio from youtube and run whisper AI on it to create the VTT file: (renamed to .txt to make Wix Happy)
Now that we have an audio file, and a VTT file we can start to work on the Player.
First lets see how we can integrate this project into a WIx blog post, by embedding the HTML code Directly: