Written on 04 Dec 2022
Senators, subtitles, and speech to text AI
Can we have better California legislative hearing transcripts thanks to some free AI tools?
The California Senate provides audio and video recordings of their hearings along with subtitles for the video, as required by law.
Cool. But how good are the subtitles?
Not great. Sure it can help you understand a bunch of what was said in a hearing but it's not even close to sufficient if you are deaf or heavily rely on closed captioning to follow along with the activity of your representative government.
In my view there are three major problems:
- There are some minor spelling mistakes and some larger grammar mistakes that introduce confusion or at least distract.
- And they're displayed in really short phrases of just a few words, which can make it difficult to follow complex thoughts in a legislative hearing.
- Frustratingly, the captions are in all caps which makes reading difficult and makes it hard to spot commonly used acronyms.
Ok so they suck leave a lot to be desired. What can we do beyond manually typing out each hearing? I hate to bring it up but can a robot have this job? Can AI help here?
Frankly, I find the quality of contemporary artificial intelligence absolutely terrifying. But it's here and there's not much I can do about it. Maybe we can at least get better legislative hearing transcripts?
There's a class of tools called "speech to text" that take audio/video of human language and generate a text version. Duh. A company called OpenAI released a speech to text tool named Whisper and it's been getting rave reviews across my Twitter/Mastodon feeds. The idea is that you give it a video and out comes the transcript. Seems perfect for this experiment!
So here's what we're gonna do: run a video of a hearing through Whisper and then see how the generated closed captioning compares to the set offered by the state.
I selected a recent California Senate hearing concerning an issue near and dear to my heart, the state's campaign finance disclosure system called CAL-ACCESS.
Some real quick background on the hearing:
- The system was originally deployed in 2000 and in 2017 the legislature passed a bill to build a replacement system called the "CAL-ACCESS Replacement System" or "CARS". Cars... yuck.
- So far the state's spent at least $30 million dollars and has almost nothing to show for it. Things were apparently so bad that current Secretary of State Weber scraped the project and started over in June 2021.
- As of the hearing, CARS is not expected to be available to the public until at least 2026.
Hence, the oversight hearing - which I attemped to live tweet.
Anyway, I fed the video to the robot and it generated captions. This part took a long time, like over 24 hours on my Intel chip Macbook Pro.
And the result was surprising! Surprisingly good, that is. You can check it out for yourself below.
In fact please do because I'm not a person who regularly uses assistive technology so I'm unclear if these new captions will be better in all circumstances. If you do frequently use assistive technology I'd love to hear your take!
Here's the hearing, toggle back and forth between the closed captioning sources to see the difference:
The main difference I see is that the whisper-generated captions are much easier to read. They're not in all caps and instead of optimizing for a consistent number of words on the screen at once the new captions seem to be more sentence based. This makes following the back and forth of a hearing much easier.
Additionally, names, titles, and acronyms seem to be more accurately captured and are normally cased so it's nicer on the eyes.
It's not perfect but it's pretty good. And sometimes the mistakes inspire a chuckle or two. For example, an attorney for the Fair Political Practices Commission (the main campaign cash cop in California) gave testimony and the AI wrote the agency name as "Fair Political Praxis Commission".
I'm excited that we might be entering a period where cheap, reliable, high quality transcripts are the norm. That means more people can participate in monitoring government. It also means that we'll be able to more quickly search through audio or video sources in order to conduct research or fact check. Yay!
Do you have any experience working with speech to text tools? Are you a person who uses screen readers and other assistive tools and you have thoughts about the differences in the two captions? I'd love to hear about it! My email is on the home page.