I recently heard about this fantastic app, available for both Mac OS and iOS, called Aiko which leverages AI technology to transcribe audio. What sets Aiko apart from similar solutions though include, in part:
- It’s free, totally free.
- Audio can be dictated directly into the app, or a pre-recorded file can be imported. I’m particularly excited about this second piece.
- Everything happens on the end-user’s device, nothing is sent to the cloud.
- Multiple languages are supported, we’re talking a lot of languages: 100 languages according to Aiko’s home page.
I was excited to test out this fascinating technology and so to really put it through it’s paces, in a sub-optimal recording environment, I decided to record some audio using my Apple Watch, while standing outside with lots of traffic and other background noise. What follows is the unedited output of my little experiment. I’m also adding the actual recorded audio, so that you can get a sense of the crummy audio I gave Aiko to work with.
Hello, and thanks for joining me today.
I’m playing with an app called AIKO.
It’s an app that leverages Whisper, which is a technology made by OpenAI, the folks that brought us ChatGPT.
Now unless you’ve been living under a rock for the past couple of months, I’m sure you’ve heard quite a lot about ChatGPT and the fascinating possibilities it opens up to us.
Anyway, Whisper, and on top of that this AIKO app, allow transcription of audio.
The interesting thing about it is that you can record directly in the AIKO app, or you can import audio, say from a file that was pre-recorded.
For example, you might have a pre-recorded audio file of a lecture or a class.
You would be able to import it into this AIKO app, transcription would happen, and then you would have the output as text.
For my test today, I’m standing outside in front of my house recording on my Apple Watch with traffic going by.
And the reason I’m doing this is because I wanted to come up with a very sub-optimal recording environment, just to better understand how the technology would deal with audio recorded in such an environment.
I’m also trying to speak as naturally as I can without saying words like um and uh, things that I think often get said when speaking.
The interesting thing about AIKO and the way that it transcribes audio is that it supposedly is able to insert punctuation correctly.
I’m not sure if it does anything about paragraphs or not, but as the speaker, I don’t have any way of controlling format.
Once you run a file or recording through AIKO, the output is rendered as text.
However, there are a few things you can do with it.
First, you can of course copy the text into some other application.
The other thing that you can do is have the text be timestamped.
The reason that this can be handy is that you can use that then to create files that can be used as closed captioning for videos.
Anyway, it is kind of loud out here, and so I will go back inside.
I also didn’t want to make this too long because I’m not sure if it’ll work at all or how accurate it’ll be, but my plan is to post this to the blog without editing it.
Stop, stop, stop.Aiko-generated transcription from my Apple Watch recording.
One final note, the dictation ends with the words “stop stop”. I didn’t actually speak those words, but because I have VoiceOver activated on my Apple Watch, they were picked up in the recording as I located and activated the stop button. This is definitely incredible technology and the price certainly can’t be beat. From an accessibility perspective, I found Aiko to be extremely accessible with VoiceOver on both Mac and IOS and since it is a native app using native controls, I feel confident that it will work with other assistive technologies as well. You can find more information about Aiko, including FAQs, links to app store pages and more here.