I recently heard about this fantastic app, available for both Mac OS and iOS, called Aiko which leverages AI technology to transcribe audio. What sets Aiko apart from similar solutions though include, in part:
- It’s free, totally free.
- Audio can be dictated directly into the app, or a pre-recorded file can be imported. I’m particularly excited about this second piece.
- Everything happens on the end-user’s device, nothing is sent to the cloud.
- Multiple languages are supported, we’re talking a lot of languages: 100 languages according to Aiko’s home page.
I was excited to test out this fascinating technology and so to really put it through it’s paces, in a sub-optimal recording environment, I decided to record some audio using my Apple Watch, while standing outside with lots of traffic and other background noise. What follows is the unedited output of my little experiment. I’m also adding the actual recorded audio, so that you can get a sense of the crummy audio I gave Aiko to work with.
Hello, and thanks for joining me today.
I’m playing with an app called AIKO.
It’s an app that leverages Whisper, which is a technology made by OpenAI, the folks that brought us ChatGPT.
Now unless you’ve been living under a rock for the past couple of months, I’m sure you’ve heard quite a lot about ChatGPT and the fascinating possibilities it opens up to us.
Anyway, Whisper, and on top of that this AIKO app, allow transcription of audio.
The interesting thing about it is that you can record directly in the AIKO app, or you can import audio, say from a file that was pre-recorded.
For example, you might have a pre-recorded audio file of a lecture or a class.
You would be able to import it into this AIKO app, transcription would happen, and then you would have the output as text.
For my test today, I’m standing outside in front of my house recording on my Apple Watch with traffic going by.
And the reason I’m doing this is because I wanted to come up with a very sub-optimal recording environment, just to better understand how the technology would deal with audio recorded in such an environment.
I’m also trying to speak as naturally as I can without saying words like um and uh, things that I think often get said when speaking.
The interesting thing about AIKO and the way that it transcribes audio is that it supposedly is able to insert punctuation correctly.
I’m not sure if it does anything about paragraphs or not, but as the speaker, I don’t have any way of controlling format.
Once you run a file or recording through AIKO, the output is rendered as text.
However, there are a few things you can do with it.
First, you can of course copy the text into some other application.
The other thing that you can do is have the text be timestamped.
The reason that this can be handy is that you can use that then to create files that can be used as closed captioning for videos.
Anyway, it is kind of loud out here, and so I will go back inside.
I also didn’t want to make this too long because I’m not sure if it’ll work at all or how accurate it’ll be, but my plan is to post this to the blog without editing it.
Stop, stop, stop.Aiko-generated transcription from my Apple Watch recording.
One final note, the dictation ends with the words “stop stop”. I didn’t actually speak those words, but because I have VoiceOver activated on my Apple Watch, they were picked up in the recording as I located and activated the stop button. This is definitely incredible technology and the price certainly can’t be beat. From an accessibility perspective, I found Aiko to be extremely accessible with VoiceOver on both Mac and IOS and since it is a native app using native controls, I feel confident that it will work with other assistive technologies as well. You can find more information about Aiko, including FAQs, links to app store pages and more here.
4 replies on “Playing around with Aiko, an amazing, accessible transcription app for Mac and iOS”
I was just looking for an iOS app that implemented Whisper, so your post came right on time. The fact that it also works in macOS is definitely a plus. Once again, thanks so much for sharing! By the way, one amazing thing about the Whisper technology is not only that it supports multiple languages, but that you can talk to it in any of the supported languages in the same file and it has no problem to understand. In fact, it understands me if I mix English and Spanish in the same sentence. The power of the transformers! This is much more than just transcription. This is text generation!
LikeLiked by 1 person
Thanks so much, I’m glad you like the post. I think it is incredible that it can detect multiple languages within the same file. There’s also an option to translate to English, so it should be possible to have it transcribe Spanish audio and render the text as translated English. I was really impressed that it was able to filter out the background noise in my audio. Also, there’s a tip on the Aiko home page describing how to leverage ChatGPT to separate the text by paragraph instead of sentence. Of course then it would also be possible to use ChatPGT to fix any of the punctuation mistakes. So so many possibilities.
We then have an excellent meetings summarizer potentially. We could write a prompt to chat GPT instructing it to create a summary with action items and takeaways and tell it that we want it formatted with headings and llists, or however we wanted. It is limited to 4000 tokens right now, but when they expand to 32k tokens possibly later in the year…, no more notetaking!!!
LikeLiked by 1 person
I love the idea of a meeting summarizer, it’s a great example of how easy it is becoming to possibly dream a thing and then just go and build the thing.
I haven’t been this excited about the possibilities of technology in a long time. I also read an interesting article recently claiming that “prompt writer” is shaping up to be an actual career. And to think, this is just the very beginning of this revolution.