So recently, I have been working on like 10 different things. And I came to the conclusion that I need to actually sit down and focus on something in particular. My creativity is wonderful, but it runs rampant and all over the place. Often I have 10 different things that require my dire attention. Now, the issue with it is that my dire attention can only be had by a specific thing in particular at a time. You know, there can be creative juices flowing for 10 different things, but I can only be actively doing one specific thing. Which is all and well, but right now I feel like this is the project for me. I think I have come to the conclusion to do this particular project right now. Now what project is that? Well it is making a uh, well I am not quite sure how to describe it. Right now I think it would be considered a “neural network”. Kinda of like artificial intelligence but not really? I am not sure to be quite honest, I just know that I have an idea that requires a lot of programming and downtime. Let me explain what I am trying to do in more depth.
So I just watched this video about Automatic Speech Recognition or ASR for short. I’ve been wanting to do this for quite a while because it is incredibly interesting to me. Really this is just the first part of what I exactly would like to do. Not the point, point is that there are a bunch of ways that you can use to detect speech and this wonderful presenter explained an overview of how exactly it would work. Now I have come to the conclusion that I would like to endeavor into this wonderful world of speech recognition. Yes, yes it will be rather complex and I have no idea where I will start or what exactly I am doing quite yet to be honest. But that is alright. I have figured out the rough parameter however, and the rest of today will be spent on researching it!
So here are my rough parameters for my ASR system, they are the goals I hope to achieve!
- I want my ASR system to be written in python!
- Python makes everything easier 🙂
- I want the system to work fully offline; no online libraries please!
- This may complicate things extremely, however that is quite alright. I have a computer that does not connect to the internet. I would prefer for my project to be able to be run offline so that everything it is trying to compute and work out can be done on that computer. That computer can run 24/7 and not sweat, also its not in my bedroom haha
- Oh, we want it to use free libraries. Nothing like an expensive google library. Yes, I might have to figure out how to make my own libraries for things lol
- A lot of ASR systems generate live text, like they translate in real time. I don’t really care about that right now, maybe eventually. But my functionality is going for precision not speed; hence why it is running on another computer.
- To follow up with that last requirement, we don’t want it to take days to generate text from speech, but we are ok if it takes a few hours. Granted, it depends on how much speech we give it.
- Oh yeah, we want it to be recursive!
- What does this mean? It means that if it messes something up we can tell it that it got it wrong so it can improve itself and try again.
- I think this is called an End-to-End ASR system and there are problems with them however I would prefer it because I will be basically directing it fairly by myself. I intend for it to really be getting used to only my voice.
- I want to be able to give it a mp3/wav file and just let it go to town with that. No visuals, just audio.
- Finally, because we are generating text based off of speech I would like for it to generate two different types of text files
- One type is purely just what text was said as accurately as possible
- The other type we want it to generate is something like a subtitle file, but with the timecodes exactly aligned for each word. I have a really funny plan for this if I am being honest
So yeah, those are my hopes and dreams for this type of project. So now that I have started documenting my journey with this project, I am more motivated to work on it haha! So I will be posting an update on my research/coding progress hopefully later tonight!
Thank you for reading, hopefully you will go and read more of my stuff!
-Ben