So currently I have been working, or attempting to research things, on a particular AI idea that I currently have. Now I keep encountering issues, or finding that I want to or need to develop something much lower down than I initially intentioned. So the first thing I wanted to develop was this AI, and I wanted to give it a text recognition feature so that I could give it images to look at and interpret. Ok, no big deal. HAHHAHAA! How was I wrong. Nah, doing this idea is a very intensive process that is not always accurate. So the idea was “Ok, lets make it more accurate” and the idea was to make it more accurate by giving it a dictionary to work with. Ok, a dictionary is dope but it is horribly inefficient for this application. So now the idea is to make a dictionary first. Well there are many different ways to make a dictionary. Since we only want to make a dictionary that works with text we can ignore phonetics and just use written syllables. No big deal right? Yeah we have no idea how many different written syllables are in the English language. Say what??? How do we not already know this? So now I guess the idea is too make a written syllable list. At first I was just going to add all of them to a text document but then I encountered the issue of ordering them, and also a lot of syllables are the made up of other syllables. Geez! Well, textually they are made up of other syllables. So now its going to be a more complex endeavor. I really want to do it but it is definitely going to take a lot of time. I think I want to automate it, but that adds another step to the process. So right now, I have no idea what I am doing or where I am going. Finding out the amount of information required to catalogue the English language is crazy. Currently I am thinking of making a “most common” syllable index and a unique syllable index. Most words can be made up of common syllables, but some words consist of not common syllables or a singular syllable. Then there’s the idea of breaking everything down mathematically in a way that makes sense and is more efficient for storage. If I want to create a dictionary list it is going to take way more time than I had imagined. Honestly, it reminds me of the nested functions in my algebra class. Which that could be a cool way to integrate everything. But hey, who knows! So I will be slowly but surely working on this! This might sound like a crazy idea, but hey having categorized all of the different syllables for quick reference might be more beneficial than storing the 171,476 words in the English language (oxford English dictionary) in plain text. There are roughly 15,000 phonetic syllables, so I would guess that there are about that many written ones. This might not be the most efficient way to create my dictionary; however I know that it will be most efficient once it has been created. I’ve got some cool ideas for compression and encoding that I am pretty damn sure will work. By being able to compress and encode the dictionary with syllables I think I can create a most space efficient process; now the encoding and decoding programs might be less efficient space wise but that is alright.
Thanks for entertaining thoughts
-Ben