The Role of Artificial Intelligence and Machine Learning in Speech Recognition

Role of Artificial Intelligence and Machine Learning in Speech Recognition

For a long time, people wanted to be able to talk to machines. Ever since they started building computers, scientist and engineers have tried to incorporate speech recognition into the process. In the year 1962, IBM introduced Shoebox, a speech recognition machine that could do simple math calculations. This innovative device recognized and responded to 16 spoken words, including the ten digits from “0” through “9.” When a number and command words such as “plus,” “minus” and “total” were spoken, Shoebox instructed an adding machine to calculate and print answers to simple arithmetic problems. Shoebox was operated by speaking into a microphone, which converted voice sounds into electrical impulses. A measuring circuit classified these impulses according to various types of sounds and activated the attached adding machine through a relay system.

With time, this technology developed and today many of us routinely interact with out computers by voice. The most popular voice assistants today are Alexa by Amazon, Siri by Apple, Google Assistant and Cortana by Microsoft. These assistants can perform tasks or services for an individual based on commands or questions. They are able to interpret human speech and respond via synthesized voices. Users can ask their assistants questions, control home automation devices and media playback via voice, and manage other basic tasks such as email, to-do lists, and calendars with verbal  commands.The more we use these voice-driven devices the more we become dependent on artificial intelligence (AI) and machine learning.

Artificial intelligence (AI)


When you say artificial intelligence (AI), many people might think that you are talking about science fiction, even though AI is very embedded in our everyday life. In fact, it has been for decades. But the truth is, it was indeed science fiction that at the beginning of the 20th century familiarized the public with artificially intelligent human-like robots. In the 50s the concepts of AI came more and more in the focus of interest of scientists and philosophers. In that time young British mathematician Alan Turing suggested that there isn’t a reason why machines couldn’t (just like humans) solve problems and make decisions based on available information. But in that time, computers didn’t have the possibility of memorizing which is key for intelligence. All they did was execute commands. But still, it was Alan Turing who established the fundamental goal and vision of artificial intelligence.

Widely recognized as the father of AI is John McCarthy who coined the term artificial intelligence. For him AI was: “the science and engineering of making intelligent machines”. This definition was presented at a conference at Dartmouth College in 1956 and it indicated the beginning of AI research. From then on AI flourished.

In the modern world artificial intelligence is ubiquitous. It has become more popular thanks to increased data volumes, advanced algorithms, and improvements in computing power and storage. Mostly AI application is connected to intellectual tasks. We use AI for translation, object, face and speech recognition, topic detection, medical image analysis, natural language processing, social network filtering, chess playing etc.

Machine learning

Machine learning is an application of artificial intelligence and it refers to systems that have the ability to improve from their own experience. The most important thing here is that the system needs to knows how to recognize patterns. To be able to do that the system needs to be trained: the algorithm is feed large amounts of data so at some point it is able to identify patterns. The goal is to allow the computers to learn automatically without human intervention or assistance.

When talking about machine learning, it is important to mention deep learning. Let’s start by saying that one of the main tools used in deep learning are artificial neural networks. Those are algorithms which are inspired by the structure and function of the brain, even though they tend to be static and symbolic, and not plastic and analog like the biological brain. So, deep learning is a specialized form of machine learning based on artificial neural network the goal of which is to replicate the way humans learn and this serves as a great tool to find patterns which are far too numerous for a programmer to teach the machine. In the past couple of years there has been much talk about driverless cars and how they could change our lives. Deep learning technology is the key here, because it reduces accidents by enabling the car to distinguish a pedestrian from a fire hydrant or to recognize a red light. Deep learning technology also plays the main role in voice control in devices like tablets, phones, fridges, TVs etc. E-commerce companies often use artificial neural networks as a filtering system that tries to predict and show the items that a user would like to buy. Deep learning technology is also used in medical field. It helps cancer researchers to automatically detect cancer cells and thus represents a tremendous progress in cancer treatment.

Speech recognition                            

Speech recognition technology serves to identify words and phrases form the spoken language and to convert them into a readable format for the machine. While some programs can only identify a limited number of phrases, some more sophisticated speech recognition programs can decipher natural speech.

Are there obstacles to overcome?

While convenient, speech recognition technology doesn’t always go smoothly and it still has a few issues to work through, as it is continuously developed. Problems that may arise can include among others the following: the quality of the recording might be inadequate, there could be noises in the background which make it difficult to understand the speaker, also the speaker might have a really strong accent or dialect (did you ever hear the Geordie dialect?), etc.

Speech recognition has developed quite a lot, but it is still far from being perfect. Not all is only about words, machine still can’t do many things that humans can: they can’t read body language or recognize the sarcastic tone in someone’s voice. People often don’t pronounce every word the proper way and they tend to shorten some words. For example, when speaking fast and informally, native English speakers often pronounce “going to” like “gonna.” All of the above, causes obstacles for machines which they are trying to overcome, but there still a long way on front of them. It is important to highlight that as more and more data are feed to those specific algorithms; the challenges seem to decrease. The future of automated speech recognition seems to be bright.

Voice-powered user interfaces are becoming increasingly available and popular in households. It might even become THE next platform in technology.

Gglot offers automated speech recognition in form of automated transcription services – we convert speeches to text. Our service is simple to use, it won’t cost you much and it will be done quickly!