What exactly is Speech Recognition?

Speech recognition

What you need to know about speech recognition

It is also important to notice that speech recognition isn’t the same as voice recognition, even though sometimes people use the two terms for the same thing. Voice recognition is used for identification of the person who is speaking and not to note what was being said.

A short history of speech recognition and related technology

In this article, we will briefly explain the history and technology behind the rise of speech recognition.

Ever since the dawn of the digital age, people had an urge to somehow be able to communicate with machines. After the first kind of digital computer was invented, numerous scientist and engineers have tried in various ways to somehow implement speech recognition into this process.  A crucial year of this process was 1962, when IBM revealed Shoebox, a basic speech recognition machine that was able to do simple math calculations. If the user of this proto-computer spoke into a microphone, this machine was able to recognize up to six control words like “plus” or “minus”. Over time, the technology behind this developed and today it is very common feature to interact with computers by voice. There are many famous speech recognition engines like Siri or Alexa.  It is important to note these voice-driven devices are dependent on artificial intelligence (AI) and machine learning.

Another important name in the development of AI is John McCarthy, who first coined the very term “artificial intelligence”. McCarthy stated that AI is: “the science and engineering of making intelligent machines”. This definition came to light at a seminal conference at Dartmouth College in 1956.  From then on AI started to develop at a frantic pace.

Today, artificial intelligence in its various form is present everywhere.  It has grown to mass adoption, mainly due to increase in the overall volume of data that is being exchanged worldwide every day. It is used in advanced algorithms, and it gave rise to improvements in storage and computing power. AI is used for many purposes, for example translation, transcription, speech, face and object recognition, analysis of medical images, processing of natural languages, various social network filters and so on. Remember that chess match between grandmaster Gari Kasparov and Deep Blue chess AI?

Untitled 7 1

Machine learning is another very important application of artificial intelligence. In short, it refers to any systems that have the ability to learn and improve from the database of their own experience. This works through recognition of patterns. For the system to do that it needs to be able to be trained. The algorithm of the system receives an input of large amounts of data, and at one point it becomes able to identify patterns from that data. The end goal of this process is to enable these computer systems to learn independently, without the need for any human intervention or assistance.

Another thing that is very important to mention alongside machine learning is deep learning. One of the most important tools in the process of deep learning are the so-called artificial neural networks. They are advanced algorithms, similar to the structure and function of the human brain. However, they are static and symbolic, unlike biological brain which is plastic and more analogue based. In short, this deep learning is a very specialized manner of machine learning, primarily based on artificial neural networks. The goal of deep learning is to closely replicate human learning processes.  Deep learning technology is very useful, and it plays an important role in various devices that are controlled by the voice – tablets, TVs, smartphones, fridges etc.  Artificial neural networks are also used as a kind of filtering system that aims to predict the items that the user would buy in the future. Deep learning technology is also very widely used in the medical field. It is very important to cancer researchers, because it helps to automatically detect cancer cells.

 

When to use speech recognition?

Untitled 8 1

Maybe you are now curious about how all of this works. Well, for it to work, sensors like microphones have to be built into the software so that the sound waves of the spoken words are recognized, analyzed and converted to a digital format. The digital information then has to be compared with other information that is stored in some sort of words and expressions repository. When there is a match the software can recognize the command and act accordingly.