Everything started with man producing the first complex machines which were automata (or self moving machines), by means of which he attempted to simulate nature and domesticate natural forces”
A point to highlight is the word “simulate” that’s where the birth of AI got triggered.
Opening our eyes, today we stand in a way far far far from the field of technology expected even by science fiction writers. Many might have not thought at some time in future a “human brain” can play chess with “non-living thing ”(or informally a box which can be kept in your bag) and even lose with it
And there is here I am to say about a marvel in human history “speech recognition”
When we call most large companies, an automated voice recording answers and instructs you. Often you can just speak certain words (again, as instructed by a recording) to get what you need. The system that makes this possible is a type of speech recognition program -- an automated phone system.
Now it had even entered in your computer..Wondered?!!..There is a reason to be so...ever imagined that a “thing” made up of plastic and wires would really identify any person voice and enter text on its own.?? An another example of engineering encouraging laziness.
It allows users to dictate to their computer and have their words converted to text in a word processing or email document. You can access function commands, such as opening files and accessing menus, with voice instructions.
People with disabilities that prevent them from typing have also adopted speech-recognition systems.
These systems work best in a business environment where a small number of users will work with the program.The accuracy rate will fall drastically with any other user.i dont want to go in dept of how it works but just to mention the basics here it goes..
To convert speech to on-screen text or a computer command, a computer has to go through several complex steps. When you speak, you create vibrations in the air. The analog-to-digital converter (ADC) translates this analog wave into digital data that the computer can understand.To do this, it samples the sound by taking precise measurements of the wave at frequent intervals. Also it handles different bands of frequency. People don't always speak at the same speed, so the sound must be adjusted to match the speed of the template sound samples already stored in the system's memory.
Now the most difficult part.The program examines phonemes in the context of the other phonemes around them. It runs the contextual phoneme plot through a complex statistical model and compares them to a large library of known words, phrases and sentences. The program then determines what the user was probably saying and either outputs it as text or issues a computer command.
But it’s not so easy as said Imagine someone from Boston saying the word "barn." He wouldn't pronounce the "r" at all, and the word comes out rhyming with "John." Or consider the sentence, "I'm going to see the ocean." Most people don't enunciate their words very carefully. The result might come out as "I'm goin' da see tha ocean." or "I'm goin'" and "the ocean." Rules-based systems were unsuccessful because they couldn't handle these variations. This also explains why earlier systems could not handle continuous speech -- you had to speak each word separately, with a brief pause in between them.
Today's speech recognition systems use powerful and complicated statistical modeling systems.
In Markov model, each phoneme is like a link in a chain, and the completed chain is a word. During this process, the program assigns a probability score to each phoneme, based on its built-in dictionary and user training.
the system has to figure out where each word stops and starts. The classic example is the phrase "recognize speech," which sounds a lot like "wreck a nice beach" when you say it very quickly. The program has to analyze the phonemes using the phrase that came before it in order to get it right. Here's a breakdown of the two phrases:
r eh k ao g n ay z s p iy ch
"recognize speech"
r eh k ay n ay s b iy ch
"wreck a nice beach"
There is some art into how one selects, compiles and prepares this training data for "digestion" by the system and how the system models are "tuned" to a particular application.
Why is this so complicated? If a program has a vocabulary of 60,000 words (common in today's programs), a sequence of three words could be any of 216 trillion possibilities. Obviously, even the most powerful computer can't search through all of them without some help. And theres where a boundry line is drawn between “life” and “non-life”
A computer has to do so many things which brain does in instants.. and infact interprets a meaning to the words spoken
No speech recognition system is 100 percent perfect; several factors can reduce accuracy.
The program needs to "hear" the words spoken distinctly, and any extra noise introduced into the sound will interfere can change how the system understands the word. The noise can come from a number of sources, including loud background. Users should work in a quiet room with a quality position of the microphone. Again where it draws the boundary of usage.. “should used in a very quiet manner “Low-quality soundcards can introduce hum or hiss into the signal
Running the statistical models needed for speech recognition requires the computer's processor to do a lot of heavy work. One reason for this is the need to remember each stage of the word-recognition search in case the system needs to backtrack to come up with the right word
Homonyms--a big problem
Homonyms are two words that are spelled differently and have different meanings but sound the same. "There" and "their," "air" and "heir," "be" and "bee" are all examples. There is no way for a speech recognition program to tell the difference between these words based on sound alone.
Way in the future..
Only in the 1990s did computers powerful enough to handle speech recognition become available to the average consumer. Current research could lead to technologies that are currently more familiar in an episode of "Star Trek." The Defense Advanced Research Projects Agency (DARPA) has three teams of researchers working on Global Autonomous Language Exploitation (GALE), It hopes to create software that can instantly translate two languages with at least 90 percent accuracy. "DARPA is also funding an R&D effort called TRANSTAC to enable our soldiers to communicate more effectively with civilian populations in non-English-speaking countries," adding that the technology will undoubtedly spin off into civilian applications, including a universal translator.
The following video shows both the usability and non usability of speech recognition
Although it is a huge leap in terms of computational power and software sophistication, some researchers argue that speech recognition development offers the most direct line from the computers of today to true artificial intelligence. We can talk to our computers today. In 25 years, they may very well talk back.
To the end i would say that there is a gap between life and non-life..but this gap is decreasing day by day and there might or might not be a day in future this gap exists or may vanish.whatever there is a difference in living and non-living...hope we can see a time where both define the "same meaning"
-----B.Vamshi
EE09B104
Give feedback
click here
references
click here
click here
click here
To the end i would say that there is a gap between life and non-life..but this gap is decreasing day by day and there might or might not be a day in future this gap exists or may vanish.whatever there is a difference in living and non-living...hope we can see a time where both define the "same meaning"
-----B.Vamshi
EE09B104
Give feedback
click here
references
click here
click here
click here
0 comments:
Post a Comment