- Environmental Sciences - May 24
Intel invests in UK institute to create Global Centre for Research in Sustainable Connected Cities - Literature - May 24
Queen Victoria's personal journals put online - Agronomy - May 24
Diagnostic labs analyze anything from bugs to toenails - Medicine - May 24
UCLA launches first face transplantation program in western U.S - Environmental Sciences - May 24
Road2Science: Researching Stronger, Safer, Smarter Infrastructure - Physics - May 24
Get ready for the transit of Venus! - Medicine - May 24
Hormone Plays Surprise Role in Fighting Skin Infections - Business - May 24
Engineering a better society - Law - May 24
Latest UT/Texas Tribune Poll: Tax Pledge Issue Reveals Conservative Divide - Medicine - May 24
Device may inject a variety of drugs without using needles - Medicine - May 24
Stopping drug- induced liver injury - Medicine - May 24
Penn Offers Benefits- tax Offset to Same- sex Couples - Environmental Sciences - May 24
Lighting control system at U-M saves energy and costs - Life Sciences - May 24
UC San Diego Receives $7 Million from DOD for Innovative Neural Research - Social Sciences - May 24
Better response plans needed for children exposed to domestic violence - Physics - May 24
Exotic particles, chilled and trapped, form giant matter wave
By category
AdministrationChemistry
Physics
Computer Science
Environmental Sciences
Earth Sciences
Life Sciences
Medicine
Business
Literature
History
Psychology
Social Sciences
» » more
Smart listeners and smooth talkers
16 November 2011 - CAMBRIDGE

Sounds recognised from an audio recording Credit: Xunying Liu
Human-like performance in speech technology could be just around the corner, thanks to a new research project that links three UK universities.
To make the technology more usable and natural, and open up a wide range of new applications, requires field-changing research."
—Professor Phil Woodland
Human conversation is rich and it’s messy. When we communicate, we constantly adjust to those around us and to the environment we’re in; we leave words out because the context provides meaning; we rush or hesitate, or change direction; we overlap with other speakers; and, crucially, we’re expressive.
No wonder then that it’s proved so challenging to build machines that interact with people naturally, with human-like performance and behaviour.
Nevertheless there have been remarkable advances in speech-to-text technologies and speech synthesizers over recent decades. Current devices speed up the transcription of dictation, add automatic captions to video clips, enable automated ticket booking and improve the quality of life for those requiring assistive technology.
However, today’s speech technology is limited by its lack of ability to acquire knowledge about people or situations, to adapt, to learn from mistakes, to generalise and to sound naturally expressive. "To make the technology more usable and natural, and open up a wide range of new applications, requires field-changing research," explained Professor Phil Woodland of Cambridge’s Department of Engineering.
Along with scientists at the Universities of Edinburgh and Sheffield, Professor Woodland and colleagues Drs Mark Gales and Bill Byrne have begun a five-year, £6.2 million project funded by the Engineering and Physical Sciences Research Council to provide the foundations of a new generation of speech technology.
Complex pattern matching
Speech technology systems are based on powerful techniques that are capable of learning statistical models known as Hidden Markov Models (HMMs). Trained on large quantities of real speech data, HMMs model the relationship between the basic speech sounds of a language and how these are realised in audio waveforms.
It’s a complex undertaking. For speech recognition, the system must work with a continuous stream of acoustic data, with few or no pauses between individual words. To determine where each word stops and starts, HMMs attempt to match the pattern of successive sounds (or phonemes) to the system’s built-in dictionary, assigning a probability score as to which sounds are most likely to follow the first sound to complete a word. The system then takes into account the structure of the language and which word sequences are more likely than others.
A key focus for the new project is to build systems that are adaptive, enabling them to acclimatise automatically to particular speakers and learn from their mistakes. Ultimately, the new systems will be able to make sense of challenging audio clips, efficiently detecting who spoke what, when and how.
Unsupervised training is also crucial, as Professor Woodland explained: "Systems are currently pre-trained with the sort of data they are trying to recognise - so a dictation system is trained with dictation data - but this is a significant commercial barrier as each new application requires specific types of data. Our approach is to build systems that are trained on a very wide range of data types and enable detailed system adaptation to the particular situation of interest. To access and structure the data, without needing manual transcripts, we are developing approaches that allow the system to train itself from a large quantity of unlabelled speech data."
"One very interesting aspect of the work is that the fundamental HMMs are also generators of speech, and so the adaptive technology underlying speech recognition is also being applied to the development of personalised speech synthesis systems," added Professor Woodland. New systems will take into account expressiveness and intention in speech, enabling devices to be built that respond to an individual’s voice, vocabulary, accent and expressions.
The three university teams have already made considerable contributions to the field and many techniques used in current speech recognition systems were developed by the engineers involved in the new project. The new programme grant enables them to take a wider vision and to work with companies that are interested in how speech technology could transform our lives at home and at work. Applications already planned include a personalised voice-controlled device to help the elderly to interact with control systems in the home, and a portable device to enable users to create a searchable text version of any audio they encounter in their everyday lives.
This work is licensed under a Creative Commons Licence. If you use this content on your site please link back to this page.
Adapt, train and talk
A key focus for the new project is to build systems that are adaptive, enabling them to acclimatise automatically to particular speakers and learn from their mistakes. Ultimately, the new systems will be able to make sense of challenging audio clips, efficiently detecting who spoke what, when and how.
Unsupervised training is also crucial, as Professor Woodland explained: "Systems are currently pre-trained with the sort of data they are trying to recognise - so a dictation system is trained with dictation data - but this is a significant commercial barrier as each new application requires specific types of data. Our approach is to build systems that are trained on a very wide range of data types and enable detailed system adaptation to the particular situation of interest. To access and structure the data, without needing manual transcripts, we are developing approaches that allow the system to train itself from a large quantity of unlabelled speech data."
"One very interesting aspect of the work is that the fundamental HMMs are also generators of speech, and so the adaptive technology underlying speech recognition is also being applied to the development of personalised speech synthesis systems," added Professor Woodland. New systems will take into account expressiveness and intention in speech, enabling devices to be built that respond to an individual’s voice, vocabulary, accent and expressions.
The three university teams have already made considerable contributions to the field and many techniques used in current speech recognition systems were developed by the engineers involved in the new project. The new programme grant enables them to take a wider vision and to work with companies that are interested in how speech technology could transform our lives at home and at work. Applications already planned include a personalised voice-controlled device to help the elderly to interact with control systems in the home, and a portable device to enable users to create a searchable text version of any audio they encounter in their everyday lives.
This work is licensed under a Creative Commons Licence. If you use this content on your site please link back to this page.
To make the technology more usable and natural, and open up a wide range of new applications, requires field-changing research."
Last job offers
- Civil Engineering - 24.5
Wissensch. Assistent/in MINERGIE® Agentur Bau (80–100 %) - Agronomy - 22.5
Wissenschaftliche Mitarbeiter/in Koordination Agrar-Umweltindikatoren - Social Sciences - 21.5
wissenschaftliche Mitarbeiterin/ wissenschaftlicher Mitarbeiter - Electroengineering - 21.5
Sektionsleiter/in - Electroengineering - 21.5
Elektroingenieur/in FH - Life Sciences - 17.5
Hochschulabsolventen (m/w) Fachrichtungen Biologie, Mikrobiologie, Bio-Informatik... - Medicine - 25.5
Chair of Paediatrics (Associate Professor-Professor) - Earth Sciences - 24.5
2012-05-24 at the Department of Geological Sciences. Reference number SU 612-1718-12. Deadline for applications:... - Pedagogy - 24.5
Professur für Erziehungswissenschaft (Allgemeine Pädagogik) - Pedagogy - 24.5
Schulpädagogik (mit dem Schwerpunkten Schulforschung und Allgemeine Didaktik) - Medicine - 24.5
Chair in Bacteriology - YMS360A - Business - 24.5
Associate Professor in Operations Management - Business - 23.5
Full, Assoc, or Asst. Professor in Marketing - Life Sciences - 23.5
Open Rank Professor - Pathology & Lab Med

» Share this page: