Tara Sainath

Last updated May 02, 2024

Tara N. Sainath is an American computer scientist whose research involves deep learning applied to speech recognition. She is a principal research scientist at Google Research.

Education and career

Sainath was a student of electrical and engineering and computer science at the Massachusetts Institute of Technology, where she received a bachelor's degree, a master's degree in 2005, and a Ph.D. in 2009. Her master's thesis was Acoustic Landmark Detection and Segmentation using the McAulay-Quatieri Sinusoidal Model, supervised by Timothy Hazen,^[1] and her doctoral dissertation was Applications of Broad Class Knowledge for Noise Robust Speech Recognition, supervised by Victor Zue.^[2]^[3]

She worked for IBM Research at the Thomas J. Watson Research Center before moving to Google Research.^[4]

Recognition

Sainath was elected both as an IEEE Fellow and as a fellow of the International Speech Communication Association in 2022, in both cases "for contributions to deep learning for automatic speech recognition".^[5]^[6]

Related Research Articles

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

<span class="mw-page-title-main">Raj Reddy</span> Indian-American computer scientist (born 1937)

Dabbala Rajagopal "Raj" Reddy is an Indian-American computer scientist and a winner of the Turing Award. He is one of the early pioneers of artificial intelligence and has served on the faculty of Stanford and Carnegie Mellon for over 50 years. He was the founding director of the Robotics Institute at Carnegie Mellon University. He was instrumental in helping to create Rajiv Gandhi University of Knowledge Technologies in India, to cater to the educational needs of the low-income, gifted, rural youth. He was the founding chairman of International Institute of Information Technology, Hyderabad. He is the first person of Asian origin to receive the Turing Award, in 1994, known as the Nobel Prize of Computer Science, for his work in the field of artificial intelligence.

Lawrence R. Rabiner is an electrical engineer working in the fields of digital signal processing and speech processing; in particular in digital signal processing for automatic speech recognition. He has worked on systems for AT&T Corporation for speech recognition.

Thomas Shi-Tao Huang was a Chinese-born American computer scientist, electrical engineer, and writer. He was a researcher and professor emeritus at the University of Illinois at Urbana-Champaign (UIUC). Huang was one of the leading figures in computer vision, pattern recognition and human computer interaction.

Richard "Dick" Francis Lyon is an American inventor, scientist, and engineer. He is one of the two people who independently invented the first optical mouse devices in 1980. He has worked in signal processing and was a co-founder of Foveon, Inc., a digital camera and image sensor company.

Computer audition (CA) or machine listening is the general field of study of algorithms and systems for audio interpretation by machines. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems — "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents."

Time delay neural network (TDNN) is a multilayer artificial neural network architecture whose purpose is to 1) classify patterns with shift-invariance, and 2) model context at each layer of the network.

<span class="mw-page-title-main">Roberto Pieraccini</span> Italian-American computer scientist

Roberto Pieraccini is an Italian and US electrical engineer working in the field of speech recognition, natural language understanding, and spoken dialog systems. He has been an active contributor to speech language research and technology since 1981. He is currently the Chief Scientist of Uniphore, a conversational automation technology company.

Carol Yvonne Espy-Wilson is an electrical engineer and Professor of Electrical and Computer Engineering at the University of Maryland (UMD) at College Park. She received her Ph.D. in Electrical Engineering from the Massachusetts Institute of Technology in 1987.

<span class="mw-page-title-main">Shrikanth Narayanan</span> Researcher

Shrikanth Narayanan is an Indian-American Professor at the University of Southern California. He is an interdisciplinary engineer–scientist with a focus on human-centered signal processing and machine intelligence with speech and spoken language processing at its core. A prolific award-winning researcher, educator, and inventor, with hundreds of publications and a number of acclaimed patents to his credit, he has pioneered several research areas including in computational speech science, speech and human language technologies, audio, music and multimedia engineering, human sensing and imaging technologies, emotions research and affective computing, behavioral signal processing, and computational media intelligence. His technical contributions cover a range of applications including in defense, security, health, education, media, and the arts. His contributions continue to impact numerous domains including in human health, national defense/intelligence, and the media arts including in using technologies that facilitate awareness and support of diversity and inclusion. His award-winning patents have contributed to the proliferation of speech technologies on the cloud and on mobile devices and in enabling novel emotion-aware artificial intelligence technologies.

Dorin Comaniciu is a Romanian-American computer scientist. He is the Senior Vice President of Artificial Intelligence and Digital Innovation at Siemens Healthcare.

Larry Paul Heck is currently the Rhesa Screven Farmer, Jr., Advanced Computing Concepts Chair, Georgia Research Alliance Eminent Scholar, and professor at the Georgia Institute of Technology. His career spans many of the sub-disciplines of artificial intelligence, including conversational AI, speech recognition and speaker recognition, natural language processing, web search, online advertising and acoustics. He is probably best known for his role as the founder of the Microsoft Cortana Personal Assistant and his early work in deep learning for speech processing.

Biing Hwang "Fred" Juang is a communication and information scientist, best known for his work in speech coding, speech recognition and acoustic signal processing. He joined Georgia Institute of Technology in 2002 as Motorola Foundation Chair Professor in the School of Electrical & Computer Engineering.

<span class="mw-page-title-main">Steve Young (software engineer)</span> British researcher (born 1951)

Stephen John Young is a British researcher, Professor of Information Engineering at the University of Cambridge and an entrepreneur. He is one of the pioneers of automated speech recognition and statistical spoken dialogue systems. He served as the Senior Pro-Vice-Chancellor of the University of Cambridge from 2009 to 2015, responsible for planning and resources. From 2015 to 2019, he held a joint appointment between his professorship at Cambridge and Apple, where he was a senior member of the Siri development team.

Thomas Francis Quatieri Jr. is an American electrical engineer and Senior Technical Staff member at the MIT Lincoln Laboratory. He is recognized for his contributions in speech signal processing, in conjunction with Petros Maragos and James Kaiser, by using the discrete Fourier transform to examine energy modulation in speech waveforms. In 1999 he was elected a Fellow of the IEEE "for contributions to sinusoidal speech and audio modeling and nonlinear signal processing".

The Robotics Collaborative Technology Alliance (R-CTA) was a research program initiated and sponsored by the US Army Research Laboratory. The purpose was to "bring together government, industrial, and academic institutions to address research and development required to enable the deployment of future military unmanned ground vehicle systems ranging in size from man-portables to ground combat vehicles." Collaborative Technology and Research Alliances was a term for partnerships between Army laboratories and centers, private industry and academia for performing research and technology development intended to benefit the US Army. The partnerships were funded by the US Army.

Lori Faith Lamel is a speech processing researcher known for her work with the TIMIT corpus of American English speech and for her work on voice activity detection, speaker recognition, and other non-linguistic inferences from speech signals. She works for the French National Centre for Scientific Research (CNRS) as a senior research scientist in the Spoken Language Processing Group of the Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur.

Abeer Alwan is an American electrical engineer and speech processing researcher. She is a professor of electrical and computer engineering in the UCLA Henry Samueli School of Engineering and Applied Science, and vice chair for undergraduate affairs in the Department of Electrical & Computer Engineering.

Yang Liu is a Chinese and American computer scientist specializing in speech processing and natural language processing, and a senior principal scientist for Amazon.

Xiaoming Liu is a Chinese-American computer scientist and an academic. He is a Professor in the Department of Computer Science and Engineering, MSU Foundation Professor as well as Anil K. and Nandita Jain Endowed Professor of Engineering at Michigan State University.

References

↑ Sainath, Tara N. (2005), Acoustic Landmark Detection and Segmentation using the McAulay-Quatieri Sinusoidal Model (PDF), Massachusetts Institute of Technology, retrieved 2023-04-21
↑ Sainath, Tara N. (2009), Applications of Broad Class Knowledge for Noise Robust Speech Recognition (PDF), Massachusetts Institute of Technology, retrieved 2023-04-21
↑ Tara Sainath at the Mathematics Genealogy Project
↑ Tara Sainath, Google Research, retrieved 2023-04-21
↑ 2022 Newly Elevated Fellows (PDF), IEEE, retrieved 2023-04-21
↑ Wellekens, Chris (May 9, 2022), "ISCA Fellows announced", ISCApad, no. 287, International Speech Communication Association

External links

Home page
Tara Sainath publications indexed by Google Scholar

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[msthesis-1] Sainath, Tara N. (2005), Acoustic Landmark Detection and Segmentation using the McAulay-Quatieri Sinusoidal Model (PDF), Massachusetts Institute of Technology, retrieved 2023-04-21

[phdthesis-2] Sainath, Tara N. (2009), Applications of Broad Class Knowledge for Noise Robust Speech Recognition (PDF), Massachusetts Institute of Technology, retrieved 2023-04-21

[mg-3] Tara Sainath at the Mathematics Genealogy Project

[goog-4] Tara Sainath, Google Research, retrieved 2023-04-21

[if-5] 2022 Newly Elevated Fellows (PDF), IEEE, retrieved 2023-04-21

[isca-6] Wellekens, Chris (May 9, 2022), "ISCA Fellows announced", ISCApad, no. 287, International Speech Communication Association

[1]

[2]

[3]

[4]

[5]

[6]

Authority control databases
International	VIAF
Academics	DBLP Google Scholar Mathematics Genealogy Project ORCID
Other	IdRef