Understanding Fundamentals of speech recognition

**Fundamentals of Speech Recognition E6998** is a comprehensive course taught by Prof. Homayoon Beigi, covering all aspects of automatic speech recognition from theory to practice. The course is based on Prof. Beigi's textbook, "Fundamentals of Speaker Recognition," published by Springer in 2011. Grading is divided into several components: a 2-page extended abstract (15%), a 5-minute presentation (5%), a final project (60%), and homework (20%). The final project includes a 6-page IEEE conference-style paper (45%) and a 5-minute presentation (5%). The course covers topics such as the anatomy of speech, signal representation, phonetics and phonology, signal processing, feature extraction, probability theory, information theory, decision theory, parameter estimation, clustering, learning, transformation, hidden Markov modeling, language modeling, and various neural network architectures (TDNN, LSTM, RNN, CNN). Students also work on hands-on projects using Kaldi to develop a functional speech recognition engine. Research projects are individual and can focus on areas like large vocabulary speech recognition, keyword and hotword recognition, speaker recognition, emotion detection, and sequence-to-sequence modeling. Lectures are structured over 12 weeks, covering each topic in detail, with weekly slides provided to students.**Fundamentals of Speech Recognition E6998** is a comprehensive course taught by Prof. Homayoon Beigi, covering all aspects of automatic speech recognition from theory to practice. The course is based on Prof. Beigi's textbook, "Fundamentals of Speaker Recognition," published by Springer in 2011. Grading is divided into several components: a 2-page extended abstract (15%), a 5-minute presentation (5%), a final project (60%), and homework (20%). The final project includes a 6-page IEEE conference-style paper (45%) and a 5-minute presentation (5%). The course covers topics such as the anatomy of speech, signal representation, phonetics and phonology, signal processing, feature extraction, probability theory, information theory, decision theory, parameter estimation, clustering, learning, transformation, hidden Markov modeling, language modeling, and various neural network architectures (TDNN, LSTM, RNN, CNN). Students also work on hands-on projects using Kaldi to develop a functional speech recognition engine. Research projects are individual and can focus on areas like large vocabulary speech recognition, keyword and hotword recognition, speaker recognition, emotion detection, and sequence-to-sequence modeling. Lectures are structured over 12 weeks, covering each topic in detail, with weekly slides provided to students.

Fundamentals of Speech Recognition

| Prof. Homayoon Beigi