This course, Fundamentals of Speech Recognition (E6998), is designed to provide a comprehensive understanding of automatic speech recognition, covering both theoretical and practical aspects. The instructor is Prof. Homayoon Beigi, and the primary textbook is "Fundamentals of Speaker Recognition" by the same author, published by Springer. The course is structured with a combination of homework, midterm proposals, and a final project, with grading components including homework (20%), midterm proposal (20%), and final project (60%). The final project includes a 6-page conference-style paper, code and results, and a 5-minute presentation.
The course covers a wide range of topics, including the anatomy of speech, signal representation, phonetics, signal processing, probability theory, information theory, decision theory, parameter estimation, neural networks, and language modeling. Students will use the Kaldi speech recognition software to implement a speech recognition engine, with hands-on projects and detailed lectures. The course includes weekly lecture slides and is structured over 12 weeks, with topics ranging from signal processing and neural networks to hidden Markov models and information theory. Research projects will focus on areas such as large vocabulary speech recognition, keyword detection, speaker recognition, and emotion detection. The course aims to equip students with the knowledge and skills necessary to design and implement speech recognition systems.This course, Fundamentals of Speech Recognition (E6998), is designed to provide a comprehensive understanding of automatic speech recognition, covering both theoretical and practical aspects. The instructor is Prof. Homayoon Beigi, and the primary textbook is "Fundamentals of Speaker Recognition" by the same author, published by Springer. The course is structured with a combination of homework, midterm proposals, and a final project, with grading components including homework (20%), midterm proposal (20%), and final project (60%). The final project includes a 6-page conference-style paper, code and results, and a 5-minute presentation.
The course covers a wide range of topics, including the anatomy of speech, signal representation, phonetics, signal processing, probability theory, information theory, decision theory, parameter estimation, neural networks, and language modeling. Students will use the Kaldi speech recognition software to implement a speech recognition engine, with hands-on projects and detailed lectures. The course includes weekly lecture slides and is structured over 12 weeks, with topics ranging from signal processing and neural networks to hidden Markov models and information theory. Research projects will focus on areas such as large vocabulary speech recognition, keyword detection, speaker recognition, and emotion detection. The course aims to equip students with the knowledge and skills necessary to design and implement speech recognition systems.