Deep Learning and Data Collection in Speech Recognition for Individuals with Complex Congenital Disorders

Complex congenital disorders often result in speech and motor skill impairments, posing communication challenges. Existing non-English speech recognition tools struggle with non-standard speech patterns, compounded by a lack of large training datasets. This project aims to create a personalized framework for training German speech recognition models, catering to the unique needs of individuals with congenital disorders. You will learn to collect data and apply machine learning or deep learning models.

Keywords: Machine Learning, Deep Learning, Speech recognition, Natural Language Processing, Speech Recognition

Description
**Motivation** For individuals, especially children, with complex congenital disorders such as cerebral palsy, muscular dystrophy, developmental coordination disorder, or Apert syndrome, fine motor skills and speech are frequently impaired to varying degrees. Despite many of these disabilities having little to no impact on cognitive abilities, individuals in these groups often encounter significant communication challenges, both in spoken and written forms. These communication difficulties can hinder their ability to function effectively in society, leading to various disadvantages. Commonly used tools for speech recognition such as Google, Apple, or Dragon often struggle to adapt and detect speech that deviates from the norm, rendering them unsuitable for individuals with speech impairments. This is especially true for languages that are not english. In this project, our primary objective is to develop a framework that streamlines and personalizes the training and fine-tuning of speech recognition models in german, thereby addressing the unique communication needs of these individuals. **Project 1: Early layer fine-tuning** One possible approach to deal with this non-normative speech is to use early layer training that has been shown to work [1, 2]. Early layer training has the advantage that no extensive training dataset for the base model is needed. Goal of the Master Thesis The goal of the Master's Thesis is to replicate the published work [1,2] by utilizing early-layer fine-tuning of an existing German speech recognition model (Open Source model: bofenghuang/whisper-large-v2-cv11-german · Hugging Face) with data from a speech-impaired child. The Master's Thesis includes: - Implement the model and test it on normal speech. - Collect a dataset featuring speech-impaired children (including tasks such as obtaining consent forms, data collection, preprocessing, and data annotation). - Fine-tune the model [2] using the newly acquired dataset and validate the model's robustness. **Project 2: Training-Data Augmentation** This project revolves around an exploratory idea, primarily focused on training an initial model with limited data. Training such a model from scratch is believed to enhance its robustness, especially when dealing with speech that deviates significantly from the norm. The primary constraint is the availability of training data. Collecting speech samples from individuals with certain speech variations presents significant challenges. This is primarily attributed to two key factors: the scarcity of individuals affected by these specific speech disorders (often referred to as 'rare diseases'), and the increased effort required from individuals to provide speech samples. To address this challenge, our approach is to collect a limited amount of data and model the variations in speech. This model will then be used to transform an existing, pre-annotated corpus, enabling us to train a model from scratch. **Goal of the Master Thesis** The goal of this project is to analyze speech variations and modify an existing German speech corpus so that it can serve as the foundation for training a new model. The Master's Thesis includes: - Collect a dataset featuring a speech-impaired child, including tasks such as drafting consent forms, data collection, preprocessing, and data annotation. - Analyze speech variations and adapt an existing corpus to capture these variations in speech. - Train a new model and validate its robustness. **Requirements** This project has the extent of a MSc-Thesis-Project (For semester project we can discuss partial work) Our ideal master's student for this project: - Is self-driven and highly engaged. - Possesses Python coding skills. - Is proficient in the German language (ideally). - Has some knowledge of neural networks **References:** 1. Green et al., “Automatic Speech Recognition of Disordered Speech: Personalized models outperforming human listeners on short phrases” https://www.isca-speech.org/archive/pdfs/interspeech_2021/green21_interspeech.pdf 2. J. Tobin and K. Tomanek, "Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 6637-6641, doi: 10.1109/ICASSP43922.2022.9747516. **Interesting Publication:** 3. Murero et al. “Artificial Intelligence for Severe Speech Impairment: Innovative approaches to AAC and Communication”, https://ceur-ws.org/Vol-2730/paper31.pdf **If you are interested:** Please reach out to us via Email and add CV, transcript of records and why you would like to work with us. Looking forward hearing from you!
**Motivation**
For individuals, especially children, with complex congenital disorders such as cerebral palsy, muscular dystrophy, developmental coordination disorder, or Apert syndrome, fine motor skills and speech are frequently impaired to varying degrees. Despite many of these disabilities having little to no impact on cognitive abilities, individuals in these groups often encounter significant communication challenges, both in spoken and written forms. These communication difficulties can hinder their ability to function effectively in society, leading to various disadvantages.

Commonly used tools for speech recognition such as Google, Apple, or Dragon often struggle to adapt and detect speech that deviates from the norm, rendering them unsuitable for individuals with speech impairments. This is especially true for languages that are not english. In this project, our primary objective is to develop a framework that streamlines and personalizes the training and fine-tuning of speech recognition models in german, thereby addressing the unique communication needs of these individuals.

**Project 1: Early layer fine-tuning**
One possible approach to deal with this non-normative speech is to use early layer training that has been shown to work [1, 2]. Early layer training has the advantage that no extensive training dataset for the base model is needed.
Goal of the Master Thesis
The goal of the Master's Thesis is to replicate the published work [1,2] by utilizing early-layer fine-tuning of an existing German speech recognition model (Open Source model: bofenghuang/whisper-large-v2-cv11-german · Hugging Face) with data from a speech-impaired child. The Master's Thesis includes:

- Implement the model and test it on normal speech.
- Collect a dataset featuring speech-impaired children (including tasks such as obtaining
consent forms, data collection, preprocessing, and data annotation).
- Fine-tune the model [2] using the newly acquired dataset and validate the model's
robustness.

**Project 2: Training-Data Augmentation**
This project revolves around an exploratory idea, primarily focused on training an initial model with limited data. Training such a model from scratch is believed to enhance its robustness, especially when dealing with speech that deviates significantly from the norm. The primary constraint is the availability of training data. Collecting speech samples from individuals with certain speech variations presents significant challenges. This is primarily attributed to two key factors: the scarcity of individuals affected by these specific speech disorders (often referred to as 'rare diseases'), and the increased effort required from individuals to provide speech samples. To address this challenge, our approach is to collect a limited amount of data and model the variations in speech. This model will then be used to transform an existing, pre-annotated corpus, enabling us to train a model from scratch.

**Goal of the Master Thesis**
The goal of this project is to analyze speech variations and modify an existing German speech corpus so that it can serve as the foundation for training a new model. The Master's Thesis includes:
- Collect a dataset featuring a speech-impaired child, including tasks such as drafting consent forms, data collection, preprocessing, and data annotation.
- Analyze speech variations and adapt an existing corpus to capture these variations in speech.
- Train a new model and validate its robustness.

**Requirements**
This project has the extent of a MSc-Thesis-Project (For semester project we can discuss partial work)
Our ideal master's student for this project:
- Is self-driven and highly engaged.
- Possesses Python coding skills.
- Is proficient in the German language (ideally).
- Has some knowledge of neural networks

**References:**
1. Green et al., “Automatic Speech Recognition of Disordered Speech: Personalized models outperforming human listeners on short phrases” https://www.isca-speech.org/archive/pdfs/interspeech_2021/green21_interspeech.pdf
2. J. Tobin and K. Tomanek, "Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 6637-6641, doi: 10.1109/ICASSP43922.2022.9747516.

**Interesting Publication:**
3. Murero et al. “Artificial Intelligence for Severe Speech Impairment: Innovative approaches to AAC and Communication”, https://ceur-ws.org/Vol-2730/paper31.pdf

**If you are interested:**
Please reach out to us via Email and add CV, transcript of records and why you would like to work with us. Looking forward hearing from you!
Goal
Not specified
Contact Details
Institute of Neuroinformatics, ETHZ, UZH: Roman Boehringer (roman@ethz.ch) Institute of Neuroinformatics, ETHZ, UZH: Pehuen Moure (pehuen@ini.ethz.ch) ETH AI Center, Institute of Neuroinformatics: Anh Duong Vo (anhduong.vo@ai.ethz.ch)
Institute of Neuroinformatics, ETHZ, UZH: Roman Boehringer (roman@ethz.ch)
Institute of Neuroinformatics, ETHZ, UZH: Pehuen Moure (pehuen@ini.ethz.ch)
ETH AI Center, Institute of Neuroinformatics: Anh Duong Vo (anhduong.vo@ai.ethz.ch)

Calendar

Earliest start	No date
Latest end	No date

Location

ETH Competence Center - ETH AI Center (ETHZ)

Labels

Master Thesis

Topics

Information, Computing and Communication Sciences
Engineering and Technology
Behavioural and Cognitive Sciences

Documents

Name	Comment	Size	Actions
MSc-Project Early Layer Fine-Tuning of speach recognition.pdf		1.4MB	Download