ASR exploratory task:
Compare some models’ performance on children's speech. Compute the
word error rate (WER) and identify some of the phrases that each model
is not good at. Some models to think about (and test on):
Wav2vec2-conformer
Wav2vec2
Whisper
Nemo ASR - STT (from Nvidia)
Paraformer
You might also want to read (for a better understanding of how today’s
deep learning models deal with audio):
https://huggingface.co/learn/audio-course/en/chapter3/ctc
https://huggingface.co/learn/audio-course/en/chapter3/seq2seq
No comments:
Post a Comment