Tuesday, July 16, 2024

SLP software

 

ASR exploratory task:

Compare some models’ performance on children's speech. Compute the
word error rate (WER) and identify some of the phrases that each model
is not good at. Some models to think about (and test on):

Wav2vec2-conformer

Wav2vec2

Whisper

Nemo ASR - STT (from Nvidia)

Paraformer

You might also want to read (for a better understanding of how today’s
deep learning models deal with audio):

https://huggingface.co/learn/audio-course/en/chapter3/ctc

https://huggingface.co/learn/audio-course/en/chapter3/seq2seq

No comments:

Post a Comment