Thursday, September 11, 2025

lec 3, GAN

 == pre-class to do: 

post video of lec 2 VAE. 

calendar email invitation: 

homework assignment, data camp, 

paper selection, high quality, primary research paper. 

potential project (agentic bioinformatics analysis, agentic lab report?, pretraining of transformer, word embedding)

socrative questions (questions on contents from last lecture): TF on VAE

update Canvas course materials, update learning objectives. assignments as needed:

Test-run code: skip. 

kindle book. using ipad to highlight key points. 


== In-class to do: 

clean up destktop space, calendars, 

ZOOM, live transcript (start video recording). 

Socrative sign in, review VAE


== summary, review VAE

GAN, principle in pdf, then kindle textbook, 

breakout rooms, 


Meeting summary 

Quick recap

The meeting began with a review session on variational autoencoders, where students demonstrated good understanding of key concepts including the variational loss function and reparameterization trick. The discussion then moved to Generative Adversarial Networks (GANs), covering their fundamental components, mathematical framework, and training processes, including the challenges and advancements in model training. The latter part of the meeting focused on practical aspects, including the implementation of GANs for image generation, the use of Google Cloud Platform resources like Vertex AI for machine learning applications, and guidelines for course presentations and storage of work.

Next steps

  • There are no action items or next steps identified in the provided content. The text only states that the material reviewed was an educational presentation about GANs without any action items being assigned.

Summary

Variational Autoencoder Review Session

Hong led a review session on variational autoencoders, confirming that the encoder maps input data to a single latent vector with randomness introduced through auxiliary parameters. Students demonstrated good understanding of concepts like the variational loss function, which includes both reconstruction loss and a regularization term (KL divergence), and the reparameterization trick that allows backpropagation through sampling steps. Hong noted that while some students hadn't signed in, there were 9 confirmed participants, and mentioned that AI meeting note-taking tools were being used by many attendees. The session concluded with a brief mention of moving on to Generative Adversarial Networks in the next lecture.

Understanding Generative Adversarial Networks

Hong explained the concept of Generative Adversarial Networks (GANs), which involve a discriminator and a generator. The discriminator aims to distinguish between real and fake data, while the generator creates synthetic data to fool the discriminator. The goal is to reach an equilibrium where the discriminator cannot reliably identify fake data, achieving a 50-50 chance of correct classification. Hong also described the mathematical framework of the value function that guides the training process, highlighting the adversarial nature of the optimization procedure.

Binary Classifier Loss Function Overview

Hong explained the mathematical foundation of a binary classifier using cross entropy loss, describing how the value function can be expressed in terms of Kullback-Leibler divergence and Jensen-Shannon divergence between real data and generated distributions. He outlined the training process as a two-step procedure: first maximizing the discriminator using the full loss function, and then minimizing the generator using a simplified version of the loss.

Advancements in Generative Model Training

Hong discussed the challenges and advancements in training generative models, focusing on the WGAN with gradient penalty as the current state-of-the-art method. He explained the technical details of the WGAN, including its use of the Earth mover's distance and the introduction of the epsilon parameter for balancing real and fake data. Hong also highlighted the practical implementation of the WGAN using a real-world example involving the detection of fake bricks, which was demonstrated using a dataset of Lego bricks.

Image Generation Model Architecture Overview

Hong explained the structure of a discriminator and generator model for image generation, noting that the discriminator is a convolutional neural network with a sigmoid output for binary classification, while the generator is similar to a variational autoencoder. Hong outlined the training process, which involves computing binary cross-entropy loss for both the discriminator and generator, and mentioned that the optimizer is specified elsewhere in the code. The discussion touched on the technical details of image expansion methods and the inclusion of noise in the loss function to improve model performance.

Enhancing GANs with Gradient Penalty

Hong discussed the implementation and effectiveness of a generative adversarial network (GAN) with a gradient penalty (GP) for image generation. They explained how the GP is calculated and its role in improving the quality of generated images compared to traditional GANs. Hong also introduced the concept of conditional GANs, which concatenate label information to the input and showed that this simple modification can significantly enhance performance.

Generative AI and Cloud Platforms

Hong discussed the evolution of generative AI methods, noting that while the generative adversarial network (GAN) approach was a significant milestone in 2014, the field has since shifted with the advent of agentic AI, which allows for more specialized and sophisticated critiques. Hong also addressed the use of Google Cloud Platform (GCP) and Vertex AI for students in the class, explaining that while GCP provides a range of industrial-level AI tools, the Vertex AI environment is still in its early stages and may require further development. Evan pointed out that the current GCP course focuses mainly on knowledge checks rather than practical use, and Hamza inquired about the speed and capabilities of Vertex AI compared to ODU's supercomputers, to which Hong clarified that the platforms serve different purposes and are not directly comparable.

Vertex AI Service Overview

Terry demonstrated how to access and use Vertex AI, a Google Cloud service for machine learning and AI applications. He explained the difference between on-premises clusters and cloud resources, emphasizing that Vertex AI provides a managed service for model development, training, and deployment. Terry showed the class how to log into Google Cloud using their ODU student accounts and navigate the Vertex AI interface, highlighting key features like the model garden, Vertex AI studio, notebooks, and deployment options.

GCP Resources and Presentation Guidelines

The meeting focused on discussing the use of Google Cloud Platform (GCP) resources for the course, particularly Vertex AI and storage solutions. Terry explained that a shared project exists for the class, but students should be cautious about deleting each other's work. He demonstrated how to use buckets for storage and recommended copying important data to Git if needed. The group discussed potential future changes to permissions and the possibility of creating individual projects for each student. Hong clarified that presentations should be individual, not group projects, and explained the format and content expectations for presentations. The class was reminded to save their work before the semester ends, as resources may be deleted afterward.


Wednesday, September 3, 2025

lec 2, gAI, VAE

  == pre-class to do: 

post video of lec 1.  done

calendar email invitation: done 

homework assignment, data camp, 

paper selection:  

potential project (agentic bioinformatics analysis, agentic lab report?, pretraining of transformer, word embedding)

socrative questions (questions on contents from last lecture ): 

update Canvas course materials, update learning objectives. assignments as needed:

Test-run code: skip. 


kindle book. using ipad to highlight key points. 


== In-class to do: 

clean up destktop space, calendars, 

ZOOM, live transcript (start video recording). 

Socrative sign in 

=> go over assignments, video, datacamp

=> kingma and Weling, 2013 arxiv

=> hqin's proof work

=> further reading, kingma 2019 tutorial

=> play student videos, setup random breakout rooms to discuss presentation papers


Monday, September 1, 2025

MLCB25 Machine Learning for Computational Biology, Manolis KellisManolis Kellis

 

MIT Course announcement: Machine Learning for Computational Biology hashtagMLCB25
Fall'24 Lecture Videos: https://lnkd.in/efSvp7hY
Fall'24 Lecture Notes: https://lnkd.in/eWBAxQHk
(a) Genomes: Statistical genomics, gene regulation, genome language models, chromatin structure, 3D genome topology, epigenomics, regulatory networks.
(b) Proteins: Protein language models, structure and folding, protein design, cryo-EM, AlphaFold2, transformers, multimodal joint representation learning.
(c) Therapeutics: Chemical landscapes, small-molecule representation, docking, structure-function embeddings, agentic drug discovery, disease circuitry, and target identification.
(d) Patients: Electronic health records, medical genomics, genetic variation, comparative genomics, evolutionary evidence, patient latent representation, AI-driven systems biology.
Foundations and frontiers of computational biology, combining theory with practice. Generative AI, foundation models, machine learning, algorithm design, influential problems and techniques, analysis of large-scale biological datasets, applications to human disease and drug discovery.
First Lecture: Thu Sept 4 at 1pm in 32-144
With: Prof. Manolis Kellis, Prof. Eric Alm, TAs: Ananth Shyamal, Shitong Luo
Course website: https://lnkd.in/eemavz6J

Friday, August 29, 2025

funding acknowledgement fall 2025

 for AI works

HQ thanks USA NSF 2525493 and  2200138, a catalyst award from the USA National Academy of Medicine,  and internal support of the Old Dominion University

An a2 Pilot Award from the University of Pennsylvania- Penn Artificial intelligence and Technology Collaboratory for Healthy Aging (PennAITech) with the NIH award P30AG073105.

 

The award is provided by the University of Pennsylvania- Penn Artificial intelligence and Technology Collaboratory for Healthy Aging (PennAITech), an initiative funded by the National Institute on Aging (NIA) of the National Institutes of Health (Grant Nr. P30AG073105).

 


Thursday, August 28, 2025

lecture 1, gAI

 == pre-class to do: 

calendar email invitation: 

syllabus update

socrative questions (questions on contents from last lecture ): 

update Canvas course materials, update learning objectives. assignments as needed:

Test-run code: skip. 


kindle book. using ipad to highlight key points. 


== In-class to do: 

clean up destktop space, calendars, 

ZOOM, live transcript (start video recording). 

Socrative sign in 

== summary, went over ch1, touched ch2. 


Friday, August 15, 2025

ASR overview 2025

 

What Is Audio Speech Recognition?

Audio speech recognition, also called automatic speech recognition (ASR), is a technology that enables computers and devices to understand and process human speech by converting spoken language into text or actionable commands. Fundamentally, ASR captures audio input (via a microphone), digitizes the sound waves, and then processes them through algorithms to recognize phonemes (basic units of sound), assemble them into words, and produce a transcript or trigger specific tasks.twilio+1

Core components and steps include:

  • Audio capture and preprocessing: Microphones convert voice vibrations into electrical and then digital signals; preprocessing enhances the speech and reduces noise.

  • Acoustic modeling: Maps the digitized signal to phonemes.

  • Language modeling: Predicts word sequences using statistical information and context.

  • Decoding: Converts the identified phonemes and language models into coherent, context-accurate text.kardome+1

Main Challenges in Speech Recognition

1. Background Noise: Ambient sounds such as traffic, appliances, or other voices can blur the spoken signal, significantly impacting ASR accuracy. While noise suppression exists, it is not perfect, especially in complex real-world environments.milvus+2

2. Accents, Dialects, and Pronunciation Variability: Regional accents, dialects, slang, and non-native pronunciation introduce significant variability, making recognition more difficult if the system isn’t trained on diverse data. Homophones and contextual ambiguities also increase complexity.atltranslate+1

3. Speech Speed and Volume Fluctuations: Variations in how quickly or slowly people speak, as well as changes in loudness, challenge systems optimized for 'average' speech patterns.waywithwords

4. Contextual Understanding: Disambiguating meaning in homophones or similar-sounding words requires context-aware models, which add computational and design complexity.milvus+1

5. Computational Efficiency and Real-Time Processing: Processing long audio streams or interactive tasks with minimal delay demands significant computing resources, balancing accuracy and responsiveness, particularly on mobile or 'edge' devices.milvus

6. Speaker Identification in Multi-Speaker Scenarios: Recognizing who is speaking and tracking speakers accurately is difficult, making transcription and command targeting less reliable in group settings.atltranslate

State of the Art (2025)

a. Neural Network Architectures: Modern ASR models are built on advanced machine learning, particularly neural networks such as transformers, recurrent neural networks, and state-space models. These models excel at mapping speech to text even in challenging acoustic environments.arxiv+1

b. Samba-ASR: The new Samba-ASR model uses a novel state-space architecture, replacing traditional transformers for improved computational efficiency and accuracy. It sets new benchmarks with remarkably low Word Error Rates (WER): as low as 1.17% on LibriSpeech Clean, outperforming previous state-of-the-art models. It is both faster and more adaptable across various languages, domains, and speaking styles.arxiv

c. OpenAI's gpt-4o-transcribe: Recent models like gpt-4o-transcribe improve on earlier whisper-based solutions in accuracy and reliability, especially for diverse accents, noisy environments, and fast or variable speech. These models use reinforcement learning and large-scale, diverse datasets to achieve high performance.openai

d. Multilingual and Accent Robustness: New benchmarks such as ML-SUPERB push models to handle over 150 languages and hundreds of accents, reflecting major progress toward more inclusive, accessible ASR. Models are evaluated on global linguistic diversity and are robust to different speech patterns and background conditions.interspeech2025

In summary, audio speech recognition has evolved into a highly capable, AI-driven field but still wrestles with real-world variability, noise, and linguistic diversity. Today’s best models—like Samba-ASR and GPT-4o—achieve impressively low error rates and operate efficiently, but ongoing research emphasizes even broader language coverage, context awareness, and noise robustnesstness.ibm+2

  1. https://www.twilio.com/en-us/blog/insights/ai/what-is-speech-recognition
  2. https://opencv.org/blog/applications-of-speech-recognition/
  3. https://www.kardome.com/blog-posts/difference-speech-and-voice-recognition
  4. https://milvus.io/ai-quick-reference/what-are-common-issues-faced-by-speech-recognition-systems
  5. https://waywithwords.net/resource/challenges-in-speech-data-processing/
  6. https://www.atltranslate.com/ai/blog/automatic-speech-recognition-challenges
  7. https://arxiv.org/html/2501.02832v1
  8. https://openai.com/index/introducing-our-next-generation-audio-models/
  9. https://www.interspeech2025.org/challenges
  10. https://www.ibm.com/think/topics/speech-recognition
  11. https://en.wikipedia.org/wiki/Speech_recognition
  12. https://developer.nvidia.com/blog/essential-guide-to-automatic-speech-recognition-technology/

Sunday, August 10, 2025

courses: deep generative learning and generative AI.

 deep generative learning and generative AI. Here are some notable sources:

  • Stanford University: The course "CS236: Deep Generative Models" by Prof. Stefano Ermon provides detailed lecture videos and slides focused on the foundations, challenges, and applications of generative models in image, text, video, medicine, robotics, and more. The course website with slides is available at https://deepgenerativemodels.github.io/ and videos are on YouTube.youtube

  • Cornell University: The course "CS 6785: Deep Generative Models," taught by Prof. Vadir Kuleshov, offers an introduction to deep generative models, recent advances, algorithms, and applications including NLP and biology. The lectures are recorded and available on YouTube.youtube

  • MIT (Massachusetts Institute of Technology): The "Introduction to Deep Learning 6.S191" program covers deep learning basics along with generative AI applications in media, vision, NLP, and biology. All lecture slides, labs, and code are open-sourced and free to use, accessible at https://introtodeeplearning.com/ with lecture videos like "Generative AI for Media" by Google’s Doug Eck on YouTube.introtodeeplearningyoutube

  • Harvard University: There are presentations specifically on generative AI’s role in education, including outlines for PowerPoint slides targeting its teaching and learning impact, available in PDF form (e.g., from Harvard AI Sandbox materials).hcsra.sph.harvard

  • University of Virginia (UVA): UVA SEAS offers collections of slides on the technical foundations of generative AI with practical uses in engineering design and analysis.teaching.virginia

  • Other resources:

    • NVIDIA Deep Learning Institute has a teaching kit for generative AI with lecture slides, labs, and Jupyter notebooks focused on GPU-accelerated generative AI development.developer.nvidia

    • Stony Brook University provides teaching resources on generative AI including PowerPoint slides for educators.stonybrook

If you want ready-to-use lecture slides or full course materials, the Stanford CS236, Cornell CS6785, and MIT 6.S191 courses are among the most comprehensive and authoritative sources from major universities. Their materials are typically publicly available online for educational use.

Would you like direct access links, or specific slide decks on any of these?

  1. https://hcsra.sph.harvard.edu/sites/projects.iq.harvard.edu/files/hcsra/files/presentation_on_ai1.pdf
  2. https://www.youtube.com/watch?v=XZ0PMRWXBEU
  3. https://blog.uwgb.edu/catl/files/2023/02/Introduction-to-Generative-AI-CATL-Presentation-Slides.pdf
  4. https://www.youtube.com/watch?v=IZgvgLy1wyg
  5. https://teaching.virginia.edu/collections/uva-seas-resources-teaching-genai-use-for-engineering-design-and-analysis/272
  6. https://developer.nvidia.com/blog/nvidia-deep-learning-institute-releases-new-generative-ai-teaching-kit/
  7. https://www.sdccd.edu/docs/IIE/ProfessionalDevelopment/Presentations/10252024_AI-Demystified-Intro-to-Generative-AI.pdf
  8. https://introtodeeplearning.com
  9. https://www.stonybrook.edu/celt/teaching-resources/aibot.php
  10. https://www.youtube.com/watch?v=P7Hkh2zOGQ0

Thursday, August 7, 2025

medical AI



Artificial Intelligence in Medical Education: The 2025 IACAI Vision and Integration Frameworks

 

https://www.medbiq.org/initiatives/international-advisory-committee-artificial-intelligence


Sunday, August 3, 2025

asr whisper wahab

 test run old slurm job and it worked. 

git clone to wahahb


hqin@wahab-01 fairASR25]$ pwd
/home/hqin/github/fairASR25





speech-to-text corpora, accent

 speech-to-text corpora—one of which is Meta FAIR’s fairness-oriented dataset:

  • LibriSpeech ASR Corpus
    A corpus of roughly 1,000 hours of 16 kHz read English speech, derived from LibriVox audiobooks, carefully segmented and aligned. Released under a CC BY 4.0 license. (openslr.org)

  • Multilingual LibriSpeech (MLS)
    A large-scale ASR dataset by Facebook AI Research (Meta), comprising ∼50,000 hours of public-domain audiobooks across eight languages (English, German, Dutch, French, Spanish, Italian, Portuguese, Polish). (Meta AI, voxforge.org)

  • Mozilla Common Voice
    A crowdsourced, multilingual speech corpus with millions of volunteer-recorded, validated sentences and transcriptions, released under CC0 (public domain). (Wikipedia)

  • TED-LIUM v3
    An English ASR corpus of 452 hours of TED talk recordings with aligned transcripts, freely available for research. (openslr.org)

  • VoxForge
    A community-collected GPL-licensed speech corpus in multiple languages, built to support open-source ASR engines (e.g., CMU Sphinx, Julius). (voxforge.org)

  • Fair-Speech Dataset (Meta FAIR)
    A fairness-oriented evaluation set containing 26,471 utterances from 593 U.S. speakers, designed to benchmark bias and robustness in speech recognition. (Meta AI, Meta AI)

  • GigaSpeech
    A multi-domain English ASR corpus featuring 10,000 hours of high-quality transcribed audio (plus 40,000 hours of additional audio for semi-/unsupervised research).

  • VoxPopuli
    Contains over 1 million hours of unlabeled multilingual speech and 1.8 k hours of transcribed speeches in 16 languages (with aligned interpretation pairs), for representation learning and semi-supervised ASR. (arxiv.org)


Here are several publicly available speech-to-text corpora that include regional and non-native accents—many of which you can filter or mine for Southern Chinese (e.g., Cantonese-influenced) accent patterns (such as /s/ vs /ʃ/ or –ing vs –in):

  • Speech Accent Archive

    A growing, global collection of ~2,500 English recordings of the same Harvard paragraph, each with narrow phonetic transcription and speaker metadata (including L1 and region). You can browse by “Chinese” and then drill down to Cantonese vs. other dialect regions. (ResearchGate, accent.gmu.edu)

  • L2-ARCTIC

    A corpus of non-native English speech from ten Mandarin (plus Hindi, Korean, Spanish, Arabic) speakers reading CMU ARCTIC prompts. It includes orthographic transcripts, forced-aligned phonetic annotations, and expert mispronunciation tags. (psi.engr.tamu.edu)

  • CSLU Foreign-Accented English (Release 1.2)

    ~4,925 telephone-quality utterances by speakers of various L1s (including Chinese), with transcript, speaker background, and perceptual accent ratings. (borealisdata.ca)

  • speechocean762

    5,000 English utterances from 250 non-native speakers (half children), each annotated at the sentence, word, and phoneme level. Designed for pronunciation assessment, freely downloadable via OpenSLR. (arXiv)

  • ShefCE: Cantonese-English Bilingual Corpus

    Audio & transcripts from 31 Hong Kong L2 English learners reading parallel Cantonese and English texts—ideal for studying Cantonese-influenced English phonetics. (orda.shef.ac.uk)

  • Sell-Corpus: Multi-Accented Chinese English Speech

    First open-source English speech corpus covering seven major Chinese dialect regions (including Southern dialects), with recordings & transcripts for accent variation research. (sigport.org)

  • Mozilla Common Voice

    Crowdsourced, multilingual speech data (CC0) with per-speaker accent tags—you can filter English recordings by “Chinese (Hong Kong)” or “Chinese (Mainland)” to get regional accent samples. (Wikipedia)

  • ICNALE Spoken Monologues

    4,400 60-second monologues (~73 h) by 1,100 Asian learners (incl. Mainland China, Hong Kong, Taiwan), with transcripts—useful for comparing Southern vs. Northern Chinese L1 influence on English pronunciation. (language.sakura.ne.jp, language.sakura.ne.jp)

  • International Dialects of English Archive (IDEA)

    Free archive of scripted & unscripted English dialect samples worldwide. Browse the “Asia → China” section to find Cantonese- and Mandarin-accented speakers, all with transcripts. (Wikipedia)

Each of these datasets provides aligned audio and text (and often phonetic detail) that you can mine to analyze pronunciation patterns—like the s/ʃ or –ing/–in contrasts—among Southern Chinese speakers learning or using English.

Real-time Out-of-distribution Detection in Learning-Enabled Cyber-Physical Systems

 Real-time Out-of-distribution Detection in Learning-Enabled Cyber-Physical Systems


Here are the details: