Children’s Speech Recognition Challenge
https://kidsasr.drivendata.org/
https://github.com/hongqin/goodnight-moon
https://www.drivendata.org/competitions/298/literacy-screening/
This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
https://kidsasr.drivendata.org/
https://github.com/hongqin/goodnight-moon
https://www.drivendata.org/competitions/298/literacy-screening/
chatGPT repeatly make mistake for shap summary plot.
for different class label, the index should be the 3rd position: shap_vals[:,:,idx],
Symposium below or share with interested colleagues.
national family survey of pregancy
https://www.cdc.gov/nchs/nsfg/index.htm
todo: request to restrickted access variables.
August 23 - Dec 12, 2025, Thursday 6p - 8:40pm.
Type | Time | Days | Where | Date Range | Schedule Type | Instructors |
---|---|---|---|---|---|---|
Scheduled In-Class Meetings | 6:00 pm - 8:40 pm | R | ENGINEERING & COMP SCI BLDG 2120 | Aug 23, 2025 - Dec 12, 2025 | LECTURE |
https://depts.washington.edu/uwruca/ruca-approx.php?utm_source=chatgpt.com
All of Us survey, data codebooks
https://docs.google.com/spreadsheets/d/1pODkE2bFN-kmVtYp89rtrJg7oXck4Fsex58237x47mA/edit?usp=sharing
The paper "A model for the assembly map of bordism-invariant functors" by Levin, Nocera, and Saunier (2025) develops advanced categorical frameworks for algebraic topology, particularly through oplax colimits of stable/hermitian/Poincaré categories and bordism-invariant functors123. While not directly addressing machine learning (ML) or large language models (LLMs), its contributions could indirectly influence these fields through three key pathways:
The paper's formalization of oplax colimits and Poincaré-Verdier localizing invariants13 provides new mathematical tools for structuring compositional systems. This could advance:
Model Architecture Design: By abstracting relationships between components (e.g., neural network layers) as bordism-invariant functors, enabling more rigorous analysis of model behavior under transformations5.
Geometric Deep Learning: Topological invariants and assembly maps could refine methods for learning on non-Euclidean data (e.g., graphs, manifolds) by encoding persistence of features under deformations5.
The bordism-invariance concept—where structures remain unchanged under continuous deformations—offers a mathematical foundation for invariance principles in ML:
Data Augmentation: Formalizing "bordism equivalence" could guide the design of augmentation strategies that preserve semantic content (e.g., image rotations as "topological bordisms")5.
Robust Feature Extraction: Kernels of Verdier projections13 might model noise subspaces to exclude during feature learning, improving adversarial robustness.
The paper’s explicit decomposition of complex functors (e.g., Shaneson splittings with twists13) parallels challenges in LLM-based reasoning:
Program Invariant Prediction: LLMs that infer program invariants6 could adopt categorical decompositions to handle twisted or hierarchical constraints (e.g., loop invariants in code).
Categorical Data Embeddings: LLM-generated numerical representations of categorical data4 might leverage bordism-invariance to ensure embeddings respect equivalence classes (e.g., "color" as a deformation-invariant attribute).
The work is highly theoretical, with no direct ML/LLM applications in the paper. Bridging this gap requires:
Translating topological bordisms into data-augmentation pipelines.
Implementing Poincaré-Verdier invariants as regularization terms in loss functions.
Extending LLM-based invariant predictors6 to handle categorical assembly maps.
While speculative, these connections highlight how advanced category theory could enrich ML’s theoretical foundations and LLMs’ reasoning capabilities.
In Mendely, if an article has unspecified type, it often list it as "preprint".
To fix this, just change the document type to 'jounral article' or 'conference proceedings' or other appropriate type.
Beyond Attention: Toward Machines with Intrinsic Higher Mental States
https://techxplore.com/news/2025-05-architecture-emulates-higher-human-mental.html#google_vignette
USDA
https://www.ers.usda.gov/data-products/rural-urban-commuting-area-codes?utm_source=chatgpt.com
Based on our discussion, I have looked further into the ‘360’ zipcode region, and found it contain 14 counties below:
counties = [ "Autauga County", "Barbour County", "Bullock County", "Butler County",
"Chilton County", "Coosa County", "Covington County", "Crenshaw County",
"Elmore County", "Lowndes County", "Macon County", "Montgomery County",
"Pike County", "Tallapoosa County"]
Amon them, there are “Barbour”, “Bullock”, and “Macon” counties.
So, not sure how useful the three-digit zip code of ‘360’ might be relevant to TU MCH project.
https://pastebin.com/uBcFUXCA
All of us research platform, data explore for rural health research.
Zip3, observational table
Workspaces > Beginner Intro to AoU Data and the Workbench > Analysis >
In ‘test-state_data_v060523’, found sample on state, county, and zip3.