Sunday, July 13, 2025

goodnight moon audo data and labels?


Children’s Speech Recognition Challenge

https://kidsasr.drivendata.org/

https://github.com/hongqin/goodnight-moon


https://www.drivendata.org/competitions/298/literacy-screening/


Saturday, July 12, 2025

shap summry plot by chatGPT

 chatGPT repeatly make mistake for shap summary plot. 


for different class label, the index should be the 3rd position:  shap_vals[:,:,idx],


Wednesday, July 2, 2025

national family survey of pregancy

  national family survey of pregancy

https://www.cdc.gov/nchs/nsfg/index.htm


todo: request to restrickted access variables. 


Saturday, June 28, 2025

Fall 2025 schedule

August 23 - Dec 12, 2025, Thursday 6p - 8:40pm.  

Scheduled Meeting Times
TypeTimeDaysWhereDate RangeSchedule TypeInstructors
Scheduled In-Class Meetings6:00 pm - 8:40 pmRENGINEERING & COMP SCI BLDG 2120Aug 23, 2025 - Dec 12, 2025LECTURE

Tuesday, June 24, 2025

ZIP Code RUCA Approximation,

 

https://depts.washington.edu/uwruca/ruca-approx.php?utm_source=chatgpt.com


All of Us survey, data codebooks

 All of Us survey, data codebooks

https://docs.google.com/spreadsheets/d/1pODkE2bFN-kmVtYp89rtrJg7oXck4Fsex58237x47mA/edit?usp=sharing


Friday, June 20, 2025

A model for the assembly map of bordism-invariant functors

 The paper "A model for the assembly map of bordism-invariant functors" by Levin, Nocera, and Saunier (2025) develops advanced categorical frameworks for algebraic topology, particularly through oplax colimits of stable/hermitian/Poincaré categories and bordism-invariant functors123. While not directly addressing machine learning (ML) or large language models (LLMs), its contributions could indirectly influence these fields through three key pathways:

1. Enhanced Categorical Frameworks for ML

The paper's formalization of oplax colimits and Poincaré-Verdier localizing invariants13 provides new mathematical tools for structuring compositional systems. This could advance:

  • Model Architecture Design: By abstracting relationships between components (e.g., neural network layers) as bordism-invariant functors, enabling more rigorous analysis of model behavior under transformations5.

  • Geometric Deep Learning: Topological invariants and assembly maps could refine methods for learning on non-Euclidean data (e.g., graphs, manifolds) by encoding persistence of features under deformations5.

2. Invariance Learning and Equivalence

The bordism-invariance concept—where structures remain unchanged under continuous deformations—offers a mathematical foundation for invariance principles in ML:

  • Data Augmentation: Formalizing "bordism equivalence" could guide the design of augmentation strategies that preserve semantic content (e.g., image rotations as "topological bordisms")5.

  • Robust Feature Extraction: Kernels of Verdier projections13 might model noise subspaces to exclude during feature learning, improving adversarial robustness.

3. LLMs for Structured Reasoning

The paper’s explicit decomposition of complex functors (e.g., Shaneson splittings with twists13) parallels challenges in LLM-based reasoning:

  • Program Invariant Prediction: LLMs that infer program invariants6 could adopt categorical decompositions to handle twisted or hierarchical constraints (e.g., loop invariants in code).

  • Categorical Data Embeddings: LLM-generated numerical representations of categorical data4 might leverage bordism-invariance to ensure embeddings respect equivalence classes (e.g., "color" as a deformation-invariant attribute).

Limitations and Future Directions

The work is highly theoretical, with no direct ML/LLM applications in the paper. Bridging this gap requires:

  • Translating topological bordisms into data-augmentation pipelines.

  • Implementing Poincaré-Verdier invariants as regularization terms in loss functions.

  • Extending LLM-based invariant predictors6 to handle categorical assembly maps.

While speculative, these connections highlight how advanced category theory could enrich ML’s theoretical foundations and LLMs’ reasoning capabilities.

  1. https://arxiv.org/abs/2506.05238
  2. https://arxiv.org/pdf/2506.05238.pdf
  3. https://www.arxiv.org/pdf/2506.05238.pdf
  4. https://pubmed.ncbi.nlm.nih.gov/39348252/
  5. https://www.aimodels.fyi/papers/arxiv/category-theoretical-topos-theoretical-frameworks-machine-learning
  6. https://openreview.net/pdf?id=mXv2aVqUGG
  7. https://x.com/CTpreprintBot
  8. https://keik.org/profile/mathat-bot.bsky.social
  9. https://www.alphaxiv.org/abs/2506.05238
  10. https://publications.mfo.de/bitstream/handle/mfo/4263/OWR_2024_47.pdf?sequence=1&isAllowed=y
  11. https://x.com/CTpreprintBot/status/1930943445977518380
  12. https://www.themoonlight.io/en/review/a-model-for-the-assembly-map-of-bordism-invariant-functors
  13. https://library.slmath.org/books/Book69/files/wholebook.pdf
  14. https://www.reed.edu/math-stats/thesis.html
  15. https://math.mit.edu/events/talbot/2020/syllabus2020.pdf
  16. https://webhomes.maths.ed.ac.uk/~v1ranick/papers/quinnass.pdf
  17. https://msp.org/agt/2009/9-4/agt-v9-n4-p16-s.pdf
  18. https://webhomes.maths.ed.ac.uk/~v1ranick/papers/owsem.pdf

Friday, June 6, 2025

Mendely preprint error

 In Mendely, if an article has unspecified type, it often list it as "preprint". 

To fix this, just change the document type to 'jounral article' or 'conference proceedings' or other appropriate type. 

Thursday, May 29, 2025

Beyond Attention: Toward Machines with Intrinsic Higher Mental States

 

Beyond Attention: Toward Machines with Intrinsic Higher Mental States

https://techxplore.com/news/2025-05-architecture-emulates-higher-human-mental.html#google_vignette


Wednesday, May 28, 2025

USDA Rural-Urban Commuting Area Codes

 USDA

Rural-Urban Commuting Area Codes

https://www.ers.usda.gov/data-products/rural-urban-commuting-area-codes?utm_source=chatgpt.com


three digit 360 zipcode in Alabama

 Based on our discussion, I have looked further into the ‘360’ zipcode region, and found it contain 14 counties below:

 

counties = [    "Autauga County", "Barbour County", "Bullock County", "Butler County",

    "Chilton County", "Coosa County", "Covington County", "Crenshaw County",

    "Elmore County", "Lowndes County", "Macon County", "Montgomery County",

    "Pike County", "Tallapoosa County"]

 

Amon them, there are “Barbour”, “Bullock”, and “Macon” counties.

 

So, not sure how useful the three-digit zip code of ‘360’ might be relevant to TU MCH project. 



pastbin

 https://pastebin.com/uBcFUXCA

  1. import pandas as pd
  2. dataset = %env WORKSPACE_CDR
  3. query = """
  4. SELECT
  5. p.person_id AS person_id,
  6. c.concept_name AS state_name
  7. FROM `{dataset}.person` AS p
  8. LEFT JOIN `{dataset}.concept` AS c ON p.state_of_residence_concept_id = c.concept_id
  9. WHERE c.concept_name LIKE 'PII State: %'
  10. """
  11.  
  12. state_df = pd.read_gbq(query.format(dataset = dataset))
  13.  
  14. state_df['state_of_residence'] = state_df['state_name'].str.replace('PII State: ', '')
  15.  
  16. state_df.head()
  17. state_df.shape
  18.  
  19. homeless_state_df=state_df[state_df['person_id'].isin(homeless_respiratory_status['person_id'])]
  20. homeless_state_df['state_of_residence'].value_counts()


All of us research platform, data explore for rural health research.

 All of us research platform, data explore for rural health research.

Zip3, observational table


Workspaces > Beginner Intro to AoU Data and the Workbench > Analysis >

In ‘test-state_data_v060523’, found sample on state, county, and zip3.