Wednesday, July 2, 2025

national family survey of pregancy

  national family survey of pregancy

https://www.cdc.gov/nchs/nsfg/index.htm


todo: request to restrickted access variables. 


Saturday, June 28, 2025

Fall 2025 schedule

August 23 - Dec 12, 2025, Thursday 6p - 8:40pm.  

Scheduled Meeting Times
TypeTimeDaysWhereDate RangeSchedule TypeInstructors
Scheduled In-Class Meetings6:00 pm - 8:40 pmRENGINEERING & COMP SCI BLDG 2120Aug 23, 2025 - Dec 12, 2025LECTURE

Tuesday, June 24, 2025

ZIP Code RUCA Approximation,

 

https://depts.washington.edu/uwruca/ruca-approx.php?utm_source=chatgpt.com


All of Us survey, data codebooks

 All of Us survey, data codebooks

https://docs.google.com/spreadsheets/d/1pODkE2bFN-kmVtYp89rtrJg7oXck4Fsex58237x47mA/edit?usp=sharing


Friday, June 20, 2025

A model for the assembly map of bordism-invariant functors

 The paper "A model for the assembly map of bordism-invariant functors" by Levin, Nocera, and Saunier (2025) develops advanced categorical frameworks for algebraic topology, particularly through oplax colimits of stable/hermitian/Poincaré categories and bordism-invariant functors123. While not directly addressing machine learning (ML) or large language models (LLMs), its contributions could indirectly influence these fields through three key pathways:

1. Enhanced Categorical Frameworks for ML

The paper's formalization of oplax colimits and Poincaré-Verdier localizing invariants13 provides new mathematical tools for structuring compositional systems. This could advance:

  • Model Architecture Design: By abstracting relationships between components (e.g., neural network layers) as bordism-invariant functors, enabling more rigorous analysis of model behavior under transformations5.

  • Geometric Deep Learning: Topological invariants and assembly maps could refine methods for learning on non-Euclidean data (e.g., graphs, manifolds) by encoding persistence of features under deformations5.

2. Invariance Learning and Equivalence

The bordism-invariance concept—where structures remain unchanged under continuous deformations—offers a mathematical foundation for invariance principles in ML:

  • Data Augmentation: Formalizing "bordism equivalence" could guide the design of augmentation strategies that preserve semantic content (e.g., image rotations as "topological bordisms")5.

  • Robust Feature Extraction: Kernels of Verdier projections13 might model noise subspaces to exclude during feature learning, improving adversarial robustness.

3. LLMs for Structured Reasoning

The paper’s explicit decomposition of complex functors (e.g., Shaneson splittings with twists13) parallels challenges in LLM-based reasoning:

  • Program Invariant Prediction: LLMs that infer program invariants6 could adopt categorical decompositions to handle twisted or hierarchical constraints (e.g., loop invariants in code).

  • Categorical Data Embeddings: LLM-generated numerical representations of categorical data4 might leverage bordism-invariance to ensure embeddings respect equivalence classes (e.g., "color" as a deformation-invariant attribute).

Limitations and Future Directions

The work is highly theoretical, with no direct ML/LLM applications in the paper. Bridging this gap requires:

  • Translating topological bordisms into data-augmentation pipelines.

  • Implementing Poincaré-Verdier invariants as regularization terms in loss functions.

  • Extending LLM-based invariant predictors6 to handle categorical assembly maps.

While speculative, these connections highlight how advanced category theory could enrich ML’s theoretical foundations and LLMs’ reasoning capabilities.

  1. https://arxiv.org/abs/2506.05238
  2. https://arxiv.org/pdf/2506.05238.pdf
  3. https://www.arxiv.org/pdf/2506.05238.pdf
  4. https://pubmed.ncbi.nlm.nih.gov/39348252/
  5. https://www.aimodels.fyi/papers/arxiv/category-theoretical-topos-theoretical-frameworks-machine-learning
  6. https://openreview.net/pdf?id=mXv2aVqUGG
  7. https://x.com/CTpreprintBot
  8. https://keik.org/profile/mathat-bot.bsky.social
  9. https://www.alphaxiv.org/abs/2506.05238
  10. https://publications.mfo.de/bitstream/handle/mfo/4263/OWR_2024_47.pdf?sequence=1&isAllowed=y
  11. https://x.com/CTpreprintBot/status/1930943445977518380
  12. https://www.themoonlight.io/en/review/a-model-for-the-assembly-map-of-bordism-invariant-functors
  13. https://library.slmath.org/books/Book69/files/wholebook.pdf
  14. https://www.reed.edu/math-stats/thesis.html
  15. https://math.mit.edu/events/talbot/2020/syllabus2020.pdf
  16. https://webhomes.maths.ed.ac.uk/~v1ranick/papers/quinnass.pdf
  17. https://msp.org/agt/2009/9-4/agt-v9-n4-p16-s.pdf
  18. https://webhomes.maths.ed.ac.uk/~v1ranick/papers/owsem.pdf

Friday, June 6, 2025

Mendely preprint error

 In Mendely, if an article has unspecified type, it often list it as "preprint". 

To fix this, just change the document type to 'jounral article' or 'conference proceedings' or other appropriate type. 

Thursday, May 29, 2025

Beyond Attention: Toward Machines with Intrinsic Higher Mental States

 

Beyond Attention: Toward Machines with Intrinsic Higher Mental States

https://techxplore.com/news/2025-05-architecture-emulates-higher-human-mental.html#google_vignette


Wednesday, May 28, 2025

USDA Rural-Urban Commuting Area Codes

 USDA

Rural-Urban Commuting Area Codes

https://www.ers.usda.gov/data-products/rural-urban-commuting-area-codes?utm_source=chatgpt.com


three digit 360 zipcode in Alabama

 Based on our discussion, I have looked further into the ‘360’ zipcode region, and found it contain 14 counties below:

 

counties = [    "Autauga County", "Barbour County", "Bullock County", "Butler County",

    "Chilton County", "Coosa County", "Covington County", "Crenshaw County",

    "Elmore County", "Lowndes County", "Macon County", "Montgomery County",

    "Pike County", "Tallapoosa County"]

 

Amon them, there are “Barbour”, “Bullock”, and “Macon” counties.

 

So, not sure how useful the three-digit zip code of ‘360’ might be relevant to TU MCH project. 



pastbin

 https://pastebin.com/uBcFUXCA

  1. import pandas as pd
  2. dataset = %env WORKSPACE_CDR
  3. query = """
  4. SELECT
  5. p.person_id AS person_id,
  6. c.concept_name AS state_name
  7. FROM `{dataset}.person` AS p
  8. LEFT JOIN `{dataset}.concept` AS c ON p.state_of_residence_concept_id = c.concept_id
  9. WHERE c.concept_name LIKE 'PII State: %'
  10. """
  11.  
  12. state_df = pd.read_gbq(query.format(dataset = dataset))
  13.  
  14. state_df['state_of_residence'] = state_df['state_name'].str.replace('PII State: ', '')
  15.  
  16. state_df.head()
  17. state_df.shape
  18.  
  19. homeless_state_df=state_df[state_df['person_id'].isin(homeless_respiratory_status['person_id'])]
  20. homeless_state_df['state_of_residence'].value_counts()


All of us research platform, data explore for rural health research.

 All of us research platform, data explore for rural health research.

Zip3, observational table


Workspaces > Beginner Intro to AoU Data and the Workbench > Analysis >

In ‘test-state_data_v060523’, found sample on state, county, and zip3.

 

 

Monday, May 26, 2025

ODU CS and DSC courses taught by Hong Qin

 

https://catalog.odu.edu/courses/cs/#graduatecoursestext

https://catalog.odu.edu/courses/dasc/


CS 781  AI for Health Sciences  (3 Credit Hours)  

This course explores the application of AI in health sciences, focusing on machine learning, NLP, computer vision, generative AI techniques for diagnostics, treatment planning, patient monitoring, and biomedical research. It covers precision medicine, ethical AI, and the integration of AI into practice. Students will gain a deep understanding and practical skills to develop innovative AI solutions that address real-world challenges in health sciences.

Prerequisites: Prior programming experience  
CS 782  Generative AI  (3 Credit Hours)  

This course provides a deep dive into the foundations and current advancements in generative AI. It covers key concepts such as transformer models, GANs, VAEs, LLMs, and their applications across various fields, emphasizing both theory and hands-on learning, including ethical considerations such as fairness and bias mitigation. Students will develop a comprehensive understanding of generative AI and gain practical experience.

Prerequisites: Prior programming experience  

CS 881  AI for Health Sciences  (3 Credit Hours)  

This course explores the application of AI in health sciences, focusing on machine learning, NLP, computer vision, generative AI techniques for diagnostics, treatment planning, patient monitoring, and biomedical research. It covers precision medicine, ethical AI, and the integration of AI into practice. Students will gain a deep understanding and practical skills to develop innovative AI solutions that address real-world challenges in health sciences.

Prerequisites: Prior programming experience  
CS 882  Generative AI  (3 Credit Hours)  

This course provides a deep dive into the foundations and current advancements in generative AI. It covers key concepts such as transformer models, GANs, VAEs, LLMs, and their applications across various fields, emphasizing both theory and hands-on learning, including ethical considerations such as fairness and bias mitigation. Students will develop a comprehensive understanding of generative AI and gain practical experience.

Prerequisites: Prior programming experience  

DASC 781  AI for Health Sciences  (3 Credit Hours)  

This course explores the application of AI in health sciences, focusing on machine learning, NLP, computer vision, generative AI techniques for diagnostics, treatment planning, patient monitoring, and biomedical research. It covers precision medicine, ethical AI, and the integration of AI into practice. Students will gain a deep understanding and practical skills to develop innovative AI solutions that address real-world challenges in health sciences.

Prerequisites: Prior programming experience  
DASC 782  Generative AI  (3 Credit Hours)  

This course provides a deep dive into the foundations and current advancements in generative AI. It covers key concepts such as transformer models, GANs, VAEs, LLMs, and their applications across various fields, emphasizing both theory and hands-on learning, including ethical considerations such as fairness and bias mitigation. Students will develop a comprehensive understanding of generative AI and gain practical experience.

Prerequisites: Prior programming experience  

DASC 881  AI for Health Sciences  (3 Credit Hours)  

This course explores the application of AI in health sciences, focusing on machine learning, NLP, computer vision, generative AI techniques for diagnostics, treatment planning, patient monitoring, and biomedical research. It covers precision medicine, ethical AI, and the integration of AI into practice. Students will gain a deep understanding and practical skills to develop innovative AI solutions that address real-world challenges in health sciences.

Prerequisites: Prior programming experience  
DASC 882  Generative AI  (3 Credit Hours)  

This course provides a deep dive into the foundations and current advancements in generative AI. It covers key concepts such as transformer models, GANs, VAEs, LLMs, and their applications across various fields, emphasizing both theory and hands-on learning, including ethical considerations such as fairness and bias mitigation. Students will develop a comprehensive understanding of generative AI and gain practical experience.

Prerequisites: Prior programming experience  


Saturday, May 24, 2025

logit, sigmoid function, and probability

step-by-step through how the sigmoid function turns a logit into a probability, with both math and plain English at each step.


✅ Step 1: What is a logit?

The logit of a probability pp is defined as:

logit(p)=log(p1p)\text{logit}(p) = \log\left(\frac{p}{1 - p}\right)

This is called the log-odds, because it’s the natural log of the odds.


✅ Step 2: What if we are given the logit and want to get the probability?

Let’s say we are given a number xx, and we know it’s a logit.

That means:

x=log(p1p)x = \log\left(\frac{p}{1 - p}\right)

We want to solve for pp, the probability.


✅ Step 3: Exponentiate both sides

To get rid of the logarithm, we exponentiate both sides:

ex=p1pe^x = \frac{p}{1 - p}

This tells us the odds in terms of xx.


✅ Step 4: Solve for pp

Now we solve this equation for pp.

Start with:

ex=p1pe^x = \frac{p}{1 - p}

Multiply both sides by 1p1 - p:

ex(1p)=pe^x (1 - p) = p

Distribute the left-hand side:

exexp=pe^x - e^x p = p

Now get all the pp's on one side:

ex=p+expe^x = p + e^x p

Factor out pp on the right-hand side:

ex=p(1+ex)e^x = p(1 + e^x)

Now divide both sides by 1+ex1 + e^x:

p=ex1+exp = \frac{e^x}{1 + e^x}

✅ Step 5: Simplify into the sigmoid form

We now simplify that expression:

p=ex1+ex=11+exex=11+exp = \frac{e^x}{1 + e^x} = \frac{1}{\frac{1 + e^x}{e^x}} = \frac{1}{1 + e^{-x}}

That’s the sigmoid function:

p=σ(x)=11+exp = \sigma(x) = \frac{1}{1 + e^{-x}}

✅ Final Result

So if you’re given a logit value xx, the sigmoid function:

σ(x)=11+ex\sigma(x) = \frac{1}{1 + e^{-x}}

turns it into a probability p(0,1)p \in (0, 1).


🔁 Why this is useful

  • In logistic regression, we compute:

    x=wTx+b(a linear score)x = w^T x + b \quad \text{(a linear score)}
  • Then use the sigmoid to convert that score (logit) into a probability of class 1:

    p=11+exp = \frac{1}{1 + e^{-x}}

This way, the model’s output is interpretable as a probability — suitable for binary classification.

Let me know if you want a plot or visualization to make it even clearer!