due March 17, 2023
https://grants.nih.gov/grants/guide/notice-files/NOT-AG-22-040.html
https://grants.nih.gov/grants/guide/pa-files/PAS-19-391.html
https://grants.nih.gov/grants/guide/pa-files/PAS-19-393.html
This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
https://towardsdatascience.com/rip-correlation-introducing-the-predictive-power-score-3d90808b9598
list of undergraduate research journals
https://www.cur.org/engage/undergraduate/journals/listing/
https://undergraduateresearch.virginia.edu/present-and-publish/undergraduate-symposia
Advanced Journal of Graduate Research
Advanced Journal of Graduate Research (ISSN:2456-7108) is a refereed journal dedicated to publishing research work carried out by Bachelor/Master Degree students under the supervision of a faculty member. Normally research work carried out as a part of the undergraduate course or graduate course in the form of final year thesis (course project) will be considered in this specific graduate journal. Any mentored student may submit articles related to all area of Science and Technology including Life Science, Computer Science, Mathematics, Environmental Science, Earth Science, Agriculture Science, Medical Science, Chemical Science, Physical Science. This journal accepts original research article, review article and survey article. Normal publication is free in this journal with open access availability of published article.
Erik Brynjolfsson, promise and peril of human-like AI, Turing trap
Swati Gupta: AI4OPT
Beth Plale: AI accountability
Aaron Smith, ethics of AI in agriculture
Wendell Wallach
AI in the wild,
Kris Hauser, AI FARMS, open-world AI
AI-EDGE, Kaushik Chowdhury,
Machine learning for inverse problems, Alex Dimakis
AI for weather and costal forecasting, Philippe Tissot,
Kathleen Fisher, an analytical framework for AI
Jeff Krolik
AI for social good, Michael Littman
Jim Dolon, NSF PO, Steven Thompson, AIVO/SAIL
Steve Brown, https://youtu.be/1Re9DX7cFRI
tRNE dimention reduction
https://towardsdatascience.com/an-introduction-to-t-sne-with-python-example-5a3a293108d1
electronic microscopy data bank
https://www.ebi.ac.uk/emdb/
New Investigators to Promote Workforce Diversity in Genomics, Bioinformatics, or Bioengineering and Biomedical Imaging Research (R01 Clinical Trial Optional)
https://grants.nih.gov/grants/guide/rfa-files/RFA-HG-21-041.html#_Section_III._Eligibility
How to do a research presentation
Qin AI 101 slides
CoLab tutorial on DNA.
go over student projects, poll about which project to discuss.
go over a example research paper
https://www.nature.com/articles/s42256-022-00536-x
SFS
deep learning prediction
final project
final presentation
State Information
I am sending you the data dictionary of the PULSE data. Once you open the file, you should be able to find a variable, named "EST_ST." As you can see, this variable indicates which state each of the respondents was from.
one-hot encoding
project presentations.
Ruth L. Kirschstein National Research Service Award (NRSA) Individual Predoctoral Fellowship to Promote Diversity in Health-Related Research (Parent F31-Diversity)
https://grants.nih.gov/grants/guide/pa-files/PA-21-052.html#_Section_II._Award
https://www.nia.nih.gov/research/training/f31-individual-fellowships-phd-students
NIA supports three F31 awards, one open to all PhD Students conducting aging research (PA-21-051), one open to students from underrepresented backgrounds conducting aging research (PA-21-052), and one specifically for students from underrepresented backgrounds conducting research on Alzheimer’s Disease and related dementias (PAR-21-218). All F31 recipients must be U.S. citizens or permanent residents at the time of award. View other NIA fellowships available to graduate students.
AD/ADRD
https://grants.nih.gov/grants/guide/pa-files/PAR-21-218.html
independence
separation
sufficiency
group fairness definitions, confusion matrix
Bias mitigation methods: preprocessing; reweighting; inprocessing; adversarial debiasing; postprocessing; reject option based classification
https://en.wikipedia.org/wiki/Fairness_(machine_learning)
https://en.wikipedia.org/wiki/Algorithmic_bias
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q what do you think are the most important concepts or skills that we have discussed today?
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code:
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in:
How to identify mutations or variations?
https://www.ebi.ac.uk/Tools/msa/clustalo/
Use AliView to explain the FASTA alignment file (gaps)
https://github.com/hongqin/Python-CoLab-bootcamp/blob/master/align2snv.ipynb
upload 0613 files to CoLab. This is too slow. So, I switched to run jupyter-notebook on my laptop
* UTC Course Evaluation
* student presentation.
https://www.utc.edu/enrollment-management-and-student-affairs/registrar/individual-studies-contracts
https://www.nature.com/articles/s43588-020-00009-4
zoom recording, live transcript
two student presentations.
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q what do you think are the most important concepts or skills that we have discussed today?
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code:
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in:
https://github.com/hongqin/Python-CoLab-bootcamp/blob/master/notebooks/Chapter_9-v2_Introduction_to_Biopython.ipynb
D614G mutation annotation
TODO:
FASTA format (GISAID)
HQ thanks the USA NSF award 1761839 and #2200138, a catalyst award from the USA National Academy of Medicine, and the support from the Office of Vice Chancellor of Research at the University of Tennessee at Chattanooga.
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q what do you think are the most important concepts or skills that we have discussed today?
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code:
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in:
https://github.com/hongqin/Python-CoLab-bootcamp/blob/master/notebooks/Chapter_9-v2_Introduction_to_Biopython.ipynb
retrieve a CDS? spike gene example.
TODO:
D614G mutation annotation
negative sense
FASTA sequence
CCF journal and conference ranking
https://www.ccf.org.cn/Academic_Evaluation/By_category/
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q what do you think are the most important concepts or skills that we have discussed today?
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code:
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in:
review of bioinformatic basic concepts
+Basic biology concept and data,
Central Dogma: DNA (ATCG), RNA (AUCG), proteins (20 AA). Translation. Transcription. Replication. Genetic table.
Genes. Genomes.
Genome sequences. Expression levels.
+biopython modules
genbank file format; fasta file format
Breakout on final projects.
TODO:
retrieve a CDS?
D614G mutation annotation
https://github.com/QinLab/aln2snv-2020format/blob/a670b48b00dffac244dc6a63bcd12cf6439c73fc/annotate_gwas_with_GFF.ipynb
https://www.fluke.com/en-us/learn/blog/power-quality/single-phase-vs-three-phase-power
In electricity, the phase refers to the distribution of a load. What is the difference between single-phase and three-phase power supplies? Single-phase power is a two-wire alternating current (ac) power circuit. Typically, there is one power wire—the phase wire—and one neutral wire, with current flowing between the power wire (through the load) and the neutral wire. Three-phase power is a three-wire ac power circuit with each phase ac signal 120 electrical degrees apart.
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q what do you think are the most important concepts or skills that we have discussed today?
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code:
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in:
+Basic biology concept and data,
Central Dogma: DNA (ATCG), RNA (AUCG), proteins (20 AA). Translation. Transcription. Replication. Genetic table.
Genes. Genomes.
Genome sequences. Expression levels.
+biopython modules
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q what do you think are the most important concepts or skills that we have discussed today?
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code:
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in:
Pandas on covid
GoogleDrive link in CoLab
final project milestones
how to duplicate code workbook
https://unite.nih.gov/workspace/slate/documents/test-vector-duplication
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q what do you think are the most important concept or skill that we have discussed today?
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code:
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in:
midterm-review (volunteer to share their videos? )
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q what do you think are the most important concept or skill that we have discussed today?
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code:
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in
PET challenge
Python lec 3. Numpy (numpy backend is C).
Go over CoLab and GoogleDrive demo
Midterm exam peer-review
Data overview
Code skeleton
https://github.com/drivendataorg/pets-prize-challenge-runtime
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q what do you think are the most important concept or skill that we have discussed today?
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code:
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in
Conda environment.
Python lec 2. Data types.
MPC and data privacy
note:
threshold cryptography is often superior to block chain in many situation, because block chain are often used as a way of encryption?!
https://en.wikipedia.org/wiki/Threshold_cryptosystem
+ joshua baron, DARPA
+ kurt nielsen, partisia blockchain
+ dan boneh, Stanford, https://crypto.stanford.edu/~dabo/pubs/pubs.html
+ mariana raykova, google/columbia
zero knowledge authentication, distributed prover
https://zenodo.org/record/4743249#.Yzx0pOzMLzf
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q what do you think are the most important concept or skill that we have discussed today?
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code:
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in
Conda environment.
Show the hidden file .RData. A student show that RStudio can disable the loading of workspace.
Python 1a. Quick start, Jupyter-notebook,
adobe image highlights incompatible with Apple Preview (show as yellow area that blocked the original view).
Adobe filled box can work.
NCBI uses wuhan-hu-1
GISAID uses wuhan-hu-4.
The two sequences only differ in the polyA tails. Qin verified this using MEGA alignment.
So, wuhan-hu-1 and wuhan-hu-4 share the same gene coordinates, same GFF
mentorship versus advisement
mentorship means the person as a whole.
advisement is just the academic advisement.
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q what topics or question should we discuss today?
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code: not today
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in
go over final topics in the shared spread sheets
set up breakout room for students to discuss 2022 midterm project 1 and 2.
midterm project
final project
machine learning model attrition challenge
https://mlmac.io/#submission
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q what topics or question should we discuss today?
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code: not today
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in
Go over screencast effort in rubric.
set up breakout room for students to discuss 2022 midterm project 1 and 2.
breakout room discussion on final project topic, coding problems. When breakout rooms are assigned randomly, some room end up with all quiet personalities, and it become really awkward. When people are free to join any room on their own, a third of the class joined the first room.
4 breakout room : 18:17 -->
TODO: Worksheet on final project topic. what references (background on importance)
https://data.cdc.gov/NCHS/Provisional-COVID-19-Deaths-by-Sex-and-Age/9bhg-hcku
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q: what concept and skills do you think are useful today? do you find breakout room discussion helpful? questions on other videos.
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code: not today
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in
Socrative
set up breakout room for students to discuss final project topics
breakout room discussion on final project topic, coding problems. When breakout rooms are assigned randomly, some room end up with all quiet personalities, and it become really awkward. When people are free to join any room on their own, a third of the class joined the first room.
TODO: Worksheet on final project topic. what references (background on importance)
state location
https://aousupporthelp.zendesk.com/hc/en-us/articles/4583333207956-Accessing-geolocation-data-
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q: what concept and skills do you think are useful today? do you find breakout room discussion helpful? questions on other videos.
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code: not today
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in
Go over screencast effort in rubric.
set up breakout room for students to discuss 2022 midterm project 1 and 2.
breakout room discussion on final project topic, coding problems. When breakout rooms are assigned randomly, some room end up with all quiet personalities, and it become really awkward. When people are free to join any room on their own, a third of the class joined the first room.
TODO: Worksheet on final project topic. what references (background on importance)
https://ohdsi.github.io/CommonDataModel/cdm60.html#Clinical_Data_Tables
Clinical data tables
https://aousupporthelp.zendesk.com/hc/en-us/categories/5942702296468-Working-with-Data
user support hub: submit ticket for questions.
== pre-class to do:
add 2021 midterm project 2 videos. DONE
socrative questions (questions on contents from last lecture ): Q: what concept and skills do you think are useful today? Many picked normalization.
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code: not today
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in
set up breakout room for students to discuss midterm project 2.
midterm global regions partitions among students.
Flip classroom: assign previous lectures on input, output, loops, simple stats.
observation:
one student said that it feels nervous to talk and be video recorded.
some breakout room discussion last longer.
students asked how much screencast should be expected. I revised rubric to make this part 10pt. (Need to discuss this next time)
== pre-class to do:
socrative questions (questions on contents from last lecture ): Q: what concept and skills do you think are useful today? Many picked normalization.
update Canvas course materials, update learning objectives. assignments as needed:
Test-run code: Rmd -> HTML report with content.
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in
Review Chapter 2
R-COVID19 Chapter 3. Google Mobility.
zoom, turn live caption on, record
== pre-class to do:
calendar email invitation: done
socrative questions (questions on contents from last lecture ):
update Canvas course materials, update learning objectives. assignments as needed: done
Test-run code: Rmd -> HTML report with content. done
* GitHub has a connection problem today due to hurrican Ida?!. So, we have to download the jhu data manually. *
== In-class to do:
clean up destktop space, calendars,
ZOOM, live transcript (start video recording).
Socrative sign in
Review Chapter 2
R-COVID19 Chapter 3. Google Mobility.
https://github.com/hrbrmstr/cdcfluview
defense against inference attack
https://arxiv.org/pdf/1806.01246.pdf
dropout in deep neural networks
model stacking, Model stacking is a major class of ensemble learning
zoom, turn live caption on, record
The 1 min summary? All of Us data are made available in a Curated Data Repository (CDR). Participants may contribute any combination of survey, physical measurement, and electronic health record data. Not all participants contribute all possible data types. Each unique piece of health data is given a unique identifier called a concept_id and organized into specific tables according to our Common Data Model. You can use these concept_ids to query the CDR and pull data on specific health topics relevant to your analysis. See the Researcher Workbench Support Hub section on Learning the Basics of the All of Us Dataset for more information.
What are concept sets?
https://en.wikipedia.org/wiki/Burrows%E2%80%93Abadi%E2%80%93Needham_logic#:~:text=Burrows%E2%80%93Abadi%E2%80%93Needham%20logic%20(,secured%20against%20eavesdropping%2C%20or%20both.
Referecne: https://www.drivendata.org/competitions/98/nist-federated-learning-1/rules/
privacy-preserving federated learning (PPFL) solutions
democracy-affirming technologies.
the global federated model is trained, the parameters related to the local models could be used to learn about the sensitive information contained in the training data of each client. Similarly, the released global model could also be used to infer sensitive information about the training datasets used.
1.4 GOALS AND OBJECTIVES:
Organizers seek to mature federated learning approaches and build trust in adoption by accelerating the development of efficient PPFL solutions that leverage a combination of input and output privacy techniques to:
Phase 1: Concept Paper. Blue Team Participants will produce a technical white paper (“Concept Paper” or “White Paper”) setting out their proposed solution approach. Technical papers will be evaluated by a panel of judges across a set of weighted criteria. Participants will be eligible to win prizes awarded to the top technical papers, ranked by points awarded.
As you propose your technical solutions, be prepared to clearly describe the technical approaches and sketch out proof of or justification for privacy guarantees. Participants should consider a broad range of privacy threats during the model training and model use phases and consider technical and process aspects including but not limited to cryptographic and non-cryptographic methods, and protection needed within the deployment environment.
Successful technical approaches and proofs of privacy guarantees will include the design of any algorithms, protocols, etc. utilized, as well as formal or informal arguments of how the solution will provide privacy guarantees. Participants will clearly list any additional privacy issues specific to the technological approaches used and justify initial enhancements or novelties compared to the current state-of-the-art. Participant submissions must describe how the solution will cater to the types of data provided to participants and how generalizable the solution is to multiple domains. Expected efficiency/scalability of improvements, privacy vs. utility trade off should be articulated, if possible, at this conceptual stage.
Q: what is the definition of privacy guarantee?
a one-page abstract and a Concept Paper.
Abstract: The one-page abstract must include a title and a brief description of the proposed solution, including the proposed privacy mechanisms and architecture of the federated model. The description should also describe the proposed machine learning model and expected results with regard to accuracy. Successful abstracts will outline how solutions will achieve privacy while minimizing loss to accuracy, a proposed solution, and the anticipated results, as more fully described on the Challenge Website. Abstracts must be submitted by following the instructions on the Challenge Website. Abstracts will be screened by the DrivenData and Organizers’ staff for contest eligibility and used to ensure the composition of the judging panel’s expertise aligns to proposed solutions that will be evaluated throughout the course of the Challenge. Feedback will not be provided.
Concept Paper: The Concept Paper should conceptualize solutions that describe the technical approaches and lay out the proof of privacy guarantees that solve a set of predictive or analytic tasks that support the use cases. Successful Concept Papers will incorporate the originally submitted abstract and be no more than ten pages in length. References will not count towards page length. Participant submissions shall:
I submitted CONCUR Philadelphia ICIBM meeting travel reimbursement request. When I uploaded the hotel, there are errors for allowance. I then created Itinerary to add per diem, and the allowance error went away.
So, next time, I should try create Itinerary first, and then upload hotel.
zoom, turn live caption on, record
https://ai.facebook.com/blog/crypten-a-new-research-tool-for-secure-machine-learning-with-pytorch/
https://crypten.readthedocs.io/en/latest/
holomorphic encryption:
Enc(m1) + Enc(m2) = Enc( m1 + m2)
Enc(m1) x Enc(m2) = Enc( m1 x m2)
So, an untrusted entity can compute addition or multiplication without decryption.
https://en.wikipedia.org/wiki/Homomorphic_encryption
Fully homomorphic encryption (FHE)
From Wikipedia:
In 2016, Cheon, Kim, Kim and Song (CKKS)[35] proposed an approximate homomorphic encryption scheme that supports a special kind of fixed-point arithmetic that is commonly referred to as block floating point arithmetic. The CKKS scheme includes an efficient rescaling operation that scales down an encrypted message after a multiplication. For comparison, such rescaling requires bootstrapping in the BGV and BFV schemes. The rescaling operation makes CKKS scheme the most efficient method for evaluating polynomial approximations, and is the preferred approach for implementing privacy-preserving machine learning applications. The scheme introduces several approximation errors, both nondeterministic and deterministic, that require special handling in practice.[36]
A 2020 article by Baiyu Li and Daniele Micciancio discusses passive attacks against CKKS, suggesting that the standard IND-CPA definition may not be sufficient in scenarios where decryption results are shared.[37] The authors apply the attack to four modern homomorphic encryption libraries (HEAAN, SEAL, HElib and PALISADE) and report that it is possible to recover the secret key from decryption results in several parameter configurations. The authors also propose mitigation strategies for these attacks, and include a Responsible Disclosure in the paper suggesting that the homomorphic encryption libraries already implemented mitigations for the attacks before the article became publicly available. Further information on the mitigation strategies implemented in the homomorphic encryption libraries has also been published.[38][39]