journal of emerging investigator (high school research publication)
https://careernavigator.gradeducation.hms.harvard.edu/journal-emerging-investigators-jei
Frontier for young minds
https://kids.frontiersin.org/article/10.3389/frym.2020.566235
This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
journal of emerging investigator (high school research publication)
https://careernavigator.gradeducation.hms.harvard.edu/journal-emerging-investigators-jei
Frontier for young minds
https://kids.frontiersin.org/article/10.3389/frym.2020.566235
guest login to check screen, audio
introduce myself, recording locally
anonymous survey, zoom,
github readme, overview, link to chat.
went overtime.
Colab need a gmail account.
git with ssh key can only run with git URL, not http
One way is to set the URL to git
https://docs.github.com/en/github/using-git/changing-a-remotes-url
$ git remote set-url origin https://github.com/USERNAME/REPOSITORY.git
tutor service announcement
https://new.utc.edu/engineering-and-computer-science/center-for-student-success/student-services/peer-peer-tutoring
breakout rooms
Discussion topics on AGILE software design:
Breakout room group report.
An interesting/funny story on when a software did not function as expected?
I then spent 5 minutes to go over the AGILE slides.
The following content were not discussed.
Breakout room discussion on Git, GitHub
Github repo cerate, readme, editi, demo
“Tell me about some of the most difficult problems you worked on and how you solved them.”
https://www.cnbc.com/2021/01/26/elon-musk-favorite-job-interview-question-to-ask-to-spot-a-liar-science-says-it-actually-works.html
http://projectbeak.org/
Reference:
https://physics.stackexchange.com/questions/53005/how-to-get-complex-exponential-form-of-wave-equation-out-of-sinusoidal-form
graph convolutional networks (GCN) used Lapalacian matrix = Degree matrix - Adjacency materix
https://towardsdatascience.com/graph-convolutional-networks-deep-99d7fee5706f
advancing data curation, book , vorgelegt von
https://www.depositonce.tu-berlin.de/bitstream/11303/10811/4/visengeriyeva_larysa.pdf
https://www.cloudemulator.net/?fbclid=IwAR1HzQRjGqemAilVEMwnYGeDHJBra72H5qKwCfp1mcaZufZEPhavX-Wyo8s
COVID19 hospitalization data USA
https://healthdata.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-facility
https://healthdata.gov/sites/default/files/reported_hospital_capacity_admissions_facility_level_weekly_average_timeseries_20210117.csv
Hospital meta information:
https://healthdata.gov/dataset/covid-19-hospital-data-coverage-report
https://healthdata.gov/sites/default/files/20210119_Hospital%20Data%20Coverage%20Report.xlsx
Reference: https://sites.google.com/view/hamiltoncounty-tn-covid19/data?authuser=0
comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2599-6
In general, agreement among the tools in calling DE genes is not high. There is a trade-off between true-positive rates and the precision of calling DE genes. Methods with higher true positive rates tend to show low precision due to their introducing false positives, whereas methods with high precision show low true positive rates due to identifying few DE genes. We observed that current methods designed for scRNAseq data do not tend to show better performance compared to methods designed for bulk RNAseq data. Data multimodality and abundance of zero read counts are the main characteristics of scRNAseq data, which play important roles in the performance of differential gene expression analysis methods and need to be considered in terms of the development of new methods.
A few studies have compared differential expression analysis methods for scRNAseq data. Jaakkola et al. [40] compared five statistical analysis methods for scRNAseq data, three of which are for bulk RNAseq data analysis. Miao et al. [41] evaluated 14 differential expression analysis tools, three of which are newly developed for scRNAseq data and 11 of which are old methods for bulk RNAseq data. A recent comparison study [42] assessed six differential expression analysis tools, four of which were developed for scRNAseq and two of which were designed for bulk RNAseq. In this study, we consider all differential gene expression analysis tools that have been developed for scRNAseq data as of October 2018 (SCDE [21], MAST [29], scDD [39], D3E [33], Monocle2 [38], SINCERA [34], DEsingle [36], and SigEMD [37]). We also consider differential gene expression analysis tools that are designed for heterogeneous expression data (EMDomics [31]) and are commonly used for bulk RNAseq data (edgeR [4], DESeq2 [43]).
As of October 2018, we have identified eight software tools for differential expression analysis of scRNAseq data, which are designed specifically for such data [21, 29, 30, 33, 34, 36,37,38] (SCDE, MAST, scDD, D3E, Monocle2, SINCERA, DEsingle, and SigEMD).
Greetings,
The 2021 ReSEARCH Dialogues Conference is going virtual!
ReSEARCH Dialogues is an annual, campus-wide academic conference celebrating research and creative activities happening on campus and in the Chattanooga community.
Presenters represent nearly all UTC disciplines, centers, programs and include:
RD 2021 will be held during the first-ever UTC Research and Creative Activities Week. The virtual conference format will feature online presentations available on demand, daily, live webcasts including panels and talks, and live Q&A sessions with conference presenters. Learn more about UTC Research and Creative Activities Week and submit an event HERE.
RD 2021 Conference Details
WHEN: Monday, April 12 – Thursday, April 15, 2021.
WHERE: Virtual via the Symposium by ForagerOne online conference platform. Live webcasts will be scheduled each day of the conference. 
Ready to Register? Visit the RD website.
Other Ways to Participate in RD 2021
Questions? Contact the RD 2021 Team.
Sincerely,
RD 2021 Team
https://api.covidtracking.com/v1/states/current.json
https://hr.tennessee.edu/pay/compensation-project/
https://hr.tennessee.edu/job-families/research/researcher/
https://hr.tennessee.edu/pay/market-ranges/
Researcher 2, MR09, mid-point salary $59.3K
Typically requires an advanced degree in a relevant field and two years of relevant experience, or an equivalent combination of education, training, and experience.
Market Range: MR09
start zoom recording, ( I forgot to record this)
Socrative questions
=> go over self-video presentation again
=>Anaconda: local installation
=>Jupyter-notebook: code blocks, mark-down blocks, run, kernel, "!", "%"
=>Google CoLab: free cloud, code blocks, text blocks, table of content, run codes, file uploading, linking GoogleDrive, download ipynb files
=>Google Cloud Platform ( need subscription)
=>Breakout room discussion:
anaconda installation, jupyter-notebook, coLab

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0162333
slides, introduce myself
Socrative, Room HongQin
syllabus
video requirement
video submission with hyper-link. Examples of past student submissions.
sample student videos
Colab; COVID19 data analysis
https://github.com/hongqin/python-covid19-analysis-sandbox/blob/master/PD_demo_jhu_covid19.ipynb
why python
x Email list to calendar invitation
import sys
sys.path.append("/opt/anaconda3/lib/python3.7/site-packages")
import requests
from bs4 import BeautifulSoup as soup
from urllib import parse
import os
import re
import pandas as pd
tb = pd.read_excel('Submission of Report _ Lesson plan, online-R-coding bootcamp Dec 2020 (Responses).xlsx')
doc_urls = tb['Please upload your report (for students) or lesson plans (for teachers) in PDF format']
type( doc_urls )
doc_urls[1]
content = requests.get(doc_urls[1] )
with open(("test.pdf"), 'wb') as pdf:
    pdf.write(content.content)
dir(content)
#The generated pdf cannot be open. It is an html file.
https://stemangiola.github.io/bioc_2020_tidytranscriptomics/articles/tidytranscriptomics.html
https://www.newser.com/story/301012/identical-twins-not-as-identical-as-we-thought.html
"On average, identical twins have 5.2 of these early genetic differences, the researchers found. But about 15% of identical twin pairs have more genetic differences, some of them up to 100, said Stefansson"
Comparison of Multiple Displacement Amplification (MDA) and Multiple Annealing and Looping-Based Amplification Cycles (MALBAC) in Single-Cell Sequencing
Using sequencing data from single sperms, we quantitatively compare two prevailing amplification methods that extensively applied in single-cell sequencing, multiple displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Our results show that MALBAC, as a combination of modified MDA and tweaked PCR, has a higher level of uniformity, specificity and reproducibility.
this paper argues that MDA has serious amplification bias.
Khan, 2012, MSB, allele specific protein expression in a diploid yeast hybrid by LC-MS
https://www.embopress.org/doi/epdf/10.1038/msb.2012.34
This paper used LC-MS to study protein from yeast hybrid S cerevisie X S. bayanas. It likely the polymorphism in the same species has too few changes at the protein level.
https://www.embopress.org/doi/epdf/10.1038/msb.2009.31
genome wide allele and strand-specific expression in yeast. Gagneur 2009. MSB.
Tiling array. 371 (13%) of the transcripts have >1.5 fold difference. So, total transcripts is about 2854 in this paper.
In Dang's CR and NR data sets, there are over 3K transcripts.
DNA: 0.034 pg/diploid cell, 0.017 pg/ haploid cell
RNA: 1.9 pg /diploid cell, 1.2 pg / haploid cell
Protein: 8 pg/ diploid cell, 6 pg/ haploid cell
Range: Table - link pg/cell
https://bionumbers.hms.harvard.edu/bionumber.aspx?id=105079&ver=5
Reference: Sherman, getting started with yeast. Mehods Enzymol. 2002.
So, if Illuminia require 1 ng for sequencing, a single cell DNA needs to be amplified by:
1ng / 0.017 pg = 1000 pg / 0.017 pg ~ 1000* 1000 / 20 = 50, 000 X amplification.
50,1000 = 2**15, so 15 rounds of PCR amplification.
https://new.utc.edu/research/graduate-school/student-resources/graduate-assistantships#SACS
https://en.wikipedia.org/wiki/H3K36me3#:~:text=H3K36me3%20is%20an%20epigenetic%20modification,have%20many%20important%20biological%20processes.
single cell DNA seq
Natural communication, 2019, Luquette, .. Peter Park, Identification of somatic mutations in single cell DNA-seq using a spatial model of allelic imbalance.
somatic SNV (single nucleotide variation).
variant allele fraction (VAF), the fraction of sequencing reads supporting allele a heterozygous variant.
Qin: VAF on loss of heterozygosity during aging can be done with young cells as the background. The young cells provide a reference distribution of VAF at allelic positions genome-wide. Comparison between VAF of aging cells and young cells. If loss of heterozygocity occurs a loci, VAF at a loci can only be caused by amplification artifacts, which are expect to be 'random' and has a very small probability to overlap with the natural variations. S288c and RM is 0.5-1%.
Luquette19 proposed a genome-location spatial model for Amplification Balance to evaluate VAF. Luquette used nearby known SNP VAF to show the allele imbalance. Luquette19 used a 'smooth curve' to model the spatial distribution of allele imbalance on chromosomes.
https://en.wikipedia.org/wiki/Weighted_correlation_network_analysis
anterior (ventral), posterior (dorsal), superior (cephalic), caudal, inferior (distal), medial, lateral
https://courses.lumenlearning.com/ap1x94x1/chapter/anatomical-orientation-and-directions/