Sunday, January 31, 2021

journal of emerging investigator, frontier for young minds

 journal of emerging investigator (high school research publication)

https://careernavigator.gradeducation.hms.harvard.edu/journal-emerging-investigators-jei


Frontier for young minds

https://kids.frontiersin.org/article/10.3389/frym.2020.566235




bootcamp, data science crash course

guest login to check screen, audio 

introduce myself, recording locally

anonymous survey, zoom, 

github readme, overview, link to chat. 

went overtime. 

Colab need a gmail account. 




Thursday, January 28, 2021

git with ssh key

 git with ssh key can only run with git URL, not http

One way is to set the URL to git

https://docs.github.com/en/github/using-git/changing-a-remotes-url 

$ git remote set-url origin https://github.com/USERNAME/REPOSITORY.git

cpsc2100, agile

tutor service announcement

https://new.utc.edu/engineering-and-computer-science/center-for-student-success/student-services/peer-peer-tutoring 


breakout rooms

Discussion topics on AGILE software design: 

  • what are your expectations of an excellent software?
  • When you noticed some problematics issues on softwares, apps, websites, have you tried to come up with potential solutions?
  • What ways do you think are effective to develop a software, app, or website?
  • Describe the basic idea of AGILE software development
  • Describe the 12 principles of AGILE using your own words.

Breakout room group report. 

An interesting/funny story on when a software did not function as expected? 

I then spent 5 minutes to go over the AGILE slides. 


The following content were not discussed. 

Breakout room discussion on Git, GitHub

  • what is git, github? 
  • what does "commit" do? 
  • what does "push" do? 
  • what does "pull" do? 


 Github repo cerate, readme, editi, demo





Tuesday, January 26, 2021

Strava global heatmap

 

https://www.strava.com/heatmap#4.00/77.11328/28.80324/hot/all




interview questions

 “Tell me about some of the most difficult problems you worked on and how you solved them.”

https://www.cnbc.com/2021/01/26/elon-musk-favorite-job-interview-question-to-ask-to-spot-a-liar-science-says-it-actually-works.html



Sunday, January 24, 2021

advancing data curation, book , vorgelegt von

 advancing data curation, book , vorgelegt von

https://www.depositonce.tu-berlin.de/bitstream/11303/10811/4/visengeriyeva_larysa.pdf


COVID19 B1.1.7 strain Alpha variant

 
https://en.wikipedia.org/wiki/SARS-CoV-2_Alpha_variant

Estimated transmissibility and severity of novel SARS-CoV-2 Variant of Concern 202012/01 in England
https://virological.org/t/transmission-of-sars-cov-2-lineage-b-1-1-7-in-england-insights-from-linking-epidemiological-and-genetic-data/576

https://www.medrxiv.org/content/10.1101/2020.12.24.20248822v1

One study showed that N501Y had only a weak transmission advantage on its own, rising rapidly only when coupled with the suite of mutations observed in B.1.1.7.

Early empirical assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom, October to November 2020
https://www.medrxiv.org/content/10.1101/2020.12.20.20248581v2

redfinger cloud amulators

 

https://www.cloudemulator.net/?fbclid=IwAR1HzQRjGqemAilVEMwnYGeDHJBra72H5qKwCfp1mcaZufZEPhavX-Wyo8s


Friday, January 22, 2021

comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data

 comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data 

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2599-6

In general, agreement among the tools in calling DE genes is not high. There is a trade-off between true-positive rates and the precision of calling DE genes. Methods with higher true positive rates tend to show low precision due to their introducing false positives, whereas methods with high precision show low true positive rates due to identifying few DE genes. We observed that current methods designed for scRNAseq data do not tend to show better performance compared to methods designed for bulk RNAseq data. Data multimodality and abundance of zero read counts are the main characteristics of scRNAseq data, which play important roles in the performance of differential gene expression analysis methods and need to be considered in terms of the development of new methods.


A few studies have compared differential expression analysis methods for scRNAseq data. Jaakkola et al. [40] compared five statistical analysis methods for scRNAseq data, three of which are for bulk RNAseq data analysis. Miao et al. [41] evaluated 14 differential expression analysis tools, three of which are newly developed for scRNAseq data and 11 of which are old methods for bulk RNAseq data. A recent comparison study [42] assessed six differential expression analysis tools, four of which were developed for scRNAseq and two of which were designed for bulk RNAseq. In this study, we consider all differential gene expression analysis tools that have been developed for scRNAseq data as of October 2018 (SCDE [21], MAST [29], scDD [39], D3E [33], Monocle2 [38], SINCERA [34], DEsingle [36], and SigEMD [37]). We also consider differential gene expression analysis tools that are designed for heterogeneous expression data (EMDomics [31]) and are commonly used for bulk RNAseq data (edgeR [4], DESeq2 [43]).

As of October 2018, we have identified eight software tools for differential expression analysis of scRNAseq data, which are designed specifically for such data [212930333436,37,38] (SCDE, MAST, scDD, D3E, Monocle2, SINCERA, DEsingle, and SigEMD). 



2021 ReSEARCH Dialogues Conference

 Greetings,  

 

The 2021 ReSEARCH Dialogues Conference is going virtual! 

ReSEARCH Dialogues is an annual, campus-wide academic conference celebrating research and creative activities happening on campus and in the Chattanooga community. 

Presenters represent nearly all UTC disciplines, centers, programs and include: 


  • Undergraduate and graduate students, faculty, and staff 
  • Chattanooga community members 
  • Local high school students 
  • Local community college students  

 

RD 2021 will be held during the first-ever UTC Research and Creative Activities Week. The virtual conference format will feature online presentations available on demand, daily, live webcasts including panels and talks, and live Q&A sessions with conference presenters. Learn more about UTC Research and Creative Activities Week and submit an event HERE

 

RD 2021 Conference Details 

WHEN: Monday, April 12 – Thursday, April 15, 2021.
WHERE: Virtual via the Symposium by ForagerOne online conference platform. Live webcasts will be scheduled each day of the conference. 


Ready to Register? Visit the RD website


Other Ways to Participate in RD 2021  

  • Attend live conference webcast events. 
  • Encourage colleagues and students to present at RD 2021, and attend live conference webcasts.  
  • Publicize RD 2021 and UTC Research and Creative Activities Week within your department, college, or unit.  
  • Participate in UTC Research and Creative Activities Week. Submit an event HERE
  • Promote the conference on social media. 
  • Volunteer at RD 2021. To volunteer sign up HERE or contact researchdialogues@utc.edu.  

 

Questions? Contact the RD 2021 Team.  

 

Sincerely, 

RD 2021 Team 

researchdilaogues@utc.edu 

COVID19 hospitalization data

 

https://api.covidtracking.com/v1/states/current.json


Thursday, January 21, 2021

UT salary database

 

https://data.tennessee.edu/salary-database/


UT compensation project, 2020

 

https://hr.tennessee.edu/pay/compensation-project/

https://hr.tennessee.edu/job-families/research/researcher/


https://hr.tennessee.edu/pay/market-ranges/

Researcher 2, MR09, mid-point salary $59.3K 

RESEARCHER 2

  • Provides professional contributions to research programs and projects requiring specialized knowledge and experience.
  • Independently performs components of a research program with general guidance by a senior member of the research team or principal investigator.
  • Performs complex research-related work assignments.
  • Solves a range of straightforward problems that may include proposing and implementing procedural and design modifications.
  • Makes independent judgments and decisions, including theoretical approaches, design of experiments and conclusions.

 

Education/Experience

Typically requires an advanced degree in a relevant field and two years of relevant experience, or an equivalent combination of education, training, and experience.

 

Market Range: MR09

Wednesday, January 20, 2021

cpsc2100 day 2, Jupyter notebook quick start, CoLab

start zoom recording, ( I forgot to record this)

Socrative questions

=> go over self-video presentation again

=>Anaconda: local installation

=>Jupyter-notebook: code blocks, mark-down blocks, run, kernel,  "!", "%"

=>Google CoLab: free cloud,  code blocks, text blocks,   table of content, run codes, file uploading, linking GoogleDrive, download ipynb files

=>Google Cloud Platform ( need subscription)

=>Breakout room discussion:

 anaconda installation, jupyter-notebook, coLab




Friday, January 15, 2021

cpsc2100 day 1

slides, introduce myself 

Socrative, Room HongQin

syllabus

video requirement

video submission with hyper-link. Examples of past student submissions. 

sample student videos

Colab; COVID19 data analysis

https://github.com/hongqin/python-covid19-analysis-sandbox/blob/master/PD_demo_jhu_covid19.ipynb 

why python

x Email list to calendar invitation

Tuesday, January 12, 2021

try to download PDF from google drive (unsuccessful attemp)

 

import sys

sys.path.append("/opt/anaconda3/lib/python3.7/site-packages")


import requests

from bs4 import BeautifulSoup as soup

from urllib import parse

import os

import re

import pandas as pd


tb = pd.read_excel('Submission of Report _ Lesson plan, online-R-coding bootcamp Dec 2020 (Responses).xlsx')

doc_urls = tb['Please upload your report (for students) or lesson plans (for teachers) in PDF format']

type( doc_urls )
doc_urls[1]
content = requests.get(doc_urls[1] )
with open(("test.pdf"), 'wb') as pdf:
    pdf.write(content.content)
dir(content)

#The generated pdf cannot be open. It is an html file. 


Sunday, January 10, 2021

A Tidy Transcriptomics introduction to RNA-Seq analyses

 

A Tidy Transcriptomics introduction to RNA-Seq analyses


https://stemangiola.github.io/bioc_2020_tidytranscriptomics/articles/tidytranscriptomics.html


Sean David Orchestra cancer data science

 

http://app.orchestra.cancerdatasci.org/1#


Friday, January 8, 2021

Thursday, January 7, 2021

identical twin 5.2 SNP

 

https://www.newser.com/story/301012/identical-twins-not-as-identical-as-we-thought.html

"On average, identical twins have 5.2 of these early genetic differences, the researchers found. But about 15% of identical twin pairs have more genetic differences, some of them up to 100, said Stefansson"


Wednesday, January 6, 2021

scDNA-seq

Comparison of Multiple Displacement Amplification (MDA) and Multiple Annealing and Looping-Based Amplification Cycles (MALBAC) in Single-Cell Sequencing

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0114520

 Using sequencing data from single sperms, we quantitatively compare two prevailing amplification methods that extensively applied in single-cell sequencing, multiple displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Our results show that MALBAC, as a combination of modified MDA and tweaked PCR, has a higher level of uniformity, specificity and reproducibility.


this paper argues that MDA has serious amplification bias. 



allele specific protein expression in a diploid yeast hybrid by LC-MS

 Khan, 2012, MSB, allele specific protein expression in a diploid yeast hybrid by LC-MS

https://www.embopress.org/doi/epdf/10.1038/msb.2012.34 

This paper used LC-MS to study protein from yeast hybrid S cerevisie X S. bayanas.  It likely the polymorphism in the same species has too few changes at the protein level. 

COVID racial data traker

 COVID racial data traker

https://covidtracking.com/race/dashboard


yeast allelic differential expression

 https://www.embopress.org/doi/epdf/10.1038/msb.2009.31

genome wide allele and strand-specific expression in yeast. Gagneur 2009. MSB. 

Tiling array. 371 (13%) of the transcripts have >1.5 fold difference. So, total transcripts is about 2854 in this paper. 

In Dang's CR and NR data sets, there are over 3K transcripts. 





Open media forensics challenge, NIST

 

https://mfc.nist.gov/


Tuesday, January 5, 2021

yeast cell DNA, RNA, protein

 DNA: 0.034 pg/diploid cell, 0.017 pg/ haploid cell

RNA:  1.9 pg /diploid cell, 1.2 pg / haploid cell

Protein: 8 pg/ diploid cell, 6 pg/ haploid cell

 Range: Table - link pg/cell

https://bionumbers.hms.harvard.edu/bionumber.aspx?id=105079&ver=5

Reference: Sherman, getting started with yeast. Mehods Enzymol. 2002. 

So, if Illuminia require 1 ng for sequencing, a single cell DNA needs to be amplified by:

  1ng / 0.017 pg = 1000 pg / 0.017 pg ~ 1000* 1000 / 20 = 50, 000 X amplification. 

50,1000 = 2**15, so 15 rounds of PCR amplification. 






UTC graduate assistantship

 

https://new.utc.edu/research/graduate-school/student-resources/graduate-assistantships#SACS


H3K36me3

 

https://en.wikipedia.org/wiki/H3K36me3#:~:text=H3K36me3%20is%20an%20epigenetic%20modification,have%20many%20important%20biological%20processes.


single cell DNA reagent kits, library preparation

Solid Phase Reversible Immobilization (SPRI) beads are used to optimize the DNA size range for library preparation. see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6944320/



This paper sheared all single-cell and bulk DNA, https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0105585 , "A Quantitative Comparison of Single-Cell Whole Genome Amplification Methods", de Bounrcy, 2014. This is an old paper though. This paper said that Illumina require 1 ng of DNA for sequencing. A single bacterial cell has 1 fg (femtogram) DNA, so, 1E6 application is needed. 

 
https://support.10xgenomics.com/single-cell-dna/library-prep/doc/user-guide-chromium-single-cell-dna-reagent-kits-user-guide
Solid Phase Reversible Immobilization (SPRI) beads are used to optimize the DNA size range for library preparation. see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6944320/
https://en.wikipedia.org/wiki/Single_cell_sequencing#Single-cell_genome_(DNA)_sequencing





Monday, January 4, 2021

hash and dictionary in R

 

https://stackoverflow.com/questions/7818970/is-there-a-dictionary-functionality-in-r/44570412#44570412

https://cran.r-project.org/web/packages/collections/index.html


scDNA-seq and LOH monitoring in yeast aging

 single cell DNA seq

Natural communication, 2019, Luquette, .. Peter Park, Identification of somatic mutations in single cell DNA-seq using  a spatial model of allelic imbalance. 

somatic SNV (single nucleotide variation). 

variant allele fraction (VAF), the fraction of sequencing reads supporting allele a heterozygous variant. 

Qin: VAF on loss of heterozygosity during aging can be done with young cells as the background. The young cells provide a reference distribution of VAF at allelic positions genome-wide. Comparison between VAF of aging cells and young cells. If loss of heterozygocity occurs a loci, VAF at a loci can only be caused by amplification artifacts, which are expect to be 'random' and has a very small probability to overlap with the natural variations. S288c and RM is 0.5-1%. 

Luquette19 proposed a genome-location spatial model for Amplification Balance to evaluate VAF.  Luquette used nearby known SNP VAF to show the allele imbalance.  Luquette19 used a 'smooth curve' to model the spatial distribution of allele imbalance on chromosomes.