Saturday, November 28, 2020

TBI and FBI background checks

Dear SFS applicant,

According to the Tennessee Bureau of Investigation, “The FBI allows members of the general public to obtain their own criminal history for record or review. However, these background checks cannot be used for employment or licensing. Click the vendor link below for more information or call 1 (877) 783-4187.” See, https://www.tn.gov/tbi/divisions/cjis-division/background-checks.html

So, we wish that you could voluntarily obtain your own criminal history with $50 fee through the TBI’s IdentoGo link at

https://www.identogo.com/services/history-check/fbi-history-check?filter=consumer-services

We wish that you can voluntarily share your background check report with the SFS committee. Please be aware that you have to pay the $50 fee yourself, and UTC will not reimburse this fee of yours.

This background is sufficient for the UTC SFS application

Friday, November 27, 2020

graph neural networks

"Graph neural network based methods such as GraphSAGE (Hamilton et al., 2017a) typically define a unique computational

graph for each node, allowing it to perform efficient information aggregation for nodes with different degrees."

Graph Attention Network (GAT) (Veliˇckovi´c et al., 2017) utilizes a self-attention mechanism in the information

aggregation process. Motivated by these properties, we propose our method Hyper-SAGNN based on the self-attention mechanism within each

tuple to learn the function f.

https://www.kdnuggets.com/2019/08/neighbours-machine-learning-graphs.html

Graph Convolutional Network (GCN) [Kipf2016]

"The normalised adjacency matrix encodes the graph structure and upon multiplication with the design matrix effectively smooths a node’s feature vector based on those of its immediate neighbours in the graph. A’ is normalised such that each neighbouring node’s contribution is proportional to how connected that node is in the graph."

"The layer definition is completed by the application of an element-wise non-linear function, e.g., ReLu, to A’FW+b. The output matrix Z of this layer can be used as input to another GCN layer or any other type of neural network layer, allowing the creation of deep neural architectures able to learn a complex hierarchy of node features needed for the downstream node classification task."

"Training a 2-layer GCN model (done in this script using our open-source Python library StellarGraph) with 32 output units per layer on the Cora dataset with just 140 training node labels seen by the model results in a considerable boost in classification accuracy when compared to the baseline 2-layer MLP. Accuracy on predicting the subject of a hold-out test set of papers increases to approximately 81% — an improvement of 21% over the MLP that only uses the BoW node features and ignores citation relationships between the papers. This clearly demonstrates that at least for some datasets utilising relationship information in the data can significantly boost performance in a predictive task."

Reference:
Kipf, T. N., & Welling, M. (2016). “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907.

https://tkipf.github.io/graph-convolutional-networks/

https://github.com/stellargraph/stellargraph

hypergraph

In hyper graph, an hyper-edge can have more than two vertice.

von drop et al 2020 Nat Communication, no evidence for increased transmission from recurrent mutations in SAERS-COV-2

The authors proposed a sister clade comparison method to detect transmission changes. The paper claim it is unbiased based on simulation and random permutation.

in von dorp 2020, D614G has linkage disequilibrium with 3 other SNPs, and claim that no evidence support that D614G is associated with higher transmission.

Potential caveats include low genetic diversity.

HQ thinks the author did not address the methological problem of the phylogenetical method of bifurcation. What if it is a phylogenetic network? The author acknowledged that bifurcation and clade often have low support.

Thursday, November 26, 2020

single cell DNA sequencing (genome sequencing)

=>QIAGEN single cell DNA library prep

https://www.qiagen.com/us/products/next-generation-sequencing/library-preparation/qiaseq-fx-single-cell-dna-library-kit/?cmpid=PC_GEN_single-cell-analysis-sales_0620_SEA_GA&clear=true#orderinginformation

The QIAseq FX Single Cell DNA Library kit provides a complete solution for whole genome sequencing from isolated single animal or bacterial cells or low amounts of genomic DNA. The kit includes all reagents required for cell lysis, whole genome amplification, enzymatic DNA fragmentation and PCR-free NGS library preparation. The kit provides comprehensive genome coverage and exceptional sequence fidelity, reducing false positives and minimizing drop-outs. The kit is ideally suited to the analysis of aneuploidy and copy number variation and sequence variation in single cells of for whole genome sequencing from rare samples

Wednesday, November 25, 2020

RM11a and s288c differ 0.5-1%

https://www.broadinstitute.org/fungal-genome-initiative/saccharomyces-cerevisiae-rm11-1a-genome-project

"sequence divergence between RM11 and S288C is estimated to be 0.5-1%, approaching that between human and chimp. This sequence variation is distributed throughout the genome, confirming that RM11 shares no recent history with S288C."

W303 is a relative to S288c

Open Biol. 2012 Aug; 2(8): 120093.

doi: 10.1098/rsob.120093

PMCID: PMC3438534

PMID: 22977733

The Saccharomyces cerevisiae W303-K6001 cross-platform genome sequence: insights into ancestry and physiology of a laboratory mutt

Markus Ralser,1,2,† Heiner Kuhl,2,† Meryem Ralser,2 Martin Werber,2 Hans Lehrach,2 Michael Breitenbach,3 and Bernd Timmermann2

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3438534/

The consensus W303-K6001 genome differs in 8133 positions from S288c, predicting altered amino acid sequence in 799 proteins, including factors of ageing and stress resistance. The W303-K6001 (85.4%) genome is virtually identical (less than equal to 0.5 variations per kb) to S288c, and thus originates in the same ancestor.

Several of these clusters are shared with Σ1278B, another widely used S288c-related model, indicating that these strains share a second ancestor. Thus, the W303-K6001 genome pictures details of complex genetic relationships between the model strains that date back to the early days of experimental yeast genetics.

RM11-a

BY4742 alpha

Tuesday, November 24, 2020

scRNA tutorial

https://hbctraining.github.io/scRNA-seq_online/lessons/04_SC_quality_control.html

Cell-level filtering (for Human cells?)

Now that we have visualized the various metrics, we can decide on the thresholds to apply which will result in the removal of low quality cells. Often the recommendations mentioned earlier are a rough guideline, and the specific experiment needs to inform the exact thresholds chosen. We will use the following thresholds:

nUMI > 500
nGene > 250
log10GenesPerUMI > 0.8
mitoRatio < 0.2

# Filter out low quality cells using selected thresholds - these will change with experiment
filtered_seurat <- subset(x = merged_seurat, 
                         subset= (nUMI >= 500) & 
                           (nGene >= 250) & 
                           (log10GenesPerUMI > 0.80) & 
                           (mitoRatio < 0.20))

UMI:

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1438-9

jackson20 elife Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments

11 different TF deletion strains

a digital expression matrix was provided in the supporting doc.

from scRNA counts per gene to gene regulatory network, the author used "inferelator", a regression based method.

A gold-standard prior TF-network was cited from Tchourine, 2018. which include 1403 signed (-1, 0, 1) interactions in a 998 genes by 98 transcription factor regulatory matrix.

Another unsigned gene network cited Teixeira 2018 that include 11486 interactions in a 3912 genes by 152 TFs.

A multi-task fitting was used to infer a gene network to 11 conditions.

Single-Cell processing pipeline
The cellranger pipeline is available from 10x Genomics under the MIT license (https://github.com/
10XGenomics/cellranger). The fastqToMat0 pipeline is available from GitHub (https://github.com/
flatironinstitute/fastqToMat0; Jackson, 2020; copy archived at https://github.com/elifesciences-pub-
lications/fastqToMat0) and is released under the MIT license. Genome sequence and annotations
are included as Source code 4.

## Supplemental Data 1: Jackson et al 2019 ##
# Included in this archive are the following data files:
# data/
# 103118_SS_Data.tsv.gz (TSV count matrix of single-cell yeast reads [cells x genes])
# 110518_SS_NEG_Data (TSV count matrix of simulated single-cell reads [cells x genes])
# TRIZOL_BULK.tsv (TSV count matrix of data prepared with TRIZOL [samples x genes])
# yeast_gene_names.tsv (TSV with Systematic Names and Common Names for yeast genes)
# STable5.tsv (TSV copy of Supplemental Table 5 with cross-validation results)
# STable6.tsv (TSV copy of Supplemental Table 6 with genes grouped into categories)
# go_slim_mapping.tab (GO Slim Terms: https://downloads.yeastgenome.org/curation/literature/go_slim_mapping.tab)
# go_slim_labels.tsv (TSV with shorter figure names for GO slim terms)
# GASCH_2017_COUNTS.tsv (TSV from GSE102475; Gasch 2017 BY4741 single-cell TPM data in mid-log YPD [cells x genes])
# LEWIS_ALL.tsv (TSV from GSE135430; Scholes 2019 BY4741 bulk TPM data in mid-log YPD[samples x genes])
# LARS_2019_COUNTS.tsv (TSV from GSE122392; Nadal-Ribelles 2019 BY4741 single-cell count data in mid-log YPD [cells x genes])
# inferelator/
# jackson_2019_figureXX.py (Python script to run the network inference with the inferelator v0.3.0 for the associated figure)
# network/
# signed_network.tsv (TSV signed [-1, 0, 1] network of regulatory relationships [genes x TFs])
# COND_signed_network.tsv (TSV signed [-1, 0, 1] network of regulatory relationships [genes x TFs] for each of 11 conditions)
# priors/
# Tchourine_gold_standard.tsv.gz (TSV with gold standard from Tchourine et al 2018)
# ATAC-motif_priors.tsv.gz (TSV with atac-motif priors from Castro et al 2019)
# YEASTRACT_priors_20181118.tsv.gz (TSV with YEASTRACT priors downloaded from YEASTRACT 11/18/2018)
# YEASTRACT_20190713_BOTH.tsv (TSV with YEASTRACT priors downloaded from YEASTRACT 07/13/2019)
# YEASTRACT_20190713_DNABINDING.tsv (TSV with YEASTRACT DNA-binding interaction data downloaded from YEASTRACT 07/13/2019)
# YEASTRACT_20190713_EXPRESSION.tsv (TSV with YEASTRACT expression change interaction data downloaded from YEASTRACT 07/13/2019)
# BUSSEMAKER_priors_2008.tsv.gz (TSV with priors from Ward & Bussemaker 2008)

Source code 1 A ‘tar.gz’ archive containing R scripts used to generate Figures 2–7 and accompanying supplementary figures with a README detailing the necessary R environment to run them locally. It also contains a data folder with the raw count matrix as a TSV file (103118_SS_Data.tsv.gz), the simulated negative data count matrix as a TSV file (110518_SS_NEG_Data.tsv.gz), a gene name metadata TSV file (yeast_gene_names.tsv), supplemental tables 5 (STable5.tsv) and 6 (STable6.tsv) as TSV files, and the yeast gene ontology slim mapping as a TAB file (go_slim_mapping.tab). Source code 1 also contains a priors folder with the Gold Standard, the three sets of priors data tested in this work, and the YEASTRACT comparison data, all as TSV files. Source code 1 also contains a network folder with the network learned in this paper (signed_network.tsv) as a TSV file, and the networks for each experimental condition (COND_signed_network.tsv) as 11 separate TSV files. Source code 1 also contains an inferelator folder with the python scripts used to generate the networks for Figures 5, 6, 7.: https://cdn.elifesciences.org/articles/51254/elife-51254-code1-v3.tar.gz
Download elife-51254-code1-v3.tar.gz
Source code 2 The raw count matrix as a gzipped TSV file. This file contains 38,225 observations (cells). Doublets and low-count cells have already been removed; gene expression values are unmodified transcript counts after deartifacting using UMIs (these values are directly produced by the cellranger count pipeline): https://cdn.elifesciences.org/articles/51254/elife-51254-code2-v3.tsv.gz
Download elife-51254-code2-v3.tsv.gz
Source code 3 The network learned in this paper as a TSV file.: https://cdn.elifesciences.org/articles/51254/elife-51254-code3-v3.tsv
Download elife-51254-code3-v3.tsv
Source code 4 A ‘.tar.gz’ archive containing the sequences used for mapping reads. It also contains a FASTA file containing the genotype-specific barcodes (bcdel_1_barcodes.fasta), a FASTA file containing the yeast S288C genome modified with markers (Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.Marker.fa), and a GTF file containing the yeast gene annotations modified to include untranslated regions at the 5’ and 3’ end, and with markers (Saccharomyces_cerevisiae.R64-1-1.Marker.UTR.notRNA.gtf).: https://cdn.elifesciences.org/articles/51254/elife-51254-code4-v3.tar.gz
Download elife-51254-code4-v3.tar.gz
Source code 5 A zipped HTML document containing the raw R output figures for Figures 2–7 and accompanying supplementary Figures. The R markdown file to create this document is contained in Source code 1.: https://cdn.elifesciences.org/articles/51254/elife-51254-code5-v3.zip
Download elife-51254-code5-v3.zip
Supplementary file 1 An excel file containing Supplemental Tables 1-6. Supplemental Table 1 contains all primer sequences used in this work. Supplemental Table 2 contains all Saccharomyces cerevisiae strains used in this work. Supplemental Table 3 contains all plasmids used in this work. Supplemental Table 4 contains all media formulations used in this work. Supplemental Table 5 contains the source data for modeling performance (as AUPR) that is reported graphically in Figure 5. Supplemental Table 6 contains the gene categorizations (cell cycle stage, RP, RiBi, etc) used in Figure 3.: https://cdn.elifesciences.org/articles/51254/elife-51254-supp1-v3.xlsx
Download elife-51254-supp1-v3.xlsx
Transparent reporting form: https://cdn.elifesciences.org/articles/51254/elife-51254-transrepform-v3.pdf
Download elife-51254-transrepform-v3.pdf

Download links

Friday, November 20, 2020

docker and jupyter notebook

docker and jpyter-notebook

https://mtetiresearch.com/how-to-use-docker-and-jupyter-notebook/

Thursday, November 19, 2020

Hsiao, Gilad, cell cycle phase in scRNA of human cells

human induced pluoripotent stem cells.

1536 scRNA sequenced, 888 passed quality check (broken cells, mitosis, more than or ccell? So, only 57% cells passed QC.

Hsiao20 normalized scRNA to a standard normal distribution.

large network, MBI talk

The association between early career informal mentorship in academic collaborations and junior author performance

Bedoor AlShebli,
Kinga Makovi &
Talal Rahwan

https://www.nature.com/articles/s41467-020-19723-8#MOESM1

microsoft academic graph

https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/?from=http://research.microsoft.com/mag

Wednesday, November 18, 2020

user admin in ubuntu

How to add new users to Lambda Ubuntu workstation

sudo useradd johndoe

sudo mkdir /home/johndoe

sudo chown johndoe:johndoe johndoe

sudo passwd johndow #set passwd

sudo usermod -a -G qinlab johndoe #add groups

groups johndoe #check groups

sudo deluser johndoe qinlab #remove a non-primary group

santolini and barabasi 2017 PNAS

predicting perturbation patterns from topology of biological networks

sensitivity matrix

The transform from ODE to boolean network is done by signs of the Jacobina matrix.

network correlation method, barzel and biham 2009

quantifing the connectivty of a network: the network correlation function method

barzel and biham 2009, Physical review E

Barabasi's group used his method later on.

genomic deep learning tutorial

Deep Learning in Genomics Primer (Tutorial)

https://colab.research.google.com/drive/160h26Egm0M0jguLg80zkolMjNkzyJUhr?usp=sharing

timeseries note, syed tareq

1) A link on a write up for fitting time-series data is as follows:

http://people.duke.edu/~rnau/timereg.html

2) Link of few articles written on model derivation on time-series data are as follows:

https://www.sciencedirect.com/science/article/pii/S0196655305009314

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320984/

https://www.sciencedirect.com/science/article/pii/S0048969720319495

Tuesday, November 17, 2020

Pytorch versus tensorflow

Biggest difference: Static vs. dynamic computation graphs

Creating a static graph beforehand is unnecessary

Reverse-mode auto-diff implies a computation graph

PyTorch takes advantage of this => We use PyTorch

https://courses.cs.washington.edu/courses/cse446/18wi/sections/section7/446_pytorch_slides.pdf

tf.keras.layers.Dense()

tf.keras.layers.Dense(
units, activation=None, use_bias=True, kernel_initializer='glorot_uniform',
bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None,
activity_regularizer=None, kernel_constraint=None, bias_constraint=None,
**kwargs
)

Dense implements the operation:

output = activation(dot(input, kernel) + bias)

where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).

units: dimension of the output space

Output shape:

N-D tensor with shape: (batch_size, ..., units). For instance, for a 2D input with shape (batch_size, input_dim), the output would have shape (batch_size, units).

Saturday, November 28, 2020

Friday, November 27, 2020

Thursday, November 26, 2020

Wednesday, November 25, 2020

The Saccharomyces cerevisiae W303-K6001 cross-platform genome sequence: insights into ancestry and physiology of a laboratory mutt

Tuesday, November 24, 2020

Cell-level filtering (for Human cells?)

Source code 1

Source code 2

Source code 3

Source code 4

Source code 5

Supplementary file 1

Transparent reporting form