This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
Saturday, November 28, 2020
TBI and FBI background checks
Friday, November 27, 2020
graph neural networks
"Graph neural network based methods such as GraphSAGE (Hamilton et al., 2017a) typically define a unique computational
graph for each node, allowing it to perform efficient information aggregation for nodes with different degrees."
Graph Attention Network (GAT) (Veliˇckovi´c et al., 2017) utilizes a self-attention mechanism in the information
aggregation process. Motivated by these properties, we propose our method Hyper-SAGNN based on the self-attention mechanism within each
tuple to learn the function f.
https://www.kdnuggets.com/2019/08/neighbours-machine-learning-graphs.html
Graph Convolutional Network (GCN) [Kipf2016]
"The normalised adjacency matrix encodes the graph structure and upon multiplication with the design matrix effectively smooths a node’s feature vector based on those of its immediate neighbours in the graph. A’ is normalised such that each neighbouring node’s contribution is proportional to how connected that node is in the graph."
"The layer definition is completed by the application of an element-wise non-linear function, e.g., ReLu, to A’FW+b. The output matrix Z of this layer can be used as input to another GCN layer or any other type of neural network layer, allowing the creation of deep neural architectures able to learn a complex hierarchy of node features needed for the downstream node classification task."
"Training a 2-layer GCN model (done in this script using our open-source Python library StellarGraph) with 32 output units per layer on the Cora dataset with just 140 training node labels seen by the model results in a considerable boost in classification accuracy when compared to the baseline 2-layer MLP. Accuracy on predicting the subject of a hold-out test set of papers increases to approximately 81% — an improvement of 21% over the MLP that only uses the BoW node features and ignores citation relationships between the papers. This clearly demonstrates that at least for some datasets utilising relationship information in the data can significantly boost performance in a predictive task."
Reference:
Kipf, T. N., & Welling, M. (2016). “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907.
von drop et al 2020 Nat Communication, no evidence for increased transmission from recurrent mutations in SAERS-COV-2
The authors proposed a sister clade comparison method to detect transmission changes. The paper claim it is unbiased based on simulation and random permutation.
in von dorp 2020, D614G has linkage disequilibrium with 3 other SNPs, and claim that no evidence support that D614G is associated with higher transmission.
Potential caveats include low genetic diversity.
HQ thinks the author did not address the methological problem of the phylogenetical method of bifurcation. What if it is a phylogenetic network? The author acknowledged that bifurcation and clade often have low support.
Thursday, November 26, 2020
single cell DNA sequencing (genome sequencing)
=>QIAGEN single cell DNA library prep
https://www.qiagen.com/us/products/next-generation-sequencing/library-preparation/qiaseq-fx-single-cell-dna-library-kit/?cmpid=PC_GEN_single-cell-analysis-sales_0620_SEA_GA&clear=true#orderinginformation
The QIAseq FX Single Cell DNA Library kit provides a complete solution for whole genome sequencing from isolated single animal or bacterial cells or low amounts of genomic DNA. The kit includes all reagents required for cell lysis, whole genome amplification, enzymatic DNA fragmentation and PCR-free NGS library preparation. The kit provides comprehensive genome coverage and exceptional sequence fidelity, reducing false positives and minimizing drop-outs. The kit is ideally suited to the analysis of aneuploidy and copy number variation and sequence variation in single cells of for whole genome sequencing from rare samples
Wednesday, November 25, 2020
RM11a and s288c differ 0.5-1%
"sequence divergence between RM11 and S288C is estimated to be 0.5-1%, approaching that between human and chimp. This sequence variation is distributed throughout the genome, confirming that RM11 shares no recent history with S288C."
W303 is a relative to S288c
Open Biol. 2012 Aug; 2(8): 120093.
The Saccharomyces cerevisiae W303-K6001 cross-platform genome sequence: insights into ancestry and physiology of a laboratory mutt
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3438534/
The consensus W303-K6001 genome differs in 8133 positions from S288c, predicting altered amino acid sequence in 799 proteins, including factors of ageing and stress resistance. The W303-K6001 (85.4%) genome is virtually identical (less than equal to 0.5 variations per kb) to S288c, and thus originates in the same ancestor.
Several of these clusters are shared with Σ1278B, another widely used S288c-related model, indicating that these strains share a second ancestor. Thus, the W303-K6001 genome pictures details of complex genetic relationships between the model strains that date back to the early days of experimental yeast genetics.
RM11-a
BY4742 alpha
Tuesday, November 24, 2020
scRNA tutorial
https://hbctraining.github.io/scRNA-seq_online/lessons/04_SC_quality_control.html
Cell-level filtering (for Human cells?)
Now that we have visualized the various metrics, we can decide on the thresholds to apply which will result in the removal of low quality cells. Often the recommendations mentioned earlier are a rough guideline, and the specific experiment needs to inform the exact thresholds chosen. We will use the following thresholds:
- nUMI > 500
- nGene > 250
- log10GenesPerUMI > 0.8
- mitoRatio < 0.2
# Filter out low quality cells using selected thresholds - these will change with experiment
filtered_seurat <- subset(x = merged_seurat,
subset= (nUMI >= 500) &
(nGene >= 250) &
(log10GenesPerUMI > 0.80) &
(mitoRatio < 0.20))
jackson20 elife Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments
11 different TF deletion strains
a digital expression matrix was provided in the supporting doc.
from scRNA counts per gene to gene regulatory network, the author used "inferelator", a regression based method.
A gold-standard prior TF-network was cited from Tchourine, 2018. which include 1403 signed (-1, 0, 1) interactions in a 998 genes by 98 transcription factor regulatory matrix.
Another unsigned gene network cited Teixeira 2018 that include 11486 interactions in a 3912 genes by 152 TFs.
A multi-task fitting was used to infer a gene network to 11 conditions.
The cellranger pipeline is available from 10x Genomics under the MIT license (https://github.com/
10XGenomics/cellranger). The fastqToMat0 pipeline is available from GitHub (https://github.com/
flatironinstitute/fastqToMat0; Jackson, 2020; copy archived at https://github.com/elifesciences-pub-
lications/fastqToMat0) and is released under the MIT license. Genome sequence and annotations
are included as Source code 4.
# Included in this archive are the following data files:
# data/
# 103118_SS_Data.tsv.gz (TSV count matrix of single-cell yeast reads [cells x genes])
# 110518_SS_NEG_Data (TSV count matrix of simulated single-cell reads [cells x genes])
# TRIZOL_BULK.tsv (TSV count matrix of data prepared with TRIZOL [samples x genes])
# yeast_gene_names.tsv (TSV with Systematic Names and Common Names for yeast genes)
# STable5.tsv (TSV copy of Supplemental Table 5 with cross-validation results)
# STable6.tsv (TSV copy of Supplemental Table 6 with genes grouped into categories)
# go_slim_mapping.tab (GO Slim Terms: https://downloads.yeastgenome.org/curation/literature/go_slim_mapping.tab)
# go_slim_labels.tsv (TSV with shorter figure names for GO slim terms)
# GASCH_2017_COUNTS.tsv (TSV from GSE102475; Gasch 2017 BY4741 single-cell TPM data in mid-log YPD [cells x genes])
# LEWIS_ALL.tsv (TSV from GSE135430; Scholes 2019 BY4741 bulk TPM data in mid-log YPD[samples x genes])
# LARS_2019_COUNTS.tsv (TSV from GSE122392; Nadal-Ribelles 2019 BY4741 single-cell count data in mid-log YPD [cells x genes])
# inferelator/
# jackson_2019_figureXX.py (Python script to run the network inference with the inferelator v0.3.0 for the associated figure)
# network/
# signed_network.tsv (TSV signed [-1, 0, 1] network of regulatory relationships [genes x TFs])
# COND_signed_network.tsv (TSV signed [-1, 0, 1] network of regulatory relationships [genes x TFs] for each of 11 conditions)
# priors/
# Tchourine_gold_standard.tsv.gz (TSV with gold standard from Tchourine et al 2018)
# ATAC-motif_priors.tsv.gz (TSV with atac-motif priors from Castro et al 2019)
# YEASTRACT_priors_20181118.tsv.gz (TSV with YEASTRACT priors downloaded from YEASTRACT 11/18/2018)
# YEASTRACT_20190713_BOTH.tsv (TSV with YEASTRACT priors downloaded from YEASTRACT 07/13/2019)
# YEASTRACT_20190713_DNABINDING.tsv (TSV with YEASTRACT DNA-binding interaction data downloaded from YEASTRACT 07/13/2019)
# YEASTRACT_20190713_EXPRESSION.tsv (TSV with YEASTRACT expression change interaction data downloaded from YEASTRACT 07/13/2019)
# BUSSEMAKER_priors_2008.tsv.gz (TSV with priors from Ward & Bussemaker 2008)
Source code 1
- https://cdn.elifesciences.org/articles/51254/elife-51254-code1-v3.tar.gz
Source code 2
- https://cdn.elifesciences.org/articles/51254/elife-51254-code2-v3.tsv.gz
Source code 3
- https://cdn.elifesciences.org/articles/51254/elife-51254-code3-v3.tsv
Source code 4
- https://cdn.elifesciences.org/articles/51254/elife-51254-code4-v3.tar.gz
Source code 5
- https://cdn.elifesciences.org/articles/51254/elife-51254-code5-v3.zip
Supplementary file 1
- https://cdn.elifesciences.org/articles/51254/elife-51254-supp1-v3.xlsx
Transparent reporting form
- https://cdn.elifesciences.org/articles/51254/elife-51254-transrepform-v3.pdf
Download links
Friday, November 20, 2020
docker and jupyter notebook
docker and jpyter-notebook
https://mtetiresearch.com/how-to-use-docker-and-jupyter-notebook/
Thursday, November 19, 2020
Hsiao, Gilad, cell cycle phase in scRNA of human cells
human induced pluoripotent stem cells.
1536 scRNA sequenced, 888 passed quality check (broken cells, mitosis, more than or ccell? So, only 57% cells passed QC.
Hsiao20 normalized scRNA to a standard normal distribution.
The association between early career informal mentorship in academic collaborations and junior author performance
The association between early career informal mentorship in academic collaborations and junior author performance
https://www.nature.com/articles/s41467-020-19723-8#MOESM1
microsoft academic graph
https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/?from=http://research.microsoft.com/mag
Wednesday, November 18, 2020
user admin in ubuntu
How to add new users to Lambda Ubuntu workstation
sudo useradd johndoe
sudo mkdir /home/johndoe
sudo chown johndoe:johndoe johndoe
sudo passwd johndow #set passwd
sudo usermod -a -G qinlab johndoe #add groups
groups johndoe #check groups
sudo deluser johndoe qinlab #remove a non-primary group
santolini and barabasi 2017 PNAS
predicting perturbation patterns from topology of biological networks
sensitivity matrix
The transform from ODE to boolean network is done by signs of the Jacobina matrix.
network correlation method, barzel and biham 2009
quantifing the connectivty of a network: the network correlation function method
barzel and biham 2009, Physical review E
Barabasi's group used his method later on.
genomic deep learning tutorial
Deep Learning in Genomics Primer (Tutorial)
https://colab.research.google.com/drive/160h26Egm0M0jguLg80zkolMjNkzyJUhr?usp=sharing
timeseries note, syed tareq
Tuesday, November 17, 2020
Pytorch versus tensorflow
Biggest difference: Static vs. dynamic computation graphs
Creating a static graph beforehand is unnecessary
Reverse-mode auto-diff implies a computation graph
PyTorch takes advantage of this => We use PyTorch
https://courses.cs.washington.edu/courses/cse446/18wi/sections/section7/446_pytorch_slides.pdf
tf.keras.layers.Dense()
tf.keras.layers.Dense(
units, activation=None, use_bias=True, kernel_initializer='glorot_uniform',
bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None,
activity_regularizer=None, kernel_constraint=None, bias_constraint=None,
**kwargs
)
Dense
implements the operation:
output = activation(dot(input, kernel) + bias)
where activation
is the element-wise activation function passed as the activation
argument, kernel
is a weights matrix created by the layer, and bias
is a bias vector created by the layer (only applicable if use_bias
is True
).
units: dimension of the output space
Output shape:
N-D tensor with shape: (
batch_size
, ...,
units
)
. For instance, for a 2D input with shape (batch_size, input_dim)
, the output would have shape (batch_size, units)
.