This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
Tuesday, April 9, 2019
duplicate genes identification:
https://media.nature.com/original/nature-assets/ng/journal/v36/n6/extref/ng1355-S2.pdf
Identification of duplicate genes and singletons After database cleaning, we conducted an all-against-all FASTA3 self-search for the entire proteome of Drosophila melanogaster (http://www.ensembl.org/Drosophila_melanogaster/) and that of Saccharomyces cerevisiae (http://genome-www.stanford.edu/Saccharomyces/). A single copy gene (i.e., a singleton) was defined as a protein that did not hit any other proteins in the FASTA search with E = 0.1; this loose similarity search criterion was used to make sure that a singleton is indeed a singleton. Two genes were regarded as duplicate genes if they meet the following three criteria during FASTA all-against-all search (modified after Ref 4): (1) E = 10-10; (2) their similarity is ≥ I (I= 30% if L ≥ 150 a.a. and I = 0.01n + 4.8L -0.32(1 + exp(-L/1000)) if L <150 a.a., where n = 6 and L is the length of the alignable region); and (3) the length of the alignable region between the two sequences is >50% of the longer protein. Since we wanted to detect the differences in expression change between real duplicate genes and singletons, we
Labels:
protocol
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment