Tuesday, April 9, 2019

duplicate genes identification:


https://media.nature.com/original/nature-assets/ng/journal/v36/n6/extref/ng1355-S2.pdf

Identification of duplicate genes and singletons After database cleaning, we conducted an all-against-all FASTA3 self-search for the entire proteome of Drosophila melanogaster (http://www.ensembl.org/Drosophila_melanogaster/) and that of Saccharomyces cerevisiae (http://genome-www.stanford.edu/Saccharomyces/). A single copy gene (i.e., a singleton) was defined as a protein that did not hit any other proteins in the FASTA search with E = 0.1; this loose similarity search criterion was used to make sure that a singleton is indeed a singleton. Two genes were regarded as duplicate genes if they meet the following three criteria during FASTA all-against-all search (modified after Ref 4): (1) E = 10-10; (2) their similarity is ≥ I (I= 30% if L ≥ 150 a.a. and I = 0.01n + 4.8L -0.32(1 + exp(-L/1000)) if L <150 a.a., where n = 6 and L is the length of the alignable region); and (3) the length of the alignable region between the two sequences is >50% of the longer protein. Since we wanted to detect the differences in expression change between real duplicate genes and singletons, we

No comments:

Post a Comment