Showing posts with label GISAID. Show all posts
Showing posts with label GISAID. Show all posts

Wednesday, December 17, 2025

no entries in GISIAD metadata.tsv have more than one VOC assignment

 (dpgr310) [hqin@ip-10-3-4-198 dpgr_build_training_data]$ git pull

Warning: Permanently added 'github.com,140.82.114.4' (ECDSA) to the list of known hosts.

remote: Enumerating objects: 6, done.

remote: Counting objects: 100% (6/6), done.

remote: Compressing objects: 100% (6/6), done.

remote: Total 6 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0)

Unpacking objects: 100% (6/6), 10.52 KiB | 73.00 KiB/s, done.

From github.com:QinLab/dpgr_build_training_data

   3637fc5..cda8662  main                                       -> origin/main

 * [new branch]      codex/check-for-entries-with-multiple-vocs -> origin/codex/check-for-entries-with-multiple-vocs

Updating 3637fc5..cda8662

Fast-forward

 README.md                      |   8 +++++++

 scripts/check_multiple_vocs.py | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 2 files changed, 175 insertions(+)

 create mode 100644 scripts/check_multiple_vocs.py

(dpgr310) [hqin@ip-10-3-4-198 dpgr_build_training_data]$ 

(dpgr310) [hqin@ip-10-3-4-198 dpgr_build_training_data]$ 

(dpgr310) [hqin@ip-10-3-4-198 dpgr_build_training_data]$ python  scripts/check_multiple_vocs.py 

Metadata path: metadata/metadata.tsv

Total rows scanned: 17499171

Rows containing multiple VOC labels: 0

Sunday, October 2, 2022

GISAID and NCBI sars-cov-2 reference genomes

NCBI uses wuhan-hu-1

GISAID uses wuhan-hu-4. 

The two sequences only differ in the polyA tails. Qin verified this using MEGA alignment. 


So, wuhan-hu-1 and wuhan-hu-4 share the same gene coordinates, same GFF 



Sunday, July 17, 2022

GISAID alignment data

 

The msa alignment in GISAID in 2022 is only limited to the reference sequence. So, the position is in the alignment should match the reference genome. 

GISAID reference wuhan-hu-4

hCoV-19/Wuhan/WIV04/2019|EPI_ISL_402124, full length 29891 nucleotides. 



Monday, January 31, 2022

omicron, sars-cov-2 variants of concerns

# Omicron GRA (B.1.1.529+BA.* 

• B.1.1.7 (Alpha): isolate C69.1, GISAID ID EPI_ISL_3277382; 
• B.1.351 (Beta): isolate C24.1, GISAID ID EPI_ISL_1123262; 
• B.1.617.2 (Delta): isolate SARS-CoV-2-hCoV-19/USA/NYMSHSPSP-PV29995/2021, GISAID ID EPI_ISL_2290769; 
• B.1.1.529 (Omicron): isolate E16.1, GISAID ID EPI_ISL_6902053


Ref: 

https://cdn.who.int/media/docs/default-source/blue-print/janine-kimpel_c19_whoconsulation_15dec2021.pdf?sfvrsn=2f6fc437_7

https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/

https://cov-lineages.org/lineage_list.html




Thursday, September 2, 2021

GISAID lineage information

From: 

https://www.gisaid.org/references/statements-clarifications/clade-and-lineage-nomenclature-aids-in-genomic-epidemiology-of-active-hcov-19-viruses/

Clade and lineage nomenclature aids in genomic epidemiology studies of active hCoV-19 viruses

Due to the naturally expanding genetic diversity of hCoV-19 viruses, GISAID introduced a nomenclature system for major clades, developed by Sebastian Maurer-Stroh et al, based on marker mutations within 8 high-level phylogenetic groupings from the early split of S and L, to the further evolution of L into V and G, and later of G into GH, GR and GV, and more recently GR into GRY.

GISAID clades are augmented with more detailed lineages assigned by the Phylogenetic Assignment of Named Global Outbreak LINeages (Pango lineage) tool, aiding in the understanding of patterns and determinants of the global spread of the pandemic strain causing COVID-19. A third effort uses a Year-Letter nomenclature to facilitate discussion of large-scale diversity patterns of hCoV-19 and label clades that persist for at least several months and have significant geographic spread. 



The list of the marker variants is as follows:

   S: C8782T,T28144C includes NS8-L84S
   L: C241,C3037,A23403,C8782,G11083,G26144,T28144 (early clade markers in WIV04-reference sequence)
   V: G11083T,G26144T NSP6-L37F + NS3-G251V
   G: C241T,C3037T,A23403G includes S-D614G
   GK: C241T,C3037T,A23403G,C22995A S-D614G + S-T478K
   GH: C241T,C3037T,A23403G,G25563T includes S-D614G + NS3-Q57H
   GR: C241T,C3037T,A23403G,G28882A includes S-D614G + N-G204R
   GV: C241T,C3037T,A23403G,C22227T includes S-D614G + S-A222V
   GRY: C241T,C3037T,21765-21770del,21991-21993del,A23063T,A23403G,G28882A includes S-H69del, S-V70del, S-Y144del, S-N501Y + S-D614G + N-G204R