HA amino acid, attention model,
https://pmc.ncbi.nlm.nih.gov/articles/PMC12721039/
2.1. Data collection
The data used in this study comprise viral sequences, amino acid properties, and serological assay results. HA sequences of A/H3N2 influenza viruses were retrieved from the GISAID database (https://gisaid.org) and the NCBI Influenza Virus Resource (https://www.ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi?go=database). Amino acid indices and substitution matrices were obtained from the AAindex database (https://www.genome.jp/aaindex/), including physicochemical indices from AAindex1 and substitution matrices from AAindex2. To reduce redundancy among AAindex1 entries, a correlation-based filtering strategy was applied, retaining representative indices with pairwise Pearson correlation coefficients below 0.6. For AAindex2, asymmetric substitution matrices were excluded to ensure consistency in amino acid pair comparisons. The selected indices and matrices were subsequently normalized using z-scores to achieve scale uniformity.
The serological data comprise two sources. The first originates from the classic study by Smith et al. (2004) (https://www.antigenic-cartography.org/Science-2004/), which includes 4,228 valid HI titers measured between 1968 and 2003. The second was compiled from annual and interim reports of the Worldwide Influenza Centre (WIC) (https://www.crick.ac.uk/research/platforms-and-facilities/worldwide-influenza-centre/annual-and-interim-reports), covering 15,077 valid HI titers from 2003 to 2025. Antigenic maps for 1968–2003 and 2003–2025 were constructed using Racmacs (Wilks, 2024), and pairwise antigenic distances were subsequently derived. In total, two datasets were generated, containing 73,441 and 788,544 pairwise distances, respectively.
No comments:
Post a Comment