Anda di halaman 1dari 1

Introduction

Influenza affects 5-15% of the worlds population annually. Vaccination is the primary control measure for influenza and the strains used in the vaccine need to be updated annually by the World Health Organization (WHO). The WHO relies on the hemagglutination inhibition (HIA) assay which is a serum-based measure of antigenic similarity to determine which strains will provide immunity. Since circulating influenza strains are now regularly sequenced quickly and inexpensively such sequence data can be used to predict antigenic similarity between any pair of sequenced strains. Table 1 Example of a set of hemagglu3na3on inhibi3on assay (HIA) results

Binary feature representation: same if the residues are same or different if the residues were different while the non-binary feature representation allows a feature to take more than two values (e.g., a feature may take the value alanine_vs_glycine) Functional group mapping representations: map amino acid to a functional amino group Classification Methods Hamming distance Pepitope distance logistic regression support vector machines (SVMs) Bayesian networks (BNs) naive Bayes (NB) lazy Bayesian rules (LBR) Evaluation Metrics We evaluated the algorithms and the feature representations using leave-one-out crossvalidation and measured the area under the ROC curve (AUC). We compared algorithm performance using Delongs one sided tests. Figure 1 Structure of the Inuenza hemagglu3nin protein which binds to human respiratory tract cells

Sensitivity

0.4

0.6

0.8

Feature Representations Full sequence representation considers all amino acid residues while the epitope sequence representation considers only the epitope regions of the HA1 subunit.

Algorithm Performance
1.0

0.0

Pepitope Hamming Distance Bayes Net Nave Nayes Logistic Regression Support Vector Machine Lazy Bayesian Rules 1.0 0.8 0.6 Specificity 0.4 0.2 0.0

Results
Among the feature representations, epitope sequence performed significantly better than full sequence (p-value = 0.0003), binary features performed statistically significantly better than non-binary features (p-value < 0.0001), and none of the functional group mappings resulted in better performance. Table 2 Algorithm performance comparison using Delongs one sided tests.

Figure 2 ROC curves of leave one out cross valida3ons on epitope sequence using binary features and no mapping. AUCs: Pepitope 0.8155, Hamming 0.8929, BN 0.9336, NB 0.9335, LR 0.8438, SVM 0.8844, LBR 0.9441

Conclusion
These promising results provide support for using sequence-based methods for determining antigenic similarity between candidate vaccine strains and potential epidemic strains. Such methods have the potential to reduce costs, and improve completeness of vaccine strain selection.

Methods
Data HIA data and aligned hemagglutinin subunit 1 (HA1) protein sequence data (262 instances) for 62 unique strains of the H3N2 influenza subtype that circulated during the years 1971 to 2004 from Liao et al. HIA values normalized using the AshfallHorsfall Measure and then dichotimized using a cuttoff of 4.0. i.e., > 4.0 was antigenically similar.

We compared feature representations and algorithms using Wilcoxon rank sum tests. As seen in Table 2 and Figure 2, among the classification methods, the Bayesian methods (BNs, NB, and LBR) performed significantly better than the other methods and overall LBR had significantly higher AUC compared to the other six methods.

Acknowledgements
This research was supported by NLM grant HHSN276201000030C.

0.2

University of Pi.sburgh

Department of Biomedical Informa<cs

Anda mungkin juga menyukai