Predictive Model for Cervical Cancer Improved With Inclusion of Human Papillomavirus Genotypes


Adding human papillomavirus genotypes to the predictive model had a significantly higher effect in improving accuracy compared to adding epidemiological factors and pelvic examination.

The inclusion of human papillomavirus (HPV) genotypes in a diagnostic prediction model for cervical cancer among women who test positive for high-risk HPV (hrHPV) infection markedly improved the predictive ability of the model, according to the results of a study published in the Journal of the American Medical Association.

Image credit: Naeblys -

Image credit: Naeblys -

Cervical cancer is one of the most common cancers in women worldwide and poses serious threats to their quality of life. Early detection can improve mortality rates, but this can be especially difficult in developing countries, the researchers noted.

Prior studies have constructed prediction models for cervical cancer based on clinical information, but the sample size of the participants was low. Further, hrHPV is recognized as an etiologic agent for cervical cancer, and different HPV genotypes are associated with varying risks of cervical cancer.

Therefore, the authors of the current study aimed to develop and validate a stacking machine learning model for predicting cervical cancer among women who tested positive for hrHPV infection by incorporating HPV genotypes and commonly available clinical information.

The primary outcome of the study was cervical intraepithelial neoplasia grade 3 or worse (CIN3+) and the secondary outcome was cervical intraepithelial neoplasia grade 2 or worse (CIN2+). The ability of the models to discriminate between CIN3+ and CIN2+ was evaluated using the area under the receiver operating characteristic curve (AUROC) in addition to other characteristics.

A total of 314,587 women participated in cervical cancer screening, of whom 24,391 (7.8%) were infected with hrHPV. After excluding those who dropped out, 21,720 women (89.0%) were included in the analysis.

The training data set included 14,553 women, of whom 349 (2.4%) received a diagnosis of CIN3+ and 673 (4.6%) received a diagnosis of CIN2+. The validation data set included 7167 women, with 167 (2.3%) identified as having CIN3+ and 228 (3.2%) having CIN2+.

For all the prediction models evaluated, the AUROC values of models that only included epidemiological factors and pelvic examination results were around 0.64, which was improved by 35.9% ([0.87-0.64]/0.64) with the additional inclusion of HPV genotypes in the model.

Contrastingly, adding epidemiological factors and pelvic examination to a model that already had HPV genotypes included improved the prediction ability, but not by a marked amount, as AUROC changed from 0.85 (95% CI, 0.82-0.88) to 0.87 (95% CI, 0.84-0.90).

With all predictors included, the stacking machine learning model had an AUROC of 0.87 (95% CI, 0.84-0.90), with sensitivity of 80.1%, specificity of 83.4%, positive likelihood ratio of 4.83, and negative likelihood ratio of 0.24. Including HPV genotype in the model for predicting CIN2+ improved the AUROC by 41.7% ([0.85-0.60]/0.60).

Additionally, the stacking model that included all predictors performed best, with an AUROC of 0.85 (95% CI, 0.82-0.88), sensitivity of 80.4%, specificity of 81.0%, positive likelihood ratio of 4.23, and negative likelihood ratio of 0.24.

The investigators discussed the feasibility of this predictive model with the inclusion of HPV genotypes strengthening patients’ risk awareness and helping physicians target women at high risk of cervical cancer to suggest further immediate screening, which may increase compliance.

In addition, the developed prediction model may become a practical tool for cervical cancer screening in low-resource settings, especially areas in which cytological and colposcopic examinations are unavailable, the study authors wrote.

Some limitations of the study discussed by the investigators begin with the predictors being obtained through a self-reported questionnaire, which could lead to reporting bias and recall bias. Secondly, factors such as smoking and oral contraceptive use—which can potentially influence cervical cancer—were not collected in the study. Lastly, the screening program lacked the inclusion of women younger than 30 years of age, which necessitates validation of the model in younger women in future studies.

“Including HPV genotypes in the model markedly improved the prediction ability, suggesting that this prediction model may be an important auxiliary tool in screening for and early diagnosis of cervical cancer in low-resource settings when cytological and colposcopic examination results are unavailable,” the study authors concluded.


Xiao T, Wang C, Yang M, et al. Use of virus genotypes in machine learning diagnostic prediction models for cervical cancer in women with high-risk human papillomavirus infection. JAMA Netw Open. 2023;6(8):e2326890. doi:10.1001/jamanetworkopen.2023.26890

Related Videos
Image Credit: © Anastasiia -
LGBTQIA+ pride -- Image credit: lazyllama |
Image Credit: © Анастасія Стягайло -
breast cancer treatment/Image Credit: © Siam -
Image Credit: © Dragana Gordic -
small cell lung cancer treatment/Image Credit: © CraftyImago -
lymphoma, OPC, ASCO 2024, hodgkin lymphoma
car t cell therapy multiple myeloma/Image Credit: © Lusi_mila -
© 2024 MJH Life Sciences

All rights reserved.