
Integrating Microbiome and Genomics With AI Uncovers New Patterns in Early-Onset Colorectal Cancer
Enrique Velazquez Villarreal, MD, PhD, used a conversational AI platform to unify microbiome, genomic, clinical, and social determinants of health data for colorectal cancer research.
Pharmacy Times® interviewed Enrique Velazquez Villarreal, MD, PhD, on his team’s use of a conversational AI platform developed at City of Hope to integrate microbiome, genomic, clinical, and social determinants of health data into a unified framework for colorectal cancer research.
He explained how this multi-domain approach enables real-time cohort stratification and reveals that early-onset colorectal cancer is biologically distinct from late-onset disease, with unique microbial and genetic signatures that may support future noninvasive detection and precision prevention strategies.
Pharmacy Times: Can you walk us through how you integrated AI into the analysis of gut microbiome data, and what specific machine learning or deep learning approaches were most effective for distinguishing early- versus late-onset colorectal cancer?
Enrique Velazquez Villarreal, MD, PhD: We use our conversational AI platform, AI hope developed in my lab here at City of Hope, to integrate microbiome, genomic, clinical and social determinants of health data into a unified analytical framework. Rather than relying on a single machine learning model, we use AI to interactively query and refine patterns across data sets, like essentially enabling dynamic cohort stratification and hypothesis testing in real time.
So for distinguishing early versus late-onset colorectal cancer, the most effective approach using this multi-domain integration is the main point. So using a multi-domain integration, where microbiome features were analyzed alongside genomic alterations and clinical variables, these allow us to identify patterns that will not emerge from microbiome data alone.
Villarreal: Yeah, so we focus on populations at increased risk based on factors such as younger age, diagnosis, genetic ancestry, particularly Hispanic Latino populations from the LA area, and clinical and social determinants of health. So to address bias, we use like a harmonized framework that integrates data from multiple sources, including our NIH cancer moonshot network and public datasets like AACR Genie, importantly, the AI platforms allow us to stratify analysis by ancestry. Divide this into 5 superpopulations of pockets of genes.
According to the 1000 Genomes Project, we divide our ancestry or genetic similarity or genomic similarity into 5 groups. So one is AMR [antimicrobial resistance], so genes that come from this continent, then the Europe-Africa component, and South Asia and East Asia. So, we have the ability, by using our conversational tool, to include social determinants of health variables. So rather than averaging like across a population, which helps to preserve biology and biologically and socially meaningful differences.
Pharmacy Times: Social determinants of health are often difficult to quantify—how did you operationalize variables like socioeconomic status, diet, or access to care within your AI model, and how much predictive weight did they carry relative to genomic or microbiome features?
Villarreal: We operationally open SDOH using measurable variables such as education level, like body mass index and other available clinical and demographic proxies. So while these are not like perfect representations of complex social factors, they provide a starting point for integration into computational models.
So in terms of contribution, SDOH variables did not act independently, so they interact with microbiome and genomic features. So in several analyses, they help to explain, like, variability in microbial composition, suggesting that social context plays a meaningful role alongside biology.
Pharmacy Times: Early-onset colorectal cancer is a growing concern — did your AI model identify distinct microbial signatures or genomic patterns that differentiate it mechanistically or clinically from late-onset disease?
Villarreal: So one of the key finding was that early onset Colorectal cancer is associated with a less diverse microbe along with distinct composition and compositional changes, such as an enrichment of taxolibacter prevotella. So more importantly, these microbial patterns aligned with the specific genomic alterations, including mutations in key colorectal cancer genes like APC, DP 53 and KRAS. So these suggest that early-onset disease is not just a younger version of late-onset cancer but may represent a biologically distinct entity with different underlying mechanisms.
Pharmacy Times: Gut microbiome data is highly variable across collection methods, sequencing platforms, and cohorts—how did you handle batch effects and ensure the reproducibility and generalizability of your findings?
Villarreal: This is a critical issue in microbiome research. We address it by focusing on relative abundance patterns across cohort consistency, rather than relying on a single cohort’s signals. So additionally, integrating microbiome data with genomic and clinical features helps us to identify robust biological, consistent signals that persist across data sets. So the AI Framework also allows interactive validation, essentially testing whether. Observe patterns hold under different stratifications.
Pharmacy Times: What are the translational implications of this work — do you envision your AI-integrated framework being used as a screening or risk-stratification tool in clinical practice, and what validation steps would be needed before deployment?
Villarreal: The long-term goal is to develop AI driven tools for early detection and risk stratification, or finders suggest that microbiome signatures, especially when combined with genomic and clinical data, could serve as noninvasive biomarkers for identifying individuals at high risk for early-onset colorectal cancer. So before clinical implementation, we need like larger prospective validation cohorts, like a standardization of microbiome data collection and analysis and development of predictive models tested in real-world clinical settings.
So ultimately, this approach moves us toward like precision prevention, where we can identify at-risk individuals earlier and intervene more effectively. I will say that by integrating microbiome science with highly autonomous artificial intelligence developed from the ground up by our team of physician scientists, genomic experts and computational specialists at the Velasquez VRL lab and real-world patient data here at City of Hope, we are beginning to uncover why Colorectal cancer is rising in younger populations while advancing our primary mission of developing early cancer detection tools to improve prevention and patient outcomes.




























































































































