Data-Driven Development: How Real-World Data and AI Are Transforming Clinical Trials


Real-world data and artificial intelligence (AI) are poised to revolutionize drug development by optimizing clinical trials and regulatory approvals, if data quality and patient privacy challenges are addressed.

Earlier this month, Dandelion Health announced the launch of the first artificial intelligence (AI) database specifically aimed at bolstering research for GLP-1 drugs, giving trial sponsors unprecedented insights into which indications to investigate next.1 The announcement from Dandelion is the latest example of a growing desire in health care analytics to use AI and machine learning (ML)–driven analysis of real-world data (RWD) to find new indications for approved drugs.

Beyond research and development for new indications and molecule structures, AI technology could be revolutionary for clinical trials, helping vastly reduce the cost of certain trial elements or even whole phases.2 A 2020 study found that a pivotal phase 3 trial across therapeutic areas costs an average of $19 million, compared to the average cost of RWD and AI analyses, which range in price from $150,000 to just over $1 million.3,4

Sponsors’ idealism around the technology, however, is hitting critical roadblocks. RWD-backed AI models in clinical trials are only effective if trained on high-quality data, which is difficult to find while maintaining patient privacy. However, with drug development costs soaring, solving the data shortage and further utilizing RWD is imperative to help build a more responsive and inclusive future for drug development.

The Revolutionary Potential of RWD and AI

Even though RWD has proven its worth with medical product development applications, such as trial design and patient population identification, the power of these large real-word datasets to establish clinical efficacy and drug safety has only come in the last few years.5

With methods such as natural language processing (NLP) and ML, both subdomains of AI, researchers are now able to comb through large patient data sets to predict future outcomes and generate real-world evidence (RWE). If AI-generated RWE is proven to be clinically valid, it could provide more inclusive and generalizable measures of treatment effectiveness. This is because RWD, pulling information from electronic health records, wearable devices, and more, can represent a far larger patient class than a traditional clinical trial.6

Early tests combining RWD and AI to evaluate the clinical significance of in-development drugs have realized entirely new, wide-ranging benefits. Researchers at Ohio State recently documented that the efficacy of an RWD-driven AI model was on par with a randomized clinical trial in determining effective treatment options.7 The findings could mark one of the most notable shifts in the drug development space in decades. If implemented properly, RWE could allow many more drug candidates to be tested before involving human patients, without ever having to worry about raising safety or efficacy concerns.

A Glimpse Into an RWE-Driven Future

A study from Deloitte found that companies who effectively harness RWD and AI in their drug development process could save up to 60% on drug development costs, while bringing their drugs to market 30% faster.8 This more efficient and cost-effective future for drug development is already on display.

Last year, drug developer Exscientia became the first to realize these findings in a clinical setting with EXS4318, a clinical candidate developed in under a year for in-human trials identified and developed by AI technology.9 Exscientia’s advances are far from the only pertinent innovations in the space, with Roche exploring how to leverage RWD in oncology alongside Novartis probing insights from patient data using pharma AI.10,11 In the coming years, examples like these will only continue to proliferate, but data quality issues may stand in the way of drug developers realizing their full potential.

AI data clinical trials clinical research

In the coming years, examples like these will only continue to proliferate, but data quality issues may stand in the way of drug developers realizing their full potential. Image Credit: © WS Studio 1985 -

Catching Data Quality Problems

Before pharmaceutical companies can fully implement AI-powered RWD models, they must first ensure they can isolate only the highest quality data for a study. If poor data is used to evaluate a drug’s efficacy, studies show the resulting analysis could be faulty.12 Even if the data is correct but certain values are missing, a correctly specified model can still struggle.13

However, a solution may be on the horizon. Machine learning technologies are growing more capable by the day at not only analyzing large amounts of information, but also in flagging variable outliers and inputing missing values. New algorithms can proactively identify variable outliers faster than a manual review, technology that has the potential to revolutionize not only RWE, but randomized clinical trials as well.14,15

However, ML will never be perfect in eliminating data quality issues, although they could help proactively identify issues in datasets before they are used for patient-centric studies, reducing wasted resources on simulations that return flawed results.

Unpacking the Regulatory Implications

Concurrently, regulatory bodies are looking to ensure a more equitable, transparent, and cost-effective pathway towards drug approval. Earlier this year, the EU announced the European Health Data Space,a first-of-its-kind initiative that seeks to simplify patient consent for data collection, while enabling unprecedented visibility into how these data are subsequently used.16 If more countries follow suit, it could lead to even better outcomes for leveraging RWD in drug development.

In the United States, the FDA is beginning to investigate further incorporating RWD into the regulatory process. Although randomized clinical trials are still undoubtedly the best measure of a drug’s efficacy, the agency is increasingly open to using RWD to provide supplementary evidence. The agency recently published updated guidelines around how it will determine if RWD is sufficient enough to be used in certain regulatory decisions (such as medical devices) and incorporated how it plans to evaluate submissions including or entirely reliant on RWD into its financial year 2024 to 2027 Information Technology Strategy.5,17 Most promising of all, the agency also established the CDER Center for Clinical Trial Innovation, which is a new initiative dedicated to improving the efficiency of drug development through innovation.18

Converting Disparate Data Into Trusted Results

To date, an astounding amount of RWD has been collected (a recent study pegged the full size of health care data at a staggering 2314 exabytes).19 Analyzing pertinent sections of these data could give regulators an unprecedented depth of knowledge into a drug’s full potential, underscoring the importance of further investigation of and investment in AI-driven RWE for drug development.

As the methodological rigor applied to RWD and the acceptance of its results continues to grow, it will turn a mountain of patient data into a treasure trove of new evidence for drug development.

About the Author

Mike Munsell, PhD, is director of research at Panalgo, where he manages the research agenda for scientific dissemination and software development in a variety of fields including health economics, data science/machine learning, and epidemiology. Before becoming director of research, Mike was a data scientist at Panalgo, working with engineering teams to prototype, code, and validate new machine learning models and features for Panalgo’s IHD Data Science platform. He has over 10 years of experience as a health economist and data scientist and is a published thought leader in the space.


  1. Beaney A. AI database to bolster research for GLP1-RAs as precision medicines. Clinical Trials Arena. Published May 14, 2024. Accessed June 12, 2024.
  2. AlphaFold. Google DeepMind. Accessed June 12, 2024.
  3. Cost of Clinical Trials For New Drug FDA Approval Are Fraction of Total Tab. Johns Hopkins Bloomberg School of Public Health. September 24, 2018. Accessed June 12, 2024.
  4. Dagenais S, Russo L, Madsen A, Webster J, Becnel L. Use of Real-World Evidence to Drive Drug Development Strategy and Inform Clinical Trial Design. Clin Pharmacol Ther. 2022;111:77-89. doi:10.1002/cpt.2480
  5. Califf A. Realizing the Promise of Real-World Evidence. FDA. December 21, 2023. Accessed June 12, 2024.
  6. Real-World Evidence. FDA. February 5, 2023. Accessed June 12, 2024.
  7. Compass. The Ohio State University. Accessed June 12, 2024.
  8. Overcoming generative AI implementation blind spots in health care. Deloitte Insights. Accessed June 12, 2024.
  9. Staff GEN. BMS Collaboration Paying Off for Exscientia. GEN - Genetic Engineering and Biotechnology News. February 7, 2023. Accessed June 12, 2024.
  10. Keenan J. Roche’s Flatiron expands oncology work to UK with collaboration deal. Fierce Biotech. June 21, 2023. Accessed June 12, 2024.
  11. AWS and Novartis: Re-inventing pharma manufacturing. Amazon Web Services. December 4, 2019. Accessed June 12, 2024.
  12. Kilkenny MF, Robinson KM. Data quality: “Garbage in – garbage out”. Health Information Management Journal. 2018;47(3):103-105. doi:10.1177/1833358318774357
  13. Rogers JR, Lee J, Zhou Z, Cheung YK, Hripcsak G, Weng C. Contemporary use of real-world data for clinical trial conduct in the United States: a scoping review. J Am Med Inform Assoc. 2021;28(1):144-154. doi:10.1093/jamia/ocaa224
  14. Khan J, Luqman S. Machine Learning Approaches for Data Quality and Integrity in Clinical Trials. 2023. Accessed June 12, 2024.
  15. Weissler EH, Naumann T, Andersson T, et al. The role of machine learning in clinical research: transforming the future of evidence generation. Trials. 2021;22(537). doi:10.1186/s13063-021-05489-x
  16. European Commission. European Health Data Space. Accessed June 12, 2024.
  17. FDA. Use of Real-World Evidence to Support Regulatory Decision-Making for Medical Devices. FDA. December 22, 2023. Accessed June 12, 2024.
  18. FDA establishes CDER Center for Clinical Trial Innovation (C3TI). FDA. 2024. Accessed June 12, 2024.
  19. How AI and Real-World Data (RWD) are Reshaping Pharma Medical Affairs - Eularis. September 25, 2023. Accessed June 12, 2024.
Recent Videos
Image credit:  Gorodenkoff |
Sun Screen, Photosensitivity, Pharmacy | Image Credit: sosiukin -
Catalyst Trial, Diabetes, Hypertension | Image Credit: grinny -
Image Credit: © Anastasiia -
Various healthy foods -- Image credit: New Africa |
LGBTQIA+ pride -- Image credit: lazyllama |