Unlocking the Potential of Machine Learning and Large Language Models in Oncology

Hippensteele,Alana;

A strength of using machine learning (ML) in oncology is its potential to extract data from unstructured documents, explained Will Shapiro, vice president of Data Science at Flatiron Health, during a session at the Association of Cancer Care Centers (ACCC) Annual Meeting & Cancer Center Business Summit (AMCCBS) in Washington DC. According to Shapiro, the ML team at Flatiron Health is focused on this endeavor in relation to oncology data and literature.

“There's a ton of really rich information that's only in unstructured documents,” Shapiro said during the session. “We build models to extract things like metastatic status or diagnosis state, which are often not captured in any kind of regular structured way.”

Shapiro explained further that more recently, his ML team has started working with large language models (LLMs). He noted this space has significant potential within health care.

“[At Flatiron Health] we built out a tool at the point of care that matches practice-authored regimens to NCCN guidelines,” Shapiro said. “That's something that we're really excited about.”

Notably, Shapiro explained that his background is in fact not in health care, as he worked for many years at Spotify, where he built personalized recommendation engines using artificial intelligence (AI) and ML.

“I really got excited about machine learning and AI in the context of building personalized recommendation engines [at Spotify],” Shapiro explained. “While personalizing music for a place like Spotify is radically different from personalizing medicine, I think there's actually some core things that really connect them, and I believe strongly that ML and AI have a key role to play in making truly personalized medicine a reality.”

Shapiro noted that terminology can pose challenges for professionals in health care as they begin to dive into terms that contain a wealth of knowledge based on decades of research and thousands of dissertations. Terms such as LLM, natural language processing (NLP), generative AI, AI, and ML each represent an abundance of information that have helped us understand their potential today. Specifically, Shapiro noted that this collection of terms is distinct from workflow automation, which is another term in the same field that is often grouped together. Shapiro noted that workflow automation is distinct from these other terms in that currently there are well-known ways in which we evaluate quality for workflow automation.

“With something like generative AI—which is, I think, one of the most hyped things out in the world right now—it's so new that there really aren't ways that we can think about quality,” Shapiro said. “That's why I think it's really important to get educated and understand what's going on [around these terms].”

According to Shapiro, a lot of these terms get used interchangeably, which can lead to additional confusion.

“I think that there's a good reason for that, which is that there's a lot of overlap,” Shapiro said. “The same algorithm can be a deep learning algorithm and an NLP algorithm, and a lot of the applications are also the same.”

Shapiro noted that one way of structuring these terms is to think of AI as a very broad category that encompasses ML, deep learning, and generative AI as nested subcategories. NLP, however, contains some differences.

“There is an enormous amount of overlap between NLP and AI. A lot of the major advances in ML and AI stemmed from questions from NLP. But then there are also parts of NLP that are really distinct. [For example,] rules-based methods of parsing text are not something that I will think about with AI, and I will caveat this by saying that this is contentious,” Shapiro said. “If you google this, there will be 20 different ways that people try to structure this. My guidance is to not get too bogged down in the labels, but really try to focus on what the algorithm is or the product is that you're trying to understand.”

According to Shapiro, one reason that oncologists should care about these terms is that ChatGPT, the most famous LLM currently in use today, is used by 1 in 10 doctors in their practice, according to a survey conducted over the summer of 2023. Shapiro noted that by the time of the presentation at the ACCC AMCCBS meeting in February 2024, that number has likely increased.

LLMs, which are large language models, are also a type of language model. According to Shapiro, the technical definition of a language model is a probability distribution over a sequence of words.

“So, basically, given a chunk of text, what is the probability that any word will follow the chunk that you're looking at,” Shapiro said. “LLMs are essentially language models that are trained on the internet, so they're enormous.”

According to Shapiro, language models can also be used to generate text. For instance, in the example “My best friend and I are so close, we finish each other's ___” it is not difficult for humans to finish this with the appropriate word in the blank, which in this case would be “sentences.” Shapiro explained that is very much how language models work.

“Probabilistically, ‘sentence’ is the missing word [in that example], which is very much at the core of what's happening with a language model,” Shapiro said. “In fact, autocomplete, which you probably don't even think about as you see it every day, is generative AI that's an example [of a language model], and it's one of the motivating examples of generative AI.”

To be clear in terms of definition, Shapiro noted that generative AI are AI models that generate new content. Specifically, the “GPT” in ChatGPT (which is both an LLM and generative AI) stands for generative pre-trained transformer. According to Shapiro, pre-trained models can be understood as having a foundational knowledge, which is in contrast to other kinds of models that just do one task.

“I mentioned my team works on building models that will extract metastatic status from documents, and that's all they do,” Shapiro said. “In contrast, pre-trained models can do a lot of different kinds of things. They can classify the sentiment of reviews, they can flag abusive messages, and they probably are going to write the next 10 Harry Potter novels. They can extract adverse events from charts, and they can also do things that extract metastatic status. So, that's a big part of the appeal—one model can do a lot of different things.”

However, this capacity of one model being capable of doing many different things can also have a trade off in terms of quality. Shapiro explained that that is something his team at Flatiron Health has found to be true in their work.

“What we've found at Flatiron Health is that generally, purpose-built models can be much better at actually predicting or doing one task. But one thing that's become really exciting, and kind of gets into the weeds of LLMs, is this concept of taking a pre-trained model and fine-tuning it on labeled examples, which is a way to really increase the performance of a pre-trained model.”

Further, the ‘T” in ChatGPT stands for “transformer,” which is a type of deep learning architecture that was developed at Google in 2017, explained Shapiro. It was originally described in a paper called “Attention is All You Need.”

“Transformers are actually kind of simple,” Shapiro said. “If you read about the history of deep learning, model architectures tended to get more and more complex, and the transformer actually stripped away a fair amount of this complexity. But what's been really game changing is how big they are, as they're trained on the internet. So things like Wikipedia, Reddit—these huge corpuses of text—have billions of grammars, and they're really, really expensive to train.”

Yet, the size of them is what has led to these incredible breakthroughs in performance and benchmarks that have caused quite a bit of buzz recently, explained Shapiro. With this buzz and attention raises the importance of becoming more educated in what these models are and how they work, especially in areas such as health care.

3 Key Takeaways

Large language models (LLMs) like ChatGPT have the potential to be valuable tools in oncology. They can be used to extract data from unstructured documents, summarize visit notes, predict patient response to treatment, and discover new drug targets.
There are challenges associated with using LLMs, such as hallucinations and bias. It is important to be aware of these challenges and to take steps to mitigate them, such as using high-quality data and carefully validating the models.
Healthcare professionals need to become more educated about AI and ML. This will help them to understand the potential benefits and risks of these technologies, and to use them safely and effectively.

“With 10% of doctors using ChatGPT, it is something that everyone really needs to get educated about pretty quickly. I also just think there are so many exciting ways that ML and AI have a role to play in the future of oncology,” Shapiro said.

Shapiro explained further that using these models, there is the potential in oncology to conduct research that is pulled from enormous patient populations, which can made available at scale. Additionally, there is the potential to summarize visit notes from audio recordings, to predict patient response to a treatment, and to discover new drug targets.

“There are huge opportunities in ML and AI, but there are also a lot of challenges and a lot of open questions. When you see someone like Sam Altman, the CEO of OpenAI, going to Congress and asking it to be regulated, you know that there's something to pay attention to,” Shapiro said. “That's because there's some real problems.”

Such problems include hallucinations, which consists of models inventing answers. Shapiro explained what makes hallucinations by AI models even more pernicious is that they come from a place of technological authority.

“There's an inherent inclination to trust them,” Shapiro said. “There's a lot of traditional considerations for any type of ML or AI algorithm around whether they are biased, whether they are perpetuating inequity, and whether data shifts affect their quality. For this reason, I think it's more important than ever to really think closely about how you're validating the quality of models. High quality ground truth data, I think, is essential for using any of these types of ML or AI algorithms.”

Shapiro W. Deep Dive 6. Artificial and Business Intelligence Technology. Presented at: ACCC AMCCBS; February 28-March 1, 2024; Washington, DC.

Unlocking the Potential of Machine Learning and Large Language Models in Oncology

3 Key Takeaways

Newsletter