Kate H. Gamble, Senior Editor
Twitter, one of the most popular social networking services, is used by millions to provide news updates, promote everything from blogs to movies, and stay connected to friends. But now, research suggests that it has potential to serve a much bigger purpose.
In a new study
published in the journal PLos Computational Biology
, Marcel Salathé, PhD, of Penn State University conducted a two-year analysis to determine how social media can affect the spread of a disease. He was able to identify immunization patterns and track attitudes on vaccination according to factors such as geographic area.
For the study, which is the first of its kind to examine how social media sites affect and reflect disease networks, Dr. Salathé tracked how the users’ attitudes correlated with vaccination rates and how microbloggers with the same negative or positive feelings seemed to influence others in their social circles.
He chose to focus on Twitter for two reasons. First, unlike the contents of Facebook, Twitter messages are considered public data and anyone can “follow” or track the tweets of anyone else. Second, he believes that the 140-character maximum makes Twitter the perfect database for learning about people’s sentiments.
Dr. Salathé began by amassing 477,768 tweets with vaccination-related keywords and phrases, then tracked users’ sentiments about a particular new vaccine for combating H1N1. The collection process began in August 2009, when news of the new vaccine first was made public, and continued through January 2010.
In sorting through the vaccinated-related tweets, Salathé partitioned a random subset of about 10% and asked Penn State students to rate them as positive, negative, neutral or irrelevant. For example, a tweet expressing a desire to get the H1N1 vaccine would be considered positive, while a tweet expressing the belief that the vaccine causes harm would be considered negative.
Shashank Khandelwal, a computer programmer and analyst at Penn State and co-author of the paper, used the ratings to design a computer algorithm responsible for cataloging the remaining 90% of the tweets according to the sentiments they expressed.
Because Twitter users often include a location in their profiles, Dr. Salathé was able to categorize the expressed sentiments by US region. Also, using data from the CDC, he was able to determine how vaccination attitudes correlated with CDC-estimated vaccination rates. Using these data, Dr. Salathé was able to find definite patterns. For example, the highest positive-sentiment users were from New England, the region that also had the highest H1N1 vaccination rate.
“These results could be used strategically to develop public-health initiatives,” Salathé said in a statement
. “For example, targeted campaigns could be designed according to which region needs more prevention education. Such data also could be used to predict how many doses of a vaccine will be required in a particular area.”
Dr. Salathé was also able to determine clusters of like-minded Twitter users. He found that users with either negative or positive sentiments about the H1N1 vaccine followed like-minded people. “The public-health message here is obvious,” he said. “If anti-vaccination communities cluster in real, geographical space, as well, then this is likely to lead to under-vaccinated communities that are at great risk of local outbreaks.”
In addition to location-related and network patterns, Dr. Salathé was able to track sentiment patterns over time. For example, he found that negative expressions spiked during the time period when the vaccine was first announced. Later, more-positive sentiments emerged when the vaccine was first shipped across the United States. Salathé also tracked spikes of negative tweets that corresponded, not surprisingly, to periods of vaccine recall.
Salathé plans to use his unique social-media analysis to study other diseases, such as obesity, hypertension, and heart disease.
“Lifestyle choices might be 'picked up' in much the same way that pathogens—viruses or bacteria—are acquired,” he said. “The difference is simply that in the one instance the infectious agent is an idea rather than a biological entity.”