If you catch yourself typing “my head hurts” into Google, give yourself a little applause: You’re doing a public health service.
One Harvard team created an algorithm that combines Twitter posts, Google searches, and location data to predict flare-ups with 14+ days notice.
Another model from University College London centers on people’s loss of smell, one of the most distinctive COVID-19 symptoms. It watches for a jump in search phrases like “I can’t smell” and “lost my sense of smell.”
So how good are these things at predicting disease? The initial evidence seems promising. “I can’t smell” searches correlated strongly with the peak of infection in Spain, Brazil, Italy, and the UK.
We also already use disease-monitoring systems like HealthMap and ProMED, which scrape the web for a jump in certain keywords to find new diseases.
ProMED first got wind of COVID-19 at the end of December after an uptick in words like “SARS,” “shortness of breath,” and “diarrhea” in social media posts in China.
In 2008, Google researchers built Google Flu Trends, which analyzed the use of 45 search terms to predict flu outbreaks. News outlets dubbed it a failure when Google Flu Trends missed significant case peaks over several years.
But as Alexis Madrigal pointed out, Google Flu Trends wasn’t such a disaster. When you combined Google data with the CDC’s own monitoring system, the combo was state-of-the-art.