It's 2015—when we feel sick, fear disease, or have questions about our health, we turn first to the internet. According to the Pew Internet Project, 72 percent of US internet users look up health-related information online. But an astonishing number of the pages we visit to learn about private health concerns—confidentially, we assume—are tracking our queries, sending the sensitive data to third party corporations, even shipping the information directly to the same brokers who monitor our credit scores. It's happening for profit, for an "improved user experience," and because developers have flocked to "free" plugins and tools provided by data-vacuuming companies.
In April 2014, Tim Libert, a researcher at the University of Pennsylvania, custom-built software called webXray to analyze the top 50 search results for nearly 2,000 common diseases (over 80,000 pages total). He found the results startling: a full 91 percent of the pages made what are known as third-party requests to outside companies. That means when you search for "cold sores," for instance, and click the highly ranked "Cold Sores Topic Overview WebMD" link, the website is passing your request for information about the disease along to one or more (and often many, many more) other corporations.
According to Libert's research, which is published in the the Communications of the ACM, about 70 percent of the time, the data transmitted "contained information exposing specific conditions, treatments, and diseases." That, he says, is "potentially putting user privacy at risk." And it means you'll probably want to think twice before looking up medical information on the internet.
Here's what's happening in a bit greater detail: Let's say you make a search for "herpes." Plugging that query into a search engine will return a list of results. Chances are, whatever site you choose to click on next will send information not just to the server of the intended site—say, the Centers for Disease Control, which maintains the top search result from Google—but to companies that own the elements installed on the page. Here's why.
When you click that CDC link, you're making a so-called "first party request." That request goes to the CDC's servers, and it returns the HTML file with the page you're looking for. In this case, it's "Genital Herpes - CDC Factsheet," which is perhaps the page on the internet you'd least want anyone to know you're looking at. But because the CDC has installed Google Analytics to measure its traffic stats, and has, for some reason, included AddThis code which allows Facebook and Twitter sharing (beckoning the question of who socializes disease pages), the CDC also sends a third party request to each of those companies. That request looks something like this—http://www.cdc.gov/std/herpes/STDFact-Herpes.htm—and makes explicit to those third party corporations in its HTTP referrer string that your search was about herpes.
Thus, Libert has discovered that the vast majority of health sites, from the for-profit WebMD.com to the government-run CDC.gov, are loaded with tracking elements that are sending records of your health inquiries to the likes of web giants like Google, Facebook, and Pinterest, and data brokers like Experian and Acxiom.
From there, it becomes relatively easy for the companies receiving the requests, many of which are collecting other kinds of data (in cookies, say) about your browsing as well, to identify you and your illness. That URL, or URI, which very clearly contains the disease being searched for, is broadcast to Google, Twitter, and Facebook, along with your computer's IP address and other identifying information.
"The underlying significance of the 91 percent figure is that this is utterly endemic across all types of sites," Libert told me, "this isn't just commercial sites who need to turn a profit, these are organizations you trust: the government, non-profits, universities."