You might think you’re pretty good at making sure you don’t share your internet life with the entire world. You use Facebook’s strictest privacy settings, don’t share anything sensitive on Twitter, and you regularly trash your laptop’s browsing history. All good, right? Nope. All that “anonymized” data you leave behind out in the ether is still totally you, and it’s far easier than you think to make it paint your picture and yours alone.Journalist Svea Eckert and data scientist Andreas Dewes, both from Germany, wanted to find out just how easy it was to acquire and identify your web browsing history. And so, as The Guardian reports, they did just that.The pair presented their findings recently at the annual Def Con hacker conference in Las Vegas.
Acquiring the data
The two were easily able to acquire a database holding more than 3 billion visited web addresses. That data, in turn, comprised about 9 million unique sites visited by roughly 3 million users, all in Germany.
The data clearly showed the light users — those who visited only a few dozen sites over a 30-day span — from the heavy users, those who had tens of thousands of data points sitting there to be examined.
Eckert and Dewes didn’t even have to pay for the data access, they said. What they did do was create a fake marketing company: They launched a website for the company and a LinkedIn page for its fake CEO.
That fake marketing company claimed to have developed a machine learning algorithm that could improve marketers’ tactics… but only if it was trained with a large amount of data. (This is common: machine learning depends on finding and exploiting patterns, and you need a whole crapton of data points in order to identify those patterns in a meaningful way.). In short, they used the fake company to go begging. “We wrote and called nearly a hundred companies, and asked if we could have the raw data, the clickstream from people’s lives,” Eckert said. It took longer than they expected — but only because they were specifically targeting Germany. “We often heard: ‘Browsing data? That’s no problem. But we don’t have it for Germany, we only have it for the US and UK,’” she said.
Eventually, one data broker was willing to help them “test” their “data platform,” and parted with the data trove for free.