The Plain ViewLast weekend I attended an event called Social Science Foo Camp, an “unconference” where attendees spontaneously schedule discussion sessions to create a lively agenda. The venue was Facebook’s headquarters in Menlo Park, California. One of the more interesting sessions I attended concerned a project called Social Science One.Social Science One is an effort to get the Holy Grail of data sets into the hands of private researchers. That Holy Grail is Facebook data. Yep, that same unthinkably massive trove that brought us Cambridge Analytica.In the Foo Camp session, Stanford Law School’s Nate Persily, cohead of Social Science One, said that after 20 months of negotiations, Facebook was finally releasing the data to researchers. (The researchers had thought all of that would be settled in two months.) A Facebook data scientist who worked on the team dedicated to this project beamed in confirmation. Indeed, the official announcement came a few days later.
It’s an unprecedented drop, involving a data set of 10 trillion numbers. The information centers on URLs shared by Facebook’s billions of users—specifically, the 38 million of these that were shared more than 100 times on Facebook between January 1, 2017, and July 31, 2019. Researchers can isolate URLs by characteristics like whether they were fact-checked or flagged as hate speech, and they can see (in the aggregate) who viewed them, liked them, shared them, or even whether they shared the links without viewing them. “This dataset enables social scientists to study some of the most important questions of our time about the effects of social media on democracy and elections with information to which they have never before had access,” reads the Social Science One press release.The reason it took so long is that Facebook, quite understandably, wanted to protect the privacy of its users. Simply aggregating the information so that no individual’s activity can be identified wasn’t enough for Facebook, which insisted on also encoding the data via a technology called differential privacy. It’s a great way to protect privacy, but because it works by adding digital noise to the data set to prevent exposure of individuals, the technique limits what research can be done. The Social Science One people think Facebook is excessively cautious. “But I didn’t just get a $5 billion fine from the FTC,” acknowledges Persily, referring to the penalty assessed on Facebook last summer for its privacy sins.