Data's going dark
The under-the-radar crackdown on more information that once held tech power to account.
As people who help whistleblowers (and workers who don’t want to become whistleblowers) bring information about the tech and AI industries to light, why are we writing about data? Well, because even as companies are battening down the hatches internally to make it harder for employees to speak out, there’s another kind of transparency that’s disappearing fast—the data transparency that once let the public see how the digital world really works and what effects algorithms and other influences are having in our social and political realms.
Major social platforms determine how information circulates in our public square, deciding which voices are amplified or buried in the scroll. For years, researchers, journalists, and watchdogs could study these dynamics through open data access, gaining real insight into how information and misinformation flowed online. That window is now closing.
Public data has fueled some pretty big discoveries about our culture and politics, and led to some pretty earth-shattering (and important!) insights. During the 2016 U.S. election, open Facebook and Twitter data enabled researchers to analyze how Trump and Biden campaigns engaged voters. A report from the Berkman Klein Center examined media coverage during the 2016 presidential campaign and revealed how right-wing sources shaped mainstream press coverage, a full analysis made possible by leveraging Twitter data. The same kind of data later helped expose the scale of COVID-19 misinformation and shape policy debates on platform accountability. There’s also other cool uses of non-API public data, too. Take the Authoritarian Stack project, for example, which used open datasets to study the spread of authoritarian movements.
A lot of the data that gives us the insights above has relied on Application Programming Interfaces (APIs), which were tools that journalists and academics could use to study patterns of speech, virality, and influence. Originally designed to let developers build features on top of social platforms, APIs became a vital civic instrument, offering a rare glimpse into the mechanics of the public square and what trends were emerging from it.
Then came the AI boom, and with it, a new data gold mine.
As data-hungry AI models and commercial brokers vacuum up the web, platforms realized their data is another opportunity to make money. Companies including X, Meta, Reddit, and TikTok have now sharply curtailed or paywalled access to APIs once freely used for research.
When Elon Musk bought Twitter, X ended free API access and began charging thousands of dollars for data, essentially outpricing the academics and journalists who relied on this information for research purposes. In 2024, Meta killed CrowdTangle, once the go-to tool for social media researchers monitoring misinformation, elections, and extremist networks. Its replacement, the Meta Content Library, is widely criticized as less transparent and incomplete. Reddit has imposed new limits, and TikTok now forces researchers into a highly restrictive “Virtual Compute Environment.”
It’s a striking double standard. Platforms freely mine our user data for advertising, sell it to data brokers, and feed it into AI models—yet they now block civil society from openly accessing even aggregate information about how these systems function or what is happening on their platforms. We are left with an information black box that conceals how we connect, communicate, and consume online.
The stakes are getting higher, too. Researchers investigating hate speech or election manipulation now risk legal threats simply for studying “public” data. Meanwhile, the insiders who might sound the alarm from within face similar barriers and legal threats.
Tech and AI workers have become one of the last possible lines of accountability, but at enormous personal cost. Whistleblowers like Frances Haugen revealed internal documents that reshaped the public’s understanding of social media’s harms. Meta’s response? Tighten internal access further, creating even greater obstacles for internal staff to share critical information with the public.
If platforms define politics—as every election cycle keeps reminding us—then understanding those platforms requires meaningful, independent access to data.
Gatekeeping data and punishing those who reveal it may be a cash grab, but it also allows companies to cover up harms and trends that might not make their platforms look great. Without visibility into how the information ecosystem functions, it’s a lot harder to hold power to account. We need to demand both: external transparency and internal protections for the people who make it possible.



