Business

How we're unwittingly letting robots censor the Web

Posted April 02, 2016 7:41 am

Caitlin Dewey Washington Post

YouTube is infamously circumspect about these sorts of things, so Mike Michaud can't prove his suspicions for certain. But the chief executive of the Web entertainment empire Channel Awesome is pretty sure that, earlier this year, his eight-year-old company was almost killed by algorithm.

Channel Awesome, which was founded in 2008, produces a number of podcasts and Web series about music, movies, TV shows and video games. The company sells DVDs and publishes on platforms like Vessel, but it makes most of its money through YouTube advertising.

On Jan. 5, Michaud received notice that YouTube would no longer pay Channel Awesome the ad revenue it relies on to pay studio rent, production costs and the salaries of its five employees. The reason, as explained via automated email, was a single copyright complaint by the Japanese animation house Studio Ghibli.

The complaint was clearly flawed, Michaud and his lawyer agreed: It disputed Channel Awesome's right to use short clips of Studio Ghibli animation in a review of a film, something generally regarded as fair use - an important exception to copyright law that lets people excerpt protected material for parody, commentary or criticism. If someone at YouTube just took a moment to review it, Michaud earnestly believed, they would see the error, too.

Michaud tried to file a copyright counterclaim, but the website was broken and returned repeat error messages. He sent emails and certified letters to YouTube but only got form letters back.

Michaud didn't realize it at the time, but he had just stumbled upon the open secret of the modern online copyright system: When it comes to matters of leaving content up or taking it down, algorithms frequently make the most crucial decisions. That can leave everyday users scrambling to save their work and business.

"I understand that YouTube isn't in the position to monitor every copyright complaint it gets," said Michaud, who - after a month without revenue - went public with his story. "But once they're considering an action that will hurt someone's livelihood, that should be monitored by a person."

Now more than ever before, however, that's simply not the case. In fact, a new report on the effectiveness of online takedown procedures finds that, particularly at large companies, decisions about whether content stays up or disappears are increasingly being made entirely by software - with room for serious error.

During a single six-month period, the reports finds, the companies tracked by the Lumen copyright database received 30.1 million obviously questionable takedown requests would have "benefit(ted) from human review." Of those, 8 million takedown requests targeted content that, like Michaud's, arguably constituted fair use.

The report is the culmination of two years of research and the first truly comprehensive review of online takedown procedures since the current policy regime came into effect in the late 1990s. It's the product of three discrete studies, two of them drawing largely from Google, as well as dozens of interviews with copyright-holders, takedown enforcement organizations and online platforms.

"The sites that get a lot of takedown requests have very sophisticated processes in place to handle them," said Joe Karaganis, the vice president of Columbia University's American Assembly and the co-author, with University of California at Berkeley's Jennifer Urban and Brianna Schofield, of the new report. "Still, millions of errors go through. ... And we should weigh the limitations on speech as the system spins further out of control."

No one, least of all Karaganis, Urban and Schofield, dispute that some degree of automation is necessary. Since the Digital Millennium Copyright Act was passed in 1998, the amount of content on the Internet has increased exponentially: There are no longer enough humans to manually process either the volume of copyright complaints or the amount of piracy.

Thanks to a recent proliferation of illegal (often foreign) streaming video and direct download sites, as well as the continued growth of peer-to-peer networks and increased Internet speeds, virtually every major movie, TV show, album, software program and e-book is available illegally as soon as it's released. A widely circulated 2013 report, which found that a quarter of all Internet bandwidth is devoted to illegal piracy, underscored the enormous, superhuman scale of the pirate economy. In all of 2009, Google received 4,000 takedown notices; they now register 19 million per week.

"In an automated world, you can't compete manually," said Nate Glass, the outspoken owner of the porn-focused copyright firm Takedown Piracy. "You could never keep up that way. It's crazy."

And yet, as people on both sides of the process adopt more and more sophisticated automation, it becomes less clear - both to users and concerned observers - whether a human ever reviews takedown requests before punitive action is taken.

On the complaint side, major rights-enforcement organizations, like Takedown Piracy, MarkMonitor and DMCA Force, employ search spiders to crawl thousands of Web pages, searching for keywords or movie stills that match their client's intellectual property. Once compiled, the links funnel to a database where they can hypothetically be triaged and escalated by human moderators.

But the degree of human moderation varies: Even at the largest and best-reputed REOs, most decisions about takedown notices are defaulted to the algorithm, the new report finds.

A senior manager at one leading firm, who agreed to speak to The Washington Post on condition of anonymity because he has received death threats over his work, said that while his company stresses accuracy and fairness, it's impossible for seven employees to vet each of the 90,000 links their search spider finds each day. Instead, the algorithm classifies each link as questionable, probable or definite infringement, and humans only review the questionable ones before sending packets of takedown requests to social networks, search engines, file-hosting sites and other online platforms.

Meanwhile, on the platform's end, copyright law requires sites to process each and every complaint they receive, and to do it "expeditiously" - or risk serious, and quite expensive, legal liability. Karaganis, Urban and Schofield find that the vast majority of all sites still use humans to do this sort of processing. But among large social networks, search engines and file-hosts - in other words, the ones that get the most complaints - it's increasingly common to rely on automated triage-and-escalation systems, similar to the ones used by third-party rights-enforcement organizations.

YouTube says, for instance, that it uses a mix of "algorithmic and human review" to assess takedown requests. (A spokesperson declined to elaborate on at what point in the process, and how often, humans step in.) Through a separate, proprietary system called Content ID, which goes above and beyond current copyright law, YouTube also automatically scans new uploads against a database of copyrighted material, which allows rights-holders to block or monetize matches.

YouTube's parent company, Google, also relies to some degree on algorithmic review, and processes takedown requests from certain "trusted" reporters with no human review at all. Glass, who belongs to Google's Trusted Copyright Removal Program, says that Google delists roughly 97 percent of the links that he reports within seconds.

Social networks tend to be more secretive about their copyright procedures: None of the major platforms agreed to discuss automation with The Post. But high-profile mistakes made in the past suggest that many of them also employ some sort of automated copyright system. In October, for instance, Instagram unfairly suspended the accounts of dozens of users in response to an overzealous copyright complaint by Janet Jackson's management; later, the company blamed the suspensions on a "bug," and declined to answer questions on whether human moderators had ever reviewed them.

Meanwhile, Twitter manually processes copyright complaints in the order that they're received, and does not take any action without human review. But according to the site's latest transparency report, takedown notices spiked 89 percent between July and December 2015 - which means that, even if Twitter manually vets complaints now, it might not always have the luxury.

"Bots are much more likely to make mistakes I think," said Ernesto Van der Sar, the editor-in-chief of the piracy news site TorrentFreak and a long-term observer of this space. "We have spotted dozens of errors that humans could have easily avoided. Some automated systems are set up pretty badly, allowing for very simple mistakes."

Despite the margin of error, most major players seem to be trending away from human review. The next frontier in the online copyright wars is automated filtering: Many rights-holders have pressed for tools that, like YouTube's Content ID, could automatically identify protected content and prevent it from ever publishing. They've also pushed for "staydown" measures that would keep content from being reposted once it's been removed, a major complaint with the current system.

"The best way to minimize the cost of sending and responding to so many notices of infringement is to use automated techniques," wrote the Information Technology and Innovation Foundation, a tech policy think tank, in a recent statement to the Copyright Office on the notice-and-takedown issue. "In particular, online service providers can use automated filtering systems that check content as it is uploaded to stop a user from reposting infringing content."

That alarms both small platforms that can't afford filtering technology and Internet rights groups who fear the silencing of legitimate speech and creative activity. Just last September, they point out, a landmark copyright case affirmed that copyright-holders must consider fair use before sending takedown notifications. While it's unclear how that would work in practice, it doesn't seem consistent with filtering or other types of unmoderated automation.

"No computer program can make accurate calls on what's free speech and what's not," said Mitch Stoltz, a senior staff attorney at the Electronic Frontier Foundation. "You can't encode the Supreme Court into a computer program."

Michaud and his colleagues at Channel Awesome agree - and now, they're going on the offensive. On Feb 16, one of Channel Awesome's flagship shows, "Nostalgia Critic," published a 20-minute rant about fair use and YouTube that's since been viewed more than 1.5 million times and earned a tweet from YouTube chief executive Susan Wojcicki. They plan to announce some kind of initiative in the near future that will further the cause of fair use and notice-and-takedown review online.

"Fair use creates content, and content creates jobs," said Michaud, who has had to delay hiring a sixth employee because of the company's January loss. "But nobody's watching. Nobody's looking out."

Until someone does watch, Michaud and Channel Awesome aren't taking any chances: They've voluntarily deleted all of their reviews of Studio Ghibli films, even though they think they have a valid fair-use defense. It's grating, Michaud acknowledges, but the company already lost a full month's revenue. Who knows how much damage another bad copyright claim would do.

Article Comments

Article Comments

Guidelines: Keep it civil and on topic; no profanity, vulgarity, slurs or personal attacks. People who harass others or joke about tragedies will be blocked. If a comment violates these standards or our terms of service, click the "flag" link in the lower-right corner of the comment box. To find our more, read our FAQ.