US court stated scraping, even when against TOS, is legal

Last month a legal case took place in a US court where four professors plus a media organization sued the US Government. The District Court for the District of Columbia conclusion stated that moderate scraping, even when against ToS, is legal.

A district court in Washington, D.C. has ruled that using automated tools to access publicly available information on the open web is not a computer crime — even when a website bans automated access in its terms of service (document). The court ruled that the notoriously vague and outdated Computer Fraud and Abuse Act (CFAA) — a 1986 statute meant to target malicious computer break-ins — does not make it a crime to access information in a manner that the website doesn’t like if you are otherwise entitled to access that same information.

Before we put down all the legal statement fine details, we better give you a conclusion derived from the law case.
If you do not read all the legal statement fine points, jump right to the conclusion.

The US court of appealshas affirmed that a certain [data] analytic company is lawful to scrape data aggregator’s (LinkedIn’s) public profiles info. The court has protected the data extractor’s right for mass gathering openly presented business directory information.

Some details of the court legal statement

Proprietary property

Private property, the Court determined, does not “lose its private character merely because the public is generally invited to use it for designated purposes.” Why, then, would it violate the First Amendment* to arrest those who engage in expressive activity on a privately owned website against the owner’s wishes?

*First Amendment includes in itself:

respecting an establishment of religion
the free exercise of religion
the freedom of speech
the freedom of the press (that is also related to the data access in the internet)
the right of the people to peaceably assemble, and to petition the government for a redress of grievances

The court considered a comparison of the internet as a whole to the public space:

Stroll out onto the National Mall (web space, namely public Internet) on any day with decent weather and you will discover a phalanx of food trucks lining the streets. Those food trucks are privately owned businesses (private website). Customers (web surfers, web bots driven by human developed scripts) interact with them for the private purpose of buying a meal. If they were a brick-and-mortar store on private property, they would encounter no First Amendment barrier to removing a patron who created a ruckus [or another fraud]. Yet if a customer standing on a public sidewalk tastes her food and then yells at those in line behind her that they should avail themselves of the myriad other culinary options nearby, the truck could not call the police to arrest her for her comments. She is in a public forum, and her speech remains protected even when she interacts with a private business located within that forum.

Limitation on website content access (mass extraction, for example):

But simply placing contractual conditions on accounts that anyone can create, as social media and many other sites do, does not remove a website from the First Amendment protections of the public Internet.

Three categories of First Amendment protected activity of an Internet user

That plainly means that a web user is protected by the law (First Amendment, namely freedom of press and speech) when conducting web searches, data extraction and re-publishing.

Plaintiffs [web surfer/web client] assert that their conduct falls within three categories of First Amendment-protected activity. Scraping data from their target websites, they allege, is subject to the First Amendment right to record or preserve information. Moreover, employing bots and sock puppets and creating false user accounts constitute harmless false speech. And their planned post-research activities are protected by the right to publish. All of these claims are sufficiently plausible to conclude that plaintiffs’ proposed conduct is “arguably affected with a constitutional interest.”.

1st category (freedom of scrape regardless of [website] restrictions)

First, scraping plausibly falls within the ambit of the First Amendment. “[T]he First Amendment goes beyond protection of the press and the self-expression of individuals to prohibit government from limiting the stock of information from which members of the public may draw.” The Supreme Court has made a number of recent statements that give full First Amendment application to the gathering and creation of information. Additionally, six courts of appeals have found that individuals have a First Amendment right to record at least some matters of public interest, in order to preserve and disseminate ideas.

That plaintiffs wish to scrape data from websites rather than manually record information does not change the analysis. Scraping is merely a technological advance that makes information collection easier; it is not meaningfully different from using a tape recorder instead of taking written notes, or using the panorama function on a smartphone instead of taking a series of photos from different positions. And, as already discussed, the information plaintiffs seek is located in a public forum. Hence, plaintiffs’ attempts to record the contents of public websites for research purposes are arguably affected with a First Amendment interest.

2nd category (freedom of user agent spoofing, false accoutns and proxy usage)

Second, plaintiffs have a First Amendment interest in harmlessly misrepresenting their identities to target websites. The complaint alleges that plaintiffs’ research requires them to create false employer and job-seeker profiles on employment websites, and to use sock puppets to make it appear to a number of housing and employment sites that multiple people are accessing the information they have made available. Because “some false statements are inevitable if there is to be an open and vigorous expression of views in public and private conversation,” and because “[t]he Government has not demonstrated that false statements generally should constitute a new category of unprotected speech,” false claims that are not “made to effect a fraud or secure moneys or other valuable considerations” fall within First Amendment protection. Plaintiffs allege that their conduct will cause minimal, if any, harm to the targeted websites, and that they will take steps to avoid affecting third-party users of the website (such as informing job seekers that their fake positions are fake). Thus, plaintiffs’ harmless false or misleading speech to website owners is arguably affected with a constitutional interest.

3d category (freedom of republishing [publicly open info] regardless ToS)

Third, plaintiffs contend that they have the right, and the desire, to publish the results of their research, and that some sites’ ToS prohibit them from doing so without prior permission or else employ anti-disparagement clauses. The Supreme Court has made very clear that the right to publish falls within the core of the First Amendment’s protections. (“As a general matter, ‘state action to punish the publication of truthful information seldom can satisfy constitutional standards.’” ). Applying criminal sanctions for publishing original material that uses publicly available information, or for making negative statements about a website, triggers First Amendment scrutiny.

Access publicly available info only

Plaintiffs are not asking the Court to force private websites to provide them with information that others cannot get. Instead, they seek only to prevent the government from prosecuting them for obtaining or using information that the general public can access — though they wish to do so in a manner that could have private consequences, such as a website banning them or deleting their accounts.

Credible Threat of Prosecution

Even if plaintiffs’ intended conduct is arguably affected with a constitutional interest, and is prohibited by the CFAA, plaintiffs still must show that there is a credible threat of prosecution for that conduct under the statute. The government asserts that plaintiffs cannot meet this test, because “plaintiffs make no allegation that the government has threatened them with CFAA enforcement,” plaintiffs “cite no instances in which the government has enforced the challenged provision for harmless [ToS] violations,” and DOJ “has expressly stated that it has no intention of prosecuting harmless [ToS] violations that are not in furtherance of other criminal activity or tortuous conduct.”. The government is, for the most part, correct on the facts. The complaint does not allege that plaintiffs have actually been threatened with prosecution. The two cases plaintiffs cite to show that prosecutors have used the Access Provision to punish ToS violations did, in fact, involve harmful conduct. And DOJ’s guidance to federal prosecutors does discourage them — though somewhat tepidly — from bringing CFAA cases based solely on harmless ToS violations.

Conclusion

Now, as a web user surfs the web space or explores it using automated tools (scraping agents, crawlers), he can have sense of security. The proper use of the extracted info is not against the civil law. False user accounts are within the legal realm when searching and registering in the web. Post-research activities are protected by the right to publish, the freedom of speech.

Things allowed to do:

using automated tools to gather info, eg. scrapers!
use false user accounts to get access to the website info
re-publish gathered public information

The things not to do:

induce harm to third-party web users (eg. posting spam comments)
induce harm to a target site functionality (eg. throttle bandwidth)
criminal activity (eg. reselling or republishing proprietary information property)
tortuous conduct (eg. using that extracted info in a misleading or harmful way)