Scraping by: the developing legal landscape of data scraping

January 2018 | SPOTLIGHT | DATA PRIVACY

Financier Worldwide Magazine

January 2018 Issue

Data scraping (otherwise known as web or screen scraping) has become a common tool of the data-driven digital trade, and depending on where your business stands, it represents either an opportunity or a risk. The process involves the automated extraction of publicly available unprotected information from external websites which is then synthesised, commercialised and redeployed or redistributed by an individual or company, generally for commercial gain. It can involve the deployment of thousands of ‘bots’, to scrape the data and interfere with the balance of markets (for example, to buy large numbers of sought-after concert tickets) or to mine private data.

In Australia, the debate about the legitimacy and legality of screen scraping has been most predominant in sectors where there is fierce competition and where deep or Big Data is a tool of trade, particularly the financial services and real estate sectors. Australian courts have previously considered claims for breach of copyright in relation to specific databases (for example, the case over the ‘WHOIS’ domain name database, brought by domain name registry operator Nominet against Australian company Diverse Internet in 2004). But recent cases in the US and Europe have considered claims against the scraping of publically available databases by companies which require access to the data for their business. The uncertainty about the enforceability of website user (or browsewrap) agreements, where users do not have to click to consent to the website terms, has seen actions brought to stop data scraping and counter-argument involving a combination of claims of copyright infringement, torts, trespass, breach of constitutional freedoms, breach of privacy, breach of anti-hacking laws and misleading conduct, as well as breach of contract.

The US

In one of the most high-profile judicial considerations of the legality of data scraping, the district court of California considered the issue in HiQ Labs Inc v. LinkedIn Corporation. HiQ’s business is based on the automated scraping of data from LinkedIn’s website, analysing that information and selling the analysis to various employers, to help them identify employees at risk of being poached and assist with employee retention. HiQ’s data analytics is wholly dependent on the public data of LinkedIn, which was tolerated for a number of years prior to the commencement of proceedings. In May 2017, however, LinkedIn served HiQ with a cease and desist letter, claiming that HiQ had breached LinkedIn’s user agreement by scraping, copying and sharing the profiles and information of LinkedIn users. In response, HiQ commenced proceedings, seeking an injunction to prevent LinkedIn from taking action to block HiQ from accessing the data.

In what appears to be an increasing trend in data scraping litigation in the US, LinkedIn argued that HiQ breached the federal anti-hacking law originally passed by Congress in the 1980s, the Computer Fraud and Abuse Act (CFAA), by undertaking unauthorised access to LinkedIn’s data. LinkedIn also noted that its user agreement prohibits various methods of data collection from its website and stated that HiQ was in violation of those provisions. LinkedIn also claimed breach of privacy, which was not sustained because LinkedIn could only point to three individual complaints about data privacy and scraping activities from its hundreds of millions of users (including 50 million users with a privacy setting of ‘do not broadcast’). Had users been aware of the scraping, in particular those employees being targeted, perhaps they would have been more vocal.

HiQ raised a number of grounds for its injunctions. The company claimed that LinkedIn’s attempts to restrain it from accessing publicly available data constituted unfair business practices, common law tort and contract claims, including intentional interference with contract and promissory estoppel and violation of free speech under the California Constitution.

At first instance the district court found in HiQ’s favour and granted the preliminary injunction finding the balance of hardship lay clearly in HiQ’s favour and there were serious questions to be tried. Interestingly, the court noted that LinkedIn’s privacy claim was undermined by the fact that the privacy policy was “buried in the user agreement that likely few, if any, users have actually read”. With respect to breach of the CFAA, the court noted its concern that the interpretation favoured by LinkedIn would unduly extend the reach of the CFAA beyond its intended limits and expand its scope beyond computer hacking. The court made a clear distinction between the unwelcome access to publicly available information and the practical circumvention of a technological access barrier. Therefore, despite LinkedIn’s user agreement stating that its users were not permitted to “copy, use, disclose or distribute any information obtained from the services, whether directly or through third parties (such as search engines), without the consent of LinkedIn”, the court decided that allegations of ‘unauthorised access’ under the CFAA had little weight where the information obtained is otherwise publicly available.

LinkedIn filed an appeal with the Ninth Circuit Court of Appeals on 5 September 2017. This appeal has not yet been heard.

The application of the CFAA has also been considered in Facebook Inc v. Power Ventures, Inc. Power Ventures marketed itself as a platform through which customers could manage multiple social media accounts in one location. Its business model was dependent on access to Facebook and other social media sites. In response to an advertising campaign, where users could click a link to post advertisements about Power Ventures on Facebook’s servers, Facebook served a cease and desist letter and took action to block Power Ventures’ access to its servers. On appeal, the Ninth Circuit Court agreed that Power Ventures violated the CFAA by continuing to access Facebook’s servers after a cease and desist letter had been issued, in circumstances where the notice explicitly withdrew permission to access the information and rendered Power Ventures’ access to the servers ‘unauthorised’.

The court did, however, note that the violation of a website’s terms of service ‘without more’ does not trigger liability under the CFAA. This means that simply accessing data in contravention of terms of service will not amount to unauthorised access to a computer under the CFAA, but continuing to do so once access has been explicitly revoked, is sufficient to fall foul of that legislation.

The EU

The European case of Ryanair v. PR Aviation considered the data scraping activities of a Dutch business which scraped data from Ryanair’s website (among others) to display price comparisons for low cost airlines. Ryanair sued PR Aviation for breach of the EU Database Directive, which protects the intellectual property of databases in the EU, and for breach of Ryanair’s website terms and conditions.

The court found that PR Aviation had breached Ryanair’s website terms and conditions, specifically the term expressly prohibiting screen scraping without a written licence agreement with Ryanair (which PR Aviation did not have). Visitors to the website are required to accept the application of the company’s terms and conditions of use by ticking a box to that effect. The court found the terms were clearly accessible, highlighted by a hyperlink. The court had considered the question of copyright protection, but decided that copyright did not extend to the data.

Lessons for Australian companies

The outcomes in the two decisions so far are, to an extent, products of their factual circumstances. But they also reflect a deeper tension between the concept of an open internet and protecting legitimate commercial ownership of data. While Australia does have robust copyright and data protection laws, and the Crimes Act 1995 prohibits computer hacking, it remains to be seen how courts would deal with a claim brought in response to ‘screen scraping’ activity within these legal frameworks, particularly where there is no express acceptance of the website terms.

For example, section 30H of the Crimes Act 1995, which prohibits ‘unauthorised access to restricted data’, appears unlikely to extend to the act of scraping publicly available information. Under the Crimes Act, restricted data is data held in any computer to which access is restricted by an access control system associated with a function of a computer. This provision is likely to be interpreted in a similar way to the CFAA, meaning access control is a key consideration when seeking to prevent data scraping activities.

Enforceable website terms

The Ryanair decision suggests that a binding agreement will exist between a website operator and a user in circumstances where the user has actual or constructive knowledge of the site’s terms and conditions prior to undertaking the activity. As enforceability of contractual terms prohibiting data scraping will be critical, the attention of users will need to be clearly drawn to the website’s terms and conditions and users should be reasonably expected to have read them, for example through a pop-up screen, check box or other method (in the form of a clickwrap agreement). The terms that protect the public database need to be well-drafted and clearly identified and explicitly prohibit data scraping activities, but they should not be too broad, as this risks becoming anti-competitive behaviour.

Requiring users to create a profile or actively sign in to access a website’s information is likely to signal to the users (and a court) that the information contained beyond the member login is information unique to that website, and not intended to be accessible for commercial use.

Firewalls and paywalls

Many screen scrapers operate as automated ‘bots’ from thousands of varying IP addresses. The introduction of a firewall or paywall for users would not only generate a substantial cost for any ‘bots’ being used to scrape the data, but will also indicate that the data in question is protected and not for general public access or consumption. This makes it more likely that a Court will view the information as commercially valuable, and therefore subject to protection. Further, the imposition of a firewall is a clear indication that users have read or turned their mind to the associated terms and conditions of the website.

The absence of data scraping claims may be due to the risks of litigating this type of claim and because the business models of many companies depend, in turn, on their ability to access and use other websites’ data. How the jurisprudence will develop in Australia is unclear and will largely depend on the facts of the case. Data protection issues may take on more significance, particularly with the commencement of the European GDPR in May 2018 and its extraterritorial application.

Depending on their business model and whether they want to protect their own content and minimise risks from data scraping, companies should be undertaking a risk assessment of the types of data (such as purely factual, personal or creative information) and sources of data (for example, competitors’ sites or large technology company sites) they are creating or scraping, reviewing their own website terms and site access procedures to ensure they are as robust as possible and looking at anti-scraping technology solutions they could use.

Veronica Scott is special counsel at MinterEllison. She can be contacted on +61 3 8608 2126 or by email: veronica.scott@minterellison.com.

Veronica Scott

MinterEllison