The internet is vast and full of data that is publicly available to anyone with the time, or technology, to mine for insights.
You can find everything from years of NYC taxi cab data and Uber information to more obscure datasets. The volume of data availability is staggering, and it’s poised to only grow with players like Amazon supporting publicly available AWS Datasets.
There is so much free data out there that thriving companies have built entire business models based upon farming, organising, and selling insights on free, publicly available data.
The key question underpinning this legal case will see at least some resolution in March of 2018 when hiQ has its day in court against LinkedIn.
Central to the case are several sub-questions related to data ownership and control:
1 - May a hosting site prohibit third-party entities from scraping otherwise publicly available data?
2 - Does a hosting company have the right to control access to data that its users make publicly available?
Basics of hiQ vs LinkedIn case
hiQ is a company built upon scraping publicly available data on LinkedIn.
Its very important to note that hiQ only gathers data from LinkedIn and only gathers data that is publicly available without a LinkedIn account.
The lawsuit currently being heard in California centers upon LinkedIn’s contentions that hiQ is in violation of the Computer Fraud and Abuse Act of 1986 (CFAA) and the LinkedIn Terms of Service. (hiQ is a member of LinkedIn.)
Briefly, the CFAA was passed in 1986 and was aimed at establishing civil and criminal punishments for hacking into private computers to access non-public information and/or cause damage. It was narrowly targeted when drafted and has not been successfully invoked as part of a terms of service violation, as it is here.
What are the arguments in the hiQ vs LinkedIn case
LinkedIn contends that hiQ’s scraping of publicly available data violates their terms of service and justifies criminal punishments for hacking under the CFAA.
While the act of scraping publicly available data does in fact violate the LinkedIn terms of service, bootstrapping a terms of service violation into CFAA criminal charges is a novel theory whilst LinkedIn happily lets your public data be collected by Google and the search engines to drive traffic to it's own site.
The CFAA is very specific in its focus towards punishing those who hack into private computers to steal private information and/or cause damage. This argument fails on the merits because the data was publicly available to the internet even without a LinkedIn account. The violation of the terms of service does present a problem for hiQ here, but the state constitutional free speech argument appears to have gained significant traction with the court.
The early court documentation supports hiQ’s contention that the use of publicly available data is protected free speech in California.
Ideally, the final verdict in the case will find that the gathering of publicly available data and use of publicly available data is protected free speech. The final argument in favour of LinkedIn is their desire to protect its users’ privacy. The protection of privacy is something that will almost always provoke legal support when it is founded upon actually protecting someone’s privacy.
Unfortunately for LinkedIn, this argument will not gain any traction in court because the data was publicly available, and the users opted in to have their data shared publicly.
So what happens next?
Assuming LinkedIn attempts to continue its crusade against hiQ, the court will hear the case in March of 2018.
For now, the recent injunction granted in favour of hiQ will ensure that hiQ and its business operations are protected until the case is fully heard and decided. The court has prohibited LinkedIn from blocking hiQ activities, including data scraping, until the case is complete.