CTO at Econsultancy
07 September 2008 17:15pm
We are currently helping a new client of ours develop a hybrid vertical search / content aggregation site and as part of the initial planning and consultancy phase we are looking at how the content is technically aggregated, categorised and tagged appropriately. As such we’ve come across a couple of issues which I believe could impact on the overall feasibility of the project and I am hoping someone with legal or relevant experience can offer some advice.
Whilst I have a sneaking suspicion that the legality of scraping and re-publishing content on your own website is certainly in the favour of the publisher, I am surprised as I can think of many large players out there relying on a business model which scrapes content and repurposes it for various uses i.e. Google, Kayak, Kelkoo, Technorati.
What should I be advising my client in respect to their rights as a content aggregator, and how that may impact on their business in the future?
Digital Services DirectorAqueduct
Copywriter at HappyCopy
07 September 2008 21:28pm
My copyright understanding comes from my work as an online journalist. I don't think you can republish content word for word, even if it is accredited and freely available. For example, Sky and the BBC make their video etc freely available online but it would be an obvious copyright transgression to stream it from your own site.
Having said that, very few websites are likely to complain at the content being republished elsewhere as it will not damage their SEO and they are being accredited.
Basically, I do not think you would have any legal protection if you were doing this but I think it is quite unlikely you would be challenged.
I hope this was some help!
Managing Director at Free Rein Ltd
08 September 2008 08:55am
Hi Felicity, Matt
In fact most news services would challenge you pretty quickly even if you did accredit them. Where they offer an RSS feed they expect people to use the title and story summary but then link directly through to their article on their site.
Some will allow you to publish the full article but generally at a cost.
08 September 2008 12:56pm
Thanks for your comment.
It's all pretty confusing and certainly a grey area I suspect.
If you look at something like http://126.96.36.199/search?q=cache:26uO4FrnezUJ:uk.news.yahoo.com/skynews/20080906/tuk-teenager-arrested-over-stabbing-45dbed5.html+sky+news+stabbing&hl=en&ct=clnk&cd=3&gl=uk you will see that Google are caching content from Sky without express permission to reproduce this, and this is a news service.
How are they "allowed" to do this without express permission?
I have noticed that if you search on "Sky News" in Google, not all the results offer a view cache option. Could this be down to what they are allowed to cache and what they are not allowed to cache, or could it simply be an algorithm that Google employs to work out what is worth caching or not?
E-Business Consultant at Dan Barker
08 September 2008 14:42pm
hi, Matt, how's life?
I remember asking a similar question several years ago in a law lecture - there was no definite answer then. A couple of years later there was a test case where search engine caching was ruled 'fair use'.
Anyway, on to your problem...
If I was you, I would:
In terms of robots.txt, I would:
I hope that helps you - would be interested to hear more about what you're doing!
08 September 2008 14:57pm
I think using snippets and linking back to the primary site is probably not a bad approach...
Thanks for the advice, and if I hear anything further about the issue I'll be sure to post it in here.
Director at Watson Hall Ltd
08 September 2008 16:09pm
You should look at the terms and conditions on the sites you intend to target. You will also have to consider the liability for things like defamation, encouragement or glorification of terrorism, copyright theft and other intellectual property infringements, etc if you start publishing other people's content. Take the advice of some good new media lawyers soon.
Also make sure that you don't start being used to circulate malware via infected content.
Some site owners are very agressive with those they consider are trying to take their content. For example:
Others use firewalling techniques to block scrapers, or to log what's going on for future court action.
Technical DirectorWatson Hall Ltd for website security
President at Salebug.com
25 August 2009 03:26am
Hi Matt and others,
Thank you for all your expert tips on the legal issues of website scrapping.
I have one specific question:
I'm working on a business idea based on aggregating content from industry specific websites.
When I look at the terms of services on these websites, Is there anything specific I should be looking for? Some of the terms and conditions say that I could use it for non commercial purposes. I'm not selling the content to my potential customers but will be making ad revenue (hopefully).
See the below example from one of the clients...
"You are hereby granted a non-exclusive, non-transferable, limited license to view this Site, and to download and/or print insignificant portions of materials retrieved from this Site provided (a) it is used only for informational, non-commercial purposes, and (b) you do not remove or obscure the copyright notice or other notices. Except as expressly provided above, no part of this Site, including but not limited to materials retrieved there from and the underlying code, may be reproduced, republished, copied, transmitted, or distributed in any form or by any means, without the express written permission of xxxx"
Would I need explicit permission from this website before I scrape its publicly available content?
Thank you gurus and I truely appreciate your time and advice.
Free market research on digital marketing
Daily Pulse: award winning newsletter
It takes 30 seconds to register