Summary of Academic Publishers Cloaking Discussion

Posted on August 5th, 2007

So the dust seems to have settled a bit about the issue of academic publishers cloaking their pages to Google. This post is a summary of the facts that emerged and the observations made, a quick recap tying it all together, and a suggestion for the next step.

For reference, the links around the web are:

Facts and Observations

There are three facets to this debate:

  1. Technical side re how this cloaking is implemented
  2. Google's policies regarding this issue
  3. User perception of this issue

So with that, the following points have been made:

    • These publishers are part of the Google Scholar program. Google initially contacted a few major publishers to join the program.
    • The cloaking is IP-based as a simple switch of the User Agent to Googlebot's doesn't work. That's not surprising to us in the field; I linked to how you can do that (with full code) in my previous post.
    • Definition of cloaking from Google's guidelines:

      Cloaking refers to the practice of presenting different content or URLs to users and search engines. Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index.

      So there is no doubt this is an example of cloaking. The question is whether this is acceptable or not.
    • A relevant quote from the Google Scholar Publisher Policies:

      Google users must be offered at least a complete abstract.This is a crucial component of our indexing program. For papers with access restrictions, a full author-written abstract will help users choose among the results which paper is the most likely to have the information they are looking for.

      Some people pointed out that this is not always happening with some SpringerLink articles.
    • A lot of academics are annoyed by this cloaking. The sentiments of the comments on John Baez's original post speak loudly. The blogosphere has other posts from annoyed academics.
    • I get the feeling that people would be happy to keep the for-pay results in the Scholar search results but keep them away from the main search results. If that happens (and I think it should), the for-pay results need to be labelled clearly. John Mueller wrote a comment on the Sphinn story about how this already happens with Google News.

So what now?

Some people are clearly upset. Some people are upset at expensive publishers in general (and so having them in Google's results make things worse) and some people are upset that Google is letting some publishers break its terms of service/policies so obviously without any perceived reward for the user.

Fundamentally, I believe the question of what's acceptable cloaking and what isn't boils down to user perception and expectation. If users expect to say for-pay content in the search results, they are OK with it, but please label it properly. Pubmed, a major aggregator of bioscience papers, has two icons to depict whether the paper is freely available (via an Open Access license) or only the abstract is freely available. There is no reason why Google shouldn't do this too.

The key question is what happens when cloaked results appear unexpectedly. Clearly people find this (very) annoying. Of course, Google's policy has so far been to ignore it as they sort of need it for Google to be able to index the papers for Google Scholar (and thus allow it). Well, Google, consider this set of posts as very vocal customer feedback: Take out for-pay content from the main search engine results pages. We're OK to keep them in the Scholar results, but label them.

And academics, you can do something about it! There are three things you can do:

  • In the short term, file a spam report with Google. Very inconveniently, there are two ways to do this. You can use the so-called unauthenticated submission form, and that's publicly accessible. Owners of websites can use the so-called authenticated form using their webmaster central form. More details about spam reporting from the horse's mouth.

    The spam details are as follows: state that you have found evidence for cloaking in the main search engine results pages (SERPs). Submit the full URL of the results page, state the apparent URL of the result (right click and copy the link location - exact wording varies in each browser), state that the result is labelled as a PDF file, and submit the URL you actually end at. This gives the spam team a full audit trail. If you can submit more than one example, do so. And tell them this is annoying you if it is.

  • Stop using Google! If their search results are not useful to you, use another search engine. MSN has a great Academic Search, and for general searches, try Yahoo!. I recommend Hakia as a decent search engine (it's still in beta, so the results can be spammy or a bit irrelevant) and there are hundreds of alternatives. Take your pick and vote with your feet!
  • In the long-term, if access is important to you, publish in prestigious journals that have an Open Access policy you agree with. If enough people do that, the Open Access journals will get an increase in their impact factor and the administrators will be happy again. Having a debate about it in the journals themselves is also helpful. This question is about awareness but it can happen with time.

So that's it for now. I've already submitted an authenticated spam report to Google. Let's hope there is a response!

Subscribe to Blog of Science!

If you liked this post, please subscribe to the blogSci.com RSS feed:

2 Responses to “Summary of Academic Publishers Cloaking Discussion”

  1. Glaring Omission from Google’s Cloaking Examples - Pocket SEO Says:

    […] I think the reason that Google doesn’t use the example of cloaking with a login page, is because Google selectively approves of certain types of cloaking. […]

  2. Odonyhoth Says:

    Is this gonna end someday??

Leave a Reply