Fear the Google, don’t fear the Google

I came back from a break to the missed news that Google has settled out of court in its clash with US Publishers over its Google Books project.

Looking over last Friday’s news coverage in the UK what immediately struck me as someone who has an interest in the story was the repetition that Google was the only one engaged in digitising books. Ben Cohen’s Channel Four report even featured him at the British Library who are part of a competing book digitisation project!

I wrote about Google books on my blog in another context back in April and, reviewing the content, it seems appropriate to repost.


evil google logo
There’s been a lot of paranoia around lately about Google. We’ve had Street View arriving in the UK and provoking posh protests blocking their camera car. Now we’ve got requests for investigations on the info they hold about their users.

The former seems bizarre to me in the country which has accepted more cameras looking down on us than anywhere else in the world. The latter appears worrying when it’s suggested that they might disclose that information to states, but states can get far more from internet service providers (and already trawl our digital use anyway).

What’s really exercised some though has been the Google business proposition, which is to use that data to better target ads.

Here’s why I find this a bit ridiculous: Google just aren’t very good at it.

I have been using Gmail for ages and send and receive heaps of email. Besides each one are text ads targeted at me using all the text in all that email plus the content of the particular email I’m looking at.

They’re consistently mistargeted .

gmail text ad showing legal firms

gmail text ad showing lionel ritchie concert and model trains

I swear I don’t know why they think I might want Lionel Richie tickets. or have a legal problem.

After hunting I found one which is vaguely near to the content of the email. But only vaguely.

gmail text ad showing training providers

The area where we should be the most concerned is the one which I’ve yet to see British media really pick up on. And it’s actually concerns Google’s mission – to organize the world’s information and make it universally accessible and useful.

Google Books is their project to make available digital copies of out-of-copyright books and make copyright book text searchable.

They’ve signed up Oxford University amongst other big name partners.

Trouble is there are several rivals to Google and they’re open-source, not proprietary. Services like PublicDomainReprints.org and the Internet Archive.

Recently Google changed it’s terms to specifically disallow any of these services from using books they’d digitised – public domain books. There’s not been any legal action thus far but why change the terms if they didn’t want to challenge others, like the Internet Archive which hosts over half a million public domain books downloaded from Google.

words in google colours drop out of bookGoogle has also ‘locked up’ some public domain books.

Here’s an example of a public domain book on Google that was once ‘Full access’ and is now ‘Snippet only’: The American Historical Review, 1920. For the time being, there is a copy on Internet Archive.

The agreements with libraries (which are mainly university libraries), which were only made public by legal action, means that they give Google all of their books for free, and in return they are given scans that they effectively cannot use for anything.

If they want access to the corpus, they have to subscribe just like everyone else. This means that Google is requiring them to buy back their own copyrighted books, if anyone wants to actually use them on or off the campus.

Their recent deal with publishers which includes the setting up of a Books Rights Registry appears to give Google different, more favourable terms to anyone else who enters into agreements with the Registry.

The Open Content Alliance
(OCA) is a consortium with the Internet Archive at its centre which wants to build (a virtual) Alexandria Library II (a physical Bibliotheca Alexandrina exists). The OCA includes the British Library, the Royal Botanic Gardens at Kew and a number of corporations – though neither Google nor Microsoft, who recently left it after funding the scanning of 750,000 books to launch their own book scanning project.

Brewster Kahle, who founded the Internet Archive and heads the Open Content Alliance, warns of “the consequences of the consolidation of information into the hands of a few private organizations”.

Google is digitizing some great libraries. But their contracts (which were actually secret contracts with libraries – which is bizarre, but anyway, they were secret until they got sued out of them by some governments) are under such restrictions that they’re pretty useless… the copies that go back to the libraries. Pretty much Google is trying to set themselves up as the only place to get to these materials; the only library; the only access. The idea of having only one company control the library of human knowledge is a nightmare. I mean this is 1984 – a book about how bad the world would be if this really came about, if a few governments’ control and corporations’ control on information goes too far.

There’s other issues here too with Google’s relationship with libraries:

Some may have second thoughts if Google’s system isn’t set up to recognize some of their digital copies, said Gregory Crane, a Tufts University professor who is currently studying the difficulty accessing some digital content.

For instance, Tufts worries Google’s optical reader won’t recognize some books written in classical Greek. If the same problem were to crop up with a digital book in the Open Content Alliance, Crane thinks it will be more easily addressed because the group is allowing outside access to the material.

The OCA is trying to establish a standard and both Google and now Microsoft have opted out. Not only is there duplication (triplication) of these vital efforts for human knowledge but Google also refuses to even talk to them, it sees them as a rival.

The OCA are building a “permanent, publicly accessible archive” of digitized texts. Both Google and Microsoft are doing it to make money – not that there’s anything wrong with that but it is right to fear when such knowledge is only available via corporate, proprietorial means.

Postscript: Thanks to Stefan Czerniawski for pointing me to this excellent piece by Doc Searls on the Google settlement.

I hope some journos are reading this because, thus far, the UK media’s coverage of the Google settlement has been dire. And we’re talking about an enormous, extremely significant issue here!

Reblog this post [with Zemanta]

Leave a Comment


Leave a Reply

Amanda Blount

I would really like to know if banned books are copied also. I have an issue with banned books being lost forever. Of course, there are books that were once accepted, which have no place in our children’s libraries now. HOWEVER, I think these books should be scanned and preserved as a historical look into where we were, and how far we came as a society.

As for Google or Microsoft trying to use all the worlds information as a money making tool. I believe if they receive the information from a free entity (a public library) then it should be free to the public. If they receive the information from others means (paying a university for the use of books) then they should be able to recoup the cost. But, I do hope there is a NON-profit entity who is large enough to be able to be Google’s rival. It would be a shame that all the world’s information would not be available to the general public. I fully support scanning books (look at how many libraries and courthouses burned down in the past), but, I would hate to have to pay a huge amount of money for my grandchildren to visit the library online. Or worse, as an adult who has a wide variety of interests, I would hate for Google to pick and choose what is appropriate for me to research. That would be a very bad situation.

Paul Canning

Amanda – no idea but you would assume they are as a copy of every publication is sent to Congress Library / British Library.

Very good point on future control of access – another one lost in the lackluster media coverage of this important issue, rightly placed by you in the context of our collective children’s future.

Brenadine Humphrey

Paul, only publications that are registered with the Copyright Office are sent to the LOC (in US), which is underfunded, understaffed and overwhelmed with books, texts, music, art, blueprints, etc. and almost impossible to locate anything in. What would be a really interesting idea is a digitization project run out of countries respective Copyright Offices/Libraries. Content could be searched, checked for information, and released for public viewing when they go into Public Domain (or equivalent). Would keep the projects open, funded by governments, and available to the public.

Paul Canning

Thanks for that info – didn’t know that but not entirely surprised.

Good idea on public domain digitisation but I still think the Open Content Alliance offers the best way forward and should be supported for, for one, ethical reasons.