Looking over last Friday’s news coverage in the UK what immediately struck me as someone who has an interest in the story was the repetition that Google was the only one engaged in digitising books. Ben Cohen’s Channel Four report even featured him at the British Library who are part of a competing book digitisation project!
I wrote about Google books on my blog in another context back in April and, reviewing the content, it seems appropriate to repost.
There’s been a lot of paranoia around lately about Google. We’ve had Street View arriving in the UK and provoking posh protests blocking their camera car. Now we’ve got requests for investigations on the info they hold about their users.
The former seems bizarre to me in the country which has accepted more cameras looking down on us than anywhere else in the world. The latter appears worrying when it’s suggested that they might disclose that information to states, but states can get far more from internet service providers (and already trawl our digital use anyway).
What’s really exercised some though has been the Google business proposition, which is to use that data to better target ads.
Here’s why I find this a bit ridiculous: Google just aren’t very good at it.
I have been using Gmail for ages and send and receive heaps of email. Besides each one are text ads targeted at me using all the text in all that email plus the content of the particular email I’m looking at.
They’re consistently mistargeted .
I swear I don’t know why they think I might want Lionel Richie tickets. or have a legal problem.
After hunting I found one which is vaguely near to the content of the email. But only vaguely.
The area where we should be the most concerned is the one which I’ve yet to see British media really pick up on. And it’s actually concerns Google’s mission – to organize the world’s information and make it universally accessible and useful.
Google Books is their project to make available digital copies of out-of-copyright books and make copyright book text searchable.
They’ve signed up Oxford University amongst other big name partners.
Recently Google changed it’s terms to specifically disallow any of these services from using books they’d digitised – public domain books. There’s not been any legal action thus far but why change the terms if they didn’t want to challenge others, like the Internet Archive which hosts over half a million public domain books downloaded from Google.
Here’s an example of a public domain book on Google that was once ‘Full access’ and is now ‘Snippet only’: The American Historical Review, 1920. For the time being, there is a copy on Internet Archive.
The agreements with libraries (which are mainly university libraries), which were only made public by legal action, means that they give Google all of their books for free, and in return they are given scans that they effectively cannot use for anything.
If they want access to the corpus, they have to subscribe just like everyone else. This means that Google is requiring them to buy back their own copyrighted books, if anyone wants to actually use them on or off the campus.
Their recent deal with publishers which includes the setting up of a Books Rights Registry appears to give Google different, more favourable terms to anyone else who enters into agreements with the Registry.
The Open Content Alliance (OCA) is a consortium with the Internet Archive at its centre which wants to build (a virtual) Alexandria Library II (a physical Bibliotheca Alexandrina exists). The OCA includes the British Library, the Royal Botanic Gardens at Kew and a number of corporations – though neither Google nor Microsoft, who recently left it after funding the scanning of 750,000 books to launch their own book scanning project.
Google is digitizing some great libraries. But their contracts (which were actually secret contracts with libraries – which is bizarre, but anyway, they were secret until they got sued out of them by some governments) are under such restrictions that they’re pretty useless… the copies that go back to the libraries. Pretty much Google is trying to set themselves up as the only place to get to these materials; the only library; the only access. The idea of having only one company control the library of human knowledge is a nightmare. I mean this is 1984 – a book about how bad the world would be if this really came about, if a few governments’ control and corporations’ control on information goes too far.
There’s other issues here too with Google’s relationship with libraries:
Some may have second thoughts if Google’s system isn’t set up to recognize some of their digital copies, said Gregory Crane, a Tufts University professor who is currently studying the difficulty accessing some digital content.
For instance, Tufts worries Google’s optical reader won’t recognize some books written in classical Greek. If the same problem were to crop up with a digital book in the Open Content Alliance, Crane thinks it will be more easily addressed because the group is allowing outside access to the material.
The OCA is trying to establish a standard and both Google and now Microsoft have opted out. Not only is there duplication (triplication) of these vital efforts for human knowledge but Google also refuses to even talk to them, it sees them as a rival.
The OCA are building a “permanent, publicly accessible archive” of digitized texts. Both Google and Microsoft are doing it to make money – not that there’s anything wrong with that but it is right to fear when such knowledge is only available via corporate, proprietorial means.
I hope some journos are reading this because, thus far, the UK media’s coverage of the Google settlement has been dire. And we’re talking about an enormous, extremely significant issue here!