BdD des Sciences d’Information

Accueil du site > Traitement de l’Information > Numérisation > Google to digitize millions of books : UNIVERSITIES JOINING IN (...)

Google to digitize millions of books : UNIVERSITIES JOINING IN EFFORT

By Michael Bazeley, Mercury News

lundi 20 décembre 2004, par Collecte CND R.L

Google is launching an ambitious effort to make digital copies of some of the world’s largest university library collections and will incorporate the texts into its vast Web index, apparently the largest project of its kind ever attempted.

As envisioned, almost anyone with a computer could instantly tap into enormous academic libraries — some with texts dating back centuries.

Stanford, Harvard and Oxford universities, as well as the University of Michigan and the New York Public Library, are participating in the program, which could span years and involve scanning and indexing well more than 10 million books and periodicals.

The project is another manifestation of the company’s oft-stated goal to ’’organize all the world’s information,’’ Google representatives said, reflecting Google’s ambition to cast itself as the most authoritative search engine.

``We think this will be a huge benefit for books that are otherwise not available,’’ said Susan Wojcicki, Google’s director of product management. ``It’s a very large task to go through every page, but we are dedicated to it.’’ Google is using its own, secret scanning and digitizing technology that it says will not harm older, delicate books. At the University of Michigan, the company has already equipped a special room with scanners and has been processing thousands of books a week since June. Books will roll into Google’s Web search index as they are scanned and digitized.

The full text of all publications will be scanned. But how much of each publication is accessible will depend on copyright restrictions.

Books that are in the public domain will probably have their full text available through the search engine. For works that are protected by copyright — the majority — Google will show either bibliographic information or snippets of text that appear around a Google user’s search term.

When possible, in the search results, Google will point users to libraries where they can access the publications, or merchants online where they can purchase copies.

At Stanford, the company will copy 2 million books as part of a pilot program, University Librarian Michael A. Keller said. Harvard’s pilot program will begin with 40,000 randomly selected books from the university’s vast collection of 15 million titles, some of which date back four centuries.

``We think it’s important and we’ll learn a lot,’’ said Peter Kosewski, the university’s director of publications and communications. ``They’re doing an absolutely wonderful thing for the scholarly community.’’

At the University of Michigan, where Google co-founder Larry Page received his bachelor of science degree in engineering, the project is more ambitious. Google will digitize about 7 million titles, a process that could take six years. Google and the university have been working on the project for about two years.

``It’s access to a research collection that we never would have dared imagine possible,’’ said John Wilkin, associate university librarian. ``Anyone with an Internet connection now has access to a vast research library.’’

Librarians and researchers said the size and scope of Google’s book-scanning project appear to be unparalleled. Universities such as Stanford have their own digitizing projects. And the Internet Archive in San Francisco has launched a campaign to make digital copies of 1 million texts.

``But I can’t imagine there’s anything out there on this scale,’’ Wilkin said of the Google project. ``Nothing has been conceived on this scale.’’

Suivre la vie du site RSS 2.0 | Plan du site | Espace privé | SPIP | squelette