Data poetry: Ode to The General Index

In this six-minute-long impassioned video, Carl Malamud draws on his considerable poetical and performance skills to introduce The General Index:

What is The General Index, and how can it benefit you? The answer, some of it in the form of questions, is in the video.

Here, to whet your appetite for viewing the video, are some passionate phrases from it, and from the documentation about it:

Topics—Access to Knowledge, Text and Data Mining, Temples of Knowledge, General Index
Language—Science is our universal language.

Public Resource, a registered nonprofit organization based in California, has created a General Index to scientific journals. The General Index consists of a listing of n-grams, from unigrams to five-grams, extracted from 107 million journal articles.

The General Index is non-consumptive, in that the underlying articles are not released, and it is transformative in that the release consists of the extraction of facts that are derived from that underlying corpus. The General Index is available for free download with no restrictions on use. This is an initial release, and the hope is to improve the quality of text extraction, broaden the scope of the underlying corpus, provide more sophisticated metrics associated with terms, and other enhancements.

Access to the full corpus of scholarly journals is an essential facility to the practice of science in our modern world. The General Index is an invaluable utility for researchers who wish to search for articles about plants, chemicals, genes, proteins, materials, geographical locations, and other entities of interest. The General Index allows scholars and students all over the world to perform specialized and customized searches within the scope of their disciplines and research over the full corpus.

Access to knowledge is a human right and the increase and diffusion of knowledge depends on our ability to stand on the shoulders of giants.



