Brain Pickings Icon
Brain Pickings

Culturomics: What We Can Learn from 5 Million Books

How to put your “beft” foot forward, or what the algorithm of censorship has to do with 1950.

We’ve already established that we could learn a remarkable amount about language from these 5 essential books, but imagine what we could learn from 5 million books. In this excellent talk from TEDxBoston, Harvard scientists Jean-Baptiste Michel and Erez Lieberman Aiden reveal fascinating insights from their computational tool that inspired Google Labs’ addictive NGram Viewer, which pulls from a database of 500 billion words and ideas culled from 5 million books across many centuries, 12% of the books that have ever been published.

They call their approach Culturomics — “the application of massive scale data collection and analysis to the study of human culture.” From advising you on the best career choices for early success to figuring out when an artist is being censored to proving that we’re forgetting the past exponentially more quickly than ever before, the data speaks volumes when queried with intelligence and curiosity.

[The database pulls from] a collection of 5 million books. 500 billion words. A string of characters a thousand times longer than the human genome. A text which, when written out, would stretch from here to the moon and back ten times over. A veritable shard of our cultural genome.”

Published September 21, 2011




Filed Under

View Full Site

Brain Pickings participates in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn commissions by linking to Amazon. In more human terms, this means that whenever you buy a book on Amazon from a link on here, I get a small percentage of its price. That helps support Brain Pickings by offsetting a fraction of what it takes to maintain the site, and is very much appreciated