Brain Pickings

Culturomics: What We Can Learn from 5 Million Books


How to put your “beft” foot forward, or what the algorithm of censorship has to do with 1950.

We’ve already established that we could learn a remarkable amount about language from these 5 essential books, but imagine what we could learn from 5 million books. In this excellent talk from TEDxBoston, Harvard scientists Jean-Baptiste Michel and Erez Lieberman Aiden reveal fascinating insights from their computational tool that inspired Google Labs’ addictive NGram Viewer, which pulls from a database of 500 billion words and ideas culled from 5 million books across many centuries, 12% of the books that have ever been published.

They call their approach Culturomics — “the application of massive scale data collection and analysis to the study of human culture.” From advising you on the best career choices for early success to figuring out when an artist is being censored to proving that we’re forgetting the past exponentially more quickly than ever before, the data speaks volumes when queried with intelligence and curiosity.

[The database pulls from] a collection of 5 million books. 500 billion words. A string of characters a thousand times longer than the human genome. A text which, when written out, would stretch from here to the moon and back ten times over. A veritable shard of our cultural genome.”

Donating = Loving

Bringing you (ad-free) Brain Pickings takes hundreds of hours each month. If you find any joy and stimulation here, please consider becoming a Supporting Member with a recurring monthly donation of your choosing, between a cup of tea and a good dinner:

You can also become a one-time patron with a single donation in any amount:

Brain Pickings has a free weekly newsletter. It comes out on Sundays and offers the week’s best articles. Here’s what to expect. Like? Sign up.

Share on Tumblr