Big Data and Culturomics


Friday, August 09, 2013

Big Data — and how we use it — is changing the way we understand our culture and history. Research scientists Erez Lieberman Aiden and Jean Baptiste Michel (Uncharted: Big Data as Lens on Human Culture) teamed up with Google to create the (highly addictive) Ngram Viewer: it sifts through millions of digitized books and charts the frequency with which words have been used. "Geek" and "nerd" started proliferating around 1980. “Women” began to spike in the 60s, surpassed “men” in the 80s, then fell behind again in 2005.

The Ngram Viewer also exposes notable gaps. “If you take Marc Chagall or Pablo Picasso, Kandinsky or any virtually any intellectual, especially Jewish, especially artists,” Michel explains, “and you look for the trajectories of their names in books published in German, you will see huge dips, holes really, literally, at about the time of the Nazi regime. So you can visually see the thumb of oppression pressing down the trajectories of these people.”

Aiden and Michel call their method of combing through text to map cultural trends “culturomics.” “It’s like genomics but with culture,” Aiden tells Kurt Andersen.

Stanford French professor Daniel Edelstein calls his data-driven project “Mapping the Republic of Letters” a kind of Facebook for the Enlightenment. He and a team took tens of thousands of letters written by intellectuals like Voltaire, Jean-Jacques Rousseau, and Jonathan Swift during the 1700s.  Using maps, their system shows how new ideas spread through Europe and the world. And in some cases, the data doesn’t follow accepted history. Voltaire is thought to have been influenced by his English contemporaries. But when Edelstein mapped his correspondence, “Where are the letters to England? There’s such a myth around Voltaire as having transmitted English thoughts about politics about philosophy about science into France and there was just this gaping hole in the map of his correspondence.” Edelstein says this discovery has lead to a new research project to explain the hole.

And then there’s the question of proper spelling: you say doughnut, I say donut. Teddy Roosevelt favored the latter method, even trying to standardize that spelling in 1906, but it wouldn’t stick. What did? According to the Ngram Viewer, “it seems like [“donut”] is relatively contemporaneous with the creation of Dunkin Donuts.” We can’t say we’re surprised.

Special thanks to Maneesh Agrawala, our scientific advisor on this story.

Comments [51]

Russell Thomas

I just completed a long blog post using n-gram analysis to explore the origins of the "Baby on Board!" sign craze and related cultural phenomena.

I know it's way past the deadline but readers might enjoy it.

Best regards,


Oct. 07 2013 03:30 AM

I am writing a book about "interracial cooperation." Thank you so much for this tool! It helps immeasurably to be able to see how the use of the phrase changed over time.

Sep. 09 2013 11:42 AM
Nick Riddle from Bristol, UK

Something to be a little cautious about: I tried the word 'failure', and the graph suggested first usage in 1500 - but when I followed the link at the bottom to what I presume are a range of specific citations, one of the first texts was some kind of 20th-century aircraft manual, dated 1542 by Google! I guess it may have interpreted a catalogue number as a year of something? Looking further down the results page there were several other anachronisms. Seems to be that big data has a few wrinkles to iron out...

Aug. 23 2013 05:01 AM
Mary from Boston

radio,television,internet from 1890-2008.

Aug. 17 2013 11:52 PM
Mindy from Portland

I know I'm late to the party (the result of only just catching up on podcasts), but I find the graph of "introvert" vs. "extrovert" fascinating.

Aug. 16 2013 04:42 PM
Magreve from Red Hook, Brooklyn

Back when I was in college in the late 70's, my roommate (still my bff) and found ourselves saying things like "scarf that right up" or "scarf it down" ... both could mean eat quickly, or "to scarf up" meant to get a good deal. I don't know WHERE we got that expression from. I have always remembered it as just "appearing" in our vocabulary. So ...

I searched for both phrases: scarf it up and scarf it down.

Low and behold ... "scarf it up" has a huge spike between 1810 and 1820, and then really nothing until ... 1970. And both phrases exist together. So I find that intriguing -- in that I "felt" it while it was happening, but didn't really understand it.

Another item I checked out ... "dilver" ... a word that until a few days ago I thot was just "my family" word for something like a food mill, e.g., to grind up cooked apples into applesauce. Dilver turns out to be a company that produces food mills, AND a word that waxed / waned in use. So ... "silver" in the English language indeed has a rhyme, albeit an uncommon one.

Finally, I looked up the phrase "at this point in time" ... because a different roommate at one time said it _to_distraction_ -- we finally had to gag her to get her to pay attention and STOP using it. It turns out to be hardly used before the 1950's and then ... whoooshh!! onward and upward.

Very, very interesting tool! Thanks for sharing!

Aug. 15 2013 11:22 PM
Arthur from Minneapolis

Try graphing science and religion together. Could it be showing the rise of rational thought triumphing over dogma? Oh wait...I forgot this is America. Never mind.

Aug. 15 2013 04:59 PM
Juan Martinez from Chicago, IL

In Strong Opinions, Nabokov claims that "Lolita is famous, not I. I am an obscure, doubly obscure, novelist with an unpronounceable name." Here's a nifty Ngram chart that seems to prove him right:

Aug. 15 2013 11:15 AM
Michael Waldron from NYC

Quality, Quantity

I would have expected "quantity" to dominate in the era of big data but it's just the opposite. A perfect reversal that crisscrosses around 1917.

Aug. 15 2013 09:27 AM
Encyclopaedia Iranica from New York City

Looked into orientalism, and tweeted the ngrams for Oriental, Persian, Turkish, Arabian, Indian
Persian and Indian seem the clear favorites.

Aug. 14 2013 06:51 PM
Mary Blockley from Austin, TX

Guess which of these four words--immigrant, emigrant, refugee, and exile--has the highest frequency in American English between 1800 and 2000 and you'll probably be wrong.

Given the number of times I hear people describe those they admire (including themselves) as "humble," an attribute once reserved for servants, abodes, and Uriah Heep, I was even more surprised by the relationship between "modest" and "humble."

Aug. 14 2013 06:30 PM
W.J. from New Jersey

I was interested to see how long it takes for popular abbreviations to supplant the original brand names, so I looked at four examples from car companies. In the case of General Motors, GM spiked ahead in the late '60s only to fall back and return later in the '80s. Chevy overtook Chevrolet in 1970. Oldsmobile and Olds, however, have stayed neck-and-neck. Meanwhile, VW may be just a few years away from overtaking Volkswagen.

This is a neat (and addictive) tool.

Aug. 14 2013 01:51 AM
Nick Gavrila from Montclair

I chose the English Author D.H. Lawrence. The mid-eighties is when he shows up the most. Somehow it seems to parallel trends in the politics within graduate programs in literature, regarding the rise of race, class, and gender studies.

Aug. 13 2013 06:26 PM

Twitter, kindle, and web. Was intrigued to see how they would map since their 'current' meanings/connotations now would be technology related yet they had a life before the Internet & devices came along. Instead of being 'flat' in earlier years they are actually quite active in their own way. They do swing up in the 90s and I imagine would continue in the 2000-2010 range...but not with the same meaning as in earlier decades.

Aug. 13 2013 04:49 PM
David Klein from Long Island, NY

I searched for "big data" and was surprised to find that there is a small blip in usage from 1929 to 1937 in before coming back into vogue in 1955 when the search is in "English". The small early hump disappears if the search is in "American English" or "British English".

Searching in "American English" or "British English" shows that the surge in American usage started in 1955 and in British English 10 years later.

It's fascinating that if the time line is extended as far as possible, the peak seems to be in 2001. Anecdotally, with discussions of privacy, social media, NSA, and Edward Snowden it would seem that it's all time usage would be right now. It made me realize that the major current usage is not in books, but in media not covered by Google books.

Aug. 13 2013 02:48 PM
Jim Govoni from Cape Cod

Baby Boomer and assisted living follow the exact same trajectory

Aug. 13 2013 01:31 PM
Saunie Holloway from San Diego, California - USA

Nothing for "garbage patch" how can we fix it if nobody writes about it?

Aug. 13 2013 11:49 AM
Mack Blair from Haiku, Hawaii

Now , the name Dylan is an ancient name first used by Dylan Thomas and later picked up by Bob and one of my students in my 6th grade math class is named Dylan.Interesting to look at names of musical icons..
Check out Elvis.. Have fun

Aug. 13 2013 03:06 AM
Chad from New Jersey

There is a very interesting connection between "freedom" and "war". One does not always follow the other. Sometimes war spikes first, sometimes freedom. There is definitely a correlation between the two, but while the use of freedom has a fairly steady incline leveling off since 1990 or so, war has definitely increased in usage, particularly since the late 18th century with the string of revolutions. It may seem likely that war would become less and less frequent as we become more and more "civilized" as a culture, but this does not seem to be the case.

Aug. 12 2013 09:50 PM
RANDI from Brooklyn ny

many large gaps. 1840s and both turns of the century.

Aug. 12 2013 09:01 PM
Murphy-Higgs from Jersey City.

To agree with the assertion that humanities spends too much time on race, class and gender is not only offensive but rather foolish. Humanities spends time on race class and gender because it is the conversation and information that is missing. Only people one with the white male dominated power structure thinks race, class and gender is focused on too much. The privilege to dismiss those things and indifference truly a luxury that many don't have. Precisely why humanities creates such an emphasis. The focus on people of color on studio 360 has always been noticeable, but today was the first time I was ashamed to listen. It will be the last.

Aug. 12 2013 08:51 PM
Moji Shabi from Brooklyn, NY

I decided to do "girls with guns" as it is a favorite movie genre of mine and I was intrigued by two things.

1. The first time the phrase shows up is in between 1894 and 1900.
I am still trying to figure what was going on then.

2. The graph displays looks like a city skyline. I wish I had something pithy and clever to say about it other than I think it looks really cool.

Aug. 12 2013 08:48 PM

I'm enjoying trying out various combinations of foods and cuisines. Favorites so far include:
dim sum,sushi,teriyaki,pad thai (look at the sushi spiking up over the last 20 years in America!)
olive oil,lard,vegetable oil,margarine,butter
olive oil,lard,vegetable oil,margarine (you have to look without "butter," also, since it dampens everything else)
parsnip,rutabaga,cardoon,salsify (some making a comeback after a dip in the 60's)
(they switched! fascinating!)

Aug. 12 2013 04:34 PM
John Sunderman from Redwood City, CA

In Ngram I graphed "th anniversary,year anniversary. It is a pet peeve of mine that increasingly people say "ten year anniversary" instead of "tenth anniversary". NPR even does it. Even worse is when people say six month anniversary. The incorrect choice appears to have overtaken the proper phrase in 1985. Unexpectedly, the correct phrase had a peak in WWII. My guess is that people used anniverasies to remember better times, etc.

Aug. 12 2013 03:09 PM
Paul Spirn from Nahant, MA

Big step up from Carla Bruni. Back on track. (Sorry, I couldn't resist.)

Aug. 12 2013 12:35 PM
Jason Davis from Toronto

I tried both 'earnest' and 'frank.' It seems that it's always been more important to be earnest.

Aug. 12 2013 11:00 AM
Patrick Lanin from Backwoods of Minnesota

Doughnut, donut.......Well they are both predated by the grand daddy of fried/boiled pasteries.....DOUGH NAUGHT! Which literally translates (from archaic English) to a Naught or zero made of dough.
You say Pootahto and I say potato...what ever. Nonetheless, it was amazing to see the amount of air time you "frittered" away on this bit of inanity.

Aug. 11 2013 11:49 PM
Gregory Slater from CA

opium, cocaine, marijuana, heroin, methamphetamine, amphetamine, LSD, mescaline - 1930-2008 :

Opium, Cocaine, Marijuana, Heroin, Methamphetamine, Amphetamine, LSD, Mescaline - 1930-2008 :


- Greg Slater

Aug. 11 2013 11:29 PM
Bill S from Manassas

I admit I had a hunch about this one. Use of “majority” is on a decline while “vast majority” is on the increase. I’m guessing it’s not enough to just be a majority, it only counts when it’s vast.

Aug. 11 2013 11:24 PM
Jaxon Cohen from Holladay, Ut

I am a philosopher. I define the world as a bifurcated choice of love and power. I define love as: vulnerable, intentional, participation. I define power as: silence, exclusive, force. This algorithm has proven my postulation that the world is moving towards love. Notice how two of the three terms in each search are stable and parallel while one is markedly different. Also take note of the effect of WWI. The net result: we participate more and force less.

Aug. 11 2013 06:02 PM
Kate Day from Danvers, MA

We just finished a fabulous new rail trail this week in Danvers, MA - now 7.6 miles of completed trail. This is part of a growing trend to preserve unused railroad rights of way for recreational use. See
Per Ngram, there a definite spike in the use of the words "rail trail:"

Aug. 11 2013 02:53 PM
Tomas Sancio from Venezuela

Thank you for the Ngram awareness! Really cool stuff. It works for charting developing nations history. For example, first thing I did was to chart "Venezuela" in English and as predicted, books on this country boomed during the USA oil investment years and dropped after the country nationalized the oil sector. Our [super]hero "Bolivar" in Spanish rose first during his living years, then peaked during the rediscovery by the mid-19th Century military dictatorship of the time, kept on rising but has been falling out of favor as there's not much more you can squeeze from a 200-year-old General. As an extra, you can try "negro" in English to find out when the USA Civil War was fought.

Aug. 11 2013 01:16 PM

My favorite is "paradigm shift." My spouse hates it, but once I heard read Kuhn's book in college, I saw the world differently for decades that followed. BTW - the ngram follows a classic statistical "S" curve (

Aug. 11 2013 12:33 PM
Margaret Stix from Brooklyn

The use of the word "ugh" peaked in 1670, before taking a steep decline in 1700 from which it never recovered. Usage now is only a bit above the 1700 level.

I may be sick of the term "cutting edge," but I bet they were even sicker of it in 1950 when it peaked. There were as many references to it now as in 1917. Go figure!

Aug. 11 2013 12:01 PM
Dorothy Cebula from New Jersey

"Went missing" emerged in the past century starting in the 1920's very briefly followed by a bump in the 1950's, with a steep increase since the 70's. I could easily prefer abandoning this term to words like "left, lost,kidnapped or escaped" that managed to express a similar idea.

Aug. 11 2013 11:37 AM
Sally Ember, Ed.D. from Hayward, CA

What happened in about 1580 and 1670 regarding Buddhism for English writers/speakers? I searched "Buddha," "Buddhist," and "Buddhism," and these spikes intrigued me. The "trend" in these terms started in 1830 and continued, increasing dramatically after 1860 (Britishers "discover" Buddhism when they invade and colonize India) and about 100 years later, in about 1950. These coincided with Caucasians' "discovering" and writing about these topics. But 1580 and 1670 I can't explain. Can anyone?

Aug. 11 2013 06:09 AM
Sally Ember, Ed.D. from Hayward, CA*%5D%2C%5Bouter+-+space%5D%2C%5Babducted%5D%2C%5Bkidnapped%5D&year_start=1500&year_end=2008&corpus=16&smoothing=3&share=
I viewed the combination of "E.T.," "extra-terrestrial," "alien," and "outer-space." Nothing for "outer-space" at all.
Look at the spikes for "alien" in about 1570 - 1580, 1700-1710, 1735, 1752, 1754, 1780-81, 1805, 1830, 1845-50, 1920-30, 1950-90, 2002. Would love to know WHY!!?? And, look at "E.T." 1740-45 (pre-dating, then overlapping "aliens," above!?!), 1930, and then because of Spielberg's movie in 1982 - 1988. Then, look at what happened when I overlaid "abducted" and "kidnapped" on these n-grams!

Aug. 11 2013 05:52 AM
gil johnson from philadelphia

"gut feeling" apparently only came into existence as a phrase in the 60s, and skyrocketed into common use since.

Aug. 11 2013 05:27 AM
Margaret P. from Washington, DC

I went on a little Ngram journey, starting with Barbie doll and I ended up comparing "American girl,French girl,English girl,Chinese girl,Japanese girl" - there are some interesting, though perhaps predictable, spikes during WWI and WWII. But the steep upward trajectory of American girl after 1877 is particularly interesting. If I were a historian I'm sure I'd have an eloquent explanation.

I think the uptick in American girl beginning in the late 1980s is due to the emergency of the American Girl doll, but I don't understand why, beginning in the 21st century, some of the other "girl" phrases start climbing again. What's going on here?

Aug. 10 2013 09:28 PM
Mark Lounsbury from St.George, Utah

A search of the word apocalypse shows an interesting rise in usage during the American Revolution and the French Revolution. After these periods, the usage dies down until World War II. from this point it rises continuously until the present. These seem to suggest a rise in fundamentalist religious belief during times of social upheaval.

Aug. 10 2013 08:32 PM
Mike from Nevada, USA

Decided to try two words which resulted in an interesting correlation over the last 50 yrs. "Groovy" rose quickly from NOTHING in the 50's and the early 60's, but started its ramp up in 65-ish and peaked in 1970-72 with a major decline shortly thereafter and then a 3/4 resurgence starting around 2000 ('far out' on the other hand has not since seen any such resurgence). Also oddly it had a minor peak in 1893 (I'll leave that for a linguist to explain.)
The other interesting "correlated"? word I searched was the 'N' word for the period 1700-2008. Here we see a very sharp peak in 1863 (Emancipation Proclamation issued), and a decade-long peak throughout the 1930s.
This declined to the lowest trough in nearly a century from the mid-50's-60's (The peak of the civil rights movement). Now, following that it only began to see a major resurgence starting in 1965-ish, a peak in 1970-1971, a drop and an on-going resurgence into the 2000's almost exactly the same as the term "groovy" over the same period. Coincidence?

I also found the graphs for the 4-lettered 'seven words you can never say on TV' even going back to 1700 through present, quite interesting.

Aug. 10 2013 08:30 PM
Kevin Hodgson from USA

I was interested in the divergence of ketchup versus catsup. This seems trivial on one level, and yet on another, it seems like such a comparison might represent the way we are losing local variations of words of common foods.
What the analysis shows is how prevalent the idea of "catsup" was during the early 1900s and even higher in the years after WW II, and then catsup flattens out and loses, eh, flavor in the 1970s, mostly falling out of favor in current times as globalization of common English seems to have taken room.
Who knows. It's easy to read a lot in a data line.

Aug. 10 2013 07:02 PM
Jen Marshall from Ontario, Canada

I searched ignorance & knowledge from 1700 to the present. I wonder what happened around 1754? 1980 shows an upswing in the use of knowledge. I don't think we like to talk about ignorance (peak use in 1790).

Aug. 10 2013 06:22 PM
Howard Weinberg from NYC

Sensed that the word "Entrepreneur" has been on the rise and after hearing about nGram and listening to David Brooks, I thought I'd try it. Great app to know about. Thanks.

Aug. 10 2013 05:30 PM
Vlad Jersian from Central NJ

I compared "outsourcing" to "job security"..

The results were as expected...

Aug. 10 2013 05:28 PM
Nicholas Penning from Arlington, Va.

I hate 'big data.' I'm overwhelmed by it. Too much of too much all the time. 'News' bursts get watched, while journalism gets drowned; Senators tweet, when they should be listening; the Net is clogged with blogs; and tweets are millisecond bursts of brain farts exploding, undecipherable, from millions of data points too distant to be understood. I want life -- particularly the part that's overtaking our financial security, which exists within masses of data dollars being spent by data crooks who find all the right deceptive data corners to divert detection -- to s l o w down.

Aug. 10 2013 03:05 PM
Elliot from Boston

For me as a history major and a museum worker, I find big data incredibly interesting for finding true trends that show how people of the time change their behavior, knowingly or unconsciously, around historical events.

A good example is combining searches like "steam","factory","machinery","union","depression" etc. to see the rise and decline of the industrial revolution. (screenshot of result:

Exploring the genesis of words is fascinating too, whether it's seeing the rise of the word "diet" around 1750 or even seeing how often "google" was used as a word before the company existed, showing the appeal of the word itself even without a given meaning. Opening this tool to the public is a great step into allowing others to self-educate and make their own discoveries about human nature and history.

Note: searching single words is fascinating, but it becomes more useful when compared to similar words(add a "," between words to show multiple results on a single graph).

Aug. 10 2013 01:01 PM
eagleApex from Philly

Get it?
Google Ngram Viewer, Ngrams: 'Get' between 1700 and 2000 in English.

Aug. 09 2013 11:48 PM
Mary from Boston

I forgot to mention: try "biodiversity". Seems like a historical and timeless thing, right?

Yeah, not so much.

Aug. 09 2013 10:57 PM
Mary from Boston

I don't want to win anything, but I love big data and the opportunity to sift through it. It's like having a great expanse of beach and a metal detector. You might find something. Or you might just have an interesting walk on the beach.

One Ngram I enjoyed thinking about was a tip from someone else, so I can't take credit for it. Look at "lifespan" since 1900. Cool, eh? It's not just life, or lifetime. It seems more defined, doesn't it? Has our mindset changed? That's over my pay grade to say.

But I also like to think about technology changes--especially abrupt ones. Look at "diabetes, insulin" from 1900. That also led me to read about some of the pre-insulin days in the fist peak. What horror that was in wards full of young people dying in agony as their helpless families watched. And then it changed.

Look at smallpox from 1600. Then it begins to drop after eradication.

"Genome" is telling. Now I want to go and look at nuclear.

Anyway--like all other tech--big data could be used for good or for ill. But I think we end up net winners.

Aug. 09 2013 10:48 PM
Greg Leff from Arizona

It might be obvious, but "Awesome" has been around for some time and only really picked up between '75 and '76 peaking around '82. Then a slight drop and then a steady climb to the present day.

Intel, Apple, Microsoft - three tech pioneers. Apple has been around forever, but ticked up appreciably in 1974ish around the same time that Intel did. Microsoft came along around 1980. Microsoft had a sharp peak in 2000 and then as sharp a drop.

Aug. 09 2013 11:02 AM

