It Turns Out No Is More Common Than Yes

Google crunches the numbers for us.

“Etaoin srhldcu” may read like nonsense to most English speakers upon first blush, but as it turns out, the combination is quite significant. It represents, in order, the most used letters in the English language, according to a new survey of 743 billion words conducted by Google’s head of research Peter Norvig.

The survey, which was publicized by Google Research on Monday, was an update to the seminal 1965 survey of some 20,000 words gathered from a variety of printed sources — books, magazines, newspapers — conducted by Mark Mayzner, a former Bell Labs researcher.

Mayzner’s survey involved a lengthy and painstaking process of identifying each word occurrence and transferring it over to Hollerith (IBM) punch cards and running them through a sorter.

Mayzner recently contacted Google’s Norvig via email to see if Norvig was interesting in repeating the experiment using Google’s much more voluminous English language database — the entire Google Books collection of scanned English volumes. Norvig accepted the challenge. Using the Google Books Ngram viewer (which shows word popularity over time), Norvig created a new dataset of some 97,565 unique words, collectively repeated 743.8 billion times, which he noted on his blog is 37 million more occurrences than the 20,000-word sample that Mayzner assembled. Norvig’s sample also included over 3 trillion individual letters.


Recent Posts



Criticism Isn't Free


CR is dedicated to thoughtful, in-depth criticism without regard to what's commercially appealing. It takes tens of hours each month to provide this. Please help make this sort of writing sustainable, either with a subscription or a one-time donation. Thank you!





Got Something To Say:

Your email address will not be published. Required fields are marked *

*

THE SURRENDER

The Surrender is Scott Esposito’s “collection of facts” concerning his lifelong desire to be a woman.


LADY CHATTERLEY'S BROTHER

Two long essays of 10,000 words each on sex in—and out of—literature . . .

The first essay dives in to Nicholson Baker’s “sex trilogy,” explaining just what Baker is up to here and why these books ultimately fail to be as sexy as Baker might wish.

From there the book moves on to the second essay, which explains just why Spaniard Javier Marías does right what Baker does wrong . . .


THE LATIN AMERICAN MIXTAPE

5 essays. 2 interviews.

All in all, over 25,000 words of Latin American literary goodness.

3 never-before-published essays, including “The Digression”—a 4,000-word piece on the most important digression in César Aira’s career.

Shop though these links = Support this site

Copyright © 2016. Powered by WordPress & Romangie Theme.