December 19th, 2010

green leaves

Playing with ngrams

the past vs. the future-- Something is going on here, but I don't know what.

hope vs. fear-- Possibly evidence of a cliche, but I wonder why there was so much more hope (or possibly "hope) from 1550 to 1650.

crisis (overview)-- looks as though people are getting steadily more worried. Did things become more drastic or was it just a change in word choice?

crisis (focusing in)-- Looks like something big happened around 1648, but I don't know what-- and it seems to be during the hope period.

the obesity crisis-- maybe it's peaked. Anyone have info on the arc taken by moral panics?

I would rather have inserted the graphs, but they either didn't show up, or only the link appeared.

This entry was posted at http://nancylebov.dreamwidth.org/450434.html. Comments are welcome here or there. comment count unavailable comments so far on that entry.
green leaves

A whole lot of books.....

Ngrams are a great toy, but I can't help wondering about possible systemic biases in which books form the initial pool.

I'm guessing that a book is more likely to end up in googlebooks if modern people like it. If old, it was more likely to be found by a modern person if it was popular.

It's not modern people in general, though, or even literate modern people. There's probably a bias towards geeky modern people.

There will be a bias towards books that aren't in copyright.

Quality of OCR is a factor, and I have no idea how that plays out. Might some typefaces be problematic? If so, there could be cultural/temporal effects.

Each book presumably only shows up once, which means that the effects of popularity only show up indirectly.

Anything else?

This entry was posted at http://nancylebov.dreamwidth.org/450669.html. Comments are welcome here or there. comment count unavailable comments so far on that entry.
green leaves

You actually do need to know things.....

I've long been annoyed with "You don't need to know things, you just need to know where to look them up".

It's a half truth, and I think of google as the larger part of my brain, but I also think you need to know a lot of specific stuff to know what things mean rather than just repeating the common opinion, and sometimes the specific thing you need to know is sufficiently weird that I'm not sure what you'd need to know to look it up.

It turns out that in ngrams, if you see a sudden rise around 1800 of a word containing s, it may be because the long s (which mercifully looks like an f) went out of fashion.

I need to come up with a button slogan about the world being made of details. And maybe something about chaos, because you never know when a detail will affect or be connected to something unlikely.

Only vaguely related, but I find it satisfying that it gets colder on clear nights because more heat leaks out into space-- there isn't usually such a direct and obvious connection between something which can be felt and the larger universe.

There used to be a magazine called Lingua Franca which reported on various aspects of academe, and I miss it tremendously. One of the articles was "Atlas Shrugged", about the ability to perform searches on vast amounts of geographic data combined with uncertainty about how carefully the data was collected.

See this from thnidu about just how low quality the ngram hits can be from early books.

This entry was posted at http://nancylebov.dreamwidth.org/451237.html. Comments are welcome here or there. comment count unavailable comments so far on that entry.