nancylebov

A whole lot of books.....

Ngrams are a great toy, but I can't help wondering about possible systemic biases in which books form the initial pool.

I'm guessing that a book is more likely to end up in googlebooks if modern people like it. If old, it was more likely to be found by a modern person if it was popular.

It's not modern people in general, though, or even literate modern people. There's probably a bias towards geeky modern people.

There will be a bias towards books that aren't in copyright.

Quality of OCR is a factor, and I have no idea how that plays out. Might some typefaces be problematic? If so, there could be cultural/temporal effects.

Each book presumably only shows up once, which means that the effects of popularity only show up indirectly.

Anything else?

