Ngrams are a great toy, but I can't help wondering about possible systemic biases in which books form the initial pool.
I'm guessing that a book is more likely to end up in googlebooks if modern people like it. If old, it was more likely to be found by a modern person if it was popular.
It's not modern people in general, though, or even literate modern people. There's probably a bias towards geeky modern people.
There will be a bias towards books that aren't in copyright.
Quality of OCR is a factor, and I have no idea how that plays out. Might some typefaces be problematic? If so, there could be cultural/temporal effects.
Each book presumably only shows up once, which means that the effects of popularity only show up indirectly.
This entry was posted at http://nancylebov.dreamwidth.org/450669.html. Comments are welcome here or there. comments so far on that entry.