Google as corpus?

Feeds:: Posts; Comments

Google as corpus?

March 2, 2005 by Mark Warschauer

Corpus-based research — either for deep insight into language or to try to answer practical questions of language use — suffers from several difficulties. First, the corpora of authentic language use that have been assembled are usually not big enough. Second, they are usually biased in certain ways (for example, containing mostly examples of academic or journalistic text rather than texts from other contexts and sources). In addition, corpora are also often clumsy to access.

My guess is that many people are starting to use Google as a huge corpus, mostly for practical information but occasionally for more academic pursuits. I personally use Google as a corpus several times a week, when I try to figure out, for example, when I try to find out whether Chanukah or Hanukkah is the more common spelling (as it turns out, it’s the latter, by more than a three to one margin).

Though this post in more than a year old, it includes some very interesting commentary on the dangers of using Google as a corpus. Follow the links in the article for some further discussion of related points. The bottom line is there is a lot of junk on Google that can seriously distort results. I doubt if the distortion would seriously affect my own research on the spelling of the Jewish holiday, but more serious usage inquiries could be thrown a curve.

Posted in general | 1 Comment

One Response

on March 4, 2005 at 8:55 am Laura Little

Have you seen the discussion on the Association for Internet Researchers (air-l) list? They recently (as in just this week!) had a discussion about a very similar topic. The URL is http://www.aoir.org. There is a link to the archives here.
Hope this helps,
Laura

Comments RSS

Design a site like this with WordPress.com

Get started

Papyrus News

… on digital learning and literacy

Google as corpus?

One Response

Leave a comment

Archives

Papyrus News

… on digital learning and literacy

Google as corpus?

Share this:

Related

One Response

Leave a comment

Archives