Professional Researcher's Encyclopaedia

Knowledge is only a click away

Corpus linguistics - enyclopaedia article

Corpus linguistics

Summary: Corpus Linguistics is the study of language as expressed in samples (corpora) or "real world" text. The approach runs counter to Noam Chomsky's view that real language is riddled with performance-related errors, thus requiring careful analysis of small speech samples obtained in a highly controlled laboratory setting. Corpus Linguistics does away with Chomsky's competence/performance split, viewing that we can only ever reliably analyse language if the researcher does not interfere. In some are ...

read the full Corpus linguistics article

Buy Corpus linguistics related products:


Buy from Amazon.co.uk Books - Music - Classical - VHS - DVD - Video-games - Software - Electronics - Toys
Buy from Amazon.com Books - Music - Classical - VHS - DVD - Videogames - Software - Electronics - Photo - Toys
Buy from Amazon.ca Books - Music - Classical - VHS - DVD - Video-games - Software - Livres en Français
Buy from Amazon.de - - - - - - -
Buy from Amazon.fr - - - - -
Advanced Product Search (new):    uk    |     us    |     ca    |     de    |     fr

Corpus linguistics

     From Wikipedia, the free encyclopedia.

Corpus Linguistics is the study of language as expressed in samples (corpora) or "real world" text. The approach runs counter to Noam Chomsky's view that real language is riddled with performance-related errors, thus requiring careful analysis of small speech samples obtained in a highly controlled laboratory setting. Corpus Linguistics does away with Chomsky's competence/performance split, viewing that we can only ever reliably analyse language if the researcher does not interfere.

In some areas there is an overlap with computational linguistics, as the latter moves towards language processing applications. This means dealing with real input data, where descriptions based on a linguist's intuition are not usually helpful.

The field was established in 1967 when Henry Kucera and Nelson Francis published their classic work Computational Analysis of Present-Day American English, on the basis of the Brown Corpus, a carefully compiled selection of current American English, totalling about a million words drawn from a wide variety of sources. Kucera and Francis subjected it to a variety of computational analyses, from which they compiled a rich and variegated opus, combining elements of linguistics, psychology, statistics, and sociology.

Shortly thereafter Boston publisher Houghton-Mifflin approached Kucera to supply a million word, three-line citation base for its new American Heritage Dictionary, the first dictionary to be compiled using corpus linguistics. The AHD made the innovative step of combining prescriptive elements (how language should be used) with descriptive information (how it actually is used).

Other publishers followed suit. The British publisher Collins' COBUILD dictionaries, designed for users learning English as a foreign language, were also compiled using corpus linguistics.

See further

External links

link to this article with the following HTML

 
This article is from Wikipedia. This article was up-to-date as of 8 May 2004 - See live article
All text is available under the terms of the GNU Free Documentation License.

This page is part of Professional Researcher
Web site design by Dean Marshall