26 de mai. de 2012

425 million words - The largest freely-avaliable corpus of English

The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. It was created at Brigham Young University in 2008, and it is now used by tens of thousands of users every month (linguists, teachers, translators, and other researchers). COCA is also related to other large corpora that we have created or modified
The corpus contains more than 425 million words of text and is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts. It includes 20 million words each year from 1990-2011 and the corpus is also updated once or twice a year (the most recent texts are from March 2011). Because of its design, it is perhaps the only corpus of English that is suitable for looking at current, ongoing changes in the language (see the 2011 article in Literary and Linguistic Computing). 



Nenhum comentário:

Postar um comentário

What's your opinion?

Related Posts Plugin for WordPress, Blogger...