Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W 0023
The American National Corpus
In its second release, the ANC contains 22 million words of written American English (10 million more than the 1st release), collected from 1990 onwards. It covers many different genres (from fiction to medical articles) and also contains transcripts of spoken data like telephone conversations.

It is still growing and will amount to 100 million words when completed, being then comparable to the BNC (in size and variety of genres). This resource is of great value for education, linguistic and lexicographic research, and also NLP applications (machine translation, information retrieval, etc.).

Annotation of the corpus concerns lemmas, parts of speech, noun chunks and verb chunks. The tagset used for POS annotation is the Penn tagset (many documents have also been pos-tagged with the Biber tagset).

Tools for processing files with stand-off annotations have also been developed.
Identification
Period of coverage : from 1990 onwards
Version : v2, 2005
Version history :
Production
Project : The American National Corpus project Creation date : 2005
Applications
Applications possible : Discourse analysis#Information retrieval
application Area : Education#Research
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4