Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-WC326
NIL Corpus
This is an annotated Chinese chat language corpus, an informal language corpus built for informal language processing research. It covers 12,112 pieces of chat language text containing 92,314 words and 12,983 chat terms. Two chat terms were tagged: ones with anomalous morphological forms and ones with standard anomalous morphological forms. The data were produced between December 2004 and July 2005. The first 2000 pieces were annotated (text string, class, counterpart in standard language text, part of speech (POS) tag, segments if it is a phrase, POS tags for all segments, and its Chinese Pinyin (romanization).
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4