You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-WC326
NIL Corpus
This is an annotated Chinese chat language corpus, an informal language corpus built for informal language processing research. It covers 12,112 pieces of chat language text containing 92,314 words and 12,983 chat terms. Two chat terms were tagged: ones with anomalous morphological forms and ones with standard anomalous morphological forms. The data were produced between December 2004 and July 2005. The first 2000 pieces were annotated (text string, class, counterpart in standard language text, part of speech (POS) tag, segments if it is a phrase, POS tags for all segments, and its Chinese Pinyin (romanization).
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Chinese
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4