You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W0361
Khmer Tagged Corpus
This is a written corpus which includes both official and daily speaking language (texts collected from the LICADHO’s report and from the Khmer Rouge Trial website).
It contains 73,206 words semi-automatically POS-tagged (from which 20,414 were manually tagged). The aim is to achieve 150,000 words.
This corpus is still under development.
Identification
Period of coverage :
Version :
2008
Version history :
Production
Project :
Pan Localization
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Khmer
Document source :
Internet
Number of tokens :
73,206
Annotation Granularity : Word
Annotation level : Morphological
Annotation Mode : Semi automatic
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4