You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0064
Chinese-English Parallel Corpus
This corpus, which is still under construction, is a Chinese-English parallel corpus that will amount to 17 million words in each language when completed. The compilation started in 2001.
It gathers texts from various genres: newspapers, technical articles, literature, movie transcription, etc.
The corpus contains global attributes (domain, written/spoken, author, time period, etc.), segmentation and POS tagging. Sentence alignment has been performed automatically (manual checking).
It is an important resource for cross-language information processing and could be used to support Machine Aided Translation, bilingual dictionary compilation, contrastive studies, teaching, etc.
Applications
application Area :
Education#Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Bilingual
Language(s) :
Chinese (China)English (United Kingdom)
Alignment :
Sentence
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Morphological
Annotation Mode : Automatic
Annotation Scheme : TEI
Annotation language : XML
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4