ELRA - ELRA-U-W 0064 : Chinese-English Parallel Corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0064

Chinese-English Parallel Corpus

This corpus, which is still under construction, is a Chinese-English parallel corpus that will amount to 17 million words in each language when completed. The compilation started in 2001.
It gathers texts from various genres: newspapers, technical articles, literature, movie transcription, etc.

The corpus contains global attributes (domain, written/spoken, author, time period, etc.), segmentation and POS tagging. Sentence alignment has been performed automatically (manual checking).

It is an important resource for cross-language information processing and could be used to support Machine Aided Translation, bilingual dictionary compilation, contrastive studies, teaching, etc.

Applications


application Area : Education#Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Bilingual
Language(s) : Chinese (China)English (United Kingdom)
Alignment : Sentence
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Morphological
Annotation Mode : Automatic
Annotation Scheme : TEI
Annotation language : XML