ELRA - ELRA-U-W0288 : NICT Japanese-Chinese parallel corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W0288

NICT Japanese-Chinese parallel corpus

The NICT Japanese-Chinese parallel corpus is one parallel corpus of the NICT Multilingual Corpora.

It contains 38,383 sentence pairs collected in Japanese newspapers and manually translated into Chinese. It represents 947,066 Japanese words and 877,859 Chinese words, encoded in Unicode.

This corpus is aligned at word and phrase levels. Part-of-speech annotation and word segmentations follow the specification of Peking University. Morphological and syntactic structure has been annotated following the specification of the Corpus of Spontaneous Japanese.

Production

Creation date : 2005

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Parallel
Language(s) : Japanese <<< >>> Chinese
Alignment : Sentence
Annotation Coverage : Full
Annotation Granularity : Word#Sentence
Annotation level : Morphological
Lexical Unit Information : Phraseological unit
Annotation language : XML