ELRA - ELRA-U-W 0162 : English-Vietnamese Corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0162

English-Vietnamese Corpus

This parallel corpus contains 5 million words of English and Vietnamese and is representative of various fields such as science, technology, daily conversation, etc.
It has been automatically word-aligned and POS-tagged. It includes the Susanne Corpus, a golden corpus manually annotated with lemma, POS tags, chunking tags, syntactic trees, etc. This corpus has been translated into Vietnamese by English teachers.

It has been compiled to train Vietnamese-related NLP tasks (segmentation, POS tagging, WSD, MT).

The quality of EVC is currently improved by manual correction of linguistic annotations.

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Bilingual
Language(s) : EnglishVietnamese
Alignment : Sentence
Annotation Coverage : Partial
Annotation Granularity : Word
Annotation level : Morphological
Annotation Mode : Semi automatic