You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0162
English-Vietnamese Corpus
This parallel corpus contains 5 million words of English and Vietnamese and is representative of various fields such as science, technology, daily conversation, etc.
It has been automatically word-aligned and POS-tagged. It includes the Susanne Corpus, a golden corpus manually annotated with lemma, POS tags, chunking tags, syntactic trees, etc. This corpus has been translated into Vietnamese by English teachers.
It has been compiled to train Vietnamese-related NLP tasks (segmentation, POS tagging, WSD, MT).
The quality of EVC is currently improved by manual correction of linguistic annotations.
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Bilingual
Language(s) :
EnglishVietnamese
Alignment :
Sentence
Annotation Coverage : Partial
Annotation Granularity : Word
Annotation level : Morphological
Annotation Mode : Semi automatic
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4