You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0159
GyanNidhi Multilingual Parallel Corpus
GyanNidhi corpus contains parallel texts for English and eleven Indian languages: Hindi, Punjabi,Marathi, Bengali, Oriya, Gujarati, Telugu, Tamil, Kannada, Malayalam, Assamese (50,000 pages per language).
Source texts: National Book Trust India, Sahitya Akademi, Navjivan Publishing House, Publications Division, SABDA Pondicherry, Pustak Mahal.
Format: XML.
It can be used for: automatic dictionary extraction, creation of translation memories, example-based machine translation, language research study and analysis, language modeling.
This project is sponsored by TDIL, DIT, MC & IT and the Government of India.
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Multilingual
Language(s) :
English ; Assamese ; Kannada ; Hindi ; Panjabi, Punjabi ; Maharati ; Bengali ; Oriya ; Gujarati ; Telugu ; Tamil ; Malayalam
Alignment :
Sentence
Annotation language : XML
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4