Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W0295
Nepali Written Corpus
The Nepali Written Corpus is a part of the Nepali National Corpus (NNC).

This is a monolingual corpus of 15 million words containing texts from various books, magazines, newspapers and from Internet websites. It is segmented and POS tagged (using a 112 parts-of-speech tagset developed empirically to annotate the first part of the corpus, the Core Sample).

It is divided into two parts :
- the Core Sample : includes 398 texts (804,574 words) collected from books, journals, magazines and newspapers between 1990 and 1992. It covers a large variety of genres : press reportage, press editorial, press review, religion, skills, trades and hobbies, biographies, essays, science, fiction. This part was developed following the FLOB and FROWN framework for collecting text corpus.
- the General Corpus : includes digitized written texts (14 million words) collected mainly from Internet websites, newspapers, books, publishers and authors between 2005 and 2006.

It is available in the ELRA catalogue http://catalog.elra.info under the reference ELRA-W0076.

In addition to this corpus, the NNC contains :
- a parallel corpus (see Nepali-English Parallel Corpus, ref. U-W0296).
- a spoken corpus (see Nepali Spoken Corpus, ref. U-S0205).
- a text-to-speech corpus (see Nepali Text-to-Speech Corpus, ref. U-S0206).
Production
Project : NeLRaLEC
Applications
application Area : Research
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4