Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 361 to 380 (of 730 products) Result Pages: [<< Prev]  ... 16  17  18  19  20 ...  [Next >>] 

ELRA-U-W0354
NoWac Norwegian web corpus 


The NoWaC is a Norwegian 700 million word corpus constructed from the Web (.no domain).
Language(s) : Norwegian

Click here for
more information


ELRA-U-W0355
The FidaPLUS corpus 


This is an extensive collection of texts published between 1990 and 2006, which represents a balanced sample of texts in Slovenian. The FidaPLUS corpus extends the FIDA corpus to 600 million words.
Language(s) : Slovenian (Slovenia)

Click here for
more information


ELRA-U-W0356
English-Lao Parallel corpus 


This is a parallel corpus of 3,110 English sentences from the Penn Treebank Corpus manually translated into Lao.
Language(s) : English >>>> Lao

Click here for
more information


ELRA-U-W0357
Indonesian - English Parallel Corpus (PANL-BPPT)


This is a parallel corpus of 1 million words in English and Bahasa Indonesian.
Language(s) : English <<< >>> Indonesian

Click here for
more information


ELRA-U-W0358
ANTARA Corpus 


This corpus contains 250,000 sentences aligned in English and Indonesian (about 2.5 million words) from articles published between 2000 and 2007 through the ANTARA News Agency.
Language(s) : English <<< >>> Indonesian

Click here for
more information


ELRA-U-W0359
Bangla News Corpus 


This is a corpus of news in Bangla (or Bengali). It is also called the Prothom-Alo corpus
Language(s) : Bengali

Click here for
more information


ELRA-U-W0360
English-Sinhala Parallel and Aligned Tagged Corpus 


This is a corpus of 100,000 words in Sinhala which have been translated into English. Annotation includes POS tags and it is aligned at the sentence level.
Language(s) : Sinhalese >>>> English

Click here for
more information


ELRA-U-W0361
Khmer Tagged Corpus 


This is a written corpus which contains both official and daily speaking language.
Language(s) : Khmer

Click here for
more information


ELRA-U-W0362
BTEC-ATR Parallel Corpus English - Indonesian 


It consists of sentences translated into Indonesian from the English part of the BTEC Corpus. Annotation includes POS tagging, syllabification and word-stress tagging in the XML-format.
Language(s) : English >>>> Indonesian

Click here for
more information


ELRA-U-W0363
JOS Morphosyntactically Tagged Corpora of Slovene 


It contains sampled paragraphs of the Slovene reference corpus, the FidaPLUS (see U-W0355), which is morphosyntactically tagged with context disambiguated MSDs and lemmas. This selection have been converted from SGML to XML following the TEI P4 guidelines and the former annotation tagset have been updated.

It consists of two corpora: the jos100k corpus and the jos1M.
Language(s) : Slovenian

Click here for
more information


ELRA-U-W0364
Nova beseda 


This is a wide collection of 4,158 Slovenian texts from various categories: newspapers, magazines, formal speech, fiction, non-fiction, scientific and technical texts. It contains about 162 million words, marked at the sentence level.
Language(s) : Slovenian

Click here for
more information


ELRA-U-W0365
Power Shift text corpus 


This is a corpus of e-mail messages about business or private. It was collected from 10-39 years old men and women with specified mobile phones or PCs through simulation.
Language(s) : Japanese

Click here for
more information


ELRA-U-W0366
POMDAF Corpus 


This corpus contains 40,000 sentences. It consists of triplets of the English original, draft Japanese translation and final Japanese translation of books and online articles.
Language(s) : English - Japanese

Click here for
more information


ELRA-U-W0367
REBECA 


This is a parallel corpus of more than 3 million words containing Dutch texts and their French translation aligned at sentence level.

It is under construction.
Language(s) : Dutch >>>> French

Click here for
more information


ELRA-U-W0368
LORCA Corpus 


This is a corpus of 1 million word containing the complete work of the Spanish poet Federico García Lorca. The corpus is tokenized, pos-tagged and lemmatized.
Language(s) : Spanish

Click here for
more information


ELRA-U-W0369
SUBTLEX-NL 


This is a subtitle corpus for Dutch. It contains 44 million words from 8,443 films and television series.
Language(s) : Dutch

Click here for
more information


ELRA-U-W0370
SUBTLEX-CH 


This is a subtitle corpus for Chinese. It contains 33.5 million words (46.8 million characters) from 6,243 different contexts (7,148 files).
Language(s) : Chinese

Click here for
more information


ELRA-U-W0371
Corpus of Clinical Data 


This is a large corpus of discharged letters collected from a medical system used in a hospital in Sweden.
Language(s) : Swedish

Click here for
more information


ELRA-U-W0372
FDV-IJS monolingual corpus 


The FDV-IJS is a Slovene monolingual corpus which contains over 5.5 million words.
Language(s) : Slovenian (Slovenia)

Click here for
more information


ELRA-U-W0373
VoiceTRAN application-specific corpus 


This is a restricted-domain corpus of Slovene-English parallel texts from the Slovenian Ministry of Defense.
Language(s) : Slovenian <<< >>> English

Click here for
more information


Displaying 361 to 380 (of 730 products) Result Pages: [<< Prev]  ... 16  17  18  19  20 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4