Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 261 to 280 (of 730 products) Result Pages: [<< Prev]  ... 11  12  13  14  15 ...  [Next >>] 

ELRA-U-W 0253
Persian Today Corpus 


This is a 1,000,000-word corpus of modern Persian, mostly written between 1994 and 2004.
Language(s) : Persian

Click here for
more information


ELRA-U-W 0254
Persian Text Corpus 


The Persian Text Corpus contains 10 million words. It has been hand-annotated.
Language(s) : Persian

Click here for
more information


ELRA-U-W 0255
Corpus of Early Ontario English (CONTE)


The Corpus of Early Ontario English covers the period from the earliest Ontarian English texts to the end of the 19th century. It contains approximately 225,000 words from diaries, newspapers, official letters, etc. (informal register as well as formal writing).

It is divided along temporal (periods of 25 years) and social criteria.
Language(s) : English

Click here for
more information


ELRA-U-W 0256
Corpus of American English 


The corpus of American English contains more than 360 million words, equally divided among spoken, fiction, popular magazines, newspapers, and academic texts.
It is POS tagged with CLAWS.
Language(s) : English (USA)

Click here for
more information


ELRA-U-W 0257
Welsh-English Aligned Corpus 


This aligned corpus of Welsh-English contains 510,813 aligned sentence pairs. Texts are taken from the proceedings of the National Assembly for Wales.
Language(s) : WelshEnglish

Click here for
more information


ELRA-U-W 0258
deWaC German Web Corpus 


The deWaC is a German 1.7 billion word corpus constructed from the Web (.de domain).
Language(s) : German

Click here for
more information


ELRA-U-W 0259
itWaC Italian Web Corpus 


The itWaC is an Italian 2 billion word corpus constructed from the Web (.it domain).
Language(s) : Italian

Click here for
more information


ELRA-U-W 0260
ukWaC English Web Corpus 


The ukWaC is an English 2 billion word corpus constructed from the Web (.uk domain).
Language(s) : English

Click here for
more information


ELRA-U-W 0261
frWaC French Web Corpus 


The frWaC is a French 1.6 billion word corpus constructed from the Web (.fr domain).
Language(s) : French

Click here for
more information


ELRA-U-W 0262
Spanish Web Corpus 


It is a Spanish corpus constructed from the Web (.es domain).
Language(s) : Spanish

Click here for
more information


ELRA-U-W 0263
NILC Corpus 


The NILC corpus is a 40 million word Brazilian Portuguese corpus. It is available in two forms: plain text and POS tagged version.
Language(s) : Portuguese (Brazil)

Click here for
more information


ELRA-U-W 0264
Brazilian CorpusDT 


The corpusDT is a corpus of scientific texts in Brazilian Portuguese. It consists of authentic theses and dissertations on Computer Science.
Language(s) : Portuguese (Brazil)

Click here for
more information


ELRA-U-W 0265
Brazilian Portuguese-English Parallel Corpora 


It is a bilingual Brazilian Portuguese-English corpora of parallel texts from different domains: scientific, law and journalistic.
It contains approximately 75,000 words.
Language(s) : Portuguese (Brazil)English

Click here for
more information


ELRA-U-W 0266
Brazilian CorpusGIS 


This is a corpus of grammatically inadequate sentences in Brazilian Portuguese.
Language(s) : Portuguese (Brazil)

Click here for
more information


ELRA-U-W 0267
RHETALHO 


RHETALHO is a corpus rhetorically annotated according to RST (Rhetorical Structure Theoy, Mann and Thompson, 1987). It is composed of 40 scientific and news texts.
Language(s) : Portuguese (Brazil)

Click here for
more information


ELRA-U-W 0268
Corpus of Verbal Response Mode Annotated Utterances 


This corpus is a pragmatically-annotated set of utterances. It contains 1,368 annotated utterances from 14 dialogues and several sets of isolated utterances. They are transcripts of spoken dialogues from various domains. Each utterance is annotated with two VRM categories that classify both its literal and pragmatic meaning.
Language(s) : English

Click here for
more information


ELRA-U-W 0269
MPQA Opinion Corpus 


This corpus contains 535 news articles and a total of 11,114 sentences. They have been manually annotated for opinions and sentiments (beliefs, emotions, sentiments, speculations, etc.).
Language(s) : English (USA)

Click here for
more information


ELRA-U-W 0270
USENET Corpus 


The USENET corpus is a collection of public USENET postings. It currently contains over 25 billion words and covers 47,860 English language non-binary-file news groups from October 2005 to January 2010. It is untagged and has been cleaned and anonymized.
Language(s) : English

Click here for
more information


ELRA-U-W 0271
Stockholm Multilingual Treebank (SMULTRON)


SMULTRON is a parallel treebank in English, German and Swedish. It contains around 1500 sentences that have been PoS-tagged and annotated with phrase structure trees.
It has been aligned at sentence, phrase and word levels.
Language(s) : GermanEnglish - EnglishSwedish - GermanSwedish

Click here for
more information


ELRA-U-W 0272
Parallel Corpus of Swedish, Danish and Norwegian Subtitles 


This parallel corpus consists of TV subtitles from soap operas, detective series, animation series, comedies, documentaries, feature films, etc.
This amounts to more than 14,000 subtitle files in each language, corresponding to more than 5 million subtitles (more than 50 million words).
Language(s) : SwedishDanish - SwedishNorwegian

Click here for
more information


Displaying 261 to 280 (of 730 products) Result Pages: [<< Prev]  ... 11  12  13  14  15 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4