Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 201 to 220 (of 730 products) Result Pages: [<< Prev]  ... 11  12  13  14  15 ...  [Next >>] 

ELRA-U-W 0193
Repentino 


Repentino is composed of textual named entity instances (set of proper nouns denoting a specific entity classified as to which kind of entity they denote: company, book title, place name, etc.). Currently, Repentino gathers more than 450,000 instances (in XML).
Language(s) : Portuguese

Click here for
more information


ELRA-U-W 0194
ELAN corpus of European Portuguese 


The ELAN corpus is a subcorpus of the CRPC (Corpus de Referencia do Portugues Contemporaneo). It contains 2,840,552 words.
Language(s) : Portuguese

Click here for
more information


ELRA-U-W 0195
RL corpus of European Portuguese 


The RL corpus is a subcorpus of the CRPC (Corpus de Referencia do Portugues Contemporaneo). It contains 8,670,438 words.
Language(s) : Portuguese

Click here for
more information


ELRA-U-W 0196
AFRICA Corpus of Portuguese 


This resource is a subcorpus of the CRPC (Corpus de Referencia do Portugues Contemporaneo). It contains 3,070,879 words of written data and 129,245 words of spoken data. It is representative of African Portuguese (Angola, Sao Tome and Principe, Mozambique, Guinea, Cape Verde).
Language(s) : Portuguese

Click here for
more information


ELRA-U-W 0197
TeMário Corpus of Portuguese 


This corpus contains 100 news texts extracted from Folha de São Paulo and Jornal do Brasil, for a total of 61,412 words. They come with manually written summaries and ideal extracts.
Language(s) : Portuguese

Click here for
more information


ELRA-U-W 0198
Lacio-Ref Corpus 


The Lacio-Ref is a reference corpus of newspaper articles in Brazilian Portuguese. It was developed in the Lacio-Web Project.
Language(s) : Portuguese (Brazil)

Click here for
more information


ELRA-U-W 0199
Mac-Morpho 


Mac-Morpho is a 1,1 million word gold standard corpus (portion of the Lacio-Ref) which is morpho-syntactically annotated (PALAVRAS, E. Bick) and manually validated. It was developed in the Lacio-Web Project.
Language(s) : Portuguese (Brazil)

Click here for
more information


ELRA-U-W 0200
Annotated Part of the Lacio-Ref 


This resource is a portion of the Lacio-Ref which is automatically annotated with lemmas, POS and syntactic tags (Curupira parser). It was developed in the Lacio-Web Project.
Language(s) : Portuguese (Brazil)

Click here for
more information


ELRA-U-W 0201
Lacio-Dev Corpus 


The Lacio-Dev is a deviation corpus composed of non-revised texts (516,840 tokens). It was developed in the Lacio-Web Project.
Language(s) : Portuguese (Brazil)

Click here for
more information


ELRA-U-W 0202
Portuguese English Parallel Corpus (Par-C)


Par-C is a Portuguese-English parallel corpus. It was developed in the Lacio-Web Project.
Language(s) : Portuguese (Brazil)English

Click here for
more information


ELRA-U-W 0203
Portuguese English Comparable Corpus (Comp_C)


Comp_C is a Portuguese-English comparable corpus (300,000 words in each language). It was developed in the Lacio-Web Project.
Language(s) : Portuguese (Brazil)

Click here for
more information


ELRA-U-W 0204
Nexing Corpus 


The Nexing corpus is a collection of written transcriptions of verbal data (around 30 hours of audio recordings) elicited during psycholinguistic experiment on syllogistic reasoning.
Language(s) : Portuguese

Click here for
more information


ELRA-U-W 0205
COMET Multilingual Corpora 


This is a multilingual corpus that comprises three subparts:

- CorTec: Technical and Scientific corpus (Brazilian Portuguese, English).
- CoMAprend: Multilingual Learner corpus (Brazilian Portuguese, English, German, French, Spanish, Italian).
- CorTrad: Translation corpus (Brazilian Portuguese, English).

It is designed for contrastive linguistic studies (translation, terminology, teaching etc.).
Language(s) : Portuguese (Brazil) - English - German - Spanish - French - Italian

Click here for
more information


ELRA-U-W 0206
LIVAC Synchronous Corpus 


The LIVAC contains texts from Chinese newspapers and electronic media of Hong Kong, Taiwan, Beijing, Shanghai, Macau and Singapore. The materials from the diverse communities have been synchronized. In 2005 the corpus was composed of over 150 million Chinese characters and over 720,000 word types. It is still expanding.
The analysis concerns various linguistic units (characters, words, sentences).
Language(s) : Chinese

Click here for
more information


ELRA-U-W 0207
PH Corpus of Chinese 


Guo Jim’s Mandarin Chinese PH corpus is a collection of Chinese newswire texts containing around two million words. Those texts were published by the Xinhua News Agency during 1990-1991.
Language(s) : Chinese

Click here for
more information


ELRA-U-W 0208
PFR Corpus of Chinese 


The PFR corpus consists of one month's newspaper material published by the People's Daily (January 1998).
Language(s) : Chinese

Click here for
more information


ELRA-U-W 0209
Chinese Internet Corpus 


The Chinese Internet Corpus contains 280 million words (tokens). This corpus has been compiled automatically from the Internet in February 2005 along with other Internet corpora (for English, German and Russian).
Language(s) : Chinese

Click here for
more information


ELRA-U-W 0210
Reader's Digest Corpus (Czech/English) 


The Reader's Digest corpus is a parallel text of articles from Reader's Digest (1993-1996). The Czech part is translation of the English one.
It contains 53,117 parallel sentences.
Language(s) : English (United Kingdom)Czech

Click here for
more information


ELRA-U-W 0211
Czech Academic Corpus (CAC)


The CAC is a Czech corpus with a manual annotation of morphology, consisting of approximately 650,000 words (it was originally called Corpus of the Pragmatic Style). It is composed of articles from a wide range of media (newspapers, magazines, and transcripts of the spoken language from radio and TV programs).
Language(s) : Czech

Click here for
more information


ELRA-U-W 0212
Mannheimer Corpus 


The Mannheimer Corpus contains 2.53 million word. It is divided into two subcorpora: the Mannheimer Korpus 1 (293 texts from 1950 to 1967) and the Mannheimer Korpus 2 (52 texts from 1949 to 1974). It covers a wide variety of sources.
Language(s) : German

Click here for
more information


Displaying 201 to 220 (of 730 products) Result Pages: [<< Prev]  ... 11  12  13  14  15 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4