Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 661 to 680 (of 730 products) Result Pages: [<< Prev]  ... 31  32  33  34  35 ...  [Next >>] 

ELRA-WC339
KIAP Corpus 1 


This is the first part of the KIAP Corpus, containing 180 different research articles from journals in the domains of Economics, Linguistics and Medicine.
Language(s) : English - French - Norwegian

Click here for
more information


ELRA-WC340
KIAP Corpus 2 


This is the second part of the KIAP Corpus, containing 270 different research articles from journals in the domains of Economics, Linguistics and Medicine.
Language(s) : English - French - Norwegian

Click here for
more information


ELRA-WC341
Light-verb construction (LVC) Corpus 


This is a small corpus of human annotations for sentences containing possible light verb constructions (that is support verb construction). It consists of 741 sentences.
Language(s) : English

Click here for
more information


ELRA-WC342
Croatian National Corpus (HNK)


The Croatian National Corpus (HNK) is a collection of selected texts mainly written in contemporary Croatian covering different media, genres, styles, fields and topics.
The HNK currently contains 101.3 million tokens.
Language(s) : Croatian

Click here for
more information


ELRA-WC343
Scottish Corpus of Texts and Speech (SCOTS) 


The SCOTS Corpus contains documents in Scottish Standard English, documents in several varieties of Scots. While Scottish Standard English has a standard written form, Scots does not. This means that the corpus contains a wide range of spelling variation. Currently, an Advanced Search System is offered so as to exploit the corpus extensive sociolinguistic metadata by allowing the user to build up a search profile specifying sociolinguistic or textual criteria. The latest version of the corpus includes 936 documents and a total of 2,524,431 words.
This corpus contains data in both Scottish Standard English and Scots.
Language(s) : English (Scotland)

Click here for
more information


ELRA-WC344
The Bulgarian Corpus 


It is intended to yield 100 million running words whish are collected from different sources in HTML and RTF formats. It is representative of different genres: 15 % fiction, 78 % newspapers and 7 % legal texts, government bulletins and others.
Language(s) : Bulgarian

Click here for
more information


ELRA-WC345
Acquis Communautaire Corpus (Acquis)


This is a large aligned parallel corpus containing 1 billion words in 22 official EU languages (231 language pair combinations). It contains EU legislation, declarations, resolutions, acts, international agreements and documents on contents, principles and political objectives of the EU Treaties.
It is also manually classified according to EUROVOC subject domains.
Language(s) : Czech - Danish - Dutch - English - Estonian - German - Greek - Finnish - French - Hungarian - Italian - Latvian - Lithuanian - Maltese - Polish - Portuguese - Romanian - Slovak - Slovene - Spanish - Swedish - Bulgarian

Click here for
more information


ELRA-WC346
SEMiSUSANNE Corpus 


This is a structucally and syntactically annotated corpus formed from the union of the SUSANNE and SemCor corpora. It contains 33 documents common to both corpora. It is part-of-speech tagged.
Language(s) : English

Click here for
more information


ELRA-WC347
Phishing Email Corpus 


It contains 210 phishing emails produced in 2004 and 2005. The phishing is an activity consisting of fraudulently attempting to acquire information such as passwords or credit card details.
Language(s) : English

Click here for
more information


ELRA-WC348
Asu-wo-yumo Monologue Corpus 


This is the transcriptions of 327 programs of "Asu-wo-yumo", a TV commentary program in which a commentator speaks for 10 minutes on a social issue. The corpus is segmented into speeches and has been syntactically annotated.
Language(s) : Japanese

Click here for
more information


ELRA-WC349
Kyoto Text Corpus 


This is a morphologically and syntactically annotated corpus of 40,000 sentences from a newspaper. 5,000 sentences are annotated with information of case, anaphora and coreference.
Language(s) : Japanese

Click here for
more information


ELRA-WC350
NUS SMS Corpus 


This is an English SMS (Short Message Service) message corpus containing about 10,000 SMS messages collected by university students. It is in the XML format.
Language(s) : English

Click here for
more information


ELRA-WC352
Electronic Corpus of El Periodico de Catalunya 


This journalistic corpus consists of 13 million words.
Language(s) : Spanish (Spain)

Click here for
more information


ELRA-WC353
Sensem Corpus 


This is a lexical database consisting of sentences extracted from the electronic version of the newspaper El Periodico de Catalunya. It illustrates the semantic and syntactic behavior of the 250 more frequent Spanish verbs. The corpus comprises one million words, with 100 examples of each verb. 25,000 sentences have been semantically and syntactically annotated, that is to say 800,000 words, and about 400,000 words have been manually checked. It is presented in the XML format.
Language(s) : Spanish (Spain)

Click here for
more information


ELRA-WC354
Corpus do Português 


It contains 45 million words, more than 50,000 Portuguese texts from the 1300s to the 1900s.
Language(s) : Portuguese

Click here for
more information


ELRA-WC355
Alpino Treebank 


The treebank contains syntactically annotated Dutch sentences, and more than 150,000 words. It includes newspaper (a part of the Eindhoven corpus).
Language(s) : Dutch

Click here for
more information


ELRA-WC356
ARTFL Textual Database 


The database contains nearly 2000 texts, ranging from classic works of French literature to various kinds of non-fiction prose and technical writing. The 18th, 19th and 20th centuries are equally represented, with a smaller selection of 17th century texts as well as some medieval and Renaissance texts. It also inscludes a Provençal database consisting of 38 texts in their original spellings.
Language(s) : French

Click here for
more information


ELRA-WC357
Syntactical Database of Current Spanish (Base de Datos Sintácticos del español actual) 


It contains about 160,000 clauses (1.5 m words) of Spanish with syntactic analysis (manually added), from the corpus ARTHUS (Archivo de Textos Hispánicos de la Universidad de Santiago). Composition: 66.5% written (narratives, essays and journalistic texts), 14.7% drama and 18.9% oral transcriptions.
Language(s) : Spanish

Click here for
more information


ELRA-WC358
ARHTUS (Archivo de textos hispánicos de la Universidad de Santiago) 


The corpus contains written contemporary texts in Spanish from Spain and from South America including 1,449,005 words and several types: essays, oral transcriptions, narratives and theatre.
Language(s) : Spanish

Click here for
more information


ELRA-WC359
CEXI (English Italian Translational Corpus) 


This is a bi-directional, parallel, translation-driven corpus, which will consist of about 4.6 million words, or 368 text samples of 10 to 15 thousand words each. It contains translations from English into Italian and translations from Italian into English, published between 1975 and 2000.
Language(s) : Italian - English

Click here for
more information


Displaying 661 to 680 (of 730 products) Result Pages: [<< Prev]  ... 31  32  33  34  35 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4