|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 661 to 680 (of 730 products) |
Result Pages: 34 |
This is the first part of the KIAP Corpus, containing 180 different research articles from journals in the domains of Economics, Linguistics and Medicine.
Language(s) : English - French - Norwegian
|
|
|
|
This is the second part of the KIAP Corpus, containing 270 different research articles from journals in the domains of Economics, Linguistics and Medicine.
Language(s) : English - French - Norwegian
|
|
|
|
This is a small corpus of human annotations for sentences containing possible light verb constructions (that is support verb construction). It consists of 741 sentences.
Language(s) : English
|
|
|
|
The Croatian National Corpus (HNK) is a collection of selected texts mainly written in contemporary Croatian covering different media, genres, styles, fields and topics.
The HNK currently contains 101.3 million tokens.
Language(s) : Croatian
|
|
|
|
The SCOTS Corpus contains documents in Scottish Standard English, documents in several varieties of Scots. While Scottish Standard English has a standard written form, Scots does not. This means that the corpus contains a wide range of spelling variation. Currently, an Advanced Search System is offered so as to exploit the corpus extensive sociolinguistic metadata by allowing the user to build up a search profile specifying sociolinguistic or textual criteria. The latest version of the corpus includes 936 documents and a total of 2,524,431 words.
This corpus contains data in both Scottish Standard English and Scots.
Language(s) : English (Scotland)
|
|
|
|
It is intended to yield 100 million running words whish are collected from different sources in HTML and RTF formats. It is representative of different genres: 15 % fiction, 78 % newspapers and 7 % legal texts, government bulletins and others.
Language(s) : Bulgarian
|
|
|
|
This is a large aligned parallel corpus containing 1 billion words in 22 official EU languages (231 language pair combinations). It contains EU legislation, declarations, resolutions, acts, international agreements and documents on contents, principles and political objectives of the EU Treaties.
It is also manually classified according to EUROVOC subject domains.
Language(s) : Czech - Danish - Dutch - English - Estonian - German - Greek - Finnish - French - Hungarian - Italian - Latvian - Lithuanian - Maltese - Polish - Portuguese - Romanian - Slovak - Slovene - Spanish - Swedish - Bulgarian
|
|
|
|
This is a structucally and syntactically annotated corpus formed from the union of the SUSANNE and SemCor corpora. It contains 33 documents common to both corpora. It is part-of-speech tagged.
Language(s) : English
|
|
|
|
It contains 210 phishing emails produced in 2004 and 2005. The phishing is an activity consisting of fraudulently attempting to acquire information such as passwords or credit card details.
Language(s) : English
|
|
|
|
This is the transcriptions of 327 programs of "Asu-wo-yumo", a TV commentary program in which a commentator speaks for 10 minutes on a social issue. The corpus is segmented into speeches and has been syntactically annotated.
Language(s) : Japanese
|
|
|
|
This is a morphologically and syntactically annotated corpus of 40,000 sentences from a newspaper. 5,000 sentences are annotated with information of case, anaphora and coreference.
Language(s) : Japanese
|
|
|
|
This is an English SMS (Short Message Service) message corpus containing about 10,000 SMS messages collected by university students. It is in the XML format.
Language(s) : English
|
|
|
|
This journalistic corpus consists of 13 million words.
Language(s) : Spanish (Spain)
|
|
|
|
This is a lexical database consisting of sentences extracted from the electronic version of the newspaper El Periodico de Catalunya. It illustrates the semantic and syntactic behavior of the 250 more frequent Spanish verbs. The corpus comprises one million words, with 100 examples of each verb. 25,000 sentences have been semantically and syntactically annotated, that is to say 800,000 words, and about 400,000 words have been manually checked. It is presented in the XML format.
Language(s) : Spanish (Spain)
|
|
|
|
It contains 45 million words, more than 50,000 Portuguese texts from the 1300s to the 1900s.
Language(s) : Portuguese
|
|
|
|
The treebank contains syntactically annotated Dutch sentences, and more than 150,000 words. It includes newspaper (a part of the Eindhoven corpus).
Language(s) : Dutch
|
|
|
|
The database contains nearly 2000 texts, ranging from classic works of French literature to various kinds of non-fiction prose and technical writing. The 18th, 19th and 20th centuries are equally represented, with a smaller selection of 17th century texts as well as some medieval and Renaissance texts. It also inscludes a Provençal database consisting of 38 texts in their original spellings.
Language(s) : French
|
|
|
|
It contains about 160,000 clauses (1.5 m words) of Spanish with syntactic analysis (manually added), from the corpus ARTHUS (Archivo de Textos Hispánicos de la Universidad de Santiago). Composition: 66.5% written (narratives, essays and journalistic texts), 14.7% drama and 18.9% oral transcriptions.
Language(s) : Spanish
|
|
|
|
The corpus contains written contemporary texts in Spanish from Spain and from South America including 1,449,005 words and several types: essays, oral transcriptions, narratives and theatre.
Language(s) : Spanish
|
|
|
|
This is a bi-directional, parallel, translation-driven corpus, which will consist of about 4.6 million words, or 368 text samples of 10 to 15 thousand words each. It contains translations from English into Italian and translations from Italian into English, published between 1975 and 2000.
Language(s) : Italian - English
|
|
|
|
Displaying 661 to 680 (of 730 products) |
Result Pages: 34 |
|
|