|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 261 to 280 (of 730 products) |
Result Pages: 14 |
This is a 1,000,000-word corpus of modern Persian, mostly written between 1994 and 2004.
Language(s) : Persian
|
|
|
|
The Persian Text Corpus contains 10 million words. It has been hand-annotated.
Language(s) : Persian
|
|
|
|
The Corpus of Early Ontario English covers the period from the earliest Ontarian English texts to the end of the 19th century. It contains approximately 225,000 words from diaries, newspapers, official letters, etc. (informal register as well as formal writing).
It is divided along temporal (periods of 25 years) and social criteria.
Language(s) : English
|
|
|
|
The corpus of American English contains more than 360 million words, equally divided among spoken, fiction, popular magazines, newspapers, and academic texts.
It is POS tagged with CLAWS.
Language(s) : English (USA)
|
|
|
|
This aligned corpus of Welsh-English contains 510,813 aligned sentence pairs. Texts are taken from the proceedings of the National Assembly for Wales.
Language(s) : WelshEnglish
|
|
|
|
The deWaC is a German 1.7 billion word corpus constructed from the Web (.de domain).
Language(s) : German
|
|
|
|
The itWaC is an Italian 2 billion word corpus constructed from the Web (.it domain).
Language(s) : Italian
|
|
|
|
The ukWaC is an English 2 billion word corpus constructed from the Web (.uk domain).
Language(s) : English
|
|
|
|
The frWaC is a French 1.6 billion word corpus constructed from the Web (.fr domain).
Language(s) : French
|
|
|
|
It is a Spanish corpus constructed from the Web (.es domain).
Language(s) : Spanish
|
|
|
|
The NILC corpus is a 40 million word Brazilian Portuguese corpus. It is available in two forms: plain text and POS tagged version.
Language(s) : Portuguese (Brazil)
|
|
|
|
The corpusDT is a corpus of scientific texts in Brazilian Portuguese. It consists of authentic theses and dissertations on Computer Science.
Language(s) : Portuguese (Brazil)
|
|
|
|
It is a bilingual Brazilian Portuguese-English corpora of parallel texts from different domains: scientific, law and journalistic.
It contains approximately 75,000 words.
Language(s) : Portuguese (Brazil)English
|
|
|
|
This is a corpus of grammatically inadequate sentences in Brazilian Portuguese.
Language(s) : Portuguese (Brazil)
|
|
|
|
RHETALHO is a corpus rhetorically annotated according to RST (Rhetorical Structure Theoy, Mann and Thompson, 1987). It is composed of 40 scientific and news texts.
Language(s) : Portuguese (Brazil)
|
|
|
|
This corpus is a pragmatically-annotated set of utterances. It contains 1,368 annotated utterances from 14 dialogues and several sets of isolated utterances. They are transcripts of spoken dialogues from various domains. Each utterance is annotated with two VRM categories that classify both its literal and pragmatic meaning.
Language(s) : English
|
|
|
|
This corpus contains 535 news articles and a total of 11,114 sentences. They have been manually annotated for opinions and sentiments (beliefs, emotions, sentiments, speculations, etc.).
Language(s) : English (USA)
|
|
|
|
The USENET corpus is a collection of public USENET postings. It currently contains over 25 billion words and covers 47,860 English language non-binary-file news groups from October 2005 to January 2010. It is untagged and has been cleaned and anonymized.
Language(s) : English
|
|
|
|
SMULTRON is a parallel treebank in English, German and Swedish. It contains around 1500 sentences that have been PoS-tagged and annotated with phrase structure trees.
It has been aligned at sentence, phrase and word levels.
Language(s) : GermanEnglish - EnglishSwedish - GermanSwedish
|
|
|
|
This parallel corpus consists of TV subtitles from soap operas, detective series, animation series, comedies, documentaries, feature films, etc.
This amounts to more than 14,000 subtitle files in each language, corresponding to more than 5 million subtitles (more than 50 million words).
Language(s) : SwedishDanish - SwedishNorwegian
|
|
|
|
Displaying 261 to 280 (of 730 products) |
Result Pages: 14 |
|
|