ELRA - ELRA-WC353 : Sensem Corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-WC353

Sensem Corpus

This is a lexical database consisting of sentences extracted from the electronic version of the newspaper El Periodico de Catalunya. It illustrates the semantic and syntactic behavior of the 250 more frequent Spanish verbs. The corpus comprises one million words, with 100 examples of each verb. 25,000 sentences have been semantically and syntactically annotated, that is to say 800,000 words, and about 400,000 words have been manually checked. For the corpus annotation, a sense has been assigned to each verb and a semantic role has been assigned to the verb argument(s), using a verb lexicon specially created for these tasks. Then a category and a syntactic function have been automatically pre-selected. It is presented in the XML format.

Production

Project : SENtence SEMantics Project

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : Spanish (Spain)
Number of tokens : 1 million words
Annotation Granularity : Word
Annotation level : Semantic
Annotation Mode : Automatic#Manual
Annotation language : XML