ELRA - ELRA-U-W 0097 : The Salsa Corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0097

The Salsa Corpus

The SALSA corpus is based on the TIGER corpus, a syntactically annotated German newspaper corpus of 1,5 million words. Word sense and semantic roles were added to TIGER using the frames of FrameNet 1.2. In addition, predicate-specific frames were developed to handle predicate instances not covered by FrameNet. The corpus was hand-annotated. The total size of the annotation is about 20.000 verbal instances and 17.000 nominal instances.

It is a resource of great value for research in NLP (automatic acquisition of lexical semantic information, training of statistical parsers on a combination of syntactic and semantic role information, improvement of techniques for information access and extraction).

The Salsa corpus was developed within the framework of the Saarbrücken Lexical Semantics Annotation and Analysis Project.

Identification

Period of coverage :

Version : Release 2.0
Version history :

Production

Project : Saarbrücken Lexical Semantics Annotation and Analysis Project

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : German (Germany)
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Semantic
Annotation Mode : Manual