ELRA - ELRA-U-W 0014 : AMALGAM Multi-Tagged Corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0014

AMALGAM Multi-Tagged Corpus

This multi-tagged corpus contains 180 sentences taken from the following texts:
- the Industrial Parsing of Software Manuals (IPSM) text (60 sentences),
- the Lancaster/IBM Spoken English Corpus (SEC) text (60 sentences),
- the Corpus of London Teenager (COLT) English text (60 sentences).

The texts were tagged with the AMALGAM tagger using the Brown, ICE, LLC, LOB, UNIX Parts, POW, SEC and UPenn tagging schemes. The output of the AMALGAM tagger was proofread and edited by human experts in order to remove any error.

This resource was compiled to study methods of mapping between one set of tags and the others (the AMALGAM project).

AMALGAM stands for Automatic Mapping Among Lexico-Grammatical Annotation Models.

Production

Project : The AMALGAM Project

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : English
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Morphological
Part of Speech : Nouns#Verbs#Adverbs#Adjectives#Pronouns#Determiners#Articles#Prepositions#Postpositions#conjunctions