You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0014
AMALGAM Multi-Tagged Corpus
This multi-tagged corpus contains 180 sentences taken from the following texts:
- the Industrial Parsing of Software Manuals (IPSM) text (60 sentences),
- the Lancaster/IBM Spoken English Corpus (SEC) text (60 sentences),
- the Corpus of London Teenager (COLT) English text (60 sentences).
The texts were tagged with the AMALGAM tagger using the Brown, ICE, LLC, LOB, UNIX Parts, POW, SEC and UPenn tagging schemes. The output of the AMALGAM tagger was proofread and edited by human experts in order to remove any error.
This resource was compiled to study methods of mapping between one set of tags and the others (the AMALGAM project).
AMALGAM stands for Automatic Mapping Among Lexico-Grammatical Annotation Models.
Production
Project :
The AMALGAM Project
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
English
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Morphological
Part of Speech : Nouns#Verbs#Adverbs#Adjectives#Pronouns#Determiners#Articles#Prepositions#Postpositions#conjunctions
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4