You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0123
TILT Corpus
The TILT corpus is an XML French-English bilingual collection of standards provided by the AFNOR (French Standards Organisation). The number of standards is 1,000, representing approximately 35,000 pages.
Documents are highly structured and the vocabulary used in standards is technical and very specific. The collection covers the different industrial domains (food industry, buildings and public works, mechanics, environment, local authorities, health, services, etc.).
It has been annotated at three levels: structural, morpho-syntactic and semantic.
An independent validation has been provided by linguists and experts in standards.
TILT stands for 'Trésor Informatisé de la Langue Technique'; it is an extension to the TLF corpus of literary French.
The corpus has already been used to extract semi-automatically:
- 12,000 bilingual terms (enrichment of technical dictionaries),
- 4,000 pairs of aligned sentences (enrichment of translation memories).
It is planned to extend the corpus with more standards.
It is a valuable resource for linguistic research on technical language and for NLP research (multilingual applications).
Production
Project :
TILT project
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Bilingual
Language(s) :
French (France)English (United Kingdom)
Alignment :
Sentence
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Semantic
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4