login - register - options - help - LCL home



  The TermExtractor algorithm has been further improved, see the news!
Furthermore, see the large-scale evaluation of TermExtractor!
 



TermExtractor is a FREE software package for Terminology Extraction. The software helps a web community to extract and validate relevant domain terms in their interest domain, by submitting an archive of domain-related documents in any format.
Furthermore, TermExtractor is a very useful starting point for Domain Ontology construction, Semantic Similarity, Knowledge Management, etc., since it allows the identification of domain-relevant terms, constituting the linguistic surface manifestation of domain concepts.

TermExtractor extracts terminology consensually referred in a specific application domain. The software takes as input a corpus of domain documents, parses the documents, and extracts a list of "syntactically plausible" terms (e.g. compounds, adjective-nouns, etc.). Documents parsing assigns a greater importance to terms with text layouts (title, abstract, bold, italic, underlined, etc.). Two entropy-based measures, called Domain Relevance and Domain Consensus, are then used. Domain Consensus is used to select only the terms which are consensually referred throughout the corpus documents. Domain Relevance to select only the terms which are relevant to the domain of interest, Domain Relevance is computed with reference to a set of contrastive terminologies from different domains. Finally, extracted terms are further filtered using Lexical Cohesion, that measures the degree of association of all the words in a terminological string.
See the help page () for additional informations about TermExtractor.

Details can be found also on: F. Sclano ad P. Velardi "TermExtractor: a Web Application to Learn the Common Terminology of Interest Groups and Research Communities " 9th Conf. on Terminology and Artificial Intelligence TIA 2007, Sophia Antinopolis, October 2007

In this page you can access a demo version of TermExtractor in which you are allowed to upload only one document with a dimension of maximum 5 MB. In order to use the full version of TermExtractor you must register (both registration and full version are FREE!). With the full version you will be able to upload a corpus of documents for a maximum of 100 MB and set the options () of the terminology extraction process.
If you are already registered, please login.



Enter one document of maximum 5 MB and START the terminology extraction process.
Accepted formats are: txt, pdf, ps, dvi, tex, doc, rtf, ppt, xls, xml, html/htm, chm, wpd.
The document must not be encrypted and written in english language.

Document:




TermExtractor conceived by LCL Group.
All rights reserved.
NOTE: this site requires popups (firefox, explorer, google toolbar), dhtml, cookies and javascript to be enabled in your browser.


TermExtractor has been used 11783 times.
TermExtractor users 491, on-line users 94.