KYOTO Logo
Knowledge Yielding Ontologies
for Transition-based Organization
  • Increase font size
  • Default font size
  • Decrease font size

SEMEVAL 2010

KYOTO granted Semeval-2010 Task on: All-words Word Sense Disambiguation on a specific domain (WSD-domain)

Domain adaptation is a hot issue in Natural Language Processing, including Word Sense Disambiguation. Word Sense Disambiguation systems trained on general corpora are known to perform worse when moved to specific domains (refs). WSD-domain will offer a testbed for domain-specific WSD systems, and will allow to test domain portability issues.

There is currently no all-words corpus available for specific domains. Lexical-sample sense-tagged corpus do exist, but they only cover the occurrences of a few manually selected words. The all-words corpus of WSD-domain will allow to measure the performance of WSD systems deployed in domain specific data in realistic conditions.


WSD-domain will produce sizeable all-words corpora on the environment domain. The data will be available in a number of languages, including English, Dutch and Italian, and possibly Basque and Chinese (confirmation pending). The sense inventories will be based on wordnets of the respective languages.

WSD-domain is being developed in the framework of the Kyoto project (http://www.kyoto-project.eu/).

 

Producing testing data

The test data will comprise three documents (approx. 2000 target words) for each language. The test data will be annotated by hand using double-blind annotation plus adjudication. Inter-Tagger Agreement will be measured. There will not be training data available, but participants are free to use existing hand-tagged corpora and lexical resources.

 

Evaluation methodology

Traditional precision and recall measures will be used, as implemented in past WSD Senseval and SemEval tasks.

 

Availability of the resources to the participants

The test data will be available for free. A pool of resources will be managed by the interested community, which could include untagged corpora of the target domain, domain ontologies, etc.

 

The resources required to prepare the task (time, money, and human)

The consortium of the Kyoto project will provide all necessary resources. The dataset will be finished before 2009.

 

ICT-211423 - 2008 © Kyoto Consortium