SemEval2010 Task
Home
Instructions for participation
Schedule
Download area
Contacts
References

SemEval 2010: All-words Word Sense Disambiguation on a specific domain (WSD-domain)

Domain adaptation is a hot issue in Natural Language Processing, including Word Sense Disambiguation. Word Sense Disambiguation systems trained on general corpora are known to perform worse when moved to specific domains (Escudero et al., 2000; Martinez and Agirre, 2000; Chan and Ng, 2007; Agirre and Lopez de Lacalle, 2008; Zhong et al., 2008; Lopez de Lacalle and Agirre, 2009). WSD-domain task will offer a testbed for domain-specific WSD systems, and will allow to test domain portability issues.

There is currently no all-words corpus available for specific domains. Lexical-sample sense-tagged corpus do exist, but they only cover the occurrences of a few manually selected words. The all-words corpus of WSD-domain will allow to measure the performance of WSD systems deployed in domain specific data in realistic conditions.

The WSD-domain task will produce sizeable all-words corpora on the environment domain. Texts from ECNC and WWF will be used in order to build domain specific test copora (see below the example). The data will be available in a number of languages: English, Chinese, Dutch and Italian. The sense inventories will be based on wordnets of the respective languages.

The test data will comprise three documents (6000 word chunk with approx. 2000 target words) for each language. The test data will be annotated by hand using double-blind annotation plus adjudication. Inter-Tagger Agreement will be measured. There will not be training data available, but participants are free to use existing hand-tagged corpora and lexical resources (e.g. SemCor and previous Senseval http://www.senseval.org and SemEval data http://nlp.cs.swarthmore.edu/semeval/index.php). Background text from the domain will be provided for unsupervised or semi-supervised learning.

Traditional precision and recall measures will be used in order to evaluate the participant systems, as implemented in past WSD Senseval and SemEval tasks.

WSD-domain is being developed in the framework of the Kyoto project (http://www.kyoto-project.eu/) and the consortium of the project will provide all necessary resources.


Environment domain text example:

Projections for 2100 suggest that temperature in Europe will have risen by between 2 to 6.3 degrees Centigrade above 1990 levels. The sea level is projected to rise, and a greater frequency and intensity of extreme weather events are expected. Even if emissions of greenhouse gases stop today, these changes would continue for many decades and in the case of sea level for centuries. This is due to the historical build up of the gases in the atmosphere and time lags in the response of climatic and oceanic systems to changes in the atmospheric concentration of the gases.


Reference:

Eneko Agirre, Oier Lopez de Lacalle, Christiane Fellbaum, Andrea Marchetti, Antonio Toral, Piek Vossen, 2009
SemEval-2010 Task 17: All-words Word Sense Disambiguation on a Specific Domain
Proceedings of NAACL workshop on Semantic Evaluations (SEW-2009). Boulder Colorado.