KYOTO Logo
Knowledge Yielding Ontologies
for Transition-based Organization
  • Increase font size
  • Default font size
  • Decrease font size

Named Entity Recognition

The Named Entity Recogniser (NER) is a module that reads KAF files and tries to detect terms that denote dates and locations, using language-specific resources and a version of the GeoNames database that has been adapted to our needs. To detect dates, the NER-module checks whether the term contains the name(s) of a weekday or month (e.g. 'January 2006'), or conforms to a particular pattern (e.g. '13-02-2003' or '1950s'). To detect locations, the NER-module sends each noun to GeoNames, and selects the most likely location from the results (e.g. the largest city, or a location in the same country as the other locations in the text).

 

What will the NER-Module do to KAF?

These named entities are stored, disambiguated, in the KAF. For instance, the KAF may contain the following terms:

    <term lemma="July_2001" pos="N" tid="t5987" type="entity">
<span>
<target id="w6786"/>
<target id="w6787"/>
</span>
</term>
<term lemma="Holderness" pos="O" tid="t461" type="open">
<span>
<target id="w533"/>
</span>
</term>

The first term is identified as a date and disambiguated to '2001-07'; the second term is identified as a location and disambiguated to a peninsula in the UK. These named entities are stored in the KAF as separate entities, with a KafReference to the terms. If multiple terms refer to the same (disambiguated) date or location, the entity contains KafReferences to all those terms. The locations also contain ExternalReferences to the GeoNames location, and a WordNet synset which describes the type of location (such as peninsula). Assuming that there is a second term which refers to the Holderness peninsula, the KAF-representation of these two named entities is as follows:

    <date did="d11"> 
<kafReferences>
<kafReference pageId="22">
<span id="t5987"/>
</kafReference>
</kafReferences>
<dateInfo dateISO="2001-07" lemma="July 2001"/>
</date>

<location lid="l4">
<kafReferences>
<kafReference pageId="3">
<span id="t461"/>
</kafReference>
<kafReference pageId="4">
<span id="t871"/>
</kafReference>
</kafReferences>
<externalReferences>
<externalRef confidence="0.35" reference="2646769" resource="GeoNames"/>
<externalRef reference="eng-30-09388848-n" resource="wn30g"/>
</externalReferences>
<geoInfo>
<place countryCode="GB" countryName="United Kingdom" fname="peninsula"
latitude="53.75" longitude="-0.1166667"
name="Holderness" timezone="Europe/London"/>
</geoInfo>
</location>

 

Download and installation

The NER module can be downloaded from ner.zip .
To run the NER-module, the GeoNames mySQL database must be installed on your machine. A dump of the database is included in the zip file; after installing mySQL, install the database by running in a command window:

mysql geonames < geonames.sql 

 

Run the NER module from the commandline

To run the NER module from the commandline the user should supply the following parameters:

  1. Project folder (contains the kaf-files);
  2. mySql username (default = root);
  3. mySql password (use '-' for the empty string);
  4. Whether the output should contain only named entities (true) or all kaf (false).
Below is an example of how to call the NER module from the commandline:
java -Xmx512m \ 
-cp ner-0.0.1-SNAPSHOT-jar-with-dependencies.jar \
eu.kyoto.ner.LocationAndDateCapture upload/estuaries_english/lp+mw+wsd root - false

 

Integration in the KYOTO pipeline

The NER module can be integrated in the KYOTO PipeT architecture. The NER-module reads from an input-stream kaf/wsd (containing the result of the WSD-module) and writes the result to an output stream kaf/ner. Through the pipeline configuration option (see the documentation on PipeT) the user can specify:

  1. sqlUser: user-name for mySQL (default = 'root');
  2. sqlPassword: password for mySQL (default = '');
  3. ner-only: True to write only named entities, false to write entire KAF to kaf/ner (default = 'false').

 

 

ICT-211423 - 2008 © Kyoto Consortium