KYOTO Logo
Knowledge Yielding Ontologies
for Transition-based Organization
  • Increase font size
  • Default font size
  • Decrease font size

OntoTagger

Ontotagging represents the last phase in the Kyoto LP annotation pipeline. The OntoTagger module adds a further layer of annotation to a text with knowledge gathered from the ontology. The OntoTagger takes in input a KAF file, which already passed through the phases of  MW and NE recognition and WS disambiguation. For each synset automatically assigned to a term in KAF, OntoTagger includes:

     1. the corresponding Base Concept;
     2. the correct ontology type and the appropriate relation to the ontology;

For each ontology type, it includes:
     3. the appropriate relations to other types in the ontology
 
This allows to make explicit the implicit ontological statements, thus leading to perform offline reasoning and facilitate the Kybots to run on KAF texts.

Each added information is represented by means of an XML element :

 

Ontotagger 1

 

The OntoTagger module relies on a set of external resources, constituted by three tables where synsets, Base Concepts and Kyoto2 Ontology with properties of the ontological types are related each other. A first tables relates each synset its Base Concept; a second table links each synset, via a relation, to its ontology type; the last table makes explicit the properties between the ontological types and other types within Kyoto2 Ontology.


The following is the description of the fields in the tables.

--- Table 1:    Synset - BC
     o Synset
     o BC

--- Table  2:     Synset - Ontology
     o Relation
     o Ontology
     o Synset


--- Table  3:     Explicit Ontology
     o Ontology label (source)
     o Property
     o Ontology (target)  

 

Description of the algorithm

Without loss of generality, in order to describe the OntoTagger algorithm, the tables where knowledge is stored can be seen as functions:
 
  F:  SYN   --> BC
  G:  SYN   -->  REL x ONTO
  H:  ONTO -->  PartOf (PROP x ONTO)
 
 Where:
 
  SYN    =  {s  |  s is a synset in WN}  that is the Set of Synsets; 
  BC       =  {b |  bc is a base concept}  that is the Set of Base Concepts;
  REL     =  {r |  r is a relation}  that is the Set of  Relations;
  ONTO =  {o |  o is an ontology type}  that is the Set of  OntologyTypes;
  PROP  =  {p |  p is an ontology property}  that is the Set of ExplictOntologyProperties;

As usual, we indicate with
     - X the Cartesian product on Set, that is, T x Z = {(t,z) | t belongsTo T and z belongsTo Z} the Set of pair of ordered element of T and Z . If  (t,z)  is a pair of elements , the operators projection ?first? and ?second? are used to select the first and second element of a pair;
     - PartOf (X) the set of all subsets of X, that is PartOf (X) = { x | x isIncludedOrCoincidentWith X }

Algorithm

* Parsing input KAF
* Parsing tables

For all terms with tag in input KAF as:

  •     reference= x   resource = wn />

The following  sub-nodes are added in output:

 

Ontotagger Output

For all values x1,x2..xn belongsTo T3(ot) with  ot = T2(x).second.


The attribute value pair status="implied"  is set when the explicit property of the ontology is inherited.


OntoTagged Sample

The following example for "duck" shows the output of the whole procedure.

Ontotagger Example

 

 

Download and installation

OntoTagger version 2.1 (12-April-2010, version 2.1) can be downloaded here: Release_V2.1.

 

Installation instructions

OntoTagger is written in C++ and compiled in two versions in order to run on two platforms, Windows (ontotagger_v2_WIN) and Linux (ontotagger_v2_LINUX).  It does not require any specific installation actions, besides copying the structure as is. The package provides a ReadMe file with specific instructions.

 

Run OntoTagger as a standalone program on KAF files on disk

The OntoTagger can be used as a standalone application on any set of WSD/NER KAF files on disk. OntoTagger takes three tables as arguments ("arg" folder):

  • To run OntoTagger on Windows platform:

more IN | ontotagger_v2_WIN.exe T1 T2 T3  > OUT 

  • To run OntoTagger on Linux platform:

cat IN | ./ontotagger_v2_LINUX T1 T2 T3  > OUT 

where:

IN   =  file_name_kaf_input ("input" folder)

T1   =  T1__Synset-BC.v2.rel.txt("arg" folder)

T2   =  T2__Synset-Ontology.txt("arg" folder)

T3   =  T3__Explicit_Ontology.txt ("arg" folder)

OUT  =  file_name_output (by convention the file in output will contain the suffix .ont.)

 

The format of the tables is the following:

 

T1:   includes synset-BC mappings.

All nominal synsets connected to 297 nominal Base Concepts; all verbal synsets connected to 579 verbal Base Concepts; 482 Domain synsets connected to Base Concepts. Format: synset-offset base-concept-offset

T2:   includes noun, adjective and verb synsets (and expanded near-hyponyms) synsets - Ontology mappings (Base Concepts as Types of the Ontology). Format: synset_offset relation Ontology_Type

T3:   includes 20363 Explicit Ontology statements in Tabular format (Top, Middle, Domain and Benchmarck Adjectives Verbs).Format: Ontology_Type_source relation Ontology_Type_target (inherited)

 

Integrate OntoTagger in the KYOTO pipeline

The OntoTagger_V.2 can be integrated as an external module within the KYOTO PipeT architecture within the standard KYOTO pipeline.

The OntoTagger operates on the KAF file after word-sense-disambiguation and named-entity recognition take place. As a pipeline module in KYOTO, OntoTagger will take kaf-ner as an inputstream and generate kaf-onto as an outputstream for any KAF document in the document base.

The OntoTagger module is created as a shell command. To run the OntoTagger module, the user, by using the PipeT graphical user interface, should:

  1. Go to the list of available modules (in the menu: settings, manage modules).
  2. Create the new OntoTagger module (add module) and select "external module" as a module type.
  3. Browse to select the OntoTagger executable and click "OK" (be careful to use the correct executable, depending on the fact that Pipet is running in Linux or Windows). A new module is now created.
  4. Select and configure the newly created module from the module list. Click “Configure”:
    • Command: contains the executable with the complete path; add the tables (T1, T2, T3) as parameters (be careful to specify the path).
    • Input pipes: add an input pipe name, as agreed on in Kyoto “kaf/ner”
    • Output pipes: add an output pipe name, “kaf/onto”.
  5. The module is now ready for being integrated in the Kyoto pipeline.
 

 

 

 

ICT-211423 - 2008 © Kyoto Consortium