Knowledge Yielding Ontologies
for Transition-based Organization
  • Increase font size
  • Default font size
  • Decrease font size

Kybot Evaluation

Owner: VU University Amsterdam

License: GPLv3


Source code:


See the readme file include in the package. 



Evaluation module implemented in Java for evaluating text mined data against a gold-standard. It uses a triplet representation of mined data. It can convert the Kybot output to this triplet format and compare it with a gold-standard triplet file.To create a gold-standard, you can use the Kaf Annotator Tool described elsewhere below tools.

There are no special instructions for installing the Kybot-evaluation module. Download the zip file and unpack it anywhere in a folder. To run the program, Java-1.6 is required.

The program evaluates text mining from text on the basis of a triplet representation for events. A triplet consists of:
 - a relation
 - a list of token ids that represent the event
 - a list of token ids that represent a participant

Here is an example of a triplet in XML format:


If an event has multiple participants, a separate triplet is created for each event-participant pair. The triplet identifier is used to mark which triplets relate to the same event.

The evaluation module assume that any mined data is converted to triplets. The program reads a file with triplets that represents the standard and another file with triplets that is generated by the system. It calculates the precision and recall for the system file, where the following definitions are used:

 - Precision = nCorrect system triplets/n gold standard triplets
 - Recall = nCorrect system triplets/nr of system triplets

Four evaluations are carried out by comparing the triplet in four ways:

- all identifiers and the relation exactly match
- all identifiers match and the relation is ignored
- at least one identifier matches and the relation matches
- at least one identifier matches and the relation is ignored

 So that systems cannot cheat by making very long ranges of event Ids and participant Ids, we publish the average size of the ranges.

Furthermore, a range of tokens can be read to limit the range of text that is evaluated. If a file with the range of tokens is omitted, the scope if text is based on the sentences that have been used for the gold standard.

The Kybot Evaluation has two main functions:

1.  Conversion of Kybot output to the triplet format

Main class:
- eu.kyotoproject.evaluation.KybotOutput.KybotOutputToTriplets

This function takes 3 obligatory arguments:
- arg1: the output of the Kybots that extract events in the KYOTO system
- arg2: the KAF file from which the Kybot output is generated
- arg3: the threshold for the WSD score of facts and roles, if set to 0 all output is taken, if set to 100 only the highest scoring interpretation in case of competition. All other values are proportional to the highest score.

- a file with the triplets

2. Evaluation of two triplet files

Main class:
- eu.kyotoproject.evaluation.EvaluateTriplets

This function takes the following arguments:

--gold-standard-triplets            file with gold standard triplets
--system-triplets                   file with system triplets
--token-range (optional)            file with tokens for the events to be covered
--ignore-participants (optional)    lumps differentiated endurants with the participant relation
--ignore-relations (optional)       relation labels are ignored for matching
--skip-time-and-location (optional) TIME and LOCATION relations are ignored

- file with statistics and recall & precision for the system file

To run the program in debug mode use:
- eu.kyotoproject.evaluation.KybotOutput.KybotOutputToTripletsDebug

This version generates details on the type of analysis, such as the precision for each profile, per relation and different types of matches: exact and partial identifiers, exact relation and ignoring the relations. It also generates a log file listing all missed triplets (to improve recall), all correct matches and a confusion matrix for the profiles.

The KYOTO project offers an annotation tool (KafAnnotator) to create a gold-standard of triplets from a text that is represented in the Kyoto annotation format (KAF).

3. Baseline facts

The package includes a function to extract a baseline from a KAF file. The baseline creates triplets between all the heads of constituents (chunks),
taking one as the event and all the others as the roles. You can call this function using the main class:


The first argument is the KAF file, the second argument is optional and can be used to name a default relation, e.g. patient.
The program directly generates a triplet file from KAF.



The binaries can be built using maven and the pom.xml

> mvn install




ICT-211423 - 2008 © Kyoto Consortium