KYOTO Logo
Knowledge Yielding Ontologies
for Transition-based Organization
  • Increase font size
  • Default font size
  • Decrease font size

Semantic Search: Fact Retrieval System

Introduction

The Semantic Search module builds a multi-lingual index on a collection of kybot facts. The facts are indexed by the lemma's and synset id's of their events and roles, the hypernyms of those synsets, and the translations of the lemma's in all Kyoto languages. The Kyoto Search web application can take a search term in any of those languages, analyse it to a set of lemma's and concepts, and retrieve kybot facts that have been indexed by these.

For instance, for the query 'king penguin', the system extracts facts with the concept 'penguin' but also its hypernyms 'sea bird' and 'bird'. The results are ordered by the number and closeness of their matches with the search term. Given the search term 'impacts of climate change on the population of polar bears', a fact that is tagged with both 'climate change' and 'bear' is more relevant than a fact that is only tagged with 'climate change'; and a fact that is tagged with 'polar bear' is more relevant than one tagged with its hypernym 'bear'.

The search results are displayed in the same language as the user's search term. They are facts and are presented as such. Each fact has a central 'event' and a number of 'roles', such as cause/actor, result/patient, location, date (see the Fact Extractor for a detailed explanation). You can click on an event to see its context (the text it was detected in); you can also view the page in the original document. This site contains a link to an online demo, as well as a video.

 

Connections with other KYOTO modules

Both kybot facts and KAF (Kyoto Annotation Format) annotated documents are used as input for semantic search.

 

Demonstration video

Click here to view the video in its proper resolution.

Demonstration application

The Semantic Search system (Kyoto-II) with the online databases. Click on the databases to search them through the Semantic Search system. The results are shown in various views: a table, a list and a map. The system supports cross-lingual searches; although the databases contain English source documents, the Dutch search term 'leeuw' and the Japanese search term 'ライオン' will return results about lions. Do not forget to specify the search language first, in the list in the top left! The semantic search video (see above) will show you how.

Search in English estuaries database.

Search in Journal Environmental Biology

Search in WWF database

Search in Dutch database

Search in European Environment Agency

Search in Hydrology

If the links do not work, please adjust your browser security level to normal.

Download and installation

The Search module can be downloaded from search.zip .
To create an index or deploy the Search web application, the mySQL ontotag and termdatabase databases must be installed locally. Dumps of both databases are included in the zip file. After installing mySQL, create two databases called 'ontotag' and 'termdatabase, then install the databases by running in a command window:

mysql ontotag < ontotag.sql
mysql termdatabase < terms.sql
You will also need a local installation of Wordnet 3.0. This can be downloaded from Princeton University.

 

Creating indexes from the command line.

To create an index on an English database, you must first download the original pdf's and the equivalent kaf-files. You can then use the class eu.kyoto.kybotindex.DatabaseCreator to create a database structure, supplying the following parameters:

  1. overWrite true/false: True if there is an existing database, and you wish to overwrite any previous entries;
  2. Database folder
  3. Optional: Folder that contains the kaf-files (if unspecified defaults to the database folder);
  4. Optional: Folder that contains the pdf-files (if unspecified defaults to the database folder);
Once you have a database structure, use the class eu.kyoto.kybotindex.KybotFactIndexer to create an index over a collection of kybot facts, supplying the following parameters:
  1. mySql username (default = root);
  2. mySql password (use '-' for the empty string);
  3. Database folder (should contain the files with kybot facts, as well as the database structure);
  4. The path to wordnet;
  5. The names of one or more files with kybot facts.
To extend the index with another language, use the class eu.kyoto.kybotindex.KybotIndexTranslator and supply the following parameters:
  1. One or more two-character language codes, divided by semi-colon (e.g. nl;it;eu;es;zh;jp);
  2. Database folder (contains the files with kybot facts);
  3. The paths to one or more files with kybot facts.
Below is an example of how to build an index over an English database, that supports Spanish and Basque querying:
java -cp search-1.0-SNAPSHOT-jar-with-dependencies.jar \ 
eu.kyoto.kybotindex.DatabaseCreator false database/ \
database/kaf database/pdfs

java -Xmx512m \
-cp search-1.0-SNAPSHOT-jar-with-dependencies.jar \
eu.kyoto.kybotindex.KybotFactIndexer root - database/ \
D:\wordnet kybot1.xml kybot2.xml

java -Xmx512m \
-cp search-1.0-SNAPSHOT-jar-with-dependencies.jar \
eu.kyoto.kybotindex.KybotIndexTranslator es database/ \
D:\wordnet kybot1.xml kybot2.xml

java -Xmx512m \
-cp search-1.0-SNAPSHOT-jar-with-dependencies.jar \
eu.kyoto.kybotindex.KybotIndexTranslator eu database/ \
D:\wordnet kybot1.xml kybot2.xml

 

Installing the Kyoto Search Web application

Search.zip contains a .war file with the complete web application. It can be deployed in Tomcat 6.0 as follows:

  1. Install the mySQL databases termdatabase and ontotag (see above);
  2. Install the wordnet dictionary to a local directory;
  3. Request a gmapkey from Google for your site;
  4. Copy kyoto.war into tomcat/webapps/ and start the server;
  5. Open the configuration file tomcat/webapps/kyoto/WEB-INF/classes/21search_client_config.xml. Specify which languages your index supports. Set the mySQL username and password and the gmapkey. Provide the path to wordnet, tomcat and the parent directory of your database.
  6. In your tomcat installation, add a file called 'database.xml' to the folder [tomcat]/conf/Catalina/localhost/, with the following contents:

    This sets a virtual directory.
  7. Restart the tomcat server. You can access the webapp at http://localhost:8080/kyoto/.

 

Detailed documentation and video-tutorial

Video tutorials:

  1. Crosslingual semantic search video
  2. Semantic search for frog
  3. Semantic search for a range of species
  4. Semantic search for a range of species
  5. Associating endangered species with invasive species through semantic search
Documentation: Relevant deliverables:

 

 

ICT-211423 - 2008 © Kyoto Consortium