KYOTO Logo
Knowledge Yielding Ontologies
for Transition-based Organization
  • Increase font size
  • Default font size
  • Decrease font size

Extracted facts

Extracted facts

We created a number of databases with facts from various websites and archive.The documents have been collected by crawling the URLs. Some documents could not be converted to text.

Estuary database (English)

Various URLs about the Humber Estuary, the Chesapeake Bay, bird migration, climate change, sedimentation, and habitat destruction. 

4,625 documents, HTML files and PDFs, corresponding to about 3 million words. You can search the data using semantic search. Login with any user name and background.

 We extracted two sets of facts:

  1. Using 261 generic profiles for English: https://kyoto.let.vu.nl/~kyoto/files/data/kybotprofiles/generic_profiles_v9.zip
  2. Using complex term relations, where multiword terms are decomposed into conceptual relations, e.g. "migratory bird" will be translated to a relation between "birds" and the process of "migration"

The generic profile facts:

  • download, zipped Xml-file in Kybot output format with events and roles (9.4 MB)
  • download, zipped in RDF format(12,9 MB)
  • 465,391 events
  • 228,934 roles
  • 102,653 date expressions, 1,168 unique dates
  • 87,814 place expressions, 2,409 unique GeoNames places
  • 63,256 country expressions, 176 unique GeoNames countries

The complex term facts:

  • download, zipped XML-file with evens and roles (550 KB)
  • 5,371 events
  • 2,696 roles

 

Estuary database (Dutch)

A few URLs and documents on the Westerschelde and a few background documents on the environment, all in Dutch.

93 documents, mostly PDFs, corresponding to 42,697 words. You can search the data using semantic search. Login with any user name and background.

We used half a day to adapt the 261 English profiles to 65 Dutch profiles. Small adaptations were made, mainly for prepositions and some word order differences. We used the equivalence relations across the Dutch wordnet and the English WordNet to generate ontotag tables. Through these tables we could insert the same ontological properties into the Dutch KAF documents, even though the Dutch KAF was enriched with Dutch wordnet synsets. We applied the 65 profiles and generated the following facts:

  • download, zipped XML-file in Kybot output format with events and roles (295 KB)
  • download, zipped RDF format (510 KB)
  • 4,095 events
  • 6,862 roles
  • 8,118 date expressions, 82 unique dates
  • 5,928 place expressions, 60 unique GeoNames places
  • 3,302 countries, 9 unique GeoNames countries

 

WWF International website (English)

URL: http://wwf.panda.org/about_our_earth/all_publications (a selection made using the key word "species"):

3,271 PDF document out of which 1,174 have been processed, containing 1,966,914 words. You can search the data using semantic search. Login with any user name and background.

  • download, zipped XML-file in Kybot output format with event and roles (840 KB)
  • download, zipped RDF format (2,2 MB)
  • 17,811 events
  • 20,774 roles
  • 38,057 date expressions, 711 unique dates
  • 21,173 place expressions, 1,224 unique GeoNames places
  • 17,169 country expressions, 146 unique GeoNames countries

The complex-term facts:

  • download, zipped XML-file with events and roles (4.1 MB)
  • 246,932 events
  • 310,617 roles

 

Journal of Environmental Biology (English)

URL: www.jeb.co.in/journal_issues

 791 PDF documents, containing 3,440,611 words. You can search the data using semantic search. Login with any user name and background.

  • download, zipped XML-file in Kybot output format with events and roles (3.7 MB)
  • download, zipped RDF format (2,7 MB)
  • 23,406 events
    • 21,850 events based on profiles
    • 1,556 events based on cterms
  • 27,782 roles
    • 26,226 roles based on profiles
    • 1556 roles based on cterms
  • 51,188 date expressions, 696 unique dates
  • 38,929 place expressions, 2,306 unique GeoNames places
  • 12,259 country expressions, 82 unique GeoNames countries


European Environment Agency (English)

URL: http://www.eea.europa.eu/publications

 713 PDF documents, 4,814,647 word tokens. You can search the data using semantic search. Login with any user name and background.

  • download zipped XML-file in Kybot output format with events and roles (4.6 MB)
  • download zipped RDF-file (5.6 MB)
  • 47,355 events
  • 58,628 roles
  • 105,952 date expressions, 662 unique dates
  • 52,890 place expressions, 1,348 unique GeoNames places
  • 53,091 country expressions,  93 unique GeoNames countries

 

Hydrology and Earth System Sciences (English)

 URL: http://www.hydrol-earth-syst-sci.net/volumes_and_issues.htm

 1,355 PDF documents, 11,228,175 word tokens. You can search the data using semantic search. Login with any user name and background.

  • download zipped XML-file in Kybot output format with events and roles (7,3 MB)
  • download zipped RDF-file (8.6 MB)
  • 71,781 events
  • 85,276 roles
  • 157,057 date expressions, 2,380 unique dates
  • 133,832 place expressions,  4,407 unique GeoNames places
  • 23,225 country expressions, 116 unique GeoNames countries

Medical protocols on the treatment of breast cancer (English)

7 PDF documents, 110,501 word tokens

These files were processed using WordNet, the same ontology and the same Wordnet to ontology mappings as for the other environment databases for the environment.

We used the same kybot profiles as well. No domain adaptation was done to the medical domain:

  • download zipped XML-file in Kybot output format (417KB)
  • 8,416 events
  • 15,984 roles

 

 

 

ICT-211423 - 2008 © Kyoto Consortium