HYENA is a multi-label classifier for entity types based on hierarchical taxonomies derived from YAGO2 knowledge base.
HYENA types taxonomy is composed of 505 types organized into a directed acyclic graph with 5 main super types in its top level, and 9 levels in its deepts part. HYENA was trained on 1.6 million instances extracted from 50,000 randomly selected Wikipedia articles.
HYENA uses neighboring words and bigrams, part-of-speech tags, and also phrases from a large gazetteer derived from YAGO2 knowledge base.
HYENA System can be tested online. Type annotations are displayed using a color-coded interactive tree.
HYENA type taxonomy was derived from YAGO knolwedge base by starting with five broaad classes namely PERSON,
LOCATION, ORGANIZATION, EVENT and ARTIFACT. Under each of these superclasses, the most 100 prominent subclasses are picked based on the population of the classes. Classes are organized in a hierachy which has 9 levels in its deepest parts.
You can browse our hierarchy in the pdf file below or using our Interactive Browser.
data property | training | testing |
---|---|---|
# of articles | 50,000 | 10,000 |
# of instances (all types) | 1,613,340 | 253,029 |
# of location instances | 489,003 (30%) | 86,936 (34.4%) |
# of person instances | 426,467 (26.4%) | 62,446 (24.6%) |
# of organization instances | 219,716 (13.6%) | 38,293 (15.1%) |
# of artifact instances | 204,802 (12.7%) | 31,899 (12.6%) |
# of event instances | 176,549 (10.9%) | 28,952 (11.4%) |
# instances in 1 top-level class | 1,131,994 (70.2%) | 179,240 (70.8%) |
# instances in 2 top-level classes | 182,508 (11.3%) | 33,399 (13.2%) |
# instances in more than 2 top-level classes | 6,492 (0.4%) | 828 (0.3%) |
# instances not in any class | 292,346 (18.1%) | 39,562 (15.6%) |
In the Coling 2012 paper, HYENA has been tested on 253,029 instances from 10,000 randomly selected Wikipedia articles. The macro per class, and micro results are shown in the table below.
Macro | Micro | |||||
---|---|---|---|---|---|---|
Precision | Recall | F1 | Precision | Recall | F1 | |
HYENA | 0.878 | 0.863 | 0.87 | 0.913 | 0.932 | 0.922 |
HYENA + meta-classifier | 0.89 | 0.837 | 0.862 | 0.916 | 0.914 | 0.915 |
Detailed HYENA results for each type classifier, as well as the output for each testing instance are available here. Results are downloadable as one compressed archive here.