The AIDA source code is available on github.com/yago-naga/aida. For AIDA to work, you will need to download our YAGO-based entity respository and import it into a PostgreSQL server. Further installation instructions are included in the source release.
Downloadable files:
The dataset used in the experiments in our EMNLP 2011 paper, Robust Disambiguation of Named Entities in Text, can be downloaded here:
It contains assignments of entities to the mentions of named entities annotated for the original CoNLL 2003 entity recognition task. The entities are identified by YAGO2 entity name, by Wikipedia URL, or by Freebase mid (Thanks to Massimiliano Ciaramita from Google Zürich for creating the Wikipedia/Freebase mapping and making it available to us). The zip contains a README.txt with details about the format, as well as instructions how to create it from the original CoNLL 2003 dataset (this is required).
We also provide the mention-entity candidate mapping which was used in our experiments in Robust Disambiguation of Named Entities in Text, which is an extension of the YAGO2 means relation:
This file contains two tab-separated colums. The first column is a quoted string, denoting a potential mention which can be recognized in the input text, and the second column is one entity candidate for this mention. Both columns are encoded in the YAGO2 format, go to the YAGO2 downloads for decoding utils.
The dataset used in the experiments in our WWW 2014 paper, Discovering Emerging Entities with Ambiguous Names, can be downloaded here:
The AIDA-EE Dataset contains 300 documents with 9,976 entity names linked to Wikipedia (2010-08-17 dump). The documents themselves are taken from the APW part of the GIGAWORD5 dataset, with 150 documents from 2010-10-01 (development data) and 150 documents from 2010-11-01 (test data). Due to licensing issues, we do not provide the document content, just the offsets with the entity annotations.
The datasets used in the experiments in our CIKM 2012 paper, KORE: Keyphrase Overlap Relatedness for Entity Disambiguation, can be downloaded here:
All datasets are licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.