Link discovery on the web of data: time and space efficient large scale link discovery using string similarities
This work proposes and evaluates a time and space efficient approach for computing links between a source data set and a target dataset by exploiting string similarities among entities’ properties. The proposed approach builds on a basic indexing method that facilitates pruning dissimilar pairs and supports effective verification of candidate pairs. It proposes a blocking method that organizes the target data set appropriately, to perform queries concerning matching a specific string. It supports an effective filtering approach that uses three filters that lead to a relatively small amount of candidate strings that need verification. Lastly, for the verification of each candidate string it uses an optimized algorithm for computing the edit distance between two strings. Evaluation results show the time and space efficiency of the proposed method against state-of-the-art approaches for link discoveries.