Published: 3 March 2021| Version 4 | DOI: 10.17632/thtndvvp9s.4
Francesco Taglino, Anna Formica


This dataset represents the results of the experimentation of a method for evaluating semantic similarity between concepts in a taxonomy. The method is based on the information-theoretic approach and allows senses of concepts in a given context to be considered. The dataset is composed of 28 files. Each file refers to one pair of the well-known Miller and Charles benchmark dataset [1] for assessing semantic similarity. For each pair of concepts, the same 28 pairs are all considered as possible different contexts. We applied our proposal by extending 7 methods for computing semantic similarity in a taxonomy, selected from the literature. The methods considered in the experiment are referred to as (R[2], W&P[3], L[4], J&C[5], P&S[6], A[7], A&M[8]): REFERENCES [1] Miller, G.A., Charles, W.G. Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1-28 (1991) [2] Resnik, P. {\em Using Information Content to Evaluate Semantic Similarity in a Taxonomy}. In Proc. of the Int. Joint Conf. on Artificial Intelligence, Montreal, Quebec, Canada, August 20-25, Morgan Kaufmann, 448-453 (1995)]. [3] Wu, Z., Palmer, M. Verb semantics and lexical selection. In Proc. of the 32nd Annual Meeting of the Associations for Computational Linguistics, Las Cruces, New Mexico, 133-138 (1994). [4] Lin, D. An Information-Theoretic Definition of Similarity. In Proceedings of the Int. Conf. on Machine Learning, Madison, Wisconsin, USA. Morgan Kaufmann, 296-304 (1998). [5] Jiang, J.J., Conrath, D.W. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In Proc. of Inter. Conf. Research on Computational Linguistics (ROCLING X), Taiwan (1997). [6] Pirrò, G. A Semantic Similarity Metric Combining Features and Intrinsic Information Content. Data Knowl. Eng, 68(11), 1289-1308 (2009). [7] Adhikari, A., Dutta, B., Dutta, A., Mondal, D., Singh, S. An intrinsic information content-based semantic similarity measure considering the disjoint common subsumers of concepts of an ontology. J. Assoc. Inf. Sci. Technol. 69(8), 1023-1034 (2018). [8] Adhikari, A., Singh, S., Mondal, D., Dutta, B., Dutta, A. A Novel Information Theoretic Framework for Finding Semantic Similarity in WordNet. CoRR, arXiv:1607.05422, abs/1607.05422 (2016). Finally, in each file, the correlation of our proposal with respect to human judgement is reported.



Istituto di analisi dei sistemi ed informatica Antonio Ruberti Consiglio Nazionale delle Ricerche


Semantics, Similarity Measure