HESML_vs_SML: scalability and performance benchmarks between the HESML V1R2 and SML 0.9 semantic measures libraries

Published: 21 Dec 2016 | Version 1 | DOI: 10.17632/5hg3z85wf4.1
Contributor(s):

Description of this data

This dataset introduces a companion reproducibility Java console program, called HESML_vs_SML_test.jar, of the work introduced by Lastra-Díaz and García-Serrano [1]. This latter work introduces the Half-Edge Semantic Measures Library (HESML), and carries-out an experimental survey between HESML V1R2, the Semantic Measures Library (SML) 0.9 [2] and the WNetSS [4] semantic measures libraries.

The HESML_vs_SML_test.jar program runs the set of performance and scalability benchmarks detailed in [1] and generates the figures and tables of results reported in the aforementioned work, which are also enclosed as complementary files of this dataset (see files below).

Licensing note:

The 'HESML_vs_SML_test.jar' program is based on the HESML V1R2 [3], SML 0.9 [2] and WNetSS [4] semantic measures libraries, and it includes these libraries in its distribution, as well as WordNet 3.0 [6] and the SimLex665 [5] dataset. Thus, if you use this dataset, you should also cite the works related to these resources.

References:

[1] Lastra-Díaz, J. J., and García-Serrano, A. (2016). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. To appear in Information Systems Journal.

[2] Harispe, S., Ranwez, S., Janaqi, S., and Montmain, J. (2014). The Semantic Measures Library: Assessing Semantic Similarity from Knowledge Representation Analysis. In E. Métais, M. Roche, & M. Teisseire (Eds.), Proc. of the 19th International Conference on Applications of Natural Language to Information Systems (NLDB 2014) (Vol. 8455, pp. 254–257). Montpelier, France: Springer. http://dx.doi.org/10.1007/978-3-319-07983-7_37

[3] Lastra-Díaz, J. J., & García-Serrano, A. (2016). HESML V1R2 Java software library of ontology-based semantic similarity measures and information content models. Mendeley Data, v2. https://doi.org/10.17632/t87s78dg78.2

[4] Ben Aouicha, M., Taieb, M. A. H., and Ben Hamadou, A. (2016). SISR: System for integrating semantic relatedness and similarity measures. Soft Computing, 1–25. http://dx.doi.org/10.1007/s00500-016-2438-x

[5] Hill, F., Reichart, R., & Korhonen, A. (2015). SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation. Computational Linguistics, 41(4), 665–695. http://dx.doi.org/10.1162/COLI_a_00237

[6] Miller, G. A. (1995). WordNet: A Lexical Database for English. Communications of the ACM, 38(11), 39–41. http://dx.doi.org/10.1145/219717.219748

Experiment data files

Steps to reproduce

System requirements: a Java8-compliant workstation with at least 8 Gb RAM.

The HESML_vs_SML_test.zip file contains the source files and compiled versions of the HESML_vs_SML_test.jar and all the aforementioned semantic measures libraries, thus, you only need to run the program. However, in order to compile HESML_vs_SML_test from its source files, you need to install NetBeans 8.0 or higher and the Java SDK 8.0.

Running of the benchmarks:

The first group of benchmarks evaluates the running-time and caching ratio in a side-by-side comparison between the most significant topological algorithms implemented by HESML and SML.

(1) Download the HESML_VS_SML_test.zip file above and extract it onto your hard drive, then follow the steps 2-4 below:

(2) Open a Linux or Windows command console in the main HESML_VS_SML_test directory and run the following command:

$prompt:> java -Xms4096m -Xmx4096m -jar dist\HESML_VS_SML_test.jar <output_results.csv>

(3) Import the raw output file with LibreOffice or MS-Excel to obtain the data as shown in benchmarks_HESML_vs_SML.csv file above

(4) Install and open the R statistics package, then follow the following steps: (a) select the "File->Open script" menu and load the 'IS_HESML_figure3_and_table18.r' script file above; (b) edit the first two lines of the script code in order to set the path of the input directory and the input 'output_results.csv' file generated in the step 2 above; and finally, (c) select the "Edit->Run all' menu in order to generate the figure in the HESML_vs_SML.pdf file above.

The output csv file obtained in step 2 above will be identical to the complementary 'benchmarks_HESML_vs_SML.csv' file. However, it will show the running times on your experimental platform.

The second benchmark evaluates the running time of HESML, SML and WNetSS in the evaluation of the Jiang-Conrath similarity measure with the Seco et al. IC model in the SImLex665 dataset. In order to reproduce the WordNet-based similarity benchmark reported in table 19 of [1] and the 'final_results-SimLex665.csv' file above, you should follow the steps 5-8 below:

(5) Install MySQL community edition in your workstation (demanded by WNetSS).

(6) Open a Linux or Windows command console in the HESML_VS_SML_test directory and run the command below, which carries out the off-line pre-processing tasks of WNetSS in order to load WordNet 3.0 and all its topological information in the MySQL server. This task could take a few hours in a modern workstation.

$prompt:> java -Xms4096m -Xmx4096m -jar dist\HESML_VS_SML_test.jar -WNetSS_Setup mySqlRootPassword

(7) From the same Linux or Windows command console run the following command:

$prompt:> java -Xms4096m -Xmx4096m -jar dist\HESML_VS_SML_test.jar -WNetSS mySqlRootPassword <output_results.csv>

(8) Import the output file with LibreOffice or MS-Excel to obtain the data shown in the final_results_SimLex665.csv file above.

Related links

peer reviewed

This data is associated with the following peer reviewed publication:

HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset

Published in: Information Systems

Latest version

  • Version 1

    2016-12-21

    Published: 2016-12-21

    DOI: 10.17632/5hg3z85wf4.1

    Cite this dataset

    Lastra-Diaz, Juan J.; Garcia-Serrano, Ana (2016), “HESML_vs_SML: scalability and performance benchmarks between the HESML V1R2 and SML 0.9 semantic measures libraries”, Mendeley Data, v1 http://dx.doi.org/10.17632/5hg3z85wf4.1

Institutions

National University of Distance Education

Categories

Ontological Models

Mendeley Library

Organise your research assets using Mendeley Library. Add to Mendeley Library

Licence

CC BY NC 3.0 Learn more

The files associated with this dataset are licensed under a Attribution-NonCommercial 3.0 Unported licence.

What does this mean?

You are free to adapt, copy or redistribute the material, providing you attribute appropriately and do not use the material for commercial purposes.

Report