Spider4SSC - A semantically equivalent text-to-SQL/SPARQL/Cypher dataset

Name: Spider4SSC - A semantically equivalent text-to-SQL/SPARQL/Cypher dataset
Creator: Martin Vejvar
Published: 2025-11-14T19:00:28.759Z
Keywords: Query Language, Structured Query Language

Vejvar, Martin; Fujimoto, Yasutaka

doi:10.17632/4dnvv92trn.2

Spider4SSC - A semantically equivalent text-to-SQL/SPARQL/Cypher dataset

Published: 14 November 2025| Version 2 | DOI: 10.17632/4dnvv92trn.2

Contributors:

Martin Vejvar,

Description

Spider4SSC is a composite and extension of Spider (Yu et al., 2018), Spider4SPARQL (Kosten et. al., 2023) and our own Spider4Cypher data. Spider4SSC is a multi-domain text-to-query language dataset with 4525 unique (question, sql, sparql, cypher) samples where semantically matching queries are provided in SQL, SPARQL and Cypher. The dataset contains 159 distinct domain relational databases from Spider \cite{yu-etal-2018-spider} and 159 equivalent graph databases. This allows cross-query-language benchmarking and fine-tunning of neural networks on text-to-query task.

Files

Steps to reproduce

Cypher queries were generated using S2CLite. We filter the dataset so that it only includes samples in which the execution results of all three queries are equivalent.

Institutions

Yokohama Kokuritsu Daigaku

Spider4SSC - A semantically equivalent text-to-SQL/SPARQL/Cypher dataset

Description

Files

Steps to reproduce

Institutions

Categories

Related Links

Licence