Spider4SSC - A semantically equivalent text-to-SQL/SPARQL/Cypher dataset

Published: 14 November 2025| Version 2 | DOI: 10.17632/4dnvv92trn.2
Contributors:
Martin Vejvar,

Description

Spider4SSC is a composite and extension of Spider (Yu et al., 2018), Spider4SPARQL (Kosten et. al., 2023) and our own Spider4Cypher data. Spider4SSC is a multi-domain text-to-query language dataset with 4525 unique (question, sql, sparql, cypher) samples where semantically matching queries are provided in SQL, SPARQL and Cypher. The dataset contains 159 distinct domain relational databases from Spider \cite{yu-etal-2018-spider} and 159 equivalent graph databases. This allows cross-query-language benchmarking and fine-tunning of neural networks on text-to-query task.

Files

Steps to reproduce

Cypher queries were generated using S2CLite. We filter the dataset so that it only includes samples in which the execution results of all three queries are equivalent.

Institutions

  • Yokohama Kokuritsu Daigaku

Categories

Query Language, Structured Query Language

Licence