Promset: An annoted dataset for translating natural language to PromQl

Name: Promset: An annoted dataset for translating natural language to PromQl
Creator: DAVE CHEDJOUN
Published: 2025-08-05T13:53:33.750Z
Keywords: Computer Science, Artificial Intelligence, Software Engineering, Data Science, Natural Language Processing, Query Language, System Supervision, Cloud Infrastructure, Large Language Model

CHEDJOUN, DAVE; Monthe, Valery Marcial; Tchuani Tchakonte, Diane; Tchagna Kouanou, Aurelle

doi:10.17632/mfy9ntjy7p.1

Promset: An annoted dataset for translating natural language to PromQl

Published: 5 August 2025| Version 1 | DOI: 10.17632/mfy9ntjy7p.1

Contributors:

,

, Diane Tchuani Tchakonte,

Description

PromSet is an annotated dataset designed to support natural language processing (NLP) research for system monitoring. It is particularly suited to applications involving the training and evaluation of large language models to translate queries expressed in natural language into their equivalent in PromQL, the query language used by the Prometheus monitoring tool. An initial dataset was constructed from the results of our experiments on Prometheus, during which we created a set of queries and their natural language descriptions. We then added additional data by collecting PromQL queries and their descriptions from various web sources. This raw data was curated, reviewed, corrected, and enriched with Gemini, resulting in a high-quality dataset suitable for research and development. The dataset contains a total of 4,350 manually curated pairs, each linking an English description to a corresponding PromQL expression. It is provided in CSV format, with two fields: description (a human-readable query) and promql (its equivalent in PromQL syntax). Each record represents a concrete and practical monitoring scenario, such as metric aggregation, label filtering, or time-based calculations. In many cases, a single PromQL query is associated with multiple English-language descriptions, increasing linguistic variation and enabling more robust model training. By bridging the gap between human-readable instructions and machine-interpretable PromQL syntax, Promset enables the development of intelligent systems capable of automatically understanding and generating monitoring queries. This facilitates the creation of more intuitive observability tools, streamlines DevOps workflows, and opens new avenues in research on natural language-to-code translation.

Files

Institutions

Universite de Yaounde I

Promset: An annoted dataset for translating natural language to PromQl

Description

Files

Institutions

Categories

Licence