DCMS DLC Coatings Producing Assistant Testing Data

Name: DCMS DLC Coatings Producing Assistant Testing Data
Creator: Georgiy Asaturov
Published: 2025-10-10T09:52:11.349Z
Keywords: Magnetron Sputtering, Diamond-Like Carbon Coating, Large Language Model, Retrieval-Augmented LLM

Asaturov, Georgiy; Guda, Sergey; Lifar, Mikhail; Kudryakov, Oleg; Alexander, Soldatov; Kolesnikov, Igor

doi:10.17632/448b6shh8y.1

DCMS DLC Coatings Producing Assistant Testing Data

Published: 10 October 2025| Version 1 | DOI: 10.17632/448b6shh8y.1

Contributors:

Georgiy Asaturov, Sergey Guda, Mikhail Lifar, Oleg Kudryakov, Soldatov Alexander, Igor Kolesnikov

Description

This dataset supports the research paper on the development of a specialized intelligent assistant for diamond-like carbon (DLC) coatings. The central hypothesis is that a Large Language Model (LLM) enhanced with a domain-specific Retrieval Augmented Generation (RAG) technique can significantly outperform base LLMs in handling complex technical tasks within the niche field of magnetron-sputtered DLC coatings. The data was gathered to train and rigorously evaluate this hypothesis. It represents a curated knowledge base of scientific publications and a series of technical queries and answers related to DLC coating processes, properties, and problem-solving. Each row in the dataset corresponds to a specific scientific document. The core data includes the original Filename, Title, and summaries of the source papers. The key generated fields are the Questions and answers and the Relevant papers retrieved by the RAG system for each query. The performance data is captured through multiple evaluation columns. For both the specialized Assistant (powered by either DeepSeek or GLM) and the Base LLMs, the dataset provides the model's generated answers, overall evaluation verdicts, and fractional scores indicating the rate of fully correct answers (A evals fraction) and partially correct answers (A,C evals fraction). The data shows a notable finding: the RAG-enhanced Assistant achieved a dramatically higher accuracy of 87% in responding to technical questions compared to the 25% accuracy of the base LLM, quantitatively demonstrating the value of domain-specific knowledge augmentation. Researchers can use this dataset to analyze the types of questions where the specialized system succeeds or fails, understand the relevance of the retrieved papers for accurate answering, and potentially use the question-answer pairs as a benchmark for developing their own domain-specific AI assistants in materials science.

Files

Funders

Russian Science Foundation
Russia
Grant ID: 25-19-00304

DCMS DLC Coatings Producing Assistant Testing Data

Description

Files

Categories

Funders

Licence