Large Language Models in Materials Science: Evaluating RAG Performance in Graphene Synthesis Using RAGAS

Name: Large Language Models in Materials Science: Evaluating RAG Performance in Graphene Synthesis Using RAGAS
Creator: Zen Han Cho
Published: 2025-09-02T19:10:14.429Z
Keywords: Materials Science Engineering, Large Language Model

Cho, Zen Han; Osvaldo, Matthew; Doloi, Sayan; Das, Maloy; Goh, Jun Ci; Tan, Bo Sheng; Wang , Jiali; Li , Yujia; Xiao, Xingchi; Joshi, Amrita; Ng, Leonard

doi:10.17632/ry7phxn4js.2

Large Language Models in Materials Science: Evaluating RAG Performance in Graphene Synthesis Using RAGAS

Published: 2 September 2025| Version 2 | DOI: 10.17632/ry7phxn4js.2

Contributors:

,

Description

Retrieval-Augmented Generation (RAG) systems increasingly support scientific research, yet evaluating their performance in specialized domains remains challenging due to the technical complexity and precision requirements of scientific knowledge. This study presents the first systematic analysis of automated evaluation frameworks for scientific RAG systems, focusing on the RAGAS framework applied to RAG-augmented large language models in materials science, with graphene synthesis as a representative case study. We develop a comprehensive evaluation protocol comparing four assessment approaches: RAGAS (an automated RAG evaluation framework), BERTScore, LLM-as-a-Judge, and expert human evaluation across 20 domain-specific questions. Our analysis reveals that while automated metrics can capture relative performance improvements from retrieval augmentation, they exhibit fundamental limitations in absolute score interpretation for scientific content. RAGAS successfully identified performance gains in RAG-augmented systems (0.52-point improvement for Gemini, 1.03-point for Qwen on a 10-point scale), demonstrating particular sensitivity as well as retrieval benefits for smaller, open-source models. These findings establish methodological guidelines for scientific RAG evaluation and highlight critical considerations for researchers deploying AI systems in specialized domains

Files

Institutions

Nanyang Technological University

Large Language Models in Materials Science: Evaluating RAG Performance in Graphene Synthesis Using RAGAS

Description

Files

Institutions

Categories

Licence