Data for: Evol-Preference : Evol-Preference: Automatic Evolution of Preference Data For Safety Alignment

Published: 15 December 2025| Version 1 | DOI: 10.17632/ck364mhrvb.1
Contributor:

Description

The PKU-Alignment team released the dataset "Beavertails", which focuses on AI safety. We have extended and optimized "Beavertails" to obtain this dataset. Readers can directly use our dataset to train large language models to enhance their usefulness and harmlessness.Training details: 70% for supervised fine-tuning(SFT), 30% direct preference optimization (DPO), training hyperparameters available in Appendix C of the paper.

Files

Steps to reproduce

The harmlessness and usefulness of a large language model can be improved by randomly sampling 70% of the data for supervised fine-tuning (SFT) and 30% of the data for direct preference optimization(DPO). Training hyperparameters are avaliable in Appendix C of the paper.

Institutions

  • South China Normal University

Categories

Reinforcement Learning, Preference Learning, Ethical LLM

Licence