Data for: Evol-Preference : Evol-Preference: Automatic Evolution of Preference Data For Safety Alignment
Published: 15 December 2025| Version 1 | DOI: 10.17632/ck364mhrvb.1
Contributor:
云 刘Description
The PKU-Alignment team released the dataset "Beavertails", which focuses on AI safety. We have extended and optimized "Beavertails" to obtain this dataset. Readers can directly use our dataset to train large language models to enhance their usefulness and harmlessness.Training details: 70% for supervised fine-tuning(SFT), 30% direct preference optimization (DPO), training hyperparameters available in Appendix C of the paper.
Files
Steps to reproduce
The harmlessness and usefulness of a large language model can be improved by randomly sampling 70% of the data for supervised fine-tuning (SFT) and 30% of the data for direct preference optimization(DPO). Training hyperparameters are avaliable in Appendix C of the paper.
Institutions
- South China Normal University
Categories
Reinforcement Learning, Preference Learning, Ethical LLM