We present the BeaverTails dataset for safety research in large language models. Our findings show that modeling decoupled human preferences for helpfulness and harmlessness improves LLM safety without sacrificing performance.
Building on the previous BeaverTails dataset, we introduce an RLHF algorithm with safety constraints. Using the Lagrangian method, Safe RLHF fine-tunes the balance between harmlessness and helpfulness. Our three-round fine-tuning shows better mitigation of harmful responses and improved performance over existing value-aligned algorithms.
This work has been graciously promote-tweeted by Ahsen (@_akhaliq).
Baichuan 2: Open large-scale language models
Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, Juntao Dai, Kun Fang, Lei Su Liang Song, Lifeng Liu, Liyun Ru, Luyao Ma, Mang Wang, Mickel Liu, MingAn Lin, Nuolan Nie, Peidong Guo, Ruiyang Sun, Tao Zhang, Tianpeng Li, Tianyu Li, Wei Cheng, Weipeng Chen, Xiangrong Zeng, Xiaochuan Wang, Xiaoxi Chen, Xin Men, Xin Yu, Xuehai Pan, Yanjun Shen, Yiding Wang, Yiyu Li, Youxin Jiang, Yuchen Gao, Yupeng Zhang, Zenan Zhou, Zhiying Wu (alphabetical ordering) Preprint
/
Code
/
Hugging Face
/
Bloomberg ($1bn valuation) Technical report in public archive
During my time at Baichuan, I have participated in the open-sourcing of our LLMs, which were trained from scratch on 2.6 trillion tokens, and credited as an author to in the corresponding technical report.
Baichuan 2 matches or exceeds the performance of other open-source models in its class across various public benchmarks, including MMLU, CMMLU, GSM8K, and HumanEval. Notably, Baichuan2-13B-Chat achieved the highest score on the SuperCLUE-agent benchmark among all open-sourced models.
Active3DPose Presents a multi-agent reinforcement learning (MARL) scheme for proactive Multi-Camera Collaboration in 3D Human Pose Estimation in dynamic human crowds. Aerial cameras are decentralized, self-interest agents, but need to collaborate to complete the given task. We proposed a reward structure inspired by the solution concept of Shapley value that helps facilitating the process of collaboration. The simulation environment is built using UnrealEngine and we used distributive RL framework Ray RLlib to train our agents.
We introduce the Multi-Agent Tracking Environment (MATE), a novel multi-agent environment simulates the target coverage control problems in the real world. MATE hosts an asymmetric cooperative-competitive game consisting of two groups of learning agents — "cameras" and "targets" — with opposing interests. This process of co-evolution between cameras and targets helps to realize a less exploitable camera network.
We introduce a framework designed to expedite SafeRL research endeavors. Our framework encompasses an collection of algorithms spanning different RL domains and places heavy emphasis on safety elements.