This paper introduces a robust version of Direct Preference Optimization (rDPO) to address the issue of noisy preferences in aligning language models with human interests. The key contribution is a novel loss function that de-biases the effect of noise on average, making the policy trained by minimizing this loss robust to noisy preferences. The proposed rDPO is shown to be robust to noise in preference labels compared to vanilla DPO and other heuristics proposed by practitioners. Theoretical analysis shows that the sub-optimality gap of the rDPO policy compared to the optimal policy is of the order $ O\left(\frac{1}{1-2\varepsilon}\sqrt{\frac{d}{n}}\right) $, where $ \varepsilon $ is the flip rate of labels, $ d $ is the policy parameter dimension, and $ n $ is the size of the dataset. Empirical experiments on IMDb sentiment generation and Anthropic's helpful-harmless dataset show that rDPO is robust to noise in preference labels compared to vanilla DPO and other baselines including DPO with label smoothing. The results demonstrate that rDPO consistently outperforms other methods across different sampling temperatures. The paper also discusses the theoretical guarantees of the proposed method and its generalization to other preference optimization methods and models.This paper introduces a robust version of Direct Preference Optimization (rDPO) to address the issue of noisy preferences in aligning language models with human interests. The key contribution is a novel loss function that de-biases the effect of noise on average, making the policy trained by minimizing this loss robust to noisy preferences. The proposed rDPO is shown to be robust to noise in preference labels compared to vanilla DPO and other heuristics proposed by practitioners. Theoretical analysis shows that the sub-optimality gap of the rDPO policy compared to the optimal policy is of the order $ O\left(\frac{1}{1-2\varepsilon}\sqrt{\frac{d}{n}}\right) $, where $ \varepsilon $ is the flip rate of labels, $ d $ is the policy parameter dimension, and $ n $ is the size of the dataset. Empirical experiments on IMDb sentiment generation and Anthropic's helpful-harmless dataset show that rDPO is robust to noise in preference labels compared to vanilla DPO and other baselines including DPO with label smoothing. The results demonstrate that rDPO consistently outperforms other methods across different sampling temperatures. The paper also discusses the theoretical guarantees of the proposed method and its generalization to other preference optimization methods and models.