This paper addresses the challenge of integrating Large Language Models (LLMs) into recommender systems to enhance their conversational, explainable, and controllable capabilities. The authors introduce a two-stage approach: a supervised learning (SL) stage and a reinforcement learning (RL) stage, to improve LLMs' proficiency in adhering to recommendation-specific instructions. The SL stage involves a suite of tasks designed to enhance controllability and label augmentation using a teacher recommender model, such as SASRec. The RL stage uses a Proximal Policy Optimization (PPO) algorithm with tailored reward signals to further refine the LLM's ability to follow instructions. Extensive experiments on the Amazon Movie and Steam datasets demonstrate that the proposed method significantly improves the LLMs' capability to respond to instructions, reduces formatting errors, and maintains high accuracy. The contributions of the paper include a novel supervised learning stage, an RL stage with specialized rewards, and comprehensive experimental results showing superior performance compared to existing LLM-based recommendation models.This paper addresses the challenge of integrating Large Language Models (LLMs) into recommender systems to enhance their conversational, explainable, and controllable capabilities. The authors introduce a two-stage approach: a supervised learning (SL) stage and a reinforcement learning (RL) stage, to improve LLMs' proficiency in adhering to recommendation-specific instructions. The SL stage involves a suite of tasks designed to enhance controllability and label augmentation using a teacher recommender model, such as SASRec. The RL stage uses a Proximal Policy Optimization (PPO) algorithm with tailored reward signals to further refine the LLM's ability to follow instructions. Extensive experiments on the Amazon Movie and Steam datasets demonstrate that the proposed method significantly improves the LLMs' capability to respond to instructions, reduces formatting errors, and maintains high accuracy. The contributions of the paper include a novel supervised learning stage, an RL stage with specialized rewards, and comprehensive experimental results showing superior performance compared to existing LLM-based recommendation models.