This paper proposes a novel LLM-based explainable recommendation model, LLM2ER-EQR, to address three low-quality problems in explainable recommendation systems: lack of personalization, inconsistency, and questionable explanation data. The model is trained using a reinforcement learning paradigm with two explainable quality reward models, CCR and HQAR, to improve the quality of generated explanations. The backbone model, LLM2ER, uses a concept graph extracted from reviews to predict ratings and infer reasoning paths between user-item pairs. The CCR model enhances explanation consistency by aligning generated explanations with sentiment-wise candidate concepts, while the HQAR model aligns generated explanations with high-quality explanations using a generative adversarial network. The model is evaluated on three real-world datasets and shows significant improvements in explanation quality, diversity, and personalization compared to existing methods. The results demonstrate that LLM2ER-EQR can generate high-quality, personalized, and consistent explanations that align with user preferences and item features. The paper also discusses the limitations of large language models, including the risk of generating offensive content, and suggests post-processing steps to mitigate these risks. The study contributes to the field of explainable recommendation by providing a novel approach to improve the quality of explanations generated by large language models.This paper proposes a novel LLM-based explainable recommendation model, LLM2ER-EQR, to address three low-quality problems in explainable recommendation systems: lack of personalization, inconsistency, and questionable explanation data. The model is trained using a reinforcement learning paradigm with two explainable quality reward models, CCR and HQAR, to improve the quality of generated explanations. The backbone model, LLM2ER, uses a concept graph extracted from reviews to predict ratings and infer reasoning paths between user-item pairs. The CCR model enhances explanation consistency by aligning generated explanations with sentiment-wise candidate concepts, while the HQAR model aligns generated explanations with high-quality explanations using a generative adversarial network. The model is evaluated on three real-world datasets and shows significant improvements in explanation quality, diversity, and personalization compared to existing methods. The results demonstrate that LLM2ER-EQR can generate high-quality, personalized, and consistent explanations that align with user preferences and item features. The paper also discusses the limitations of large language models, including the risk of generating offensive content, and suggests post-processing steps to mitigate these risks. The study contributes to the field of explainable recommendation by providing a novel approach to improve the quality of explanations generated by large language models.