This paper addresses the challenge of generating personalized text with fine-grained linguistic control, focusing on controlling multiple linguistic dimensions such as lexical and syntactic attributes. The authors introduce a novel benchmark to evaluate the ability of generative models to generate text that reflects specific stylistic attributes. They construct the benchmark using data from various sources, including blogs, movie reviews, and product reviews, and extract linguistic features such as lexical usage, morpho-syntactic information, and discourse coherence. The evaluation metrics include success rate, relative improvement over a random baseline, and grammatical error detection. The paper compares the performance of different large language models (LLMs), including GPT-3.5 and Pythia models, and analyzes their sensitivity to changes in stylistic attributes. The results show that the 1B Pythia Prefix model performs the best across all metrics, demonstrating the effectiveness of incorporating attribute-specific features into the training process. The paper also explores the impact of training sample size on model performance and discusses related work in personalized language modeling and multi-attribute controlled generation. The authors make their code, data, and pre-trained models publicly available to encourage further research in this area.This paper addresses the challenge of generating personalized text with fine-grained linguistic control, focusing on controlling multiple linguistic dimensions such as lexical and syntactic attributes. The authors introduce a novel benchmark to evaluate the ability of generative models to generate text that reflects specific stylistic attributes. They construct the benchmark using data from various sources, including blogs, movie reviews, and product reviews, and extract linguistic features such as lexical usage, morpho-syntactic information, and discourse coherence. The evaluation metrics include success rate, relative improvement over a random baseline, and grammatical error detection. The paper compares the performance of different large language models (LLMs), including GPT-3.5 and Pythia models, and analyzes their sensitivity to changes in stylistic attributes. The results show that the 1B Pythia Prefix model performs the best across all metrics, demonstrating the effectiveness of incorporating attribute-specific features into the training process. The paper also explores the impact of training sample size on model performance and discusses related work in personalized language modeling and multi-attribute controlled generation. The authors make their code, data, and pre-trained models publicly available to encourage further research in this area.