This paper investigates the impact of fine-tuning data on downstream factuality in large language models (LLMs). The study shows that fine-tuning on well-known facts improves factuality, while fine-tuning on less-known facts can degrade it. This is because models may rely on generic "shortcuts" rather than stored knowledge when answering questions. Theoretical analysis and experiments on three question-answering benchmarks (PopQA, Entity Questions, and MMLU) and two LLMs (Llama-2-7B and Mistral-7B) reveal that fine-tuning on the top 50% of well-known facts outperforms fine-tuning on the entire dataset by 7% on MMLU, 6% on PopQA, and 4% on Entity Questions. The study also shows that fine-tuning on a smaller subset of well-known facts can match the performance of fine-tuning on the entire dataset. The findings highlight the importance of considering how facts are stored in the pretrained model when fine-tuning for knowledge-intensive tasks. Theoretical analysis of a one-layer transformer model demonstrates that fact salience, or how well a fact is stored in the model, plays a key role in determining downstream factuality. The study also shows that attention imbalance during fine-tuning can suppress pretrained knowledge, leading to less accurate answers. The results suggest that focusing on well-known facts during fine-tuning can improve factuality, while focusing on less-known facts can harm it. The study provides insights into the mechanisms of fine-tuning and offers practical guidance for data curation in knowledge-intensive tasks.This paper investigates the impact of fine-tuning data on downstream factuality in large language models (LLMs). The study shows that fine-tuning on well-known facts improves factuality, while fine-tuning on less-known facts can degrade it. This is because models may rely on generic "shortcuts" rather than stored knowledge when answering questions. Theoretical analysis and experiments on three question-answering benchmarks (PopQA, Entity Questions, and MMLU) and two LLMs (Llama-2-7B and Mistral-7B) reveal that fine-tuning on the top 50% of well-known facts outperforms fine-tuning on the entire dataset by 7% on MMLU, 6% on PopQA, and 4% on Entity Questions. The study also shows that fine-tuning on a smaller subset of well-known facts can match the performance of fine-tuning on the entire dataset. The findings highlight the importance of considering how facts are stored in the pretrained model when fine-tuning for knowledge-intensive tasks. Theoretical analysis of a one-layer transformer model demonstrates that fact salience, or how well a fact is stored in the model, plays a key role in determining downstream factuality. The study also shows that attention imbalance during fine-tuning can suppress pretrained knowledge, leading to less accurate answers. The results suggest that focusing on well-known facts during fine-tuning can improve factuality, while focusing on less-known facts can harm it. The study provides insights into the mechanisms of fine-tuning and offers practical guidance for data curation in knowledge-intensive tasks.