The paper introduces T2I-Adapter, a lightweight and simple model designed to enhance the controllability of text-to-image (T2I) diffusion models without altering their original network topology or generation capabilities. T2I-Adapter aligns the internal knowledge of pre-trained T2I models with external control signals, enabling more precise and flexible generation. The method supports various types of guidance, including color, depth, sketch, semantic segmentation, and keypose, and allows for local editing and composable guidance. Extensive experiments demonstrate that T2I-Adapter improves generation quality and achieves a wide range of applications, including multi-condition control and generalization to custom models. The paper also discusses the design and optimization of T2I-Adapter, as well as its limitations and future directions.The paper introduces T2I-Adapter, a lightweight and simple model designed to enhance the controllability of text-to-image (T2I) diffusion models without altering their original network topology or generation capabilities. T2I-Adapter aligns the internal knowledge of pre-trained T2I models with external control signals, enabling more precise and flexible generation. The method supports various types of guidance, including color, depth, sketch, semantic segmentation, and keypose, and allows for local editing and composable guidance. Extensive experiments demonstrate that T2I-Adapter improves generation quality and achieves a wide range of applications, including multi-condition control and generalization to custom models. The paper also discusses the design and optimization of T2I-Adapter, as well as its limitations and future directions.