Follow-Your-Emoji is a diffusion-based framework for portrait animation that enables the animation of reference portraits using target landmark sequences. The main challenges in portrait animation include preserving the identity of the reference portrait while transferring the target expression and maintaining temporal consistency and fidelity. To address these challenges, Follow-Your-Emoji introduces two key technologies: expression-aware landmarks and a facial fine-grained loss. Expression-aware landmarks provide a more accurate and expressive motion signal, enabling better alignment between the reference portrait and target motion. The facial fine-grained loss helps the model focus on subtle expression changes and detailed appearance reconstruction. Additionally, the framework employs a progressive generation strategy to enable stable long-term animation. To evaluate the method, a comprehensive benchmark called EmojiBench is introduced, consisting of diverse portrait images, driving videos, and landmarks. The results show that Follow-Your-Emoji outperforms existing methods in terms of expression generation, identity preservation, and motion rendering. The framework is capable of handling a wide range of styles, including real humans, cartoons, sculptures, and even animals. The method demonstrates significant performance in controlling the expression of freestyle portraits and is effective in generating high-quality, long-term animations.Follow-Your-Emoji is a diffusion-based framework for portrait animation that enables the animation of reference portraits using target landmark sequences. The main challenges in portrait animation include preserving the identity of the reference portrait while transferring the target expression and maintaining temporal consistency and fidelity. To address these challenges, Follow-Your-Emoji introduces two key technologies: expression-aware landmarks and a facial fine-grained loss. Expression-aware landmarks provide a more accurate and expressive motion signal, enabling better alignment between the reference portrait and target motion. The facial fine-grained loss helps the model focus on subtle expression changes and detailed appearance reconstruction. Additionally, the framework employs a progressive generation strategy to enable stable long-term animation. To evaluate the method, a comprehensive benchmark called EmojiBench is introduced, consisting of diverse portrait images, driving videos, and landmarks. The results show that Follow-Your-Emoji outperforms existing methods in terms of expression generation, identity preservation, and motion rendering. The framework is capable of handling a wide range of styles, including real humans, cartoons, sculptures, and even animals. The method demonstrates significant performance in controlling the expression of freestyle portraits and is effective in generating high-quality, long-term animations.