A systematic review of AI-based automated written feedback research identifies 83 SSCI-indexed articles (1993–2022) on automated written feedback (AWF). The study explores research contexts, AWF systems, feedback focus, utilization methods, research design, investigation foci, and results. AWF is primarily studied in tertiary-level language and writing classes with English as the target language, but its scope has broadened to include diverse language environments and settings. AWF systems vary widely, with 31 different systems identified, and research designs show diversity in genres, participants, sample sizes, and methodologies. Three main foci of investigation were identified: AWF performance, perceptions, uses, and engagement with AWF, and its effects. Results show mixed outcomes, with less positive validation of AWF compared to other areas. AWF systems like Criterion, Pigai, and Grammarly are commonly used, and feedback focuses on form-related aspects more than meaning. AWF is often used in isolation, though some studies compare it with human feedback. Research design shows a variety of writing genres, participants, and data sources. The study highlights the importance of argument-based validity in assessing AWF, emphasizing the need for further research on domain analysis, accuracy, and the effectiveness of AWF in language learning. While AWF shows positive effects on writing performance, its accuracy and alignment with human feedback remain areas for improvement. The study also notes the potential of generative AI in enhancing AWF. Overall, AWF research is diverse, with a need for more comprehensive studies to address validation, effectiveness, and integration into language learning. The review underscores the importance of considering AWF's role in complementing or replacing human feedback, and the need for further research on its impact on various writing-related factors.A systematic review of AI-based automated written feedback research identifies 83 SSCI-indexed articles (1993–2022) on automated written feedback (AWF). The study explores research contexts, AWF systems, feedback focus, utilization methods, research design, investigation foci, and results. AWF is primarily studied in tertiary-level language and writing classes with English as the target language, but its scope has broadened to include diverse language environments and settings. AWF systems vary widely, with 31 different systems identified, and research designs show diversity in genres, participants, sample sizes, and methodologies. Three main foci of investigation were identified: AWF performance, perceptions, uses, and engagement with AWF, and its effects. Results show mixed outcomes, with less positive validation of AWF compared to other areas. AWF systems like Criterion, Pigai, and Grammarly are commonly used, and feedback focuses on form-related aspects more than meaning. AWF is often used in isolation, though some studies compare it with human feedback. Research design shows a variety of writing genres, participants, and data sources. The study highlights the importance of argument-based validity in assessing AWF, emphasizing the need for further research on domain analysis, accuracy, and the effectiveness of AWF in language learning. While AWF shows positive effects on writing performance, its accuracy and alignment with human feedback remain areas for improvement. The study also notes the potential of generative AI in enhancing AWF. Overall, AWF research is diverse, with a need for more comprehensive studies to address validation, effectiveness, and integration into language learning. The review underscores the importance of considering AWF's role in complementing or replacing human feedback, and the need for further research on its impact on various writing-related factors.