Slicedit is a zero-shot video editing method that leverages a pretrained text-to-image (T2I) diffusion model to process both spatial and spatiotemporal slices of videos. The method aims to edit challenging long videos with complex nonrigid motion and occlusions while preserving regions not specified in the text prompt. By applying the T2I model on spatiotemporal slices, Slicedit enhances temporal consistency and maintains the structure and motion of the original video. The method involves inflating the T2I denoiser to work on videos, using extended attention for capturing dynamics between frames, and processing spatiotemporal slices to enforce temporal consistency. Extensive experiments demonstrate Slicedit's ability to edit a wide range of real-world videos, showing superior performance compared to existing methods in terms of editing fidelity, structure preservation, and temporal consistency. The method is evaluated on various datasets and compared against state-of-the-art techniques, with results highlighting its effectiveness in preserving the specified regions while adhering to the target text prompt.Slicedit is a zero-shot video editing method that leverages a pretrained text-to-image (T2I) diffusion model to process both spatial and spatiotemporal slices of videos. The method aims to edit challenging long videos with complex nonrigid motion and occlusions while preserving regions not specified in the text prompt. By applying the T2I model on spatiotemporal slices, Slicedit enhances temporal consistency and maintains the structure and motion of the original video. The method involves inflating the T2I denoiser to work on videos, using extended attention for capturing dynamics between frames, and processing spatiotemporal slices to enforce temporal consistency. Extensive experiments demonstrate Slicedit's ability to edit a wide range of real-world videos, showing superior performance compared to existing methods in terms of editing fidelity, structure preservation, and temporal consistency. The method is evaluated on various datasets and compared against state-of-the-art techniques, with results highlighting its effectiveness in preserving the specified regions while adhering to the target text prompt.