InstanceAnimator: Multi-Instance Sketch Video Colorization

arXiv 2025

1HKUST(GZ), 2HKUST, 3NUS
results

Example results of InstanceAnimator. Given diverse instances, sketch sequences, and textual descriptions, our framework enables high-quality, controllable video colorization with multi-instance and background customization

Demo Video

Overview

abstract

Motivation. Unlike traditional methods that require multi-stage, frame-by-frame colorization, InstanceAnimator supports col- orizing sketch sequences into videos directly after background and character design. Our method no longer relys in single reference keyframe but adjusts instance-level colorization, providing higher user flexibility and significantly reducing time consumption and labor
Overview. Existing animation colorization methods rely heavily on a single initial reference frame, resulting in fragmented workflows and limited customizability. To eliminate these constraints, we introduce a Canvas Guidance Condition that allows users to freely place reference elements on a blank canvas, enabling flexible user control. To address the misalignment and quality degradation issues of DiT-based approaches, we design an Instance Matching Mechanism that integrates the instances with the sketch and noise channels, ensuring visual consistency across different sequences while maintaining controllability. Additionally, to mitigate the degradation of fine-grained instance and background details, we propose an Adaptive Decoupled Control Module that injects semantic features from characters, backgrounds, and text conditions into the diffusion model, significantly enhancing detail fidelity.


Framework

pipeline

Overview of InstanceAnimator. We first fuse instance latent features with sketch and noise channels to establish a corre- spondence between the line drawing and the reference instance, as well as to maintain the character feature. Concurrently, instances, background, and text descriptions are fed into the Adaptive Decoupled Control Module independently, which dynamically injects condi- tion information into DiT blocks through three condition-specific expert modules. At the inference stage, users can adjust the conditional weights to enhance controllability and creative flexibility


Dataset

The current colorization datasets are generally based on reference frames and protect multiple scenes, which cannot meet the instance-level colorization task. Therefore, we constructed this dataset. This dataset is designed for the instance-aware sketch video colorization task and aimed at promoting the development of automatic animation techniques. This dataset includes two parts. One is a filter subset from the Sakuga42M dataset, the other is collected from animation films on the internet. The total video clips are about 40K+, and each video clip is in the same scene, which is cut by scene-cutting algorithms. For each data clip, we provide a reference frame, multiple reference instances, a background image, and a text description.

pipeline

More Results

face

Given the same sketch and different reference instances, InstanceAnimator generates a variety of colorful videos.


scene

Using the same designed characters, our framework colorizes different sketches with consistent colors and user-customized backgrounds.


scene

Visual background control. Our method supports customized visual background control in combination with reference instance and line drawing during the colorization process.


more_results

Colorization from Different Style Reference. For the same character, our method easily learn features. For different style characters, our method can transfer their style to the sketches even without prior knowledge.


more_results

Multiple Reference Instances Control. While we obtain different style reference instances from the internet, our method can match the most similar character and follow the reference instance style to colorize the sketch sequence

BibTeX

If you find this paper helpful to your work, please cite our paper with the following BibTeX reference:

 ...