ECCV 2026

PeCA: Palette Context Assisted Inference for Test-Time Paint-Bucket Colourisation on Animation Videos

“A colour shines in its surroundings.”
Ludwig Wittgenstein

Dongheng Lin, Jianbo Jiao

The MIx Group, University of Birmingham

Conceptual overview of PeCA using spatial and temporal context for paint-bucket colourisation.
PeCA uses palette level contexts to improve region-level paint-bucket colour assignment.
Abstract

In animation production, paint-bucket colourisation assigns each enclosed region in line sketches a colour from reference design sheets. Recent automatic paint-bucket colourisation pipelines mirror this workflow via region correspondence, but correspondences can be brittle when regions are ambiguous fragments without proper context.

We propose Palette Context Assisted (PeCA), a training-free, plug-and-play framework for animation video colourisation that improves test-time reasoning over spatial and temporal contexts. PeCA strengthens reference coverage, aggregates noisy colour evidence, and refines predictions over time while preserving the production requirement of discrete palette colours.

Method

🧩 Context for region-to-palette assignment

The input is a target line-art frame and one or more coloured references. The output is not a free-form generated image: each enclosed target region must receive one discrete palette colour from the references. PeCA keeps this paint-bucket interface, but makes the underlying region matching less brittle by adding context at inference time.

Overview diagram of the PeCA inference pipeline.
In plain terms, PeCA asks three questions before assigning a colour: which reference views better cover this target shot, which candidate region matches agree on the same palette colour, and whether neighbouring frames support or contradict the current prediction. These become spatial, probabilistic, and temporal context.
Active Reference Expansion algorithm diagram.
Spatial Context

Active Reference Expansion

Builds a target-aware reference pool from cheap geometric view proposals, then selects useful supports for difficult target frames.

Soft kNN palette voting diagram for Probability Aggregation.
Probabilistic Context

Probability Aggregation

Aggregates top correspondence evidence in colour space, reducing sensitivity to individual spurious region matches.

Cyclic-gated Temporal Fusion algorithm diagram.
Temporal Context

Cyclic-gated Temporal Fusion

Uses neighbouring frames as temporal context and gates unreliable matches to avoid propagating colourisation mistakes.

Results

📊 Experiment Results

Across diverse settings, PeCA improves region-to-palette assignment while preserving paint-bucket constraints.

Natural Videos

🌿 Extending to semantic matching in natural videos

The same idea can be tested outside cartoon colourisation by replacing palette colours with semantic labels. In the VIPSeg diagnostic, reference frames provide panoptic semantic labels, target frames are over-segmented into SLIC superpixels, and the task is to propagate semantic labels from reference regions to target regions by matching region descriptors.

Pipeline for reference-guided semantic region label propagation on natural videos.
Pipeline: external labelled reference frames and target RGB frames are converted into superpixels, then reference labels are propagated through region matching.

VIPSeg region label propagation results

PeCA improves direct hard matching across two generic pretrained backbones and all metrics, suggesting the context mechanism transfers beyond cartoon palette assignment.

Backbone Pipeline Seg-Acc Pix-Acc Pix-MIoU
SAM2.1-LargeBase33.3533.056.78
SAM2.1-LargePeCA (ours)38.9538.7910.85
DINOv2 ViT-L/14Base44.1244.0312.68
DINOv2 ViT-L/14PeCA (ours)52.4752.3819.23

Evaluation uses the VIPSeg validation split: 343 videos and 8,255 frames. Metrics are frame-wise averages over SLIC superpixel predictions.

Qualitative VIPSeg semantic label propagation examples comparing Base, PeCA, and ground truth.
Qualitative examples: compared with direct matching, PeCA produces more coherent semantic regions with fewer fragmented labels.
Citation

📚 BibTeX

@inproceedings{lin2026peca,
  title={PeCA: Palette Context Assisted Inference for Test-Time Paint-Bucket Colourisation on Animation Videos},
  author={Dongheng Lin and Jianbo Jiao},
  booktitle={European Conference on Computer Vision},
  year={2026}
}