Interpretability through Training Samples: Data Attribution for Diffusion Models

Published:

Interpretability through Training Samples: Data Attribution for Diffusion Models

Tong Xie, Haoyu Li, Andrew Bai, Cho-jui Hsieh

Data attribution methods help interpret how neural networks behave by linking the model behavior to their training data. We extend the first-order influence approximation, TracIn, to diffusion models by incorporating the denoising timestep dynamics. We demonstrate that this influence estimation may be biased due to dominating gradient norms. To this end, Diffusion-ReTrac with a renormalization technique is introduced, enabling notably more localized influence estimation and the targeted attribution of training samples.

image_tracing-1