This work investigates text-to-texture synthesis using diffusion models to generate physically-based texture maps. We aim to achieve realistic model appearances under varying lighting conditions. A prominent solution for the task is score distillation sampling. It allows recovering a complex texture using gradient guidance given a differentiable rasterization and shading pipeline. However, in practice, the aforementioned solution in conjunction with the widespread latent diffusion models produces severe visual artifacts and requires additional regularization such as implicit texture parameterization. As a more direct alternative, we propose an approach using cascaded diffusion models for texture synthesis (CasTex). In our setup, score distillation sampling yields high-quality textures out-of-the-box. In particular, we were able to omit implicit texture parameterization in favor of an explicit parameterization to improve the procedure. In the experiments, we show that our approach significantly outperforms state-of-the-art optimization-based solutions on public texture synthesis benchmarks.
@article{aliev2025castex,
title={CasTex: Cascaded Text-to-Texture Synthesis via Explicit Texture Maps and Physically-Based Shading},
author={Aliev, Mishan and Baranchuk, Dmitry and Struminsky, Kirill},
journal={arXiv preprint arXiv:2504.06856},
year={2025}
}