StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

Can a generative model be trained to produce images from a specific domain, guided by a text prompt only, without seeing any image? In other words: can an image generator be trained blindly?

Leveraging the semantic power of large scale Contrastive-Language-Image-Pre-training (CLIP) models, we present a text-driven method that allows shifting a generative model to new domains, without having to collect even a single image from those domains.

We show that through natural language prompts and a few minutes of training, our method can adapt a generator across a multitude of domains characterized by diverse styles and shapes. Notably, many of these modifications would be difficult or outright impossible to reach with existing methods.

We conduct an extensive set of experiments and comparisons across a wide range of domains. These demonstrate the effectiveness of our approach and show that our shifted models maintain the latent-space properties that make generative models appealing for downstream tasks.

StyleGAN-NADA can convert real images between domains, enabling out-of-domain image editing.

Since we train a new generator - you can even edit the images in the new domain, using your favorite off-the-shelf StyleGAN editing methods.

Methods like pSp already allow for image-to-image translation from arbitrary input domains, but their output is restricted to the domain of a pre-trained GAN. StyleGAN-NADA greatly expands the range of available GAN domains, enabling a wider range of image-to-image translation tasks such as sketch-to-drawing.

Our work focused on StyleGAN, but it can just as easily be applied to other generative architectures. For example, we can take models that convert segmentation masks to images, such as OASIS, and completely replace the identity of a class - using nothing but text!

BibTeX

If you find our work useful, please cite our paper:

@misc{gal2021stylegannada,
      title={StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators},
      author={Rinon Gal and Or Patashnik and Haggai Maron and Gal Chechik and Daniel Cohen-Or},
      year={2021},
      eprint={2108.00946},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

StyleGAN-NADA converts a pre-trained generator to new domains using only a textual prompt and no training data.

Photo → Sketch

Photo → Modegliani Painting

Photo → Mona Lisa Painting

Photo → Pixar

Photo → Ukiyo-e Painting

Human → Werewolf

Human → Zombie

Human → Neanderthal

Human → Elf

Human → Mark Zuckerberg

Photo → Raphael Painting

Church → Hut

Church → New York City

Church → Shibuya at Night

Church → Snowy Mountain

Church → The Shire

Church → Underwater Ruin

Church → Minas Morgul

Dog → The Joker

Dog → Nicolas Cage

Photo → Pixel Art

Dog → Bugs Bunny

Photo → Watercolor Art

Photo → Watercolor Art with Thick Brush

Dog → Bear

Dog → Otter

Dog → Badger

Dog → Boar

Dog → Capybara

Dog → Fox

Dog → Hamster

Dog → Koala

Dog → Lion

Dog → Meerkat

Dog → Pig

Dog → Skunk

Dog → Wolf

2015 → 1920

Photo → Dali Painting

Car → Ghost Car

Car → Gold Car

Chrome Wheels → TRON Wheels

Abstract

How does it work?

Real Image Editing

Conditional Synthesis

Cross Model Interpolation

Beyond StyleGAN

BibTeX