Generating Images Using Diffusion Models (Similar To DALL-E 2)

FYI

This post is much less useful, compared to my new post on Stable Diffusion.

Reference

This entire effort, and blog post, was inspired by a single Tweet by Simon Willison (@simonw).

Hugging Face recently announced that new diffusion models have been released. For those that are unaware, diffusion models are able to take in some user input, some noise, and create content similar to what its been trained on. In this case, these models were trained on images, meaning you can pass in a prompt, and have the model produce an image related to that prompt.

Let’s look at some examples prompts and the respective generated image:

teddy bear

beagle dog

Abraham Lincoln

tortuga

cute puppy

I did hand pick these images, as some of them can look quite messy or even scary. Especially when you are trying to get human faces to be generated, as often times the nose, eyes or ears will not be fully expressed, and potentially be in some sort of uncanny valley.

The Code

Let’s walk through how you can produce these images yourself. Fortunately Hugging Face makes it incredibly easy to do so. You’ll first want to install your two dependencies

$ !pip install diffusers transformers

From here, the code is relatively straight forward.

Example Note

This code is not the cleanest and it is being provided simply as a starting point.

# Import the necessary Hugging Face pipeline
from diffusers import DiffusionPipeline

# Load the pre-trained Hugging Face diffusion model
ldm = DiffusionPipeline.from_pretrained('CompVis/ldm-text2im-large-256')

# Define a list of prompts you want to generate images for
prompts = [
  "beagle dog",
  "a cute fish"
]

# Simply loop over all the prompts, generate the images, and save them
for prompt in prompts:
  print(f'processing prompt {prompt}')

  images = ldm(
      [prompt],
      num_inference_steps=50,
      eta=0.3,
      guidance_scale=6
  )['sample']

  print(f'saving {len(images)} images')
  for idx, image in enumerate(images):
    file_path = f"./{prompt.replace(' ', '-').lower()}-{idx}.png"
    print(f'saving to {file_path}')
    image.save(file_path)

One can even run this code in a Google Collab notebook and get access to a free GPU. This makes image production take roughly only 11 seconds, compared to the roughly five minutes my old server CPU takes.

In under thirty lines, you have the ability to access a pre-trained diffusion model, and generate images from your own prompts. Cheers!

The Code#

The Code