Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Half precision] Make sure half-precision is correct #182

Merged
merged 8 commits into from
Aug 16, 2022

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Aug 15, 2022

This PR fixes a couple of things to allow stable diffusion to give 1-to-1 the same results as the original implementation

Changes include:

  • Make sure group norm is run always in fp32
  • Change ordering of how time embeddings are computed -> for some reason this makes a big difference for stable diffusion but not really for other models it seems
  • Make sure time embeddings are expanded if batch size > 1
  • Adapt DDIM sampler to allow to be 1-to-1 compatible with stable diffusion

Both DDIM and PNDM work with the following script:

#!/usr/bin/env python3
from diffusers import StableDiffusionPipeline, DDIMScheduler
from time import time
from PIL import Image
from einops import rearrange
import numpy as np
import torch
from torch import autocast
from torchvision.utils import make_grid

torch.manual_seed(42)

prompt = "a photograph of an astronaut riding a horse"
#prompt = "a photograph of the eiffel tower on the moon"
#prompt = "an oil painting of a futuristic forest gives"

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=True)  # make sure you're logged in with `huggingface-cli login`

# uncomment to use DDIM
#scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
#pipe.scheduler = scheduler

all_images = []
num_rows = 1
num_columns = 4
for _ in range(num_rows):
    with autocast("cuda"):
        images = pipe(num_columns * [prompt], guidance_scale=7.5, output_type="np")["sample"]  # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)
        all_images.append(torch.from_numpy(images))

# additionally, save as grid
grid = torch.stack(all_images, 0)
grid = rearrange(grid, 'n b h w c -> (n b) h w c')
grid = rearrange(grid, 'n h w c -> n c h w')
grid = make_grid(grid, nrow=num_rows)

# to image
grid = 255. * rearrange(grid, 'c h w -> h w c').cpu().numpy()
image = Image.fromarray(grid.astype(np.uint8))

image.save(f"./images/diffusers/{'_'.join(prompt.split())}_{round(time())}.png")

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 15, 2022

The documentation is not available anymore as the PR was closed or merged.

@patrickvonplaten
Copy link
Contributor Author

The script above gives for:

PNDM:
a_photograph_of_an_astronaut_riding_a_horse_1660583008 (3)

And for DDIM
a_photograph_of_an_astronaut_riding_a_horse_1660582465 (2)

@patrickvonplaten
Copy link
Contributor Author

patrickvonplaten commented Aug 15, 2022

The original code base: https://github.com/CompVis/stable-diffusion gives with the following command:

CUDA_VISIBLE_DEVICES="0" python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code --plms

for PNDM:
grid-0006 (1)

and

CUDA_VISIBLE_DEVICES="0" python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code

for DDIM:
grid-0005

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for fixing these sneaky bugs! Looks good to me. Left some nit.

Just one question: Why do we need the negative_alpha_cumprod ?

Comment on lines +53 to +59
text_input = self.tokenizer(
prompt,
padding="max_length",
max_length=self.tokenizer.model_max_length,
truncation=True,
return_tensors="pt",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important here to always pad to max_length, as that's how the model was trained.

Copy link
Contributor Author

@patrickvonplaten patrickvonplaten Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes agree, but let's make sure to not do this when we create our text to image training script (it's def cleaner to mask out padding tokens and should help the model learn better as stated by Katherine on Slack as well)
cc @anton-l

@@ -75,7 +76,7 @@ def __init__(

self.alphas = 1.0 - self.betas
self.alphas_cumprod = np.cumprod(self.alphas, axis=0)
self.one = np.array(1.0)
self.negative_alpha_cumprod = np.array(1.0) if do_neg_alpha_one else self.alphas_cumprod[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why this is needed, could you maybe add some comment here explaining why we added it here as this True by default.
Does it affect other pipelines ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah maybe we need a better name here indeed. Let me know if you have better ideas?
In short the story is the following:

For every step we need to know the previous step. Just at the step t=0 there is no previous step -> so what should we do?
What is the default here now and what was used before (no breaking change) is the we just set it to 1, but DDIM in stable diffusion instead uses the highest number that exists in the cumulative products of the alphas which is often smaller then one. This can actually make quite a difference in the end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I changed it to clip_alpha_at_one - think the naming is a bit better no? Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_alpha_to_one, since it's not necessarily clipping it? (Clipping, to me, suggests that we perform the cumulative product and then make sure it's not larger than 1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing that actually real quick

Copy link
Member

@anton-l anton-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great fixes! Will report LMS test results once this is merged :)

@patrickvonplaten
Copy link
Contributor Author

Merging! Now looking into PNDM

@patrickvonplaten
Copy link
Contributor Author

Adapted tests to locally only run on GPU with autocast - Careful: Test might differ depending on the machine

@patrickvonplaten patrickvonplaten merged commit 051b346 into main Aug 16, 2022
@patrickvonplaten patrickvonplaten deleted the correct_some_stuff branch August 16, 2022 08:42
@patil-suraj patil-suraj mentioned this pull request Sep 6, 2022
4 tasks
PhaneeshB pushed a commit to nod-ai/diffusers that referenced this pull request Mar 1, 2023
…e#182)

* Change tflite test from sharkimporter -> sharkdownloader

* xfail all uint/int tflite sharkdownloader tests
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* [Half precision] Make sure half-precision is correct

* Update src/diffusers/models/unet_2d.py

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

* correct some tests

* Apply suggestions from code review

Co-authored-by: Suraj Patil <[email protected]>

* finalize

* finish

Co-authored-by: Suraj Patil <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants