Diffusion with Offset Noise: Finetuning SD to generate very dark or light images

sschueller · on Feb 27, 2023

The whole thing with SD is extremely interesting and difficult to keep up with as there are so many new things coming in almost daily.

Just last week I saw ControlNet[1] which ads a lot more control.

Today I saw what corridor crew[2] did to stabilize the randomness when you want to make videos. Very exciting.

[1] https://github.com/lllyasviel/ControlNet

[2] https://www.youtube.com/watch?v=_9LX9HSQkWo

cwkoss · on Feb 27, 2023

The A1111 webUI and extension ecosystem is a beautiful example of what FOSS can be.

- User can install with a few clicks, no thinking about command line

- webUI has hover tooltips on everything, so user can most figure out what's going on without ever needing to touch documentation

- A1111 has a tab which can load a list of extensions from a github page

- Click to install, then refresh UI and extension just works

- Users are getting new incredibly powerful extensions every week or two - deforum lets you sequentially generate as many frames as you want and stitch them into a video, controlnet lets you copypaste features from a source image to your target image(s). Controlnet was added to A1111 ~2 weeks ago and is already integrated into the deforum tab so you can use both together.

Truly beautiful. I'd love to see more FOSS projects that felt so user friendly, generous with features, and rapid. Really fun to play with new cutting edge tech every couple weeks.

lelandfe · on Feb 28, 2023

> a beautiful example of what FOSS can be

I was going to respond by saying it's not actually FOSS, but the author added an AGPL license last month: https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob...

(I suspect they did not ask contributors to relicense, but hey)

p1esk · on Feb 27, 2023

In the corridow crew video you posted - the most interesting part is the beginning (interesting to ML practitioners).

2bitencryption · on Feb 27, 2023

This is really interesting. Another thing I noticed in my fun with SD is that it is extremely stubborn about colors during the denoising process.

That is, whatever color a region of the image has during denoising step 3, it will almost surely have that color at step 50, even if it makes no logical sense for the thing in that location to have that color.

This may not seem bad, but it's annoying when doing anything image-to-image, because regardless of the prompt you give it, the colors are "sticky".

If you have an image of an apple, and you use image-to-image with the prompt "an image of an orange", you will get a very reddish orange (in my experience at least).

CuriouslyC · on Feb 28, 2023

You can use this to your advantage too though. I've manually generated "noise" for img2img by taking a base image, hitting it with a bunch of filters, then putting it back into img2img with a low-ish denoising strength. This method works quite well for ensuring composition while still letting the image be styled, but it's probably obsolete for that purpose now that we have controlnet.

Vecr · on Feb 27, 2023

I've heard putting a fuzzy multicolored noise (perlin in RGB space or something) over your IMG2IMG input. Not sure what scale or what opacity you need, but that's something you can try.

sp332 · on Feb 27, 2023

Does it work to make a greyscale image and let the denoiser find a color by descent?

danielvf · on Feb 27, 2023

From experience in related things - you are going to get a fairly grey orange, and fairly grey overall image.

markisus · on Feb 28, 2023

This could possibly be fixed by generating training image sequences which, along with random noise pixes, also have random hue shift for subregions of the image.

braingenious · on Feb 27, 2023

There is already a publicly-available model that uses this!

https://civitai.com/models/11193/illuminati-diffusion-v11

SV_BubbleTime · on Feb 27, 2023

As a ckpt that's cool, but I'd like to see it as a LORA so you can use any checkpoint you already have. That would (let's face it... will next week) be amazing.

BudaDude · on Feb 27, 2023

Turns it out it came faster than that.

https://civitai.com/models/8765/theovercomer8s-contrast-fix-...

SV_BubbleTime · on Feb 27, 2023

Yea, why was I thinking! Next week is cotnrolNet2 followed by SD3 (but not screwed up this time).

jupiterelastica · on Feb 27, 2023

What is LORA in this context? Google only brought me to wifi networks :D

SV_BubbleTime · on Feb 27, 2023

LOw-Rank Adaptation

It's a way to cut just the components/styles/themes/patterns out from a model and apply them into other models.

So if I have a Disney characters checkpoint, but I really like this MakeGiantEyes checkpoint, if I can get it down to a MakeGiantEyes LoRA, I can apply that on top of my Disney Characters model which is already a custom trained set. It definitely does not always work, but when it does it's like magic. At a practical level, it's a model-modifier.

For example... Here is a Peter Griffin LoRA. https://civitai.com/models/13606/peter-griffin-lora or https://civitai.com/models/13763/thomas-the-train-i-lora

... It took me a minute to get those because I had to sort through a LOT things that would probably get me banned here. If anyone wanted to know nationalities were using SD more... It's Asians, hands down, all day long, and I think that's interesting.

EDIT: And if you were wondering what a Textual Inversion is vs a LoRA... Don't ask me! They're both model modifiers, but as I understand it, textual inversions are good for faces (which is why most of those are people, and they are kilobytes in size), and LoRAs aren't as good for faces specifically but better for themes.

TI exmaples (I couldn't use any of the million women... there are almost none that would be appropriate to post. Even though civitai does a good job of removing the nsfw posts of real people even with cloths on some are just still too much... Thirst is driving AI now) https://civitai.com/models/11039/ian-mckellen or https://civitai.com/models/8060/seu-madruga

braingenious · on Feb 27, 2023

https://huggingface.co/blog/lora

I’m really new to all this and I’m learning new stuff about it every day!

I only understand a fraction of what this article says though :(

Jack000 · on Feb 27, 2023

I'm curious if this generalizes to mid frequencies (ie. add some blurred noise in addition to the offset) and what effect that might have on the generations.

sebastiennight · on March 9, 2023

Hi Jack, I've been fascinated with your work on Colormind and Fontjoy and tried to reach out to you on your @colormind email to ask about your API. Is there a better way to reach you?

ChrisFoster · on Feb 28, 2023

Exactly my thought too. The offset is just the zero frequency. But in general, the need to do this for the zero frequency would suggest that there's a scaling problem for all the longer frequency Fourier components? And that, perhaps, the effective spatial Fourier spectrum of noise which is used in Sd is not optimal?

indeedmug · on March 8, 2023

Does anyone know how are they taking wavelengths from an image or what exactly "long-wavelength features" means?

I googled "wavelength of images" it doesn't seem like I am going in the right direction because it's about finding the wavelength of light from images rather than "wavelength of features" that this blog is talking about.

tangjurine · on Feb 27, 2023

An easy extension would be to have random patches of the image be offset by a random color, of varying sizes.

Pretty cool!

ec109685 · on Feb 27, 2023

Diffusion is so interesting. Unlike LLM that have some parallels to how the human mind works, it's not as obvious that reverse engineering from noise to a prompt has any similar parallels.

Will this cause us to hit walls at some point or actually exceed what a human can create?

albertzeyer · on Feb 27, 2023

I would actually say the opposite.

We know that the biological brain does a lot of iterative refinement via recurrent processing (attractor dynamics), which is very similar to how diffusion works.

However, while prediction is also a core functionality of the brain, it's not really that you always auto-regressively generate word-by-word.

vlovich123 · on Feb 28, 2023

Do we know the process through which dreams and memory visualization works?

Lerc · on Feb 27, 2023

Is the noise function used for just the starting data provided to the first iteration of denoising, or does it get called repeatedly throughout the iterations?

nyanpasu64 · on Feb 28, 2023

I don't fully understand SD, but what would happen if you used brown or pink noise rather than a DC component added to white noise?

Abecid · on Feb 28, 2023

This is one of the coolest things I've read about diffusion

mgraczyk · on Feb 28, 2023

What if you do this with no fine tuning?