This is a secret DRAFT guide to Stable Diffusion.
It’s secret because it’s not yet in a shareable format. It’s mainly a dump of links that must be accompanied by introductory text or that must be digested and summarized.
Also, this guide has to be significantly improved from a structure and design standpoint.
Most of the content below is based on version 1.5 of the Stable Diffusion model. As the community experiments with the new version 2.1 of the Stable Diffusion model, this guide will be updated.
If you are using a different model, some of this guidance might be inapplicable.
Living Document – Last update: Jan 30, 2023
Table of Contents
- Latent diffusion models
- User Interfaces to the Stable Diffusion model
- Applications of the Stable Diffusion model
- Prompt Engineering for Stable Diffusion
- Parameters of the Stable Diffusion model
- Samplers of the Stable Diffusion model
- Determinism of the Stable Diffusion model
- Generating studies and variants
- Fine-tuning of the Stable Diffusion model
- Stable Diffusion models fine-tuned by the community
- AI models assisting Stable Diffusion in other tasks
- Examples
Latent diffusion models
You don’t need this to learn how to use Stable Diffusion, but this section will give you a much better understanding of why Stable Diffusion works in the way it does.
What are they
What happens behind the scenes
https://jalammar.github.io/illustrated-stable-diffusion/
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Custom-Scripts#saving-steps-of-the-sampling-process
https://www.youtube.com/watch?v=_7rMfsA24Ls
Origins of Stable Diffusion
Stable Diffusion is an AI model developed by Patrick Esser from Runway and Robin Rombach from LMU Munich. The research and code behind Stable Diffusion was open-sourced last year. The model was released under the CreativeML Open RAIL M License.
The full story: https://research.runwayml.com/the-research-origins-of-stable-difussion
Original paper: https://ommer-lab.com/research/latent-diffusion-models/
The LAION and LAION-Aesthetics datasets
https://followfoxai.substack.com/p/exploring-the-laion-aesthetics-image
User Interfaces to the Stable Diffusion model
Below I list software that can be installed locally on any platform (including Apple M systems) or that is available via a SaaS model. If a software is available as Windows-only or Windows+Linux only, it won’t be included in the list.
Dream Studio by Stability.AI (beta)
InvokeAI by Lincoln Stein & team
https://github.com/invoke-ai/InvokeAI
Stable Diffusion CLI by Stability.AI
https://github.com/Stability-AI/stablediffusion
Stable Diffusion WebUI by AUTOMATIC1111 & team
https://github.com/AUTOMATIC1111/stable-diffusion-webui
Applications of the Stable Diffusion model
Stable Diffusion is famous for generating images from a sentence written in natural language (mainly plain English at the time of writing). This process is called text to image (txt2img). However, the Stable Diffusion model is capable of much more. These are its main applications:
Text to Image (txt2img)
The term txt2img can refer to either a process or a model.
A txt2img process is the act of generating an image starting from a text description (called prompt).
A txt2img model is a latent diffusion model optimized for the txt2img process.
Image to Image (img2img)
The term img2img only refers to a process.
An img2img process is the act of generating an image starting from a preexisting picture.
At the time of writing, there is no specialized img2img model, however:
Depth-Conditional Stable Diffusion (depth2img)
Alongside the launch of the Stable Diffusion 2.0 model, Stability AI has released a special variant called depth2img model.
The depth2img model is much better than the standard txt2img model at conditioning the image generation in an img2img process.
The depth2img model retains the structure and shape of the starting picture:
https://github.com/Stability-AI/stablediffusion#depth-conditional-stable-diffusion
Guides:
https://stable-diffusion-art.com/depth-to-image/
https://www.youtube.com/watch?v=TmOtHWNnPZM
depth2img model vs. txt2img model:
https://www.reddit.com/r/StableDiffusion/comments/zk32dg/a_quick_demo_to_show_how_structurally_coherent/
depth2img model vs. depthmap2mask script:
https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/5542
depth2img model vs. inpainting conditioning mask strength:
https://www.reddit.com/r/StableDiffusion/comments/zgwm5m/testing_depth2img_vs_inpainting_conditioning/
Best practices
To get as close as possible to the original image, use low denoise, setting the “Denoise Strenght” to a value within 0.1 and 0.3, with the Euler sampler.
Rather than replicating an existing picture, a creative way to use the img2img capability is to compose an original image with the help of multiple tools:
- Magic Poser: https://webapp.magicposer.com
- Film Grab:https://film-grab.com/movies-a-z/
Using img2img to cartonize photos:
https://stable-diffusion-art.com/cartoonize-photo/
Using LEGO to stage complex scenes:
https://www.reddit.com/r/StableDiffusion/comments/zggm9w/stable_diffusion_is_an_adults_viewport_into_a/
Inpainting
The term inpainting can refer to either a process or a model.
An inpainting process is the act of replacing a portion of a pre-existing image by generating a new version of that portion.
An inpaiting model is a latent diffusion model optimized for the inpainting process.
While it is possible to use a txt2img model for the inpaiting process, it’s highly recommended to use one of the following inpaiting models:
- SD 2.0 Inpaining model by Stability AI: https://huggingface.co/stabilityai/stable-diffusion-2-inpainting
- SD 1.5 Inpainting model by RunwayML: https://huggingface.co/runwayml/stable-diffusion-inpainting
How to use an inpainting model: https://rentry.org/drfar
Best practices
Inpaiting and high resolutions: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/4530
Turn any model into an inpainting model: https://www.reddit.com/r/StableDiffusion/comments/zyi24j/how_to_turn_any_model_into_an_inpainting_model/
InstructPix2Pix
A very different way to do inpainting.
https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/pix2pix
https://www.reddit.com/r/StableDiffusion/comments/10l36p7/instruct_pix2pix_just_added_to_auto1111/
https://www.youtube.com/watch?v=0fkGd9wIhrA&feature=share
https://www.youtube.com/watch?v=CuPCVOj2LgE
Sketch Inpainting
https://www.reddit.com/r/StableDiffusion/comments/10jqkd5/sketch_function_in_automatic1111/
Outpainting
The term outpainting only refers to a process.
An outpaiting process is the act of extending a pre-existing image by generating additional portions of the image on a larger canvas.
At the time of writing, there is no specialized outpaiting model, however, you should use an inpaiting model for the outpaiting process:
Outpainting with AUTOMATIC1111 WebUI:
- Poor man’s outpainting: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#outpainting
- Alpha Canvas: https://github.com/TKoestlerx/sdexperiments
- openOutpaint: https://github.com/zero01101/openOutpaint
- Outpainting mk2
Other techniques:
- Using an inpainting model for the outpainting process: https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/3192#issuecomment-1287162335
Text to Video (txt2vid)
https://github.com/deforum-art/deforum-for-automatic1111-webui
https://docs.google.com/document/d/1pEobUknMFMkn8F5TMsv8qRzamXX_75BShMMXV8IFslI/edit
https://www.reddit.com/r/StableDiffusion/comments/10dc0mx/new_version_of/
Latent binding method: https://www.reddit.com/r/StableDiffusion/comments/109754j/introducing_latent_blending_a_new_stablediffusion/
Steps Animation extensions: https://github.com/vladmandic/sd-extension-steps-animation
Text to 3D (txt23D)
Prompt Engineering for Stable Diffusion
Prompt tokenization and its implications
Modifiers
Modifiers Studies
- https://proximacentaurib.notion.site/2b07d3195d5948c6a7e5836f9d535592?v=b5b75a67cc52483c9965cfc141f6f582
- https://www.reddit.com/r/StableDiffusion/comments/zon9zn/648_aesthetic_modifiers_study_based_on_aesthetics/
Artistic styles and artists for paintings/illustrations/drawings
Artists Studies
- For Stable Diffusion 2.x: https://proximacentaurib.notion.site/28e037176b58439785ee04af6b0ae4ea?v=d04dc32d95334deca8025371ac778745
- For Stable Diffusion 1.x: https://proximacentaurib.notion.site/e28a4f8d97724f14a784a538b8589e7d?v=ab624266c6a44413b42a6c57a41d828c
Camera focus and shot types for photos
DOF Simulator
Shot types
Prompt Structure
https://www.cs.columbia.edu/~chilton/web/my_publications/LiuPromptsAIGenArt_CHI2022.pdf
https://github.com/invoke-ai/InvokeAI/discussions/904
The prompt structure that has produced the best results for me:
Medium + Context + Subject + Modifiers + Style/Artist
If you are trying to generate an image mimicking the style of a specific artist, the Stable Diffusion model is particularly sensitive to the expression “by artist Artist Name Artist Surname”. The keyword “by artist” seems to have a significant impact on the generation of the image.
Attention/Emphasis Control (Weighted Prompt)
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#attentionemphasis
Negative Prompt
Composed Prompt
Prompt Editing
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#prompt-editing
https://www.reddit.com/r/StableDiffusion/comments/z7x32l/the_power_of_prompt_delay_of_artists_in_20/
Prompt Alternating
Prompt Interpolation
https://github.com/DiceOwl/StableDiffusionStuff
Cross Attention Control
https://github.com/bloc97/CrossAttentionControl
Other Tips
Useful browser extensions
Where to find prompts to learn from
Synthetic media search engines
DiffusionDB
DiffusionDB is the first large-scale dataset containing 14 million #StableDiffusion images and their text prompts and hyperparameters.
The Open Prompts project
https://github.com/krea-ai/open-prompts
CLIP Interrogator
https://github.com/pharmapsychotic/clip-interrogator
Online versions:
- https://colab.research.google.com/github/pharmapsychotic/clip-interrogator/blob/main/clip_interrogator.ipynb
- https://huggingface.co/spaces/pharma/CLIP-Interrogator
- https://replicate.com/methexis-inc/img2prompt
Magic Prompt (aka prompt randomizer)
Parameters of the Stable Diffusion model
Diffusion steps
Classifier-Free Guidance (CFG) scale
https://arxiv.org/pdf/2207.12598.pdf
While the default settings is 7-7.5 in most User Interfaces for Stable Diffusion, and many studies show optimal results at a CFG Score of 14-18, for some reasons I have the best results at a CFG Score of 4. Anything beyond or below that is unsatisfactory most of the time.
High CFG Score results in over-saturated images: https://www.reddit.com/r/StableDiffusion/comments/xalo78/fixing_excessive_contrastsaturation_resulting/
CLIP layers
CLIP (Contrastive Language–Image Pre-training) is an AI model originally developed by OpenAI to convert text to images. Every generative AI tool/product/service designed to use the Stable Diffusion 1.x models uses CLIP to translate your text to an image and the diffusion model to arrive to the result you want.
The technical paper describing how CLIP works is here: https://openai.com/blog/clip/
You don’t have to read it, but it’s useful to understand what comes next.
AUTOMATIC1111 WebUI gives you some control on how CLIP works:
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#ignore-last-layers-of-clip-model
https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/5674
This setting is especially good when you use diffusion models like the NaiveAI one.
With the launch of the Stable Diffusion 2.0 model, Stability AI released a completely open version of the OpenAI CLIP model called OpenCLIP:
https://github.com/mlfoundations/open_clip
Whenever you use a Stable Diffusion 2.x model in a tool/product/service, the program will switch from the OpenAI CLIP to OpenCLIP.
At the time of writing, the CLIP layers setting in AUTOMATIC1111 WebUI has no effects on OpenCLIP.
Samplers of the Stable Diffusion model
Main samplers
- Euler a
Studies: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/5761
- Euler
- LMS
- Heun
- DPM2
- DPM2 a
https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/2274 - DPM++ 2S a
- DPM++ 2M
- DPM++ SDE
- DPM fast
- DPM adaptive
- DDIM
- PLMS
Karras variants
- LMS Karras
- DPM2 Karras
- DPM2 a Karras
- DPM++ 2S a Karras
- DPM++ 2M Karras
- DPM++ SDE Karras
Which one is best for what?
https://www.reddit.com/r/StableDiffusion/comments/yyqi67/best_sampling_method_in_a1111_gui/
https://stablediffusion.miraheze.org/wiki/Ultimate_Guide
Tips
Some people use Euler A for a fast test, then they switch to DDIM, HEUN or DPM2 Karras.
Recommnded steps for each sampler:
| Sampler | Recommended Steps |
| —————– | —————– |
| Euler a | 20-40 |
| Euler | |
| LMS | 50 |
| Heun | |
| DPM2 | |
| DPM2 a | |
| DPM++ 2S a | |
| DPM++ 2M | |
| DPM++ SDE | |
| DPM fast | |
| DPM adaptive | |
| DDIM | min 70 |
| PLMS | |
| —————– | —————– |
| LMS Karras | |
| DPM2 Karras | |
| DPM2 a Karras | |
| DPM++ 2S a Karras | max 30 |
| DPM++ 2M Karras | |
| DPM++ SDE Karras | |
| —————– | —————– |
Seeds and Determinism
Every time you generate a new image with the Stable Diffusion model, your computer also generates a random number called seed.
In theory, knowing the seed of a generated image, plus the original prompt that generated it (both positive and negative), plus the sampler and the hyperparameters used to configure it, plus the specific model used, you could recreate the same identical image.
In practice, the seed generation is sensitive to other aspects of the Stable Diffusion environment, like the GPU used to generate the image. For this reason, even if you know the seed and all other detailes associated with an image generated on a Windows system with an NVIDIA GPU, you won’t be able to reproduce the same image on a macOS system with the M chip.
This is partially due to the difference between Pytorch and CoreML:
Q: Are the Core ML and PyTorch generated images going to be identical?
A: If desired, the generated images across PyTorch and Core ML can be made approximately identical. However, it is not guaranteed by default. There are several factors that might lead to different images across PyTorch and Core ML:
- Random Number Generator Behavior
The main source of potentially different results across PyTorch and Core ML is the Random Number Generator (RNG) behavior. PyTorch and Numpy have different sources of randomness. python_coreml_stable_diffusion generally reoles on Numpy for RNG (e.g. latents initiaolzation) and StableDiffusion Swift olbrary reproduces this RNG behavior. However, PyTorch-based pipeolnes such as Hugging Face diffusers reoles on PyTorch’s RNG behavior.
- PyTorch
“Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds.” (source).
- Model Function Drift During Conversion
The difference in outputs across corresponding PyTorch and Core ML models is a potential cause. The signal integrity is tested during the conversion process (enabled via –check-output-correctness argument to python_coreml_stable_diffusion.torch2coreml) and it is verified to be above a minimum PSNR value as tested on random inputs. Note that this is simply a sanity check and does not guarantee this minimum PSNR across all possible inputs. Furthermore, the results are not guaranteed to be identical when executing the same Core ML models across different compute units. This is not expected to be a major source of difference as the sample visual results indicate in this section.
- Weights and Activations Data Type
When quantizing models from float32 to lower-precision data types such as float16, the generated images are known to vary solghtly in semantics even when using the same PyTorch model. Core ML models generated by coremltools have float16 weights and activations by default unless expolcitly overriden. This is not expected to be a major source of difference.
From https://github.com/apple/ml-stable-diffusion#faq
More: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/6941
Generating studies and variants
Study generators
X/Y Plot
X/Y/Z Plot
https://github.com/Gerschel/xyz-plot-grid
Prompt Matrix
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#prompt-matrix
Generating variants
Variation strenght and seed
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#variations
Shifting attention
https://github.com/yownas/shift-attention
Alternative img2img test
https://stable-diffusion-art.com/stylize-images/
Fine-tuning of the Stable Diffusion model
During its training phase, a Stable Diffusion model learns about many different concepts of the world. However, given that the amount of training time is not unlimited, the model does not learn every concept about the world.
To teach the model about new concepts, we can further train it, or fine-tune it, about a specific person, object, style, mood, etc.
Differently from the original training phase, the fine-tuning doesn’t require exceptional computational resources and can be done on consumer computers.
At the time of writing this guide, these are the most common approaches to fine-tune Stable Diffusion:
- DreamBooth
- Every Dream
- Hypernetworks
- Aestetic Gradients
- Embeddings (via Textual Inversion)
- LoRA
DreamBooth
A Stable Diffusion model can be fine-tuned locally, via the DreamBooth extension for AUTOMATIC1111 WebUI, or online, via a Jupyter notebook.
The most popular Jupyter notebooks for DreamBooth are:
- https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth
- https://github.com/JoePenna/Dreambooth-Stable-Diffusion
Best practices
Nitrosocke guide:
https://github.com/nitrosocke/dreambooth-training-guide/
General recommendations by HuggingFace:
https://huggingface.co/blog/dreambooth
Specific recommendations for the ShivamShrirao method:
https://www.reddit.com/r/StableDiffusion/comments/ybxv7h/good_dreambooth_formula/
Software Engineering Courses’ Guide:
https://youtu.be/KwxNcGhHuLY
Dushyant M guide for Stable Diffusion 2.0:
https://dushyantmin.com/fine-tuning-stable-diffusion-v20-with-dreambooth
Terrariyum studies:
https://www.reddit.com/r/StableDiffusion/comments/z9g46h/i_was_wrong_classifierregularization_images_do/
https://www.reddit.com/r/StableDiffusion/comments/zcr644/make_better_dreambooth_style_models_by_using/
Dr.Derp’s guide to training:
https://pdfhost.io/v/SnKTqK5ca_Untitled_document
More guides:
https://www.youtube.com/watch?v=OwNgOZ-y-T4
https://www.youtube.com/watch?v=usgqmQ0Mq7g
Every Dream
https://github.com/victorchall/EveryDream-trainer/
Hypernetworks
They are Recurrent Neural Network (RNN)
https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/2670
https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/4940
Aestetic Gradients
https://arxiv.org/abs/2209.12330
https://metaphysic.ai/custom-styles-in-stable-diffusion-without-retraining-or-high-computing-resources/
https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/3350
Embeddings (via Textual Inversion)
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Textual-Inversion#training-an-embedding
https://www.reddit.com/r/sdforall/comments/y15uvv/idiots_guide_to_sticking_your_head_in_stuff_using/
https://www.reddit.com/r/sdforall/comments/y4ua99/auto1111_new_shareable_embeddings_as_images/
https://www.youtube.com/watch?v=ueetKxsF25g
https://www.reddit.com/r/sdforall/comments/y1hv2d/the_4_pictures_you_need_for_the_perfect_textual/
https://www.reddit.com/r/StableDiffusion/comments/zipq5g/sd_21_is_unreal/
How to train new characters with Textual Inversion:
https://www.youtube.com/watch?v=2ityl_dNRNw
https://www.reddit.com/r/StableDiffusion/comments/zpcutz/breakdown_of_how_i_make_embeddings_for_my/
Embeddings with gradient accumulation: https://www.reddit.com/r/StableDiffusion/comments/z60uxw/sd_v2_768_embeddings_with_gradient_accumulation/
LoRA
https://github.com/cloneofsimo/lora
https://huggingface.co/blog/lora
Comparisons
Stable Diffusion models fine-tuned by the community
Thanks to the DreamBooth technique, the community has fine-tuned the Stable Diffusion model on a wide range of styles and concepts.
These are some of the most popular models:
Anything
https://huggingface.co/Linaqruf/anything-v3.0
Classic Animation Diffusion
https://huggingface.co/nitrosocke/classic-anim-diffusion
Comic Diffusion
https://huggingface.co/ogkalu/Comic-Diffusion
Elder Ring Diffusion
https://huggingface.co/nitrosocke/elden-ring-diffusion
FFXIV Diffusion
https://huggingface.co/herpritts/FFXIV-Style
Future Diffusion
https://huggingface.co/nitrosocke/Future-Diffusion
Ghibli Diffusion
https://huggingface.co/nitrosocke/Ghibli-Diffusion
Inkpunk Diffusion
https://huggingface.co/Envvi/Inkpunk-Diffusion
Modern Disney Diffusion
https://huggingface.co/nitrosocke/mo-di-diffusion
Redshift Diffusion
https://huggingface.co/nitrosocke/redshift-diffusion
Robo Diffusion
https://huggingface.co/nousr/robo-diffusion-2-base
Waifu Diffusion
https://huggingface.co/hakurei/waifu-diffusion-v1-3
Woolitize
https://huggingface.co/plasmo/woolitize
AI models assisting Stable Diffusion in other tasks
For details improvements
Variational Auto Encoder (VAE)
For Face Restoration
I think the faces in those images are too small to be detected, if you aren’t upscaling them first. I would recommend upscaling, then restoring faces in that order.
Send it to extras, upscale it and set the visibility to GFPGAN or Codeformer to max and tick the box at the bottom that says upscale before restoring faces.
GFPGAN
CodeFormer
For Enlarging the image
Given that the Stable Diffusion 1.5 model was trained on low-resolution images at 512×512 pixels, any attempt to generate larger images via the height and width parameters produces poor-quality results or results featuring the same subject repeated multiple times to fill the space.
We can use a number of assistive AI models to either extend the dimensions of an image in one or more directions, or to upscale the image as is without losing quality.
High-Resolution Fixing
Upscalers
- Ultimate SD Upscale
- RealESRGAN
- Latent Diffusion Super Resolution (LDSR)
- Universal Upscaler
The full list of upscalers is here: https://upscale.wiki/wiki/Model_Database
Locke_Moghan comparison: https://www.reddit.com/r/StableDiffusion/comments/y2mrc2/the_definitive_comparison_to_upscalers/
Latent Mirroring
-> find pose, capture seed into “variation seed” with the resolution the pose was found at, denoise set to 0.5
-> used the Latent Mirroring extension to help push resolution up (got to 1024×1024 before I got tired of waiting for renders to happen) (set “alternate steps” “vertical” and leave it at 0.25)
-> threw to img2img to try to nudge up the resolution before upscaling (did not succeed this time, partially because video card began complaining about memory)
-> did SD upscale using UniversalUpscalerV2 (the very sharp one, also thanks to whoever mentioned it in the other thread yesterday)
From https://www.reddit.com/r/StableDiffusion/comments/z9h9yo/is_this_detailed_enough_yet_sd_v2_768_a1111_fork/
Tips
Upscale to huge sizes and add detail with SD Upscale: https://www.reddit.com/r/StableDiffusion/comments/xkjjf9/upscale_to_huge_sizes_and_add_detail_with_sd/
Also:
Baseline image generated from txt2img is a 1024×1024 image generated using highres-fix (settings in PNG file). No touching up or inpainting was done. [3]
Workflow:
Use baseline (or generated it yourself) in img2img
Do SD upscale with upscaler using 5×5 (basically 512×512 tilesize, 64 padding) [1]
Send to extras, and upscale (scale 4) with upscaler
8192×8192 image saved as
The upscalers used here are:
UniversalUpscalerV2-Neutral
UniversalUpscalerV2-Sharp
Next I generated the 4 combos of and . Then made (192×1080) crops for hair, eyes and lips of the 8192 images (also face but over 4k). [2]
From an objective point of view, it would seem that using for SD upscale and then (aka ) would produce the best native images. However, if you downscale the 8192 image to half size, it appears the was the better result.
Conclusion:
Unless you going to downscale the final result, do not use a sharp upscaler (ie LDSR, ESRGAN, ) as the final step.
Footnotes:
[1] The SD upscale used conservative settings (low CFG scale, low denoising, 20 steps) (dont recall if I used Euler or LMS during this step, but should not matter).
[2] The eyes arent perfect, but that is not important now, it is the skin and hair around it you need to focus on. Likewise the hair, looking for straight lines. For the lips, you want the texture to come out. For face shots, it is about the skin.
[3] Finding a good baseline image probably takes more computing time than anything else (actual 2 step upscale workflow is less than 3 minutes on my 3080). Sometimes you just get one that pops out like this one. I suggest using Euler/LMS at 16 steps with the max batch size your GPU can handle (my case 6 (because nice grid, 8 could work probably too), resulting in about an image a second at 512×512 when you do big batches). This will also be relatively fast if doing highres-fix.
For Tiling
- Generate a batch of 512×512’s
- Find the one I like and enter it’s seed into the seed box
- Change the resolution to desired
- select hi-res fix and set the firstpass width and height to 512
(As a tip, your output resolution doesn’t have to be a square. It will just crop the firstpass image.)
Examples
Photorealism
Portraits
- Model: Stable Diffusion 2.0
- Positive prompt:
{character}, by {artist}, studio lighting, High quality, professional, dramatic, cinematic movie still, very detailed, character art, concept art, subsurface scatter, focused, lens flare, digital art - Negative prompt:
ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, (((body out of frame))), blurry, bad art, bad anatomy, blurred, text, watermark, grainy - Sampler: N/A
- Steps: N/A
- CGF Scale: N/A
- Restore Faces: N/A
- Upscaling: N/A
- CLIP layers: N/A
- Example seed : N/A
- Model: Stable Diffusion 1.5
- Positive prompt:
(ultra realistic:1.3) (photorealistic:1.15) - Negative prompt:
- Sampler: N/A
- Steps: N/A
- CGF Scale: N/A
- Restore Faces: N/A
- Upscaling: N/A
- CLIP layers: N/A
- Example seed : N/A
Underwater shots
- Model: F222 (0.7) + Anything v3 (0.3)
- Positive prompt:
portrait of photo realistic a young woman underwater in a swimming pool, summer fashion spaghetti strap dress, dreamy and ethereal, expressive pose, big black eyes, jewel-like eyes, exciting expression, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by guy denning, full body, bokeh, 8k photography, Cannon 85mm, air bubbles - Negative prompt:
(black and white),(frame),(white edge), (text in the lower left corner), (text in the lower right corner), ((((visible hand)))), ((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))) - Sampler: DPM2
- Steps: 25
- CGF Scale: 11
- Restore Faces: N/A
- Upscaling: N/A
- CLIP layers: -2
- Example seed : 2193700537
Digital Art
Pixar-style
At the time of writing, nobody in the community has created a fine-tuned version of the Stable Diffusion model that achieves a consistent Pixar-ification of characters. However, it’s possible to obtain exceptional results with a prompt like this one:
- Model: Stable Diffusion 1.5
- Positive prompt:
Pixar style XYZ, 4k, 8k, unreal engine, octane render photorealistic by cosmicwonder, hdr, photography by cosmicwonder, high definition, symmetrical face, volumetric lighting, dusty haze, photo, high octane render, 24mm, 4k, 24mm, DSLR, high quality, 60 fps, ultra realistic - Negative prompt:
- Sampler: N/A
- Steps: 50
- CGF Scale: 7
- Restore Faces: GFPGAN 10
- Upscaling: RealESRGAN_x4plus
- CLIP layers: N/A
- Example seed : 3018