AP Workflow 6.0 for ComfyUI
Stable Diffusion is an AI model able to generate images from text instructions written in natural language (text-to-image. txt2img, or t2i), or from existing images used as guidance (image-to-image, img2img, or i2i).
When an AI model like Stable Diffusion is paired with an automation engine, like ComfyUI, it allows individuals and organizations to accomplish extraordinary things.
Individual artists and small design studios can use it to create very complex images in a matter of minutes, instead of hours or days. Large organizations can use it to generate or modify images and videos at industrial scale for commercial applications.
To study and experiment with the enormous power of automation applied to AI, Alessandro created the AP Workflow.
AP Workflow is now is used by organizations all around the world to power enterprise and consumer applications.
What’s new in 6.0
- The T2I function now support image generation with Stable Diffusion 1.5 base and fine-tuned models.
- The Prompt Enricher function now supports local large language models (LLaMA, Mistral, etc.) via LM Studio, Oobabooga WebUI, and other AI systems.
- A new HighRes Fix function has been added.
- A new IPAdapter function has been added.
- A new Object Swapper function has been added. Now, you can automatically recognize objects in generated or uploaded images and change their aspect according to the user prompt.
- A truer image manipulation pipeline has been created. Now, images created with the SDXL/SD1.5 models, or the ones uploaded via the I2I function, can go through a first level of manipulation, via the Refiner, HighRes Fix, IPAdapter, or the ReVision functions. The resulting images, then, can go through a second level of manipulation, via the following functions, in the specified order: Hand Detailer, Face Detailer, Object Swapper, Face Swapper, and finally Upscaler.
You can activate one or more image manipulation functions, creating a chain.
- The Prompt Builder in the T2I function has been revamped to be more visual and flexible.
- Support for @jags111’s fork of @LucianoCirino’s Efficiency Nodes for ComfyUI Version 2.0+ has been added.
- The ControlNet function now leverages the image upload capability of the I2I function.
- The workflow’s wires have been reorganized to simplify debugging.
Upcoming in 6.1
- Support for Kohya Deep Shrink optimization.
- Universal Negative Prompt, Free Lunch, and Kohya Deep Shrink optimizations have been reorganized in dedicated functions, and can be un-bypassed/bypassed with dedicated switches in the Controller function.
- The Debug function has been moved to reduce wire clutter.
- Fewer nodes and wires thanks to another brilliant update of @receyuki’s SD Parameter Generator node.
How to Download It
The entire workflow is embedded in the workflow picture itself. Click on it and the full version will open in a new tab. Right click on the full version image and download it. Drag it inside ComfyUI, and you’ll have the same workflow you see below.
Before you download the workflow, be sure you read “6.0” in the image. If not, you are looking at a cached version of the image.
Required Custom Nodes
If, after loading the workflow, you see a lot of red boxes, you must install some custom node suites.
This workflow depends on multiple custom nodes that you might not have installed. You should download and install ComfyUI Manager and then install the following custom nodes suites to be sure you can replicate this workflow:
If you already have some or all the custom node suites required to run the AP Workflow 6.0, you might still encounter problems. That’s because you might have some old version of the required custom node suites.
To improve your chances to run this workflow, you could try to download a snapshot of Alessandro’s ComfyUI working environment. This will help you install a version of each custom node suite that is known to work with the AP Workflow 6.0.
- Shut down ComfyUI.
- Download the snapshot.
- Move/copy the snapshot to the /ComfyUI/custom_nodes/ComfyUI-Manager/snapshots folder.
- Restart ComfyUI.
- Open ComfyUI Manager and then the new Snapshot Manager.
- Restore the AP Workflow 6.0 snapshot.
- Restart ComfyUI.
Notice that the Snapshot Manager is a new, experimental feature and it might not work in every situation. If you encounter errors, check the documentation: https://github.com/ltdrdata/ComfyUI-Manager#snapshot-manager
Required AI Models
Many nodes used throughouth the AP Workflow require specific AI models to perform their task. While some nodes automatically download the required models, others require you to download them manually.
At today, there’s not an easy way to export the full list of models Alessandro is using in his ComfyUI environment.
The best way to know which models you need to download is by opening ComfyUI Manager and proceed to the Install Models section. Here you’ll find a list of all the models each node requires or recommends to download.
Where to Start?
AP Workflow 6.0 is a large, moderately complex workflow. It can be difficult to navigate if you are new to ComfyUI.
Start from the Functions section of the workflow on the left, and proceed to the right by configuring each section relevant to you: I2I or T2I, Prompt Enricher and, finally, Parameters.
You should not change any additional setting in other areas of the workflow unless you have to tweak how a certain function behaves.
AP Workflow 6.0 come pre-configured to generate images with the SDXL 1.0 Base + Refiner models. After you download the workflow, you have to do nothing in particular but queue a generation with the prompt already configured in the Visual Prompt Builder section of the workflow.
Check the outcome of your image generation/manipulation in the Magic section on the bottom-left of the workflow.
If ComfyUI doesn’t generate anything, you might have a problem with your installation. Check carefully the Required Custom Nodes
section of this document and Reddit forums.
AP Workflow 6.0 allows you to generate images from text instructions written in natural language (text-to-image. txt2img, or t2i), or to upload existing images for further manipulation (image-to-image, img2img, or i2i).
You can use the T2I function to generate images with:
SDXL 1.0 Base + Refiner models
By default, AP Workflow 6.0 is configured to generated images with the SDXL 1.0 Base model used in conjunction with the SDXL 1.0 Refiner model.
When you define the total number of diffusion steps you want the system to perform, the workflow will automatically allocate a certain number of those steps to each model, according to the refiner_start parameter in the Parameters section.
Further guidance is provided in a Note in the Parameters section of the workflow.
Here’s an example of the images that the SDXL 1.0 Base + Refiner models can generate:
SDXL 1.0 Base model only or Fine-Tuned SDXL models
If you prefer, you can disable the Refiner function. That is useful, for example, when you want to generate images with fine-tuned SDXL models that require no Refiner.
If you don’t want to use the Refiner, you must disable it in the Functions section of the workflow, and then set the refiner_start parameter to 1 in the Parameters section.
Here’s an example of the images that a fine-tuned SDXL model can generate:
SD 1.5 Base model or Fine-Tuned SD 1.5 models
AP Workflow 6.0 now supports image generation with Stable Diffusion 1.5 models.
To reconfigure the workflow to generate images with those models, you must follow a number of steps defined in the Support section of this document.
Visual Prompt Builder
AP Workflow 6.0 expands the previous prompt builder to make it more visual and flexible.
You can use it to quickly switch between frequently used types and styles of image for the positive prompt, and frequently used negative prompts.
You can use the I2I function to upload individual images and as well as entire directories of images.
The image/s you upload can be used as source image/s for Level 1 Image Manipulators: SDXL Refiner, HighRes Fix, IPAdapter, and ReVision.
The image/s you upload can also be further modified by Level 2 Image Manipulators: Hand Detailer, Face Detailer, Object Swapper, Face Swapper, and Upscaler.
Notice that an uploaded image can be used by Level 1 Image Manipulators, Level 2 Image Manipulators, or both.
Image Manipulators (Level 1)
When you upload an image instead of generating one, you can use it as a base for a number of Level 1 Image Manipulators. Each can be activated/deactivated in the Functions section of the workflow.
This function enables the use of the SDXL 1.0 Refiner model, designed to improve the quality of images generated with the SDXL 1.0 Base model.
It’s useful exclusively in conjunction with the SDXL 1.0 Base model.
This function enables the use of the HighRes Fix technique, useful to upscale an image while maintaining or increasing its resolution.
It’s mainly useful in conjunction with Stable Diffusion 1.5 base and fine-tuned models. Some people use it with SDXL models, too.
This function enables the use of the IPAdapter technique, to generate variants of the source image. People use this technique to generate consistent characters in different poses.
For more information information on how to use this technique, Alessandro recommends reading the documention of @cubiq’s IPAdapter Plus custom node suite: https://github.com/cubiq/ComfyUI_IPAdapter_plus.
This function enables the use of the ReVision technique, to generate variants of the source image according to a secondary image that you decide.
The secondary image must be defined in the ReVision section of the workflow.
Notice that the ReVision model does not take into account the positive prompt defined in the Visual Prompt Builder section of the workflow, but it considers the negative prompt.
Image Manipulators (Level 2)
After you generate an image with the T2I function or upload an image with the I2I function, you can use pass that image through a series of Level 2 Image Manipulators. Each can be activated/deactivated in the “Functions” section of the workflow.
Notice that you can activate multiple Level 2 Image Manipulators in sequence, creating an image manipulation pipeline.
If every Level 2 Image Manipulator is activated, the image will be passed through the following functions, in the specified order: Hand Detailer, Face Detailer, Object Swapper, Face Swapper, and finally Upscaler.
The Hand Detailer function identifies hands in the source image, and attempts to improve their anatomy through two consecutive passes, generating an image after each pass.
Notice that the Hand Detailer function uses dedicated ControlNet and T2I models based on Stable Diffusion 1.5. They work even if your source image has been generated with an SDXL 1.0 base model or a Fine-Tuned SDXL model.
The reason for this design choice is that the Hand Detailer function seems to perform better (so far) with a particular model not yet available in the SDXL variant.
The Face Detailer function identifies small and large faces in the source image, and attempts to improve their aesthetics according to two independent configurations: large faces require a different treatment than small faces.
The Face Detailer function will generate an image after processing small faces and another after also processing large faces.
AP Workflow 6.0 introduces a new Object Swapper function capable of identifying a wide range of objects and features in the source image thanks to the GroundingDINO technique.
You can describe the feature/s to be found in the source image with natural language.
Once an object/feature has been identified, it will be modified according to the prompt you defined in the Object Swapper function.
Notice that the Object Swapper function uses dedicated ControlNet and T2I models based on Stable Diffusion 1.5. They work even if your source image has been generated with an SDXL 1.0 base model or a Fine-Tuned SDXL model.
The reason for this design choice is that the Object Swapper function seems to perform better (so far) with a particular model not yet available in the SDXL variant.
Notice that the Object Swapper function can be used also to modify the physical aspect of the subjects in the source image.
The Face Swapper function identifies the face of one or more subjects in the source image, and swaps them with a face of choice. If your source image has multiple faces, you can target the desired one via an index value.
You must upload an image of the face to be swapped in the Face Swapper section of the workflow.
The Upscaler function upscales the source image with one or two upscalers in sequence.
You must manually disable the second upscaler in the Upscaler section of the workflow. This approach will be simplified in future versions of the workflow.
Some people prefer to use the Ultimate SD Upscale node for image upscaling. AP Workflow 6.0 doesn’t support it, but it’s very easy to drop it in the workflow and use it instead of the provided upscalers.
AP Workflow 6.0 includes the following auxiliary functions:
The Prompt Enricher function will enrich the user positive prompt with additional text generated by a large language model.
AP Workflow 6.0 introduces the possiblity to use either OpenAI models (GPT-3.5-Turbo, GPT-4, or GPT-4-Turbo) or open access models (e.g., LLaMA, Mistral, etc.) installed locally.
The use of OpenAI models requires an OpenAI API key. To setup your OpenAI API key, follow these instructions: https://github.com/omar92/ComfyUI-QualityOfLifeSuit_Omar92#chatgpt-simple.
You will be charged every time the Prompt Enricher function is enabled and a new queue is processed.
The use of local open access models requires the separate installation of an AI system like LM Studio or Oobabooga WebUI.
Alessandro highly recommends the use of LM Studio and AP Workflow 6.0 is configured to use it by default.
Additional details are provided in the Setup for prompt enrichment with LM Studio section of this document.
The Prompt Enricher function features two example prompts that you can use to enrich their ComfyUI positive prompts: a generic one, and one focused on photographic image generation. Multiple switches are in place to choose the preferred system prompt and the preferred AI system to process system prompt and user prompt.
WARNING: When you use the Prompt Enricher, even if the user prompt doesn’t change, even if you use the I2I function instead of the T2I function, or even if you use LM Studio instead of ChatGPT, the Prompt Enrichment (ChatGPT) node will make an API call, resulting in a charge. Because of this, the Prompt Enricher function is disabled by default. If you are concerned about being charged, either manually mute the node with CTRL+M every time, delete the node, or remove your API key from the configuration file.
The XY Plot function generates a series of images permutating different generation parameters, according to the configuration you defined.
For maximum flexibility, AP Workflow 6.0 features a Manual XY Entry node that can be configured according to the values displayed in the Manual XY Entry Info node. This node is disconnected by default.
ControlNet, Control-LoRAs, and LoRAs
You can condition the image generation performed with the T2I function thanks to a number of ControlNet/ControlLoRA preprocessors. Some of these preprocessors are compatible with SDXL models while others are compatible with SD 1.5 models.
AP workflow 6.0 supports the configuration of up to six concurrent ControlNet preprocessors, but you can add additional ControlNet stack nodes easily if they wish so.
Each ControlNet preprocessor must be defined and manually activated in the ControlNet + Control-LoRAs section of the workflow.
You can further condition the image generation performed with the T2I function thanks to a number of LoRAs.
Each LoRA must be activated in the VAE + LoRAs section of the workflow.
AP Workflow 6.0 includes the following experimental functions:
Free Lunch (v1 and v2)
AI researchers have discovered an optimization for Stable Diffusion models that improves the quality of the generated images. The optimization has been named “Free Lunch”. The optimization is implemented in a node called FreeUv1.
AI researchers have further refined the Free Lunch technique, leading to the availability of a FreeUv2 node.
For more information, read:
You can enable either the FreeUv1 or the FreeUv2 node. They are located just above the Prompt Encoder function.
Notice that the FreeU experimental nodes are not optimized for MPS and DirectML devices. On these systems, the nodes force the use of the CPU rather than the GPU, slowing down the image generation process. Hence, these nodes are bypassed by default.
To enable either of them, you must un-bypass it manually with CTRL+B.
Choose Image to Proceed
AP Workflow 6.0 allows you to generate a batch of images with the T2I function and then pick the favorite image to continue the execution of the workflow with Level 1 and Level 2 Image Manipulators.
The node is bypassed by default. To enable it, you must un-bypass it manually with CTRL+B.
The node is located just above the SDXL Refiner function.
Universal Negative Prompt
The Reddit user /u/AI_Characters has identified a “Universal Negative Prompt”, which tends to improve the quality of most images with a single subject/concept.
For more information, read:
In Alessandro’s tests, it works very well, but it makes very difficult to generate images with multiple subjects/concepts. Hence, it’s disabled by default.
To enable it, you must:
- Change input to 5 in the Negative String switch in the T2I function.
- Change input to 2 in the Negative Prompt switch in the Universal Negative Prompt function
AP Workflow 6.0 includes the following debug functions:
Almost all images generated or manipulated with AP Workflow 6.0 include metadata about the prompts and the generation parameters used by ComfyUI. That metadata should be readable by A1111 WebUI, Vladmandic SD.Next, SD Prompt Reader, and other applications.
The only images that don’t carry metadata are the ones manipulated with the Object Swapper or the Face Swapper functions.
Prompt and Parameters Print
Both negative and positive prompt, together with a number of parameters, are printed in the terminal to provide more information about the ongoing image generation.
You can also write a note about the next generation. The note will be printed in the terminal as soon as the ComfyUI queue is processed.
This information can be further saved in a file by adding the appropriate node, if you wish so.
Universal Negative Prompt Print
A string printed in the terminal will inform you if the Universal Negative Prompt optimization is used or not.
For a more advanced debugging, it’s recommended you send the output of the node you want to debug to a Beautify node, available as part of the 0246 custom node suite. No Beautify node is present in AP Workflow. You have to manually add it where relevant.
Setup for image generation with Stable Diffusion 1.5
AP Workflow 6.0 is not designed to let you switch quickly between SDXL and Stable Diffusion 1.5 models. By default, the workflow is configured to use SDXL models. To switch to SD 1.5 models, you must follow these steps:
- In the Functions section of the workflow, enable SDXL or SD1.5 (Base / Fine-Tuned) function and disable the SDXL Refiner function.
- In the Parameters section of the workflow, change the ckpt_name to an SD1.5 model, change model_version to SDv1 512px, set refiner_start to 1, change the aspect_ratio to 1:1.
- In the VAE+LoRAs section of the workflow, change the vae_name to vae-ft-mse-840000-ema-pruned.safesensors, or any other VAE model optimized for Stable Diffusion 1.5
- In the Prompt Encoder section of the workflow, change the two SDXL or SD1.5 Encoding? switches to 2
WARNING: Support for SD 1.5 models is not throughly tested. If the models don’t behave as expected, please report it via Reddit.
Setup for prompt enrichment with LM Studio
AP Workflow 6.0 allows you to enrich their positive prompt with additional text generated by a locally-installed open access large language model.
AP Workflow 6.0 supports this feature through the integration with LM Studio.
Any model supported by LM Studio can be used by the AP Workflow, including all models at the top of the HuggingFace LLM Leaderboard.
Guidance on how to install and configure LM Studio is beyond the scope of this document and you should refer to the product documentation for more information.
Once LM Studio is installed and configured, you must load the LLM of choice, assign to it the appropriate preset, and activate the Local Inference Server.
Alessandro usually works with LLaMA 2 fine-tuned models and the Meta AI LLaMA 2 Chat preset.
If, once you configured everything, ComfyUI generates a timeout error while waiting for LM studio to respond, you could trying increasing the request_timeout parameter in /ComfyUI/venv/lib/python3.11/site-packages/autogen/oai/completion.py. Be sure to backup the file before attempting this change.
WARNING: Assigning the wrong preset to an LLM will result in the Prompt Enrichment function not working correctly.
Why is this workflow so sparse? You wasted a lot of space!
The workflow is designed to be easy to follow, not to be space-efficient. A tighter arrangement of the nodes, or the collapse of some of them, would make it hard to understand the flow of information through the pipeline.
Given the size of the workflow, it’s highly recommended that you install the ComfyUI extension called graphNavigator and you save views for the areas that you want to jump to quickly.
Here’s an example configuration:
Can’t you consolidate the configuration parameters and switches a little bit more?
This workflow could be partially simplified by using the Efficiency Loader SDXL and Efficiency KSampler SDXL custom nodes. However, doing so would hide a lot of the SDXL architecture, which Alessandro prefers to remain visible for education purpose.
Are you trying to replicate A1111 WebUI, Vladmandic SD.Next, or Invoke AI?
Alessandro never intended to recreate those UIs in ComfyUI and has no plan to do so in the future.
While the AP Workflow enables some of the capabilities offered by those UIs, its philosophy and goals are very different. Read below.
Why are you using ComfyUI instead of easier-to-maintain solutions like A1111 WebUI, Vladmandic SD.Next, or Invoke AI?
- Alessandro wanted to learn, and help others learn, the SDXL architecture, understanding what goes where and how different building blocks influence the generation. A1111 WebUI and similar tools makes it harder. With ComfyUI, you knows exactly what’s happening. So AP Workflow is, first and foremost, a learning tool.
- While Alessandro considers A1111 WebUi and similar toos invaluable, and he’s grateful for their gift to the community, he finds their interfaces chaotic. He wanted to explore alternative design layouts. Many people might argue that ComfyUI is even more chaotic than A1111 WebUI or that AP Workflow, in particular, is more chaotic than A1111 WebUI.
Ultimately, different brains process information in different ways, and some people prefer one approach over the other. Some people find node systems easier to work with than more standard UIs.
- Alessandro served in the enterprise IT industry for over two decades. ComfyUI allowed him to demostrate how AI models paired with automation can be used to create complex image generation pipelines useful in certain industrial applications. This is not currently possible with A1111 WebUI and similar solutions.
- The most sophisticated AI systems we have today (Midjourney, ChatGPT, etc.) don’t generate images or text by simply processing the user prompt. They depend on complex pipelines and/or Mixture of Experts (MoE) which enrich the user prompt and process it in many different ways. Alessandro’s long-term goal is to use ComfyUI to create multi-modal pipelines that can produce digital outputs as good as the ones from the AI systems mentioned above, without human intervention. AP Workflow 5.0 was the first step in that direction. The goal is not attainable with A1111 WebUI and similar solutions as they are implemented today.
What else is missing? / Can you add XYZ?
Features under evaluation:
- More advanced ways to write and enrich the user prompt.
- Support for LCM-LoRA models.
- Support for Stable Diffusion Video.
I Need More Help!
The AP Workflow is provided as is, for research and education purposes only.
However, if your company wants to build commercial solutions on top of ComfyUI and you need help with this workflow, you could work with Alessandro on your specific challenge.