Install ComfyUI Locally with LM Downloader
ComfyUI is a visual tool for image and video generation, allowing you to easily modify and use AI painting and video models—just like building blocks. It supports popular models such as FLUX.1 Dev, Stable Diffusion 3.5, Hunyuan Video, Framepack, WAN Video, Qwen Image, and LTX-2 Video.
If you have some development experience, you can download the source code from GitHub and deploy it locally. Alternatively, you can directly download the client version from the official website, which is simpler and perfect for beginners to get started quickly.
- ComfyUI Github repo:https://github.com/comfyanonymous/ComfyUI
- ComfyUI site:https://comfy.org/
Note
- For Windows users: An NVIDIA GPU with updated CUDA drivers and at least 4GB VRAM is recommended.
- For macOS users: M-series chips are preferred for optimal performance.
- Ensure 10GB of free disk space for a smooth ComfyUI experience. For advanced usage with multiple models, 100GB+ of storage is advised.
Find ComfyUI in LM Downloader
Open LM Downloader, then click the "Local Apps" in the left menu. You could see ComfyUI in the app list. Click the ComfyUI icon to go to the introduction page.
Click the Install Button,the install window opens. If you already have ComfyUI installed, don't worry, this can be treated as an update to ComfyUI and won't affect the models you've previously downloaded.
Close this window after the installation is complete.
Run ComfyUI
On the application details page, click the Run button on the right to open the execution window. Upon successful launch, your browser will open automatically.
How to Choose Your "Startup Options"
If you are using Windows with an NVIDIA GPU, you will see startup options on the right side allowing you to select different execution environments. Choosing torch 2.7.1 ensures maximum stability, while other versions are optimized to unlock higher hardware performance. Users with other GPU brands or macOS will not see these options.
We have expanded our perspective from "consumer-grade cards" to a full-dimensional view—ranging from the classic Pascal architecture to the flagship Rubin architecture—covering workstations (RTX Ada/A-series) and data centers (H/B/R-series).
🚀 ComfyUI Environment Guide: Unleashing VRAM Potential from the Architecture Level
ComfyUI performance depends on the "Trinity" of PyTorch (Backend), CUDA (Instruction Set), and GPU Architecture (Hardware Core). We have preset three environments designed to cover everything from decade-old classics to top-tier chips of the next five years.
📊 Hardware Architecture Matching Matrix
| Environment Option | Core Architecture | Typical GPU Models | Key Advantage |
|---|---|---|---|
| torch 2.7.1 | Pascal / Turing / Ampere | GTX 10/20 series, RTX 30 series, Quadro P/RTX series, A100/A800 | Stability First: Rock-solid drivers for older architectures. |
| torch 2.8.0 | Ada Lovelace / Blackwell | RTX 40/50 series, RTX 6000 Ada, L40S, H100 / H200 / B100 | Performance Hub: Native support for FP8 inference. |
| torch 2.9.1 | Blackwell / Rubin (Next Gen) | RTX 50 series, B200, GB200, Rubin R100 / R200 | Arch Overclocking: Supports NVFP4 quantization. |
🔍 Deep Dive: Which One Should You Choose?
1️⃣ Default Environment: torch 2.7.1 (Compatibility & Stability)
- Target Architectures:
sm_60tosm_86. - Core Specs: Torch 2.7.1 + CU 12.8. This is currently the most mature ecosystem. Almost all custom nodes (especially those involving C++ compilation like face-swapping or video streaming) are developed and tested on this version.
- Best For: Legacy workstation upgrades or A100/A800 compute clusters. If your priority is "running every workflow without errors," choose this.
2️⃣ torch 2.8.0 (Ada/Blackwell Optimized)
- Target Architectures:
sm_89(Ada) /sm_100(Blackwell Initial). - Core Specs: Torch 2.8.0 + CU 12.8. Tailored specifically for the RTX 40 series and H100/B100 data center cards.
- Key Improvements: Significantly optimizes FP8 (8-bit floating point) matrix operations. When running Flux.1 (Schnell/Dev) or large-scale SD3.5 models, VRAM recycling efficiency is improved by 20%, with generation speeds noticeably faster than the default environment.
- Best For: RTX 40 series owners or H100 server users who need a balance of performance and node compatibility.
3️⃣ torch 2.9.1 (Blackwell / Rubin Experimental)
Target Architectures:
sm_120(Blackwell) and future-looking architecturessm_130/140(Rubin).Core Specs: Torch 2.9.1 + CU 12.8. Cutting-edge adaptation for the RTX 50 series (5060Ti-5090) and NVIDIA’s next-gen Rubin (R100) architecture.
Architectural Benefits:
Native NVFP4 Support: Leverages Blackwell's hardware-level 4-bit floating point acceleration to potentially double inference efficiency.
Deep Optimization: Reserved higher memory exchange bandwidth for the HBM4 VRAM features of the Rubin architecture.
Best For: Top-tier personal studios or B200/GB200 compute nodes. Choose this if you are processing 10B+ parameter models and demand "instant" generation.
💡 Tech Primer: Where does my GPU sit?
| Category | Architecture | Representative Models |
|---|---|---|
| Future / Cutting-Edge | Rubin | R100, R200 (Mainstream after 2026) |
| Current Flagship | Blackwell | RTX 50 series, B100, B200, GB200 |
| Modern Mainstream | Ada Lovelace | RTX 40 series, RTX 6000 Ada, L40S |
| Classic High-Perf | Ampere | RTX 30 series, A100, A10, A30 |
| Legacy/Basic | Turing / Pascal | RTX 20 series, GTX 10 series, P100, T4 |
⚠️ torch 2.9.1 Risks & Maintenance Tips
- Driver Requirements: If selecting "torch 2.9.1," ensure your GPU driver is updated to the latest version (v580.xx or higher), or the CUDA instruction set may fail to initialize.
- Node Breakage: Newer versions may cause errors in custom nodes that haven't been updated since before 2024. If an environment fails to load, check the
custom_nodesdirectory for outdated plugins. - VRAM Management: Features enhanced
Async Offloadingcapabilities, ideal for devices with high bandwidth or large VRAM like the 5060 Ti 16G or B100.
❓ FAQ
Q: I have an RTX 5060 Ti 16G. Which should I choose?A: For stability, go with torch 2.7.1. However, since this card uses the new Blackwell architecture, you must use torch 2.8.0 if you want to run FP4 quantized models. For maximum performance, try torch 2.9.1.
Q: I have an RTX 4060 Ti 16G. Which should I choose?A: torch 2.7.1 is recommended for stability; for higher performance, torch 2.8.0 is suitable, offering stable operation in most scenarios and support for running fp8 quantized models. However, this GPU does not deliver performance gains with fp4 quantized models, so their use is not advised.
Q: I have an RTX 3060 8GB. Which should I choose?A: Stick with torch 2.7.1. While torch 2.8.0 might be faster in some cases, it could also be slower on this older architecture.
Q: Can I switch environments later?A: Yes. Choosing one environment does not affect your ability to pick another next time. However, switching requires a dependency re-check, which is time-consuming. Staying with the same environment results in faster startups.
Q: Is there a difference in image quality between the three?A: No. The environment only affects generation speed, temperature control, and VRAM usage. It does not change the mathematical logic of the final image output.
Links
If you're using ComfyUI for the first time, you'll see the simplest example provided by the official team. You can try this to quickly experience what ComfyUI can do.
While the default example gives you a basic idea, we recommend exploring other workflows and models that offer richer features and much better results.
Links:
LAN Access
If you want other computers on the local network to access this ComfyUI service, you can choose "Allow LAN Access" to enable remote access. However, be aware that there are risks of data leakage and attacks. Ensure your network is secure. If you're unsure, do not enable this feature.
If you still encounter issues, please contact our technical support team. tech@daiyl.com