jaidaken/ComfyUI

Fork 0

Files

jaidaken f09734b0ee

Python Linting / Run Ruff (push) Has been cancelled

Details

Python Linting / Run Pylint (push) Has been cancelled

Details

Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.10, [self-hosted Linux], stable) (push) Has been cancelled

Details

Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.11, [self-hosted Linux], stable) (push) Has been cancelled

Details

Full Comfy CI Workflow Runs / test-stable (12.1, , linux, 3.12, [self-hosted Linux], stable) (push) Has been cancelled

Details

Full Comfy CI Workflow Runs / test-unix-nightly (12.1, , linux, 3.11, [self-hosted Linux], nightly) (push) Has been cancelled

Details

Execution Tests / test (macos-latest) (push) Has been cancelled

Details

Execution Tests / test (ubuntu-latest) (push) Has been cancelled

Details

Execution Tests / test (windows-latest) (push) Has been cancelled

Details

Test server launches without errors / test (push) Has been cancelled

Details

Unit Tests / test (macos-latest) (push) Has been cancelled

Details

Unit Tests / test (ubuntu-latest) (push) Has been cancelled

Details

Unit Tests / test (windows-2022) (push) Has been cancelled

Details

Add custom nodes, Civitai loras (LFS), and vast.ai setup script

Includes 30 custom nodes committed directly, 7 Civitai-exclusive
loras stored via Git LFS, and a setup script that installs all
dependencies and downloads HuggingFace-hosted models on vast.ai.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-09 00:56:42 +00:00

2.9 KiB

Raw Permalink Blame History

Florence2 in ComfyUI

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages our FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model.

New Feature: Document Visual Question Answering (DocVQA)

This fork includes support for Document Visual Question Answering (DocVQA) using the Florence2 model. DocVQA allows you to ask questions about the content of document images, and the model will provide answers based on the visual and textual information in the document. This feature is particularly useful for extracting information from scanned documents, forms, receipts, and other text-heavy images.