Model Showcase: Explore Hugging Face Models

Salesforce/blip-image-captioning-large

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

microsoft/trocr-base-handwritten

TrOCR (base-sized model, fine-tuned on IAM)

xtuner/llava-phi-3-mini-gguf

llava-phi-3-mini is a LLaVA model fine-tuned from microsoft/Phi-3-mini-4k-instruct and CLIP-ViT-Large-patch14-336 with ShareGPT4V-PT and InternVL-SFT by XTuner.

See Details →

nlpconnect/vit-gpt2-image-captioning

The Illustrated Image Captioning using transformers

See Details →

microsoft/trocr-large-printed

TrOCR (large-sized model, fine-tuned on SROIE)

See Details →