vision-language-model

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

ocr computer-vision artificial-intelligence text-recognition document text-detection document-analysis end-to-end-ocr multimodal scene-text-recognition multimodal-deep-learning scene-text-detection vision-language document-understanding scene-text-detection-recognition document-recognition document-intelligence documentai vision-language-transformer vision-language-model

Updated Jul 18, 2024
C++

tian1327 / SWAT

Star

fewshot-learning vision-language-model

Updated Jul 17, 2024
Python

InternLM / InternLM-XComposer

Star

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

Updated Jul 17, 2024
Python

sStonemason / RET-CLIP

Star

RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports

medical-imaging fundus-image-analysis foundation-models vision-language-model

Updated Jul 17, 2024
Python

sun-hailong / LAMDA-PILOT

Star

🎉 PILOT: A Pre-trained Model-Based Continual Learning Toolbox

machine-learning deep-learning toolkit reproducible-research pytorch incremental-learning lifelong-learning continual-learning pre-trained-models vision-transformer vision-language-model

Updated Jul 16, 2024
Python

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

Updated Jul 15, 2024
Python

zhengli97 / PromptKD

Star

[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"

clip knowledge-distillation multi-modal-learning prompt-learning vision-language-model cvpr2024

Updated Jul 15, 2024
Python

JUNJIE99 / VISTA_Evaluation

Star

Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original code and model can be accessed at FlagEmbedding.

multimodal-retrieval vision-language-model hybrid-modal-retrieval

Updated Jul 15, 2024
Python

haotian-liu / LLaVA

Star

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot llama multimodal multi-modality gpt-4 foundation-models visual-language-learning chatgpt instruction-tuning vision-language-model llava llama2 llama-2

Updated Jul 14, 2024
Python

Improve this page

Add a description, image, and links to the vision-language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-model topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-language-model

Here are 152 public repositories matching this topic...

illuin-tech / colpali

OpenGVLab / InternVL

neonwatty / meme_search

SunzeY / AlphaCLIP

zytx121 / Awesome-VLGFM

mbodiai / embodied-agents

ndurner / oai_chat

sitamgithub-MSIT / TextSnap

illuin-tech / vidore-benchmark

zhengli97 / Awesome-Prompt-Adapter-Learning-for-Vision-Language-Models

linzhiqiu / t2v_metrics

AlibabaResearch / AdvancedLiterateMachinery

tian1327 / SWAT

InternLM / InternLM-XComposer

sStonemason / RET-CLIP

sun-hailong / LAMDA-PILOT

BAAI-Agents / Cradle

zhengli97 / PromptKD

JUNJIE99 / VISTA_Evaluation

haotian-liu / LLaVA

Improve this page

Add this topic to your repo