MultiCLIP: A framework for multimodal-multilabel-multistage classification utilizing advanced pretrained models like CLIP and BLIP. 一个多模态多标签多阶段分类框架,利用像CLIP和BLIP这样的先进预训练模型。
-
Updated
Jun 21, 2024 - Python
MultiCLIP: A framework for multimodal-multilabel-multistage classification utilizing advanced pretrained models like CLIP and BLIP. 一个多模态多标签多阶段分类框架,利用像CLIP和BLIP这样的先进预训练模型。
API to infer automated disease detection and report generation from medical images.
This repository contains Python code for performing vision tasks using the Microsoft Phi-3 Vision model and the Hugging Face library. It demonstrates generating textual responses based on image content, showcasing the integration of advanced vision-language models for tasks such as image analysis and descriptive text generation.
Visual Question Answering project as a part of 11-777 course requirements
A repository for the article "Semiotically-grounded distant viewing of diagrams: insights from two multimodal corpora" published in Digital Scholarship in the Humanities (2022)
A repository for the article "Corpus-based insights into multimodality and genre in primary school science diagrams" published in Visual Communication (2023)
prediction adult site user numbers with multimodel source (Image and text and tag)
Under the framework of TELMI Project, this is a python script to automatically upload multimodal data into repovizz repository. The project is part of TELMI within MTG Universitat Pompeu Fabra
Repository for the conference article "Enhancing the AI2 Diagrams dataset using Rhetorical Structure Theory", published in the Proceedings of the 11th International Language Resources and Evaluation Conference.
Vision Language Dataset Construction Library for Remote Sensing Domain
Jittor reimplementation of DiverseSampling (MM22)
Analysing Adversarial Loss of Social GAN
[FR|EN - Trio] 2023 - 2024 Centrale Méditerranée AI Master | Multimodal retranscription with text, audio and video
[CVMI 2022] Multimodal Controller for Generative Models
An exploration into the possibility of generating multi-sentence image descriptions by leveraging the latent dependencies between visual concepts in an image with their textual counterparts
Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention
Integrating machining learning and multi-modal neuroimaging to detect schizophrenia at the level of the individual
[IN PROGRESS] Multimodal feature extraction modules for ease of doing research and reproducibility.
Code and Dataset for paper "On the Role of Images for Analyzing Claims in Social Media" @2nd International Workshop on Cross-lingual Event-centric Open Analytics (CLEOPATRA) co-located with The Web Conf 2021
Add a description, image, and links to the multimodality topic page so that developers can more easily learn about it.
To associate your repository with the multimodality topic, visit your repo's landing page and select "manage topics."