Deeplearning utils for multimodal research
-
Updated
Jul 28, 2023 - Python
Deeplearning utils for multimodal research
Code and Models for Binding Text, Images, Graphs, and Audio for Music Representation Learning
Project to transform a natural language description into an image using Generative Adversarial Networks.
Accepted at The Web Conference 2024.
Learning a common representation space from speech and text for cross-modal retrieval given textual queries and speech files.
This code is part of the paper: "A Deep Dive Into Neural Synchrony Evaluation for Audio-visual Translation" published at ACM ICMI 2022.
Semi-Supervised Learning (SSL)
Using a 3D Nearby Self-Attention Transformer to leverage the spatiotemporal nature of video for representation learning.
API to infer automated disease detection and report generation from medical images.
Kedro pipelines for preprocessing text and tabular data for multi-modal ML in TensorFlow.
Adding Bottlenecked Fusion to [ACL'19] Multimodal Transformer
Visual Question Answering (VQA) Model
Demo for Binding Text, Images, Graphs, and Audio for Music Representation Learning
Part of my work for my Bachelor's Thesis Project on Counterfactual Reasoning for Videos.
Utilizing a multimodal architecture to predict the appropriate speaker turn in a dialogue.
Classifying multimodal health data with LSTMs
a Discord chatbot trained on Mistral and LLaVA language models
Repository for context based emotion recognition
Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."