Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.
-
Updated
May 7, 2023 - Python
Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.
Image to LaTeX (Seq2seq + Attention with Beam Search) - Tensorflow
Pytorch implemention of Deep CNN Encoder + LSTM Decoder with Attention for Image to Latex
Image captioning ready-to-go inference: show and tell model compatible with Tensorflow r1.9
Code of Dense Relational Captioning
Repo for Implementing Research Papers & Projects related to Machine Learning
ImageCaptioning improved with an attention mechanism. Also a PyQt5 application
Some interesting applications of RNN, e.g. char rnn (pomes generation), seq2seq (machine translation), image captioning (NIC)
An implementation of the paper "Context-aware Captions from Context-agnostic Supervision"
A dockerised web-app to generate captions for uploaded images.
An App with Voice Assisted Image Captioning and VQA For Visually Challenged Individuals
TensorFlow-2 implementation of Im2Latex deep learning model described in HarvardNLP paper "What You Get Is What You See: A Visual Markup Decompiler"
AI Poet who looks at the images and writes poems Web service.
Implementation of various basic layers forward and back propagation. CS 231n Stanford Spring 2018: Convolutional Neural Networks for Visual Recognition. Solutions to Assignments
A CNN model to predict the scene or location from any given image
Generating Captions for images using CNN & LSTM on Flickr8K dataset.The generation of captions from images has various practical benefits, ranging from aiding the visually impaired.
Generative AI Models is a comprehensive repository dedicated to the implementation of cutting-edge generative AI models using Python. It features various models, including those for image captioning and text-to-image generation, leveraging advanced architectures like Vision Transformers (ViT), GPT-2, and Stable Diffusion.
First Chinese Multi-Style Image Caption Model
LSTM and RNN implementation on COCO dataset for Image Cation generator
Add a description, image, and links to the imagecaptioning topic page so that developers can more easily learn about it.
To associate your repository with the imagecaptioning topic, visit your repo's landing page and select "manage topics."