Skip to content

xiaoyezuo/CLIPP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VLA-Nav

Embedding Vision-Language-Actions for learning robot navigation

Running Docker

Build the image with docker build --rm -t vla-docker .. You'll have to change the username at the bottom of Dockerfile. Once it is built run with ./docker_run.sh. Once again you'll need to file path after --volume to where your data is. You can also configure what gpus you are using, use --gpus all to use all of them. Once in the image activate the conda environement with conda activate habitat.

Proposed Architecture

The pretraining architecture draws inspiration from CLIP, leveraging its principles to construct a robust framework. Initially, we execute a three-dimensional embedding process encompassing potential paths, images, and textual data. This approach adopts a 3D contrastive learning paradigm, facilitating the computation of cosine similarities across these three dimensions. By doing so, the model encapsulates intricate relationships between paths, images, and text, enhancing its ability to interpret and synthesize complex multimodal inputs. pretrain_architecture 3D Contrastive Pre-training

During inference, our model utilizes a zero-shot learning technique, which involves two distinct stages. First, the model receives an image and accompanying text as input. Subsequently, it orchestrates a sophisticated analysis process, seeking to identify the most plausible path associated with the given input pair. This determination is achieved by strategically maximizing cosine similarity within the embedding space crafted by the pre-trained model. By referencing this embedding space, which encapsulates a wealth of semantic information learned during pretraining, the model effectively translates the multimodal input into a cohesive output, ultimately identifying the most probable path. inference_architecture Inference Architecture for Best Path

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages