New submissions for Fri, 12 Aug 22 #215

zhuhu00 · 2022-08-12T03:47:02Z

New submissions for Fri, 12 Aug 22

Keyword: SLAM

RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild

Authors: Jason Y. Zhang, Deva Ramanan, Shubham Tulsiani
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2208.05963
Pdf link: https://arxiv.org/pdf/2208.05963
Abstract
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object. This task is a core component of classic geometric pipelines such as SfM and SLAM, and also serves as a vital pre-processing requirement for contemporary neural approaches (e.g. NeRF) to object reconstruction and view synthesis. In contrast to existing correspondence-driven methods that do not perform well given sparse views, we propose a top-down prediction based approach for estimating camera viewpoints. Our key technical insight is the use of an energy-based formulation for representing distributions over relative camera rotations, thus allowing us to explicitly represent multiple camera modes arising from object symmetries or views. Leveraging these relative predictions, we jointly estimate a consistent set of camera rotations from multiple images. We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories. Further, our probabilistic approach significantly outperforms directly regressing relative poses, suggesting that modeling multimodality is important for coherent joint reconstruction. We demonstrate that our system can be a stepping stone toward in-the-wild reconstruction from multi-view datasets. The project page with code and videos can be found at https://jasonyzhang.com/relpose.

Keyword: odometry

There is no result

Keyword: livox

There is no result

Keyword: loam

There is no result

Keyword: lidar

Predictive Angular Potential Field-based Obstacle Avoidance for Dynamic UAV Flights

Authors: Daniel Schleich, Sven Behnke
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2208.05873
Pdf link: https://arxiv.org/pdf/2208.05873
Abstract
In recent years, unmanned aerial vehicles (UAVs) are used for numerous inspection and video capture tasks. Manually controlling UAVs in the vicinity of obstacles is challenging, however, and poses a high risk of collisions. Even for autonomous flight, global navigation planning might be too slow to react to newly perceived obstacles. Disturbances such as wind might lead to deviations from the planned trajectories. In this work, we present a fast predictive obstacle avoidance method that does not depend on higher-level localization or mapping and maintains the dynamic flight capabilities of UAVs. It directly operates on LiDAR range images in real time and adjusts the current flight direction by computing angular potential fields within the range image. The velocity magnitude is subsequently determined based on a trajectory prediction and time-to-contact estimation. Our method is evaluated using Hardware-in-the-Loop simulations. It keeps the UAV at a safe distance to obstacles, while allowing higher flight velocities than previous reactive obstacle avoidance methods that directly operate on sensor data.

Barrier Certificate based Safe Control for LiDAR-based Systems under Sensor Faults and Attacks

Authors: Hongchao Zhang, Shiyu Cheng, Luyao Niu, Andrew Clark
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2208.05944
Pdf link: https://arxiv.org/pdf/2208.05944
Abstract
Autonomous Cyber-Physical Systems (CPS) fuse proprioceptive sensors such as GPS and exteroceptive sensors including Light Detection and Ranging (LiDAR) and cameras for state estimation and environmental observation. It has been shown that both types of sensors can be compromised by malicious attacks, leading to unacceptable safety violations. We study the problem of safety-critical control of a LiDAR-based system under sensor faults and attacks. We propose a framework consisting of fault tolerant estimation and fault tolerant control. The former reconstructs a LiDAR scan with state estimations, and excludes the possible faulty estimations that are not aligned with LiDAR measurements. We also verify the correctness of LiDAR scans by comparing them with the reconstructed ones and removing the possibly compromised sector in the scan. Fault tolerant control computes a control signal with the remaining estimations at each time step. We prove that the synthesized control input guarantees system safety using control barrier certificates. We validate our proposed framework using a UAV delivery system in an urban environment. We show that our proposed approach guarantees safety for the UAV whereas a baseline fails.

Keyword: loop detection

There is no result

Keyword: nerf

FDNeRF: Few-shot Dynamic Neural Radiance Fields for Face Reconstruction and Expression Editing

Authors: Jingbo Zhang, Xiaoyu Li, Ziyu Wan, Can Wang, Jing Liao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2208.05751
Pdf link: https://arxiv.org/pdf/2208.05751
Abstract
We propose a Few-shot Dynamic Neural Radiance Field (FDNeRF), the first NeRF-based method capable of reconstruction and expression editing of 3D faces based on a small number of dynamic images. Unlike existing dynamic NeRFs that require dense images as input and can only be modeled for a single identity, our method enables face reconstruction across different persons with few-shot inputs. Compared to state-of-the-art few-shot NeRFs designed for modeling static scenes, the proposed FDNeRF accepts view-inconsistent dynamic inputs and supports arbitrary facial expression editing, i.e., producing faces with novel expressions beyond the input ones. To handle the inconsistencies between dynamic inputs, we introduce a well-designed conditional feature warping (CFW) module to perform expression conditioned warping in 2D feature space, which is also identity adaptive and 3D constrained. As a result, features of different expressions are transformed into the target ones. We then construct a radiance field based on these view-consistent features and use volumetric rendering to synthesize novel views of the modeled faces. Extensive experiments with quantitative and qualitative evaluation demonstrate that our method outperforms existing dynamic and few-shot NeRFs on both 3D face reconstruction and expression editing tasks. Our code and model will be available upon acceptance.

RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild

Authors: Jason Y. Zhang, Deva Ramanan, Shubham Tulsiani
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2208.05963
Pdf link: https://arxiv.org/pdf/2208.05963
Abstract
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object. This task is a core component of classic geometric pipelines such as SfM and SLAM, and also serves as a vital pre-processing requirement for contemporary neural approaches (e.g. NeRF) to object reconstruction and view synthesis. In contrast to existing correspondence-driven methods that do not perform well given sparse views, we propose a top-down prediction based approach for estimating camera viewpoints. Our key technical insight is the use of an energy-based formulation for representing distributions over relative camera rotations, thus allowing us to explicitly represent multiple camera modes arising from object symmetries or views. Leveraging these relative predictions, we jointly estimate a consistent set of camera rotations from multiple images. We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories. Further, our probabilistic approach significantly outperforms directly regressing relative poses, suggesting that modeling multimodality is important for coherent joint reconstruction. We demonstrate that our system can be a stepping stone toward in-the-wild reconstruction from multi-view datasets. The project page with code and videos can be found at https://jasonyzhang.com/relpose.

Keyword: mapping

ROC: A New Paradigm for Lyric-to-Melody Generation

Authors: Ang Lv, Xu Tan, Tao Qin, Tie-Yan Liu, Rui Yan
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2208.05697
Pdf link: https://arxiv.org/pdf/2208.05697
Abstract
Lyric-to-melody generation is an important task in songwriting, and is also quite challenging due to its distinctive characteristics: the generated melodies should not only follow good musical patterns, but also align with features in lyrics such as rhythms and structures. These characteristics cannot be well handled by neural generation models that learn lyric-to-melody mapping in an end-to-end way, due to several issues: (1) lack of aligned lyric-melody training data to sufficiently learn lyric-melody feature alignment; (2) lack of controllability in generation to explicitly guarantee the lyric-melody feature alignment. In this paper, we propose ROC, a new paradigm for lyric-to-melody generation that addresses the above issues through a generation-retrieval pipeline. Specifically, our paradigm has two stages: (1) creation stage, where a huge amount of music pieces are generated by a neural-based melody language model and indexed in a database through several key features (e.g., chords, tonality, rhythm, and structural information including chorus or verse); (2) re-creation stage, where melodies are recreated by retrieving music pieces from the database according to the key features from lyrics and concatenating best music pieces based on composition guidelines and melody language model scores. Our ROC paradigm has several advantages: (1) It only needs unpaired melody data to train melody language model, instead of paired lyric-melody data in previous models. (2) It achieves good lyric-melody feature alignment in lyric-to-melody generation. Experiments on English and Chinese datasets demonstrate that ROC outperforms previous neural based lyric-to-melody generation models on both objective and subjective metrics.

Learning Based Joint Coding-Modulation for Digital Semantic Communication Systems

Authors: Yufei Bo, Yiheng Duan, Shuo Shao, Meixia Tao
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2208.05704
Pdf link: https://arxiv.org/pdf/2208.05704
Abstract
In learning-based semantic communications, neural networks have replaced different building blocks in traditional communication systems. However, the digital modulation still remains a challenge for neural networks. The intrinsic mechanism of neural network based digital modulation is mapping continuous output of the neural network encoder into discrete constellation symbols, which is a non-differentiable function that cannot be trained with existing gradient descend algorithms. To overcome this challenge, in this paper we develop a joint coding-modulation scheme for digital semantic communications with BPSK modulation. In our method, the neural network outputs the likelihood of each constellation point, instead of having a concrete mapping. A random code rather than a deterministic code is hence used, which preserves more information for the symbols with a close likelihood on each constellation point. The joint coding-modulation design can match the modulation process with channel states, and hence improve the performance of digital semantic communications. Experiment results show that our method outperforms existing digital modulation methods in semantic communications over a wide range of SNR, and outperforms neural network based analog modulation method in low SNR regime.

Predictive Angular Potential Field-based Obstacle Avoidance for Dynamic UAV Flights

Authors: Daniel Schleich, Sven Behnke
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2208.05873
Pdf link: https://arxiv.org/pdf/2208.05873
Abstract
In recent years, unmanned aerial vehicles (UAVs) are used for numerous inspection and video capture tasks. Manually controlling UAVs in the vicinity of obstacles is challenging, however, and poses a high risk of collisions. Even for autonomous flight, global navigation planning might be too slow to react to newly perceived obstacles. Disturbances such as wind might lead to deviations from the planned trajectories. In this work, we present a fast predictive obstacle avoidance method that does not depend on higher-level localization or mapping and maintains the dynamic flight capabilities of UAVs. It directly operates on LiDAR range images in real time and adjusts the current flight direction by computing angular potential fields within the range image. The velocity magnitude is subsequently determined based on a trajectory prediction and time-to-contact estimation. Our method is evaluated using Hardware-in-the-Loop simulations. It keeps the UAV at a safe distance to obstacles, while allowing higher flight velocities than previous reactive obstacle avoidance methods that directly operate on sensor data.

Keyword: localization

Towards Automated Key-Point Detection in Images with Partial Pool View

Authors: T. J. Woinoski, I. V. Bajic
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2208.05641
Pdf link: https://arxiv.org/pdf/2208.05641
Abstract
Sports analytics has been an up-and-coming field of research among professional sporting organizations and academic institutions alike. With the insurgence and collection of athlete data, the primary goal of such analysis is to improve athletes' performance in a measurable and quantifiable manner. This work is aimed at alleviating some of the challenges encountered in the collection of adequate swimming data. Past works on this subject have shown that the detection and tracking of swimmers is feasible, but not without challenges. Among these challenges are pool localization and determining the relative positions of the swimmers relative to the pool. This work presents two contributions towards solving these challenges. First, we present a pool model with invariant key-points relevant for swimming analytics. Second, we study the detectability of such key-points in images with partial pool view, which are challenging but also quite common in swimming race videos.

Predictive Angular Potential Field-based Obstacle Avoidance for Dynamic UAV Flights

Authors: Daniel Schleich, Sven Behnke
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2208.05873
Pdf link: https://arxiv.org/pdf/2208.05873
Abstract
In recent years, unmanned aerial vehicles (UAVs) are used for numerous inspection and video capture tasks. Manually controlling UAVs in the vicinity of obstacles is challenging, however, and poses a high risk of collisions. Even for autonomous flight, global navigation planning might be too slow to react to newly perceived obstacles. Disturbances such as wind might lead to deviations from the planned trajectories. In this work, we present a fast predictive obstacle avoidance method that does not depend on higher-level localization or mapping and maintains the dynamic flight capabilities of UAVs. It directly operates on LiDAR range images in real time and adjusts the current flight direction by computing angular potential fields within the range image. The velocity magnitude is subsequently determined based on a trajectory prediction and time-to-contact estimation. Our method is evaluated using Hardware-in-the-Loop simulations. It keeps the UAV at a safe distance to obstacles, while allowing higher flight velocities than previous reactive obstacle avoidance methods that directly operate on sensor data.

Keyword: transformer

ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal Fashion Design

Authors: Xujie Zhang, Yu Sha, Michael C. Kampffmeyer, Zhenyu Xie, Zequn Jie, Chengwen Huang, Jianqing Peng, Xiaodan Liang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2208.05621
Pdf link: https://arxiv.org/pdf/2208.05621
Abstract
Cross-modal fashion image synthesis has emerged as one of the most promising directions in the generation domain due to the vast untapped potential of incorporating multiple modalities and the wide range of fashion image applications. To facilitate accurate generation, cross-modal synthesis methods typically rely on Contrastive Language-Image Pre-training (CLIP) to align textual and garment information. In this work, we argue that simply aligning texture and garment information is not sufficient to capture the semantics of the visual information and therefore propose MaskCLIP. MaskCLIP decomposes the garments into semantic parts, ensuring fine-grained and semantically accurate alignment between the visual and text information. Building on MaskCLIP, we propose ARMANI, a unified cross-modal fashion designer with part-level garment-text alignment. ARMANI discretizes an image into uniform tokens based on a learned cross-modal codebook in its first stage and uses a Transformer to model the distribution of image tokens for a real image given the tokens of the control signals in its second stage. Contrary to prior approaches that also rely on two-stage paradigms, ARMANI introduces textual tokens into the codebook, making it possible for the model to utilize fine-grain semantic information to generate more realistic images. Further, by introducing a cross-modal Transformer, ARMANI is versatile and can accomplish image synthesis from various control signals, such as pure text, sketch images, and partial images. Extensive experiments conducted on our newly collected cross-modal fashion dataset demonstrate that ARMANI generates photo-realistic images in diverse synthesis tasks and outperforms existing state-of-the-art cross-modal image synthesis approaches.Our code is available at https://github.com/Harvey594/ARMANI.

Solving MathWord Problems Automatically with Heterogeneous Line Graph Transformer for Online Learning

Authors: Zijian Hu, Meng Jiang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2208.05645
Pdf link: https://arxiv.org/pdf/2208.05645
Abstract
This paper describes the design and implementation of a new machine learning model for online learning systems.We aim at improving the intelligent level of the systems by enabling an automated math word problem solver which can support a wide range of functions such as homework correction, difficulty estimation, and priority recommendation. We originally planned to employ existing models but realized that they processed a math word problem as a sequence or a homogeneous graph of tokens. Relationships between the multiple types of tokens such as entity, unit, rate, and number were ignored.We decided to design and implement a novel model to use such relational data to bridge the information gap between human-readable language and machine-understandable logical form. We propose a heterogeneous line graph transformer (HLGT) model that constructs a heterogeneous line graph via semantic role labeling on math word problems and then perform node representation learning aware of edge types. We add numerical comparison as an auxiliary task to improve model training for real-world use. Experimental results show that the proposed model achieves a better performance than existing models and suggest that it is still far below human performance. Information utilization and knowledge discovery is continuously needed to improve the online learning systems.

Semi-supervised Vision Transformers at Scale

Authors: Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2208.05688
Pdf link: https://arxiv.org/pdf/2208.05688
Abstract
We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks. To tackle this problem, we propose a new SSL pipeline, consisting of first un/self-supervised pre-training, followed by supervised fine-tuning, and finally semi-supervised fine-tuning. At the semi-supervised fine-tuning stage, we adopt an exponential moving average (EMA)-Teacher framework instead of the popular FixMatch, since the former is more stable and delivers higher accuracy for semi-supervised vision transformers. In addition, we propose a probabilistic pseudo mixup mechanism to interpolate unlabeled samples and their pseudo labels for improved regularization, which is important for training ViTs with weak inductive bias. Our proposed method, dubbed Semi-ViT, achieves comparable or better performance than the CNN counterparts in the semi-supervised classification setting. Semi-ViT also enjoys the scalability benefits of ViTs that can be readily scaled up to large-size models with increasing accuracies. For example, Semi-ViT-Huge achieves an impressive 80% top-1 accuracy on ImageNet using only 1% labels, which is comparable with Inception-v4 using 100% ImageNet labels.

Learning Point Processes using Recurrent Graph Network

Authors: Saurabh Dash, Xueyuan She, Saibal Mukhopadhyay
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2208.05736
Pdf link: https://arxiv.org/pdf/2208.05736
Abstract
We present a novel Recurrent Graph Network (RGN) approach for predicting discrete marked event sequences by learning the underlying complex stochastic process. Using the framework of Point Processes, we interpret a marked discrete event sequence as the superposition of different sequences each of a unique type. The nodes of the Graph Network use LSTM to incorporate past information whereas a Graph Attention Network (GAT Network) introduces strong inductive biases to capture the interaction between these different types of events. By changing the self-attention mechanism from attending over past events to attending over event types, we obtain a reduction in time and space complexity from $\mathcal{O}(N^2)$ (total number of events) to $\mathcal{O}(|\mathcal{Y}|^2)$ (number of event types). Experiments show that the proposed approach improves performance in log-likelihood, prediction and goodness-of-fit tasks with lower time and space complexity compared to state-of-the art Transformer based architectures.

HERO: HiErarchical spatio-tempoRal reasOning with Contrastive Action Correspondence for End-to-End Video Object Grounding

Authors: Mengze Li, Tianbao Wang, Haoyu Zhang, Shengyu Zhang, Zhou Zhao, Wenqiao Zhang, Jiaxu Miao, Shiliang Pu, Fei Wu
Subjects: Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2208.05818
Pdf link: https://arxiv.org/pdf/2208.05818
Abstract
Video Object Grounding (VOG) is the problem of associating spatial object regions in the video to a descriptive natural language query. This is a challenging vision-language task that necessitates constructing the correct cross-modal correspondence and modeling the appropriate spatio-temporal context of the query video and caption, thereby localizing the specific objects accurately. In this paper, we tackle this task by a novel framework called HiErarchical spatio-tempoRal reasOning (HERO) with contrastive action correspondence. We study the VOG task at two aspects that prior works overlooked: (1) Contrastive Action Correspondence-aware Retrieval. Notice that the fine-grained video semantics (e.g., multiple actions) is not totally aligned with the annotated language query (e.g., single action), we first introduce the weakly-supervised contrastive learning that classifies the video as action-consistent and action-independent frames relying on the video-caption action semantic correspondence. Such a design can build the fine-grained cross-modal correspondence for more accurate subsequent VOG. (2) Hierarchical Spatio-temporal Modeling Improvement. While transformer-based VOG models present their potential in sequential modality (i.e., video and caption) modeling, existing evidence also indicates that the transformer suffers from the issue of the insensitive spatio-temporal locality. Motivated by that, we carefully design the hierarchical reasoning layers to decouple fully connected multi-head attention and remove the redundant interfering correlations. Furthermore, our proposed pyramid and shifted alignment mechanisms are effective to improve the cross-modal information utilization of neighborhood spatial regions and temporal frames. We conducted extensive experiments to show our HERO outperforms existing techniques by achieving significant improvement on two benchmark datasets.

Hybrid Transformer Network for Deepfake Detection

Authors: Sohail Ahmed Khan, Duc-Tien Dang-Nguyen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2208.05820
Pdf link: https://arxiv.org/pdf/2208.05820
Abstract
Deepfake media is becoming widespread nowadays because of the easily available tools and mobile apps which can generate realistic looking deepfake videos/images without requiring any technical knowledge. With further advances in this field of technology in the near future, the quantity and quality of deepfake media is also expected to flourish, while making deepfake media a likely new practical tool to spread mis/disinformation. Because of these concerns, the deepfake media detection tools are becoming a necessity. In this study, we propose a novel hybrid transformer network utilizing early feature fusion strategy for deepfake video detection. Our model employs two different CNN networks, i.e., (1) XceptionNet and (2) EfficientNet-B4 as feature extractors. We train both feature extractors along with the transformer in an end-to-end manner on FaceForensics++, DFDC benchmarks. Our model, while having relatively straightforward architecture, achieves comparable results to other more advanced state-of-the-art approaches when evaluated on FaceForensics++ and DFDC benchmarks. Besides this, we also propose novel face cut-out augmentations, as well as random cut-out augmentations. We show that the proposed augmentations improve the detection performance of our model and reduce overfitting. In addition to that, we show that our model is capable of learning from considerably small amount of data.

Domain-Specific Text Generation for Machine Translation

Authors: Yasmin Moslem, Rejwanul Haque, John D. Kelleher, Andy Way
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2208.05909
Pdf link: https://arxiv.org/pdf/2208.05909
Abstract
Preservation of domain knowledge from the source to target is crucial in any translation workflow. It is common in the translation industry to receive highly specialized projects, where there is hardly any parallel in-domain data. In such scenarios where there is insufficient in-domain data to fine-tune Machine Translation (MT) models, producing translations that are consistent with the relevant context is challenging. In this work, we propose a novel approach to domain adaptation leveraging state-of-the-art pretrained language models (LMs) for domain-specific data augmentation for MT, simulating the domain characteristics of either (a) a small bilingual dataset, or (b) the monolingual source text to be translated. Combining this idea with back-translation, we can generate huge amounts of synthetic bilingual in-domain data for both use cases. For our investigation, we use the state-of-the-art Transformer architecture. We employ mixed fine-tuning to train models that significantly improve translation of in-domain texts. More specifically, in both scenarios, our proposed methods achieve improvements of approximately 5-6 BLEU and 2-3 BLEU, respectively, on the Arabic-to-English and English-to-Arabic language pairs. Furthermore, the outcome of human evaluation corroborates the automatic evaluation results.

Keyword: autonomous driving

There is no result

zhuhu00 self-assigned this Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New submissions for Fri, 12 Aug 22 #215

New submissions for Fri, 12 Aug 22 #215

zhuhu00 commented Aug 12, 2022

New submissions for Fri, 12 Aug 22 #215

New submissions for Fri, 12 Aug 22 #215

Comments

zhuhu00 commented Aug 12, 2022

New submissions for Fri, 12 Aug 22

Keyword: SLAM

RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild

Keyword: odometry

Keyword: livox

Keyword: loam

Keyword: lidar

Predictive Angular Potential Field-based Obstacle Avoidance for Dynamic UAV Flights

Barrier Certificate based Safe Control for LiDAR-based Systems under Sensor Faults and Attacks

Keyword: loop detection

Keyword: nerf

FDNeRF: Few-shot Dynamic Neural Radiance Fields for Face Reconstruction and Expression Editing

RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild

Keyword: mapping

ROC: A New Paradigm for Lyric-to-Melody Generation

Learning Based Joint Coding-Modulation for Digital Semantic Communication Systems

Predictive Angular Potential Field-based Obstacle Avoidance for Dynamic UAV Flights

Keyword: localization

Towards Automated Key-Point Detection in Images with Partial Pool View

Predictive Angular Potential Field-based Obstacle Avoidance for Dynamic UAV Flights

Keyword: transformer

ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal Fashion Design

Solving MathWord Problems Automatically with Heterogeneous Line Graph Transformer for Online Learning

Semi-supervised Vision Transformers at Scale

Learning Point Processes using Recurrent Graph Network

HERO: HiErarchical spatio-tempoRal reasOning with Contrastive Action Correspondence for End-to-End Video Object Grounding

Hybrid Transformer Network for Deepfake Detection

Domain-Specific Text Generation for Machine Translation

Keyword: autonomous driving