Skip to content

📚 Paper Review - 2026-05-27 #245

@github-actions

Description

@github-actions

📚 Daily Paper Review - 2026-05-27

Found 10 relevant papers today. Please review and approve/reject.


1. EchoPilot: Training-Free Ultrasound Video Segmentation via Scale-Space Semantic Prompting and Reliability-Gated Memory

Score: 5.5/10 | arXiv: 2605.25944v1

Authors: Ruiqiang Xiao, Zhaohu Xing, Yijun Yang...

Relevance:

  • 🎯 Field Match: 0.51/10 - Matches: segmentation
  • 🏆 Venue: MICCAI (10/10)
  • 💻 Code: ✅ Available

AI Summary:
Ultrasound video segmentation is clinically valuable yet difficult due to speckle noise, weak boundaries, and rapid anatomical deformation. Recent promptable foundation models enable point-guided segmentation, but their direct deployment in ultrasound remains unreliable: a single point provides insufficient spatial context to resolve scale ambiguity, and greedy memory updates amplify early errors ...

Key Contributions:

  • Ultrasound video segmentation is clinically valuable yet difficult due to speckle noise, weak boundaries, and rapid anatomical deformation.
  • Recent promptable foundation models enable point-guided segmentation, but their direct deployment in ultrasound remains unreliable: a single point provides insufficient spatial context to resolve scale ambiguity, and greedy memory updates amplify early errors into severe temporal drift.
  • We present EchoPilot, a training-free framework for ultrasound video segmentation under sparse first-frame interaction, requiring only a single point click and an anatomical category name.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

2. Global Structure-from-Motion Meets Feedforward Reconstruction

Score: 4.9/10 | arXiv: 2605.26103v1

Authors: Linfei Pan, Johannes Schönberge, Marc Pollefeys

Relevance:

  • 🎯 Field Match: 1.02/10 - Matches: 3d reconstruction, computer vision
  • 🏆 Venue: CVPR (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Structure-from-Motion -- the process of simultaneously estimating camera poses and 3D scene structure from a collection of images -- remains a central challenge in computer vision, with many open problems yet to be solved. Recent advances in feedforward 3D reconstruction have made significant strides in overcoming persistent failure cases of classical SfM methods, particularly in scenarios charact...

Key Contributions:

  • Structure-from-Motion -- the process of simultaneously estimating camera poses and 3D scene structure from a collection of images -- remains a central challenge in computer vision, with many open problems yet to be solved.
  • Recent advances in feedforward 3D reconstruction have made significant strides in overcoming persistent failure cases of classical SfM methods, particularly in scenarios characterized by low texture, limited overlap, and symmetries.
  • However, while feedforward approaches excel in these challenging conditions, they often face limitations regarding scalability, accuracy, or robustness, and typically fall short of classical methods in standard reconstruction settings.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

3. Hidden in Plain Tokens: Simply Robust, Gradient-Free Watermark for Synthetic Audio

Score: 4.3/10 | arXiv: 2605.25967v1

Authors: Georgios Milis, Yubin Qin, Yihan Wu...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
As policy catches up with the capabilities of generative AI, watermarking is central to content provenance efforts. Inference-time watermarks for autoregressive models are unfit for continuous modalities due to discretization inconsistencies. Existing methods overcome this by finetuning the modality tokenizers, nullifying the watermark's training-free advantage. In this work, motivated by the voca...

Key Contributions:

  • As policy catches up with the capabilities of generative AI, watermarking is central to content provenance efforts.
  • Inference-time watermarks for autoregressive models are unfit for continuous modalities due to discretization inconsistencies.
  • Existing methods overcome this by finetuning the modality tokenizers, nullifying the watermark's training-free advantage.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

4. Conditional KRR: Injecting Unpenalized Features into Kernel Methods with Applications to Kernel Thresholding

Score: 4.2/10 | arXiv: 2605.26067v1

Authors: Rustem Takhanov, Zhenisbek Assylbekov

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Conditionally positive definite (CPD) kernels are defined with respect to a function class $\mathcal{F}$. It is well known that such a kernel $K$ is associated with its native space (defined analogously to an RKHS), which in turn gives rise to a learning method -- called conditional kernel ridge regression (conditional KRR) due to its analogy with KRR -- where the estimated regression function is ...

Key Contributions:

  • Conditionally positive definite (CPD) kernels are defined with respect to a function class $\mathcal{F}$.
  • It is well known that such a kernel $K$ is associated with its native space (defined analogously to an RKHS), which in turn gives rise to a learning method -- called conditional kernel ridge regression (conditional KRR) due to its analogy with KRR -- where the estimated regression function is penalized by the square of its native space norm.
  • This method is of interest because it can be viewed as classical linear regression, with features specified by $\mathcal{F}$, followed by the application of standard KRR to the residual (unexplained) component of the target variable.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

5. Towards 3D heart mesh generation using contactless radar imaging and physics-informed neural network

Score: 4.1/10 | arXiv: 2605.26003v1

Authors: Jinye Li, Chenxi Fu, Minghang Zheng...

Relevance:

  • 🎯 Field Match: 1.44/10 - Matches: cardiac, heart
  • 🏆 Venue: None (5.0/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Cardiac function evaluation necessitates continuous, non-invasive monitoring, a capability limited in MRI. Millimeter-wave (mmWave) radar and its Synthetic Aperture Radar (SAR) mode offer a privacy-preserving and portable point-of-care clinical applications. However, reconstructing high-fidelity 3D cardiac geometry from SAR remains an open challenge. Traditional radar methods generate sparse point...

Key Contributions:

  • Cardiac function evaluation necessitates continuous, non-invasive monitoring, a capability limited in MRI.
  • Millimeter-wave (mmWave) radar and its Synthetic Aperture Radar (SAR) mode offer a privacy-preserving and portable point-of-care clinical applications.
  • However, reconstructing high-fidelity 3D cardiac geometry from SAR remains an open challenge.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

6. Reinforcing Few-step Generators via Reward-Tilted Distribution Matching

Score: 4.1/10 | arXiv: 2605.26108v1

Authors: Yushi Huang, Xiangxin Zhou, Ruoyu Wang...

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: None (5.0/10)
  • 💻 Code: ✅ Available

AI Summary:
Recent advances in few-step diffusion distillation have enabled efficient image generation, yet aligning these models with human preferences remains challenging. We propose Reward-Tilted Distribution Matching Distillation (RTDMD), a two-stage framework that unifies distribution matching distillation with reward-guided reinforcement learning for few-step flow generators. We show that minimizing the...

Key Contributions:

  • Recent advances in few-step diffusion distillation have enabled efficient image generation, yet aligning these models with human preferences remains challenging.
  • We propose Reward-Tilted Distribution Matching Distillation (RTDMD), a two-stage framework that unifies distribution matching distillation with reward-guided reinforcement learning for few-step flow generators.
  • We show that minimizing the KL divergence to a reward-tilted teacher distribution naturally decomposes into a distribution matching term and a reward maximization term.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

7. TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

Score: 4.1/10 | arXiv: 2605.26115v1

Authors: Weijie Wang, Zimu Li, Jinchuan Shi...

Relevance:

  • 🎯 Field Match: 0.59/10 - Matches: 3d reconstruction
  • 🏆 Venue: None (5.0/10)
  • 💻 Code: ✅ Available

AI Summary:
Sparse-view 3D reconstruction is increasingly addressed with feed-forward splatting networks that predict explicit primitives directly from images. Yet most existing methods remain centered on Gaussian primitives and expose surfaces only indirectly: extracting a usable mesh for downstream simulation, physics reasoning, or embodied interaction still requires expensive post-hoc steps that break the ...

Key Contributions:

  • Sparse-view 3D reconstruction is increasingly addressed with feed-forward splatting networks that predict explicit primitives directly from images.
  • Yet most existing methods remain centered on Gaussian primitives and expose surfaces only indirectly: extracting a usable mesh for downstream simulation, physics reasoning, or embodied interaction still requires expensive post-hoc steps that break the feed-forward promise.
  • This limitation is especially pronounced in pose-free settings, where scene structure and camera parameters must be estimated jointly from sparse observations.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

8. AnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and Beyond

Score: 4.1/10 | arXiv: 2605.26113v1

Authors: Haiming Zhang, Junfei Zhou, Feng Jiang...

Relevance:

  • 🎯 Field Match: 0.59/10 - Matches: 3d reconstruction
  • 🏆 Venue: None (5.0/10)
  • 💻 Code: ✅ Available

AI Summary:
Generating high-fidelity and controllable synthetic data is critical for advancing end-to-end autonomous driving, particularly for addressing the long tail of rare safety-critical scenarios. Existing occupancy-guided methods typically rely on shallow conditioning mechanisms and reference-frame-dependent video synthesis, which limits fine-grained controllability from arbitrary BEV layouts and restr...

Key Contributions:

  • Generating high-fidelity and controllable synthetic data is critical for advancing end-to-end autonomous driving, particularly for addressing the long tail of rare safety-critical scenarios.
  • Existing occupancy-guided methods typically rely on shallow conditioning mechanisms and reference-frame-dependent video synthesis, which limits fine-grained controllability from arbitrary BEV layouts and restricts their applicability for scalable simulation.
  • In this paper, we propose AnyScene, a unified occupancy-centric framework for driving scene generation.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

9. Where Concept Erasure Should Occur: Concept-Layer Alignment in Text-to-Video Diffusion Models

Score: 4.0/10 | arXiv: 2605.25941v1

Authors: Yiwei Xie, Ping Liu, Zheng Zhang

Relevance:

  • 🎯 Field Match: 0.0/10 - Matches:
  • 🏆 Venue: ICML (10/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Text-to-video diffusion transformers encode semantic information unevenly across model depth, which constrains effective concept erasure. We identify a representational bottleneck, termed concept-layer topological alignment, under which target concepts exhibit higher separability at certain representational depths. Outside these depths, concept and non-target signals remain strongly entangled, lim...

Key Contributions:

  • Text-to-video diffusion transformers encode semantic information unevenly across model depth, which constrains effective concept erasure.
  • We identify a representational bottleneck, termed concept-layer topological alignment, under which target concepts exhibit higher separability at certain representational depths.
  • Outside these depths, concept and non-target signals remain strongly entangled, limiting the effectiveness of depth-specific erasure.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

10. F-RNG: Feed-Forward Relightable Neural Gaussians

Score: 4.0/10 | arXiv: 2605.25975v1

Authors: Guangming Fu, Jiahui Fan, Jian Yang...

Relevance:

  • 🎯 Field Match: 1.69/10 - Matches: 3d gaussian, gaussian splatting
  • 🏆 Venue: None (5.0/10)
  • 💻 Code: ❌ Not mentioned

AI Summary:
Capturing relightable 3D assets from real-world objects is a widely researched problem. Several per-scene optimization-based methods, based on 3D Gaussian splatting (3DGS), support relighting; however, they usually require dense input views, and their overfitting nature makes it difficult to generalize across scenes. Unlike per-scene optimization methods, generalized feed-forward models can direct...

Key Contributions:

  • Capturing relightable 3D assets from real-world objects is a widely researched problem.
  • Several per-scene optimization-based methods, based on 3D Gaussian splatting (3DGS), support relighting; however, they usually require dense input views, and their overfitting nature makes it difficult to generalize across scenes.
  • Unlike per-scene optimization methods, generalized feed-forward models can directly reconstruct Gaussians from sparse input views.

Links: 📄 Paper | 📥 PDF

Actions:

  • ✅ Approve: Add label approved and comment "approve"
  • ❌ Reject: Add label rejected and comment "reject"
  • ⭐ Important: Add label starred

How to Review

  1. Read the summaries above
  2. Check paper links for more details
  3. Add labels to indicate your decision:
    • approved - Add to collection
    • rejected - Skip this paper
    • starred - Mark as particularly important
  4. Comment "approve" or "reject" to trigger automation

Note: Papers with approved label will be automatically added to the collection.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions