📚 Paper Review - 2026-05-27

# 📚 Daily Paper Review - 2026-05-27

Found **10** relevant papers today. Please review and approve/reject.

---

## 1. EchoPilot: Training-Free Ultrasound Video Segmentation via Scale-Space Semantic Prompting and Reliability-Gated Memory

**Score:** `5.5/10` | **arXiv:** [2605.25944v1](http://arxiv.org/abs/2605.25944v1)

**Authors:** Ruiqiang Xiao, Zhaohu Xing, Yijun Yang...

**Relevance:**
- 🎯 Field Match: 0.51/10 - Matches: segmentation
- 🏆 Venue: MICCAI (10/10)
- 💻 Code: ✅ Available

**AI Summary:**
Ultrasound video segmentation is clinically valuable yet difficult due to speckle noise, weak boundaries, and rapid anatomical deformation. Recent promptable foundation models enable point-guided segmentation, but their direct deployment in ultrasound remains unreliable: a single point provides insufficient spatial context to resolve scale ambiguity, and greedy memory updates amplify early errors ...

**Key Contributions:**
- Ultrasound video segmentation is clinically valuable yet difficult due to speckle noise, weak boundaries, and rapid anatomical deformation.
- Recent promptable foundation models enable point-guided segmentation, but their direct deployment in ultrasound remains unreliable: a single point provides insufficient spatial context to resolve scale ambiguity, and greedy memory updates amplify early errors into severe temporal drift.
- We present EchoPilot, a training-free framework for ultrasound video segmentation under sparse first-frame interaction, requiring only a single point click and an anatomical category name.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.25944v1) | [📥 PDF](https://arxiv.org/pdf/2605.25944v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 2. Global Structure-from-Motion Meets Feedforward Reconstruction

**Score:** `4.9/10` | **arXiv:** [2605.26103v1](http://arxiv.org/abs/2605.26103v1)

**Authors:** Linfei Pan, Johannes Schönberge, Marc Pollefeys

**Relevance:**
- 🎯 Field Match: 1.02/10 - Matches: 3d reconstruction, computer vision
- 🏆 Venue: CVPR (10/10)
- 💻 Code: ❌ Not mentioned

**AI Summary:**
Structure-from-Motion -- the process of simultaneously estimating camera poses and 3D scene structure from a collection of images -- remains a central challenge in computer vision, with many open problems yet to be solved. Recent advances in feedforward 3D reconstruction have made significant strides in overcoming persistent failure cases of classical SfM methods, particularly in scenarios charact...

**Key Contributions:**
- Structure-from-Motion -- the process of simultaneously estimating camera poses and 3D scene structure from a collection of images -- remains a central challenge in computer vision, with many open problems yet to be solved.
- Recent advances in feedforward 3D reconstruction have made significant strides in overcoming persistent failure cases of classical SfM methods, particularly in scenarios characterized by low texture, limited overlap, and symmetries.
- However, while feedforward approaches excel in these challenging conditions, they often face limitations regarding scalability, accuracy, or robustness, and typically fall short of classical methods in standard reconstruction settings.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.26103v1) | [📥 PDF](https://arxiv.org/pdf/2605.26103v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 3. Hidden in Plain Tokens: Simply Robust, Gradient-Free Watermark for Synthetic Audio

**Score:** `4.3/10` | **arXiv:** [2605.25967v1](http://arxiv.org/abs/2605.25967v1)

**Authors:** Georgios Milis, Yubin Qin, Yihan Wu...

**Relevance:**
- 🎯 Field Match: 0.0/10 - Matches: 
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned

**AI Summary:**
As policy catches up with the capabilities of generative AI, watermarking is central to content provenance efforts. Inference-time watermarks for autoregressive models are unfit for continuous modalities due to discretization inconsistencies. Existing methods overcome this by finetuning the modality tokenizers, nullifying the watermark's training-free advantage. In this work, motivated by the voca...

**Key Contributions:**
- As policy catches up with the capabilities of generative AI, watermarking is central to content provenance efforts.
- Inference-time watermarks for autoregressive models are unfit for continuous modalities due to discretization inconsistencies.
- Existing methods overcome this by finetuning the modality tokenizers, nullifying the watermark's training-free advantage.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.25967v1) | [📥 PDF](https://arxiv.org/pdf/2605.25967v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 4. Conditional KRR: Injecting Unpenalized Features into Kernel Methods with Applications to Kernel Thresholding

**Score:** `4.2/10` | **arXiv:** [2605.26067v1](http://arxiv.org/abs/2605.26067v1)

**Authors:** Rustem Takhanov, Zhenisbek Assylbekov

**Relevance:**
- 🎯 Field Match: 0.0/10 - Matches: 
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned

**AI Summary:**
Conditionally positive definite (CPD) kernels are defined with respect to a function class $\mathcal{F}$. It is well known that such a kernel $K$ is associated with its native space (defined analogously to an RKHS), which in turn gives rise to a learning method -- called conditional kernel ridge regression (conditional KRR) due to its analogy with KRR -- where the estimated regression function is ...

**Key Contributions:**
- Conditionally positive definite (CPD) kernels are defined with respect to a function class $\mathcal{F}$.
- It is well known that such a kernel $K$ is associated with its native space (defined analogously to an RKHS), which in turn gives rise to a learning method -- called conditional kernel ridge regression (conditional KRR) due to its analogy with KRR -- where the estimated regression function is penalized by the square of its native space norm.
- This method is of interest because it can be viewed as classical linear regression, with features specified by $\mathcal{F}$, followed by the application of standard KRR to the residual (unexplained) component of the target variable.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.26067v1) | [📥 PDF](https://arxiv.org/pdf/2605.26067v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 5. Towards 3D heart mesh generation using contactless radar imaging and physics-informed neural network

**Score:** `4.1/10` | **arXiv:** [2605.26003v1](http://arxiv.org/abs/2605.26003v1)

**Authors:** Jinye Li, Chenxi Fu, Minghang Zheng...

**Relevance:**
- 🎯 Field Match: 1.44/10 - Matches: cardiac, heart
- 🏆 Venue: None (5.0/10)
- 💻 Code: ❌ Not mentioned

**AI Summary:**
Cardiac function evaluation necessitates continuous, non-invasive monitoring, a capability limited in MRI. Millimeter-wave (mmWave) radar and its Synthetic Aperture Radar (SAR) mode offer a privacy-preserving and portable point-of-care clinical applications. However, reconstructing high-fidelity 3D cardiac geometry from SAR remains an open challenge. Traditional radar methods generate sparse point...

**Key Contributions:**
- Cardiac function evaluation necessitates continuous, non-invasive monitoring, a capability limited in MRI.
- Millimeter-wave (mmWave) radar and its Synthetic Aperture Radar (SAR) mode offer a privacy-preserving and portable point-of-care clinical applications.
- However, reconstructing high-fidelity 3D cardiac geometry from SAR remains an open challenge.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.26003v1) | [📥 PDF](https://arxiv.org/pdf/2605.26003v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 6. Reinforcing Few-step Generators via Reward-Tilted Distribution Matching

**Score:** `4.1/10` | **arXiv:** [2605.26108v1](http://arxiv.org/abs/2605.26108v1)

**Authors:** Yushi Huang, Xiangxin Zhou, Ruoyu Wang...

**Relevance:**
- 🎯 Field Match: 0.0/10 - Matches: 
- 🏆 Venue: None (5.0/10)
- 💻 Code: ✅ Available

**AI Summary:**
Recent advances in few-step diffusion distillation have enabled efficient image generation, yet aligning these models with human preferences remains challenging. We propose Reward-Tilted Distribution Matching Distillation (RTDMD), a two-stage framework that unifies distribution matching distillation with reward-guided reinforcement learning for few-step flow generators. We show that minimizing the...

**Key Contributions:**
- Recent advances in few-step diffusion distillation have enabled efficient image generation, yet aligning these models with human preferences remains challenging.
- We propose Reward-Tilted Distribution Matching Distillation (RTDMD), a two-stage framework that unifies distribution matching distillation with reward-guided reinforcement learning for few-step flow generators.
- We show that minimizing the KL divergence to a reward-tilted teacher distribution naturally decomposes into a distribution matching term and a reward maximization term.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.26108v1) | [📥 PDF](https://arxiv.org/pdf/2605.26108v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 7. TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

**Score:** `4.1/10` | **arXiv:** [2605.26115v1](http://arxiv.org/abs/2605.26115v1)

**Authors:** Weijie Wang, Zimu Li, Jinchuan Shi...

**Relevance:**
- 🎯 Field Match: 0.59/10 - Matches: 3d reconstruction
- 🏆 Venue: None (5.0/10)
- 💻 Code: ✅ Available

**AI Summary:**
Sparse-view 3D reconstruction is increasingly addressed with feed-forward splatting networks that predict explicit primitives directly from images. Yet most existing methods remain centered on Gaussian primitives and expose surfaces only indirectly: extracting a usable mesh for downstream simulation, physics reasoning, or embodied interaction still requires expensive post-hoc steps that break the ...

**Key Contributions:**
- Sparse-view 3D reconstruction is increasingly addressed with feed-forward splatting networks that predict explicit primitives directly from images.
- Yet most existing methods remain centered on Gaussian primitives and expose surfaces only indirectly: extracting a usable mesh for downstream simulation, physics reasoning, or embodied interaction still requires expensive post-hoc steps that break the feed-forward promise.
- This limitation is especially pronounced in pose-free settings, where scene structure and camera parameters must be estimated jointly from sparse observations.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.26115v1) | [📥 PDF](https://arxiv.org/pdf/2605.26115v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 8. AnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and Beyond

**Score:** `4.1/10` | **arXiv:** [2605.26113v1](http://arxiv.org/abs/2605.26113v1)

**Authors:** Haiming Zhang, Junfei Zhou, Feng Jiang...

**Relevance:**
- 🎯 Field Match: 0.59/10 - Matches: 3d reconstruction
- 🏆 Venue: None (5.0/10)
- 💻 Code: ✅ Available

**AI Summary:**
Generating high-fidelity and controllable synthetic data is critical for advancing end-to-end autonomous driving, particularly for addressing the long tail of rare safety-critical scenarios. Existing occupancy-guided methods typically rely on shallow conditioning mechanisms and reference-frame-dependent video synthesis, which limits fine-grained controllability from arbitrary BEV layouts and restr...

**Key Contributions:**
- Generating high-fidelity and controllable synthetic data is critical for advancing end-to-end autonomous driving, particularly for addressing the long tail of rare safety-critical scenarios.
- Existing occupancy-guided methods typically rely on shallow conditioning mechanisms and reference-frame-dependent video synthesis, which limits fine-grained controllability from arbitrary BEV layouts and restricts their applicability for scalable simulation.
- In this paper, we propose AnyScene, a unified occupancy-centric framework for driving scene generation.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.26113v1) | [📥 PDF](https://arxiv.org/pdf/2605.26113v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 9. Where Concept Erasure Should Occur: Concept-Layer Alignment in Text-to-Video Diffusion Models

**Score:** `4.0/10` | **arXiv:** [2605.25941v1](http://arxiv.org/abs/2605.25941v1)

**Authors:** Yiwei Xie, Ping Liu, Zheng Zhang

**Relevance:**
- 🎯 Field Match: 0.0/10 - Matches: 
- 🏆 Venue: ICML (10/10)
- 💻 Code: ❌ Not mentioned

**AI Summary:**
Text-to-video diffusion transformers encode semantic information unevenly across model depth, which constrains effective concept erasure. We identify a representational bottleneck, termed concept-layer topological alignment, under which target concepts exhibit higher separability at certain representational depths. Outside these depths, concept and non-target signals remain strongly entangled, lim...

**Key Contributions:**
- Text-to-video diffusion transformers encode semantic information unevenly across model depth, which constrains effective concept erasure.
- We identify a representational bottleneck, termed concept-layer topological alignment, under which target concepts exhibit higher separability at certain representational depths.
- Outside these depths, concept and non-target signals remain strongly entangled, limiting the effectiveness of depth-specific erasure.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.25941v1) | [📥 PDF](https://arxiv.org/pdf/2605.25941v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---

## 10. F-RNG: Feed-Forward Relightable Neural Gaussians

**Score:** `4.0/10` | **arXiv:** [2605.25975v1](http://arxiv.org/abs/2605.25975v1)

**Authors:** Guangming Fu, Jiahui Fan, Jian Yang...

**Relevance:**
- 🎯 Field Match: 1.69/10 - Matches: 3d gaussian, gaussian splatting
- 🏆 Venue: None (5.0/10)
- 💻 Code: ❌ Not mentioned

**AI Summary:**
Capturing relightable 3D assets from real-world objects is a widely researched problem. Several per-scene optimization-based methods, based on 3D Gaussian splatting (3DGS), support relighting; however, they usually require dense input views, and their overfitting nature makes it difficult to generalize across scenes. Unlike per-scene optimization methods, generalized feed-forward models can direct...

**Key Contributions:**
- Capturing relightable 3D assets from real-world objects is a widely researched problem.
- Several per-scene optimization-based methods, based on 3D Gaussian splatting (3DGS), support relighting; however, they usually require dense input views, and their overfitting nature makes it difficult to generalize across scenes.
- Unlike per-scene optimization methods, generalized feed-forward models can directly reconstruct Gaussians from sparse input views.

**Links:** [📄 Paper](http://arxiv.org/abs/2605.25975v1) | [📥 PDF](https://arxiv.org/pdf/2605.25975v1)

**Actions:**
- ✅ Approve: Add label `approved` and comment "approve"
- ❌ Reject: Add label `rejected` and comment "reject"
- ⭐ Important: Add label `starred`

---


## How to Review

1. Read the summaries above
2. Check paper links for more details
3. Add labels to indicate your decision:
   - `approved` - Add to collection
   - `rejected` - Skip this paper
   - `starred` - Mark as particularly important
4. Comment "approve" or "reject" to trigger automation

**Note:** Papers with `approved` label will be automatically added to the collection.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📚 Paper Review - 2026-05-27 #245

📚 Daily Paper Review - 2026-05-27

1. EchoPilot: Training-Free Ultrasound Video Segmentation via Scale-Space Semantic Prompting and Reliability-Gated Memory

2. Global Structure-from-Motion Meets Feedforward Reconstruction

3. Hidden in Plain Tokens: Simply Robust, Gradient-Free Watermark for Synthetic Audio

4. Conditional KRR: Injecting Unpenalized Features into Kernel Methods with Applications to Kernel Thresholding

5. Towards 3D heart mesh generation using contactless radar imaging and physics-informed neural network

6. Reinforcing Few-step Generators via Reward-Tilted Distribution Matching

7. TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

8. AnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and Beyond

9. Where Concept Erasure Should Occur: Concept-Layer Alignment in Text-to-Video Diffusion Models

10. F-RNG: Feed-Forward Relightable Neural Gaussians

How to Review

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

📚 Paper Review - 2026-05-27 #245

Description

📚 Daily Paper Review - 2026-05-27

1. EchoPilot: Training-Free Ultrasound Video Segmentation via Scale-Space Semantic Prompting and Reliability-Gated Memory

2. Global Structure-from-Motion Meets Feedforward Reconstruction

3. Hidden in Plain Tokens: Simply Robust, Gradient-Free Watermark for Synthetic Audio

4. Conditional KRR: Injecting Unpenalized Features into Kernel Methods with Applications to Kernel Thresholding

5. Towards 3D heart mesh generation using contactless radar imaging and physics-informed neural network

6. Reinforcing Few-step Generators via Reward-Tilted Distribution Matching

7. TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

8. AnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and Beyond

9. Where Concept Erasure Should Occur: Concept-Layer Alignment in Text-to-Video Diffusion Models

10. F-RNG: Feed-Forward Relightable Neural Gaussians

How to Review

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions