- githubのmarkdownプレビューだと数式が崩壊してしまうため、pdf参照
- vision-basic
- u-net : U-Net(Convolutional Networks for Biomedical Image Segmentation)
- vision-transformer : VisionTransformer(AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE)
- swin-transformer : SwinTransformer(Swin Transformer: Hierarchical Vision Transformer using Shifted Windows)
- distillation : Distilling the Knowledge in a Neural Network
- maxvit : MaxViT: Multi-Axis Vision Transformer
- mae : Masked Autoencoders Are Scalable Vision Learners
- simmim : SimMIM: a Simple Framework for Masked Image Modeling
- revnet : The Reversible Residual Network: Backpropagation Without Storing Activations
- rev-vit : Reversible Vision Transformers
- diffusion
- ddpm : Denoising Diffusion Probabilistc Models
- palette : Palette: Image-to-Image Diffusion Models
- ddim : Denoising Diffusion Implicit Models
- improved-ddpm : Improved Denoising Diffusion Probabilistic Models
- adm : Diffusion Models Beat GANs on Image Synthesis
- glide : Guided Language to Image Diffusion for Generation and Editing
- ldm : Latent Diffusion Model(Stable diffusion)
- cdm : Cascaded Diffusion Model
- inpaint-survey : Deep Learning-based Image and Video Inpainting: A Survey
- super-resolution
- srcnn-vdsr-fsrcnn-fspcn : 超解像の歴史(CNNあたりからGAN登場まで)
- swinir : SwinIR(SwinIR: Image Restoration Using Swin Transformer)
- hat : HAT-L(Hybrid Attention Transformer)
- drct : DRCT(Dense Residual Connected Transformer)
- sr3 : Image Super-Resolution via Iterative Refinement
- ipg : Image Processing GNN: Breaking Rigidity in Super-Resolution
- yonos-sr : You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
- hmanet : HMANet: Hybrid Multi-Axis Aggregation Network for Image Super-Resolution
- diffusion-sr-survey : Diffusion Models, Image Super-Resolution And Everything: A Survey
- tr-misr : TR-MISR: Multiimage Super-Resolution Based on Feature Fusion With Transformers
- div2k : NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study
- lsdir : LSDIR: A Large Scale Dataset for Image Restoration
- df2k : DF2K
- ntire-challenge-on-lfsr : NTIRE 2024 Challenge on Light Field Image Super-Resolution: Methods and Results
- epit : (EPIT)Learning Non-Local Spatial-Angular Correlation for Light Field Image Super-Resolution
- pixel-shuffle : Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
- datsr : Reference-based Image Super-Resolution with Deformable Attention Transformer
- ais2024challenge-survey : Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey
- image-restoration
- aioir-survey : A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends
- ram : Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration
- deblurring
- image-deblurring-survey : Deep Image Deblurring: A Survey
- adarevd : AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring
- 3dgs
- 3dgs : 3D Gaussian Splatting for Real-Time Radiance Field Rendering
- srgs: SRGS: Super-Resolution 3D Gaussian Splatting
- gaussiansr : GaussianSR: 3D Gaussian Super-Resolution with 2D
- supergaussian : SuperGaussian: Repurposing Video Models for 3D Super Resolution Diffusion Priors
- supergs : SuperGS: Super-Resolution 3D Gaussian Splatting via Latent Feature Field and Gradient-guided Splitting
- e-3dgs : Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting
- deblurring-3dgs : Deblurring 3D Gaussian Splatting
- nerf
- nerf : NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
- nerf-sr : NeRF-SR: High Quality Neural Radiance Fields using Supersampling
- mip-nerf : Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields
- crop : Cross-Guided Optimization of Radiance Fields with Multi-View Image Super-Resolution for High-Resolution Novel View Synthesis
- video
- adatad : End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
- iaw : Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
- vision-and-language
- clip : CLIP(Learning Transferable Visual Models From Natural Language Supervision)
- lit : LiT : Zero-Shot Transfer with Locked-image text Tuning
- blip : BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- blip2 : BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
- siglip : Sigmoid Loss for Language Image Pre-Training
- Flamingo: a Visual Language Model for Few-Shot Learning
- video-llm-survey : Video Understanding with Large Language Models: A Survey(途中)
- llava : Visual Instruction Tuning
- llava-next-video : blog
- llava-next-stronger : blog
- llava-video : VIDEO INSTRUCTION TUNING WITH SYNTHETIC DATA
- long-vlm : LongVLM: Efficient Long Video Understanding via Large Language Models(ECCV2024)
- tcr : Text-Conditioned Resampler For Long Form Video Understanding(ECCV2024)
- nlp
- keyword : LLMの用語集
- transformer : Transformer(Attention is all you need)
- lora : LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS
- auxiliary-loss-free : auxiliary-loss-free load balancing strategy for mixture-of-experts
- deepseek-v3 : DeepSeek-V3 Technical Report