LLM-PLSE

A. Code Model

A.1 Benchmark, Empirical Study, and Survey

Security of Language Models for Code: A Systematic Literature Review. (arXiv 2024) [Link]
LLMs: Understanding Code Syntax and Semantics for Code Analysis. (arXiv 2024) [Link]
CodeMind: A Framework to Challenge Large Language Models for Code Reasoning. (arXiv 2024) [Link]
Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code? (ICSE 2024) [Link]
Grounded Copilot: How Programmers Interact with Code-Generating Models. (OOPSLA 2023) [Link]

A.2 Source Code Model

SemCoder: Training Code Language Models with Comprehensive Semantics. (NeurIPS 2024) [Link]
CodeFort: Robust Training for Code Generation Models. (EMNLP 2024) [Link]
Constrained Decoding for Secure Code Generation. (DeepMind 2024) [Link]
Instruction Tuning for Secure Code Generation. (ICML 2024) [Link]
Large Language Models for Code: Security Hardening and Adversarial Testing. (CCS 2023) [Link]
GraphCodeBert: Pre-training Code Representations with Data Flow. (ICLR 2021) [Link]
CodeBERT: A Pre-Trained Model for Programming and Natural Languages. (EMNLP 2020) [Link]
Neural Code Comprehension: A Learnable Representation of Code Semantics. (NeurIPS 2018) [Link]

A.3 IR Code Model

Meta Large Language Model Compiler: Foundation Models of Compiler Optimization. (Meta 2024) [Link]
Symmetry-Preserving Program Representations for Learning Code Semantics. (ICML 2024) [Link]
FAIR: Flow Type-Aware Pre-Training of Compiler Intermediate Representations. (ICSE 2024) [Link]
How could Neural Networks understand Programs? (ICML 2021) [Link]
ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations (ICML 2021) [Link]

A.4 Binary Code Model

ReSym: Harnessing LLMs to Recover Variable and Data Structure Symbols from Stripped Binaries. (CCS 2024) [Link]
Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases. (NeurIPS 2024) [Link]
CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking. (FSE 2024) [Link]
LmPa: Improving Decompilation by Synergy of Large Language Model and Program Analysis. (arXiv 2023) [Link]
jTrans: jump-aware transformer for binary code similarity detection. (ISSTA 2022) [Link]

B. Code Generation

B.1 Benchmark, Empirical Study, and Survey

SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (ICLR 2024) [Link]
EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories. (arXiv 2024) [Link]
CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks. (arXiv 2024) [Link]
A Survey on Large Language Models for Code Generation. (arXiv 2024) [Link]

B.2 Program Repair

Large Language Models for Test-Free Fault Localization. (ICSE 2024) [Link]
AutoCodeRover: Autonomous Program Improvement. (ISSTA 2024) [Link]
PyDex: Repairing Bugs in Introductory Python Assignments using LLMs. (OOPSLA 2024) [Link]
Is Self-Repair a Silver Bullet for Code Generation? (ICLR 2024) [Link]
RepairAgent: An Autonomous, LLM-Based Agent for Program Repair. (arXiv 2024) [Link]

B.3 Program Synthesis

ChatDev: Mastering the Virtual Social Realm, Shaping the Future of Intelligent Interactions. (ACL 2024) [Link]
Natural Language Commanding via Program Synthesis. (Microsoft 2024) [Link]
Effective Large Language Model Debugging with Best-first Tree Search. (NVDIA 2024) [Link]
Automatic Programming: Large Language Models and Beyond. (arXiv 2024) [Link]
Towards AI-Assisted Synthesis of Verified Dafny Methods. (FSE 2024) [Link]
Verified Multi-Step Synthesis using Large Language Models and Monte Carlo Tree Search. (NeurIPS 2024) [Link]
Hypothesis Search: Inductive Reasoning with Language Models. (ICLR 2024) [Link]
Guess & Sketch: Language Model Guided Transpilation. (ICLR 2024) [Link]
AutoGen: A programming framework for agentic AI (Microsoft 2023) [Link]
Data Extraction via Semantic Regular Expression Synthesis. (OOPSLA 2023) [Link]
Optimal Neural Program Synthesis from Multimodal Specifications. (EMNLP 2021) [Link]
Web Question Answering with Neurosymbolic Program Synthesis. (PLDI 2021) [Link]

B.4 Program Transformation

Exploring and Unleashing the Power of Large Language Models in Automated Code Translation. (FSE 2024) [Link]
Rectifier: Code Translation with Corrector via LLMs. (arXiv 2024) [Link]
Learning Performance-Improving Code Edits. (ICLR 2024) [Link]
Enabling Memory Safety of C Programs using LLMs. (arXiv 2024) [Link]
Refactoring Programs Using Large Language Models with Few-Shot Examples. (arXiv 2023) [Link]

B.5 Code Completion

Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search. (NeurIPS 2024) [Link]
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules. (ICLR 2024) [Link]
LongCoder: A Long-Range Pre-trained Language Model for Code Completion. (ICML 2023) [Link]
CodePlan: Repository-level Coding using LLMs and Planning. (NeurIPS 2023) [Link]
Repository-Level Prompt Generation for Large Language Models of Code. (ICML 2023) [Link]

C. Static Analysis

C.1 Static Bug Detection

C.1.1 Benchmark, Empirical Study, and Survey

Vulnerability Detection with Code Language Models: How Far Are We? (ICSE 2025) [Link]
VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection. (arXiv 2024) [Link]
LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks. (S&P 2024) [Link]
A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection. (arXiv 2024) [Link]
Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs. (arXiv 2024) [Link]
Your Instructions Are Not Always Helpful: Assessing the Efficacy of Instruction Fine-tuning for Software Vulnerability Detection. (arXiv 2024) [Link]
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs’ Vulnerability Reasoning. (arXiv 2024) [Link]
Detecting Misuse of Security APIs: A Systematic Review. (arXiv 2024) [Link]
Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection. (arXiv 2024) [Link]
How Far Have We Gone in Vulnerability Detection Using Large Language Models. (arXiv 2023) [Link]
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection. (RAID 2023) [Link]
Large Language Models for Code Analysis: Do LLMs Really Do Their Job?. (Usenix Security 2023) [Link]
Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities. (arXiv 2023) [Link]
Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection. (arXiv 2023) [Link]
SkipAnalyzer: An Embodied Agent for Code Analysis with Large Language Models. (arXiv 2023) [Link]

C.1.2 General Bug Detection

LLM-based Resource-Oriented Intention Inference for Static Resource Detection. (ICSE 2025) [Link]
LLMDFA: Analyzing Dataflow in Code with Large Language Models. (NeurIPS 2024) [Link]
Sanitizing Large Language Models in Bug Detection with Data-Flow. (EMNLP 2024) [Link]
Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection. (ICSE 2024) [Link]
Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach. (OOPSLA 2024) [Link]
Interleaving Static Analysis and LLM Prompting. (SOAP 2024) [Link]
LLM-Assisted Static Analysis for Detecting Security Vulnerabilities. (arXiv 2024) [Link]
Beware of the Unexpected: Bimodal Taint Analysis. (ISSTA 2023) [Link]
E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification. (Microsoft 2023) [Link]
Harnessing the Power of LLM to Support Binary Taint Analysis. (arXiv 2023) [Link]

C.1.3 Domain-Specific Bug Detection

Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications. (ICSE 2025) [Link]
GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis. (ICSE 2024) [Link]
An Investigation into Misuse of Java Security APIs by Large Language Models. (ASIACCS 2024) [Link]
SMARTINV: Multimodal Learning for Smart Contract Invariant Inference. (S&P 2024) [Link]
Do you still need a manual smart contract audit? (arXiv 2023) [Link]
Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives. (arXiv 2023) [Link]
Continuous Learning for Android Malware Detection. (USENIX Security 2023) [Link]

C.2 Program Verification

C.2.1 Invariant Generation

LLM Meets Bounded Model Checking: Neuro-symbolic Loop Invariant Inference. (ASE 2024) [Link]
LLM-Generated Invariants for Bounded Model Checking Without Loop Unrolling. (ASE 2024) [Link]
Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification. (CAV 2024) [Link]
Lemur: Integrating Large Language Models in Automated Program Verification. (ICLR 2024) [Link]
Can ChatGPT support software verification? (FASE 2024) [Link]
Can Large Language Models Reason about Program Invariants? (ICML 2023) [Link]
Ranking LLM-Generated Loop Invariants for Program Verification. (EMNLP 2023) [Link]
Finding Inductive Loop Invariants using Large Language Models. (arXiv 2023) [Link]

C.2.2 Specification Inference

Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions? (FSE 2024) [Link]
Zero and Few-shot Semantic Parsing with Ambiguous Inputs. (ICLR 2024) [Link]
SpecGen: Automated Generation of Formal Program Specifications via Large Language Models. (arXiv 2024) [Link]
SpecEval: Evaluating Code Comprehension in Large Language Models via Program Specifications. (arXiv 2024) [Link]
Impact of Large Language Models on Generating Software Specifications. (arXiv 2023) [Link]

C.3 Fundamental Static Analysis

A Learning-Based Approach to Static Program Slicing. (OOPSLA 2024) [Link]
Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis Tasks. (OOPSLA 2024) [Link]
Using an LLM to Help With Code Understanding. (ICSE 2024) [Link]
Program Slicing in the Era of Large Language Models. (arXiv 2024) [Link]

D. Dynamic Analysis

D.1 Debugging

Teaching Large Language Models to Self-Debug. (ICLR 2024) [Link]
LPR: Large Language Models-Aided Program Reduction. (ISSTA 2024) [Link]

D.2 Fuzzing and Mutation Testing

When Fuzzing Meets LLMs: Challenges and Opportunities. (FSE 2024) [Link]
Towards Understanding the Effectiveness of Large Langauge Models on Directed Test Input Generation. (ASE 2024) [Link]
Prompt Fuzzing for Fuzz Driver Generation. (CCS 2024) [Link]
Sedar: Obtaining High-Quality Seeds for DBMS Fuzzing via Cross-DBMS SQL Transfer. (ICSE 2024) [Link]
Large Language Model guided Protocol Fuzzing. (NDSS 2024) [Link]
LLM4FUZZ: Guided Fuzzing of Smart Contracts with Large Language Models. (arXiv 2024) [Link]
LLMorpheus: Mutation Testing using Large Language Models. (arXiv 2024) [Link]
Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models. (ISSTA 2023) [Link]

D.3 Unit Test Generation

An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation. (TSE 2024) [Link]
Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. (ASE 2023) [Link]

D.4 Execution Prediction

Predictive Program Slicing via Execution Knowledge-Guided Dynamic Dependence Learning. (FSE 2024) [Link]

D.5 PoC Generation

From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code. (Google 2024/10) [Link]
Evaluating Offensive Security Capabilities of Large Language Models. (Google 2024/06) [Link]
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models. (arXiv 2024) [Link]
Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag. (NeurIPS 2023) [Link]

E. Hallucinations in Reasoning Tasks

E.1 Study of Hallucinations

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models. (Apple 2024) [Link]
Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models. (OOPSLA 2024) [Link]
Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination? (NAACL 2024) [Link]
Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation. (ICLR 2024) [Link]
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. (arXiv 2023) [Link]

E.2 General Prompting Strategy (in Reasoning Tasks)

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator. (Google 2024) [Link]
When Do Program-of-Thought Works for Reasoning? (AAAI 2024) [Link]
Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting. (EMNLP 2023) [Link]
Complementary Explanations for Effective In-Context Learning. (ACL 2023) [Link]
Self-Evaluation Guided Beam Search for Reasoning. (NeurIPS 2023) [Link]
Tree of Thoughts: Deliberate Problem Solving with Large Language Models. (NeurIPS 2023) [Link]
ReAct: Synergizing Reasoning and Acting in Language Models. (ICLR 2023) [Link]
Reflexion: Language Agents with Verbal Reinforcement Learning. (NeurIPS 2023) [Link]
SATLM: Satisfiability-Aided Language Models Using Declarative Prompting. (NeurIPS 2023) [Link]
Cumulative Reasoning With Large Language Models. (arXiv 2023) [Link]
Self-consistency improves chain of thought reasoning in language models. (NeurIPS 2022) [Link]
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. (arXiv 2021) [Link]

E.3 Prompting Strategy in Code Reasoning Tasks

Steering Large Language Models between Code Execution and Textual Reasoning. (Microsoft 2024), [Link]
Don’t Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs. (Meta 2024) [Link]
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models. (NeurIPS 2023) [Link]

F. Other Surveys, Models, and Frameworks

F.1 Surveys of Agent

Large Language Model-Based Agents for Software Engineering: A Survey. (arXiv 2024) [Link]
Large Language Models for Software Engineering: A Systematic Literature Review. (arXiv 2024) [Link]
Awesome things about LLM-powered agents: Papers, Repos, and Blogs. (arXiv 2024) [Link]
Comprehensive Outline of Large Language Model-based Multi-Agent Research. (None 2024) [Link]
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents. (arXiv 2023) [Link]
Cognitive Architectures for Language Agents. (arXiv 2023) [Link]
The Rise and Potential of Large Language Model Based Agents: A Survey. (arXiv 2023) [Link]

F.2 Models and Frameworks

LMFLow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. (ACL 2024) [Link]
codellama: Inference code for CodeLlama models. (Meta 2023) [Link]
CodeFuse: LLM for Code from Ant Group. (Ant 2023) [Link]
Owl-LM: Large Language Model for Blockchain. (Sec3 2023) [Link]

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
README.md		README.md

wcphkust/LLM-PLSE-paper

Folders and files

Latest commit

History

Repository files navigation

LLM-PLSE

A. Code Model

A.1 Benchmark, Empirical Study, and Survey

A.2 Source Code Model

A.3 IR Code Model

A.4 Binary Code Model

B. Code Generation

B.1 Benchmark, Empirical Study, and Survey

B.2 Program Repair

B.3 Program Synthesis

B.4 Program Transformation

B.5 Code Completion

C. Static Analysis

C.1 Static Bug Detection

C.1.1 Benchmark, Empirical Study, and Survey

C.1.2 General Bug Detection

C.1.3 Domain-Specific Bug Detection

C.2 Program Verification

C.2.1 Invariant Generation

C.2.2 Specification Inference

C.3 Fundamental Static Analysis

D. Dynamic Analysis

D.1 Debugging

D.2 Fuzzing and Mutation Testing

D.3 Unit Test Generation

D.4 Execution Prediction

D.5 PoC Generation

E. Hallucinations in Reasoning Tasks

E.1 Study of Hallucinations

E.2 General Prompting Strategy (in Reasoning Tasks)

E.3 Prompting Strategy in Code Reasoning Tasks

F. Other Surveys, Models, and Frameworks

F.1 Surveys of Agent

F.2 Models and Frameworks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Packages