Skip to content

wcphkust/LLM-PLSE-paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 

Repository files navigation

LLM-PLSE

A. Code Model

A.1 Benchmark, Empirical Study, and Survey

  • Security of Language Models for Code: A Systematic Literature Review. (arXiv 2024) [Link]

  • LLMs: Understanding Code Syntax and Semantics for Code Analysis. (arXiv 2024) [Link]

  • CodeMind: A Framework to Challenge Large Language Models for Code Reasoning. (arXiv 2024) [Link]

  • Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code? (ICSE 2024) [Link]

  • Grounded Copilot: How Programmers Interact with Code-Generating Models. (OOPSLA 2023) [Link]

A.2 Source Code Model

  • SemCoder: Training Code Language Models with Comprehensive Semantics. (NeurIPS 2024) [Link]

  • CodeFort: Robust Training for Code Generation Models. (EMNLP 2024) [Link]

  • Constrained Decoding for Secure Code Generation. (DeepMind 2024) [Link]

  • Instruction Tuning for Secure Code Generation. (ICML 2024) [Link]

  • Large Language Models for Code: Security Hardening and Adversarial Testing. (CCS 2023) [Link]

  • GraphCodeBert: Pre-training Code Representations with Data Flow. (ICLR 2021) [Link]

  • CodeBERT: A Pre-Trained Model for Programming and Natural Languages. (EMNLP 2020) [Link]

  • Neural Code Comprehension: A Learnable Representation of Code Semantics. (NeurIPS 2018) [Link]

A.3 IR Code Model

  • Meta Large Language Model Compiler: Foundation Models of Compiler Optimization. (Meta 2024) [Link]

  • Symmetry-Preserving Program Representations for Learning Code Semantics. (ICML 2024) [Link]

  • FAIR: Flow Type-Aware Pre-Training of Compiler Intermediate Representations. (ICSE 2024) [Link]

  • How could Neural Networks understand Programs? (ICML 2021) [Link]

  • ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations (ICML 2021) [Link]

A.4 Binary Code Model

  • ReSym: Harnessing LLMs to Recover Variable and Data Structure Symbols from Stripped Binaries. (CCS 2024) [Link]

  • Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases. (NeurIPS 2024) [Link]

  • CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking. (FSE 2024) [Link]

  • LmPa: Improving Decompilation by Synergy of Large Language Model and Program Analysis. (arXiv 2023) [Link]

  • jTrans: jump-aware transformer for binary code similarity detection. (ISSTA 2022) [Link]

B. Code Generation

B.1 Benchmark, Empirical Study, and Survey

  • SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (ICLR 2024) [Link]

  • EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories. (arXiv 2024) [Link]

  • CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks. (arXiv 2024) [Link]

  • A Survey on Large Language Models for Code Generation. (arXiv 2024) [Link]

B.2 Program Repair

  • Large Language Models for Test-Free Fault Localization. (ICSE 2024) [Link]

  • AutoCodeRover: Autonomous Program Improvement. (ISSTA 2024) [Link]

  • PyDex: Repairing Bugs in Introductory Python Assignments using LLMs. (OOPSLA 2024) [Link]

  • Is Self-Repair a Silver Bullet for Code Generation? (ICLR 2024) [Link]

  • RepairAgent: An Autonomous, LLM-Based Agent for Program Repair. (arXiv 2024) [Link]

B.3 Program Synthesis

  • ChatDev: Mastering the Virtual Social Realm, Shaping the Future of Intelligent Interactions. (ACL 2024) [Link]

  • Natural Language Commanding via Program Synthesis. (Microsoft 2024) [Link]

  • Effective Large Language Model Debugging with Best-first Tree Search. (NVDIA 2024) [Link]

  • Automatic Programming: Large Language Models and Beyond. (arXiv 2024) [Link]

  • Towards AI-Assisted Synthesis of Verified Dafny Methods. (FSE 2024) [Link]

  • Verified Multi-Step Synthesis using Large Language Models and Monte Carlo Tree Search. (NeurIPS 2024) [Link]

  • Hypothesis Search: Inductive Reasoning with Language Models. (ICLR 2024) [Link]

  • Guess & Sketch: Language Model Guided Transpilation. (ICLR 2024) [Link]

  • AutoGen: A programming framework for agentic AI (Microsoft 2023) [Link]

  • Data Extraction via Semantic Regular Expression Synthesis. (OOPSLA 2023) [Link]

  • Optimal Neural Program Synthesis from Multimodal Specifications. (EMNLP 2021) [Link]

  • Web Question Answering with Neurosymbolic Program Synthesis. (PLDI 2021) [Link]

B.4 Program Transformation

  • Exploring and Unleashing the Power of Large Language Models in Automated Code Translation. (FSE 2024) [Link]

  • Rectifier: Code Translation with Corrector via LLMs. (arXiv 2024) [Link]

  • Learning Performance-Improving Code Edits. (ICLR 2024) [Link]

  • Enabling Memory Safety of C Programs using LLMs. (arXiv 2024) [Link]

  • Refactoring Programs Using Large Language Models with Few-Shot Examples. (arXiv 2023) [Link]

B.5 Code Completion

  • Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search. (NeurIPS 2024) [Link]

  • CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules. (ICLR 2024) [Link]

  • LongCoder: A Long-Range Pre-trained Language Model for Code Completion. (ICML 2023) [Link]

  • CodePlan: Repository-level Coding using LLMs and Planning. (NeurIPS 2023) [Link]

  • Repository-Level Prompt Generation for Large Language Models of Code. (ICML 2023) [Link]

C. Static Analysis

C.1 Static Bug Detection

C.1.1 Benchmark, Empirical Study, and Survey

  • Vulnerability Detection with Code Language Models: How Far Are We? (ICSE 2025) [Link]

  • VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection. (arXiv 2024) [Link]

  • LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks. (S&P 2024) [Link]

  • A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection. (arXiv 2024) [Link]

  • Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs. (arXiv 2024) [Link]

  • Your Instructions Are Not Always Helpful: Assessing the Efficacy of Instruction Fine-tuning for Software Vulnerability Detection. (arXiv 2024) [Link]

  • LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs’ Vulnerability Reasoning. (arXiv 2024) [Link]

  • Detecting Misuse of Security APIs: A Systematic Review. (arXiv 2024) [Link]

  • Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection. (arXiv 2024) [Link]

  • How Far Have We Gone in Vulnerability Detection Using Large Language Models. (arXiv 2023) [Link]

  • DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection. (RAID 2023) [Link]

  • Large Language Models for Code Analysis: Do LLMs Really Do Their Job?. (Usenix Security 2023) [Link]

  • Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities. (arXiv 2023) [Link]

  • Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection. (arXiv 2023) [Link]

  • SkipAnalyzer: An Embodied Agent for Code Analysis with Large Language Models. (arXiv 2023) [Link]

C.1.2 General Bug Detection

  • LLM-based Resource-Oriented Intention Inference for Static Resource Detection. (ICSE 2025) [Link]

  • LLMDFA: Analyzing Dataflow in Code with Large Language Models. (NeurIPS 2024) [Link]

  • Sanitizing Large Language Models in Bug Detection with Data-Flow. (EMNLP 2024) [Link]

  • Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection. (ICSE 2024) [Link]

  • Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach. (OOPSLA 2024) [Link]

  • Interleaving Static Analysis and LLM Prompting. (SOAP 2024) [Link]

  • LLM-Assisted Static Analysis for Detecting Security Vulnerabilities. (arXiv 2024) [Link]

  • Beware of the Unexpected: Bimodal Taint Analysis. (ISSTA 2023) [Link]

  • E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification. (Microsoft 2023) [Link]

  • Harnessing the Power of LLM to Support Binary Taint Analysis. (arXiv 2023) [Link]

C.1.3 Domain-Specific Bug Detection

  • Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications. (ICSE 2025) [Link]

  • GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis. (ICSE 2024) [Link]

  • An Investigation into Misuse of Java Security APIs by Large Language Models. (ASIACCS 2024) [Link]

  • SMARTINV: Multimodal Learning for Smart Contract Invariant Inference. (S&P 2024) [Link]

  • Do you still need a manual smart contract audit? (arXiv 2023) [Link]

  • Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives. (arXiv 2023) [Link]

  • Continuous Learning for Android Malware Detection. (USENIX Security 2023) [Link]

C.2 Program Verification

C.2.1 Invariant Generation

  • LLM Meets Bounded Model Checking: Neuro-symbolic Loop Invariant Inference. (ASE 2024) [Link]

  • LLM-Generated Invariants for Bounded Model Checking Without Loop Unrolling. (ASE 2024) [Link]

  • Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification. (CAV 2024) [Link]

  • Lemur: Integrating Large Language Models in Automated Program Verification. (ICLR 2024) [Link]

  • Can ChatGPT support software verification? (FASE 2024) [Link]

  • Can Large Language Models Reason about Program Invariants? (ICML 2023) [Link]

  • Ranking LLM-Generated Loop Invariants for Program Verification. (EMNLP 2023) [Link]

  • Finding Inductive Loop Invariants using Large Language Models. (arXiv 2023) [Link]

C.2.2 Specification Inference

  • Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions? (FSE 2024) [Link]

  • Zero and Few-shot Semantic Parsing with Ambiguous Inputs. (ICLR 2024) [Link]

  • SpecGen: Automated Generation of Formal Program Specifications via Large Language Models. (arXiv 2024) [Link]

  • SpecEval: Evaluating Code Comprehension in Large Language Models via Program Specifications. (arXiv 2024) [Link]

  • Impact of Large Language Models on Generating Software Specifications. (arXiv 2023) [Link]

C.3 Fundamental Static Analysis

  • A Learning-Based Approach to Static Program Slicing. (OOPSLA 2024) [Link]

  • Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis Tasks. (OOPSLA 2024) [Link]

  • Using an LLM to Help With Code Understanding. (ICSE 2024) [Link]

  • Program Slicing in the Era of Large Language Models. (arXiv 2024) [Link]

D. Dynamic Analysis

D.1 Debugging

  • Teaching Large Language Models to Self-Debug. (ICLR 2024) [Link]

  • LPR: Large Language Models-Aided Program Reduction. (ISSTA 2024) [Link]

D.2 Fuzzing and Mutation Testing

  • When Fuzzing Meets LLMs: Challenges and Opportunities. (FSE 2024) [Link]

  • Towards Understanding the Effectiveness of Large Langauge Models on Directed Test Input Generation. (ASE 2024) [Link]

  • Prompt Fuzzing for Fuzz Driver Generation. (CCS 2024) [Link]

  • Sedar: Obtaining High-Quality Seeds for DBMS Fuzzing via Cross-DBMS SQL Transfer. (ICSE 2024) [Link]

  • Large Language Model guided Protocol Fuzzing. (NDSS 2024) [Link]

  • LLM4FUZZ: Guided Fuzzing of Smart Contracts with Large Language Models. (arXiv 2024) [Link]

  • LLMorpheus: Mutation Testing using Large Language Models. (arXiv 2024) [Link]

  • Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models. (ISSTA 2023) [Link]

D.3 Unit Test Generation

  • An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation. (TSE 2024) [Link]

  • Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. (ASE 2023) [Link]

D.4 Execution Prediction

  • Predictive Program Slicing via Execution Knowledge-Guided Dynamic Dependence Learning. (FSE 2024) [Link]

D.5 PoC Generation

  • From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code. (Google 2024/10) [Link]

  • Evaluating Offensive Security Capabilities of Large Language Models. (Google 2024/06) [Link]

  • Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models. (arXiv 2024) [Link]

  • Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag. (NeurIPS 2023) [Link]

E. Hallucinations in Reasoning Tasks

E.1 Study of Hallucinations

  • GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models. (Apple 2024) [Link]

  • Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models. (OOPSLA 2024) [Link]

  • Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination? (NAACL 2024) [Link]

  • Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation. (ICLR 2024) [Link]

  • A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. (arXiv 2023) [Link]

E.2 General Prompting Strategy (in Reasoning Tasks)

  • Chain of Code: Reasoning with a Language Model-Augmented Code Emulator. (Google 2024) [Link]

  • When Do Program-of-Thought Works for Reasoning? (AAAI 2024) [Link]

  • Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting. (EMNLP 2023) [Link]

  • Complementary Explanations for Effective In-Context Learning. (ACL 2023) [Link]

  • Self-Evaluation Guided Beam Search for Reasoning. (NeurIPS 2023) [Link]

  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models. (NeurIPS 2023) [Link]

  • ReAct: Synergizing Reasoning and Acting in Language Models. (ICLR 2023) [Link]

  • Reflexion: Language Agents with Verbal Reinforcement Learning. (NeurIPS 2023) [Link]

  • SATLM: Satisfiability-Aided Language Models Using Declarative Prompting. (NeurIPS 2023) [Link]

  • Cumulative Reasoning With Large Language Models. (arXiv 2023) [Link]

  • Self-consistency improves chain of thought reasoning in language models. (NeurIPS 2022) [Link]

  • Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. (arXiv 2021) [Link]

E.3 Prompting Strategy in Code Reasoning Tasks

  • Steering Large Language Models between Code Execution and Textual Reasoning. (Microsoft 2024), [Link]

  • Don’t Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs. (Meta 2024) [Link]

  • LeanDojo: Theorem Proving with Retrieval-Augmented Language Models. (NeurIPS 2023) [Link]

F. Other Surveys, Models, and Frameworks

F.1 Surveys of Agent

  • Large Language Model-Based Agents for Software Engineering: A Survey. (arXiv 2024) [Link]

  • Large Language Models for Software Engineering: A Systematic Literature Review. (arXiv 2024) [Link]

  • Awesome things about LLM-powered agents: Papers, Repos, and Blogs. (arXiv 2024) [Link]

  • Comprehensive Outline of Large Language Model-based Multi-Agent Research. (None 2024) [Link]

  • If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents. (arXiv 2023) [Link]

  • Cognitive Architectures for Language Agents. (arXiv 2023) [Link]

  • The Rise and Potential of Large Language Model Based Agents: A Survey. (arXiv 2023) [Link]

F.2 Models and Frameworks

  • LMFLow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. (ACL 2024) [Link]

  • codellama: Inference code for CodeLlama models. (Meta 2023) [Link]

  • CodeFuse: LLM for Code from Ant Group. (Ant 2023) [Link]

  • Owl-LM: Large Language Model for Blockchain. (Sec3 2023) [Link]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published