-
Security of Language Models for Code: A Systematic Literature Review. (arXiv 2024) [Link]
-
LLMs: Understanding Code Syntax and Semantics for Code Analysis. (arXiv 2024) [Link]
-
CodeMind: A Framework to Challenge Large Language Models for Code Reasoning. (arXiv 2024) [Link]
-
Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code? (ICSE 2024) [Link]
-
Grounded Copilot: How Programmers Interact with Code-Generating Models. (OOPSLA 2023) [Link]
-
SemCoder: Training Code Language Models with Comprehensive Semantics. (NeurIPS 2024) [Link]
-
CodeFort: Robust Training for Code Generation Models. (EMNLP 2024) [Link]
-
Constrained Decoding for Secure Code Generation. (DeepMind 2024) [Link]
-
Instruction Tuning for Secure Code Generation. (ICML 2024) [Link]
-
Large Language Models for Code: Security Hardening and Adversarial Testing. (CCS 2023) [Link]
-
GraphCodeBert: Pre-training Code Representations with Data Flow. (ICLR 2021) [Link]
-
CodeBERT: A Pre-Trained Model for Programming and Natural Languages. (EMNLP 2020) [Link]
-
Neural Code Comprehension: A Learnable Representation of Code Semantics. (NeurIPS 2018) [Link]
-
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization. (Meta 2024) [Link]
-
Symmetry-Preserving Program Representations for Learning Code Semantics. (ICML 2024) [Link]
-
FAIR: Flow Type-Aware Pre-Training of Compiler Intermediate Representations. (ICSE 2024) [Link]
-
How could Neural Networks understand Programs? (ICML 2021) [Link]
-
ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations (ICML 2021) [Link]
-
ReSym: Harnessing LLMs to Recover Variable and Data Structure Symbols from Stripped Binaries. (CCS 2024) [Link]
-
Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases. (NeurIPS 2024) [Link]
-
CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking. (FSE 2024) [Link]
-
LmPa: Improving Decompilation by Synergy of Large Language Model and Program Analysis. (arXiv 2023) [Link]
-
jTrans: jump-aware transformer for binary code similarity detection. (ISSTA 2022) [Link]
-
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (ICLR 2024) [Link]
-
EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories. (arXiv 2024) [Link]
-
CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks. (arXiv 2024) [Link]
-
A Survey on Large Language Models for Code Generation. (arXiv 2024) [Link]
-
Large Language Models for Test-Free Fault Localization. (ICSE 2024) [Link]
-
AutoCodeRover: Autonomous Program Improvement. (ISSTA 2024) [Link]
-
PyDex: Repairing Bugs in Introductory Python Assignments using LLMs. (OOPSLA 2024) [Link]
-
Is Self-Repair a Silver Bullet for Code Generation? (ICLR 2024) [Link]
-
RepairAgent: An Autonomous, LLM-Based Agent for Program Repair. (arXiv 2024) [Link]
-
ChatDev: Mastering the Virtual Social Realm, Shaping the Future of Intelligent Interactions. (ACL 2024) [Link]
-
Natural Language Commanding via Program Synthesis. (Microsoft 2024) [Link]
-
Effective Large Language Model Debugging with Best-first Tree Search. (NVDIA 2024) [Link]
-
Automatic Programming: Large Language Models and Beyond. (arXiv 2024) [Link]
-
Towards AI-Assisted Synthesis of Verified Dafny Methods. (FSE 2024) [Link]
-
Verified Multi-Step Synthesis using Large Language Models and Monte Carlo Tree Search. (NeurIPS 2024) [Link]
-
Hypothesis Search: Inductive Reasoning with Language Models. (ICLR 2024) [Link]
-
Guess & Sketch: Language Model Guided Transpilation. (ICLR 2024) [Link]
-
AutoGen: A programming framework for agentic AI (Microsoft 2023) [Link]
-
Data Extraction via Semantic Regular Expression Synthesis. (OOPSLA 2023) [Link]
-
Optimal Neural Program Synthesis from Multimodal Specifications. (EMNLP 2021) [Link]
-
Web Question Answering with Neurosymbolic Program Synthesis. (PLDI 2021) [Link]
-
Exploring and Unleashing the Power of Large Language Models in Automated Code Translation. (FSE 2024) [Link]
-
Rectifier: Code Translation with Corrector via LLMs. (arXiv 2024) [Link]
-
Learning Performance-Improving Code Edits. (ICLR 2024) [Link]
-
Enabling Memory Safety of C Programs using LLMs. (arXiv 2024) [Link]
-
Refactoring Programs Using Large Language Models with Few-Shot Examples. (arXiv 2023) [Link]
-
Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search. (NeurIPS 2024) [Link]
-
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules. (ICLR 2024) [Link]
-
LongCoder: A Long-Range Pre-trained Language Model for Code Completion. (ICML 2023) [Link]
-
CodePlan: Repository-level Coding using LLMs and Planning. (NeurIPS 2023) [Link]
-
Repository-Level Prompt Generation for Large Language Models of Code. (ICML 2023) [Link]
-
Vulnerability Detection with Code Language Models: How Far Are We? (ICSE 2025) [Link]
-
VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection. (arXiv 2024) [Link]
-
LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks. (S&P 2024) [Link]
-
A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection. (arXiv 2024) [Link]
-
Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs. (arXiv 2024) [Link]
-
Your Instructions Are Not Always Helpful: Assessing the Efficacy of Instruction Fine-tuning for Software Vulnerability Detection. (arXiv 2024) [Link]
-
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs’ Vulnerability Reasoning. (arXiv 2024) [Link]
-
Detecting Misuse of Security APIs: A Systematic Review. (arXiv 2024) [Link]
-
Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection. (arXiv 2024) [Link]
-
How Far Have We Gone in Vulnerability Detection Using Large Language Models. (arXiv 2023) [Link]
-
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection. (RAID 2023) [Link]
-
Large Language Models for Code Analysis: Do LLMs Really Do Their Job?. (Usenix Security 2023) [Link]
-
Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities. (arXiv 2023) [Link]
-
Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection. (arXiv 2023) [Link]
-
SkipAnalyzer: An Embodied Agent for Code Analysis with Large Language Models. (arXiv 2023) [Link]
-
LLM-based Resource-Oriented Intention Inference for Static Resource Detection. (ICSE 2025) [Link]
-
LLMDFA: Analyzing Dataflow in Code with Large Language Models. (NeurIPS 2024) [Link]
-
Sanitizing Large Language Models in Bug Detection with Data-Flow. (EMNLP 2024) [Link]
-
Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection. (ICSE 2024) [Link]
-
Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach. (OOPSLA 2024) [Link]
-
Interleaving Static Analysis and LLM Prompting. (SOAP 2024) [Link]
-
LLM-Assisted Static Analysis for Detecting Security Vulnerabilities. (arXiv 2024) [Link]
-
Beware of the Unexpected: Bimodal Taint Analysis. (ISSTA 2023) [Link]
-
E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification. (Microsoft 2023) [Link]
-
Harnessing the Power of LLM to Support Binary Taint Analysis. (arXiv 2023) [Link]
-
Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications. (ICSE 2025) [Link]
-
GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis. (ICSE 2024) [Link]
-
An Investigation into Misuse of Java Security APIs by Large Language Models. (ASIACCS 2024) [Link]
-
SMARTINV: Multimodal Learning for Smart Contract Invariant Inference. (S&P 2024) [Link]
-
Do you still need a manual smart contract audit? (arXiv 2023) [Link]
-
Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives. (arXiv 2023) [Link]
-
Continuous Learning for Android Malware Detection. (USENIX Security 2023) [Link]
-
LLM Meets Bounded Model Checking: Neuro-symbolic Loop Invariant Inference. (ASE 2024) [Link]
-
LLM-Generated Invariants for Bounded Model Checking Without Loop Unrolling. (ASE 2024) [Link]
-
Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification. (CAV 2024) [Link]
-
Lemur: Integrating Large Language Models in Automated Program Verification. (ICLR 2024) [Link]
-
Can ChatGPT support software verification? (FASE 2024) [Link]
-
Can Large Language Models Reason about Program Invariants? (ICML 2023) [Link]
-
Ranking LLM-Generated Loop Invariants for Program Verification. (EMNLP 2023) [Link]
-
Finding Inductive Loop Invariants using Large Language Models. (arXiv 2023) [Link]
-
Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions? (FSE 2024) [Link]
-
Zero and Few-shot Semantic Parsing with Ambiguous Inputs. (ICLR 2024) [Link]
-
SpecGen: Automated Generation of Formal Program Specifications via Large Language Models. (arXiv 2024) [Link]
-
SpecEval: Evaluating Code Comprehension in Large Language Models via Program Specifications. (arXiv 2024) [Link]
-
Impact of Large Language Models on Generating Software Specifications. (arXiv 2023) [Link]
-
A Learning-Based Approach to Static Program Slicing. (OOPSLA 2024) [Link]
-
Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis Tasks. (OOPSLA 2024) [Link]
-
Using an LLM to Help With Code Understanding. (ICSE 2024) [Link]
-
Program Slicing in the Era of Large Language Models. (arXiv 2024) [Link]
-
Teaching Large Language Models to Self-Debug. (ICLR 2024) [Link]
-
LPR: Large Language Models-Aided Program Reduction. (ISSTA 2024) [Link]
-
When Fuzzing Meets LLMs: Challenges and Opportunities. (FSE 2024) [Link]
-
Towards Understanding the Effectiveness of Large Langauge Models on Directed Test Input Generation. (ASE 2024) [Link]
-
Prompt Fuzzing for Fuzz Driver Generation. (CCS 2024) [Link]
-
Sedar: Obtaining High-Quality Seeds for DBMS Fuzzing via Cross-DBMS SQL Transfer. (ICSE 2024) [Link]
-
Large Language Model guided Protocol Fuzzing. (NDSS 2024) [Link]
-
LLM4FUZZ: Guided Fuzzing of Smart Contracts with Large Language Models. (arXiv 2024) [Link]
-
LLMorpheus: Mutation Testing using Large Language Models. (arXiv 2024) [Link]
-
Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models. (ISSTA 2023) [Link]
-
An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation. (TSE 2024) [Link]
-
Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. (ASE 2023) [Link]
- Predictive Program Slicing via Execution Knowledge-Guided Dynamic Dependence Learning. (FSE 2024) [Link]
-
From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code. (Google 2024/10) [Link]
-
Evaluating Offensive Security Capabilities of Large Language Models. (Google 2024/06) [Link]
-
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models. (arXiv 2024) [Link]
-
Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag. (NeurIPS 2023) [Link]
-
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models. (Apple 2024) [Link]
-
Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models. (OOPSLA 2024) [Link]
-
Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination? (NAACL 2024) [Link]
-
Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation. (ICLR 2024) [Link]
-
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. (arXiv 2023) [Link]
-
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator. (Google 2024) [Link]
-
When Do Program-of-Thought Works for Reasoning? (AAAI 2024) [Link]
-
Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting. (EMNLP 2023) [Link]
-
Complementary Explanations for Effective In-Context Learning. (ACL 2023) [Link]
-
Self-Evaluation Guided Beam Search for Reasoning. (NeurIPS 2023) [Link]
-
Tree of Thoughts: Deliberate Problem Solving with Large Language Models. (NeurIPS 2023) [Link]
-
ReAct: Synergizing Reasoning and Acting in Language Models. (ICLR 2023) [Link]
-
Reflexion: Language Agents with Verbal Reinforcement Learning. (NeurIPS 2023) [Link]
-
SATLM: Satisfiability-Aided Language Models Using Declarative Prompting. (NeurIPS 2023) [Link]
-
Cumulative Reasoning With Large Language Models. (arXiv 2023) [Link]
-
Self-consistency improves chain of thought reasoning in language models. (NeurIPS 2022) [Link]
-
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. (arXiv 2021) [Link]
-
Steering Large Language Models between Code Execution and Textual Reasoning. (Microsoft 2024), [Link]
-
Don’t Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs. (Meta 2024) [Link]
-
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models. (NeurIPS 2023) [Link]
-
Large Language Model-Based Agents for Software Engineering: A Survey. (arXiv 2024) [Link]
-
Large Language Models for Software Engineering: A Systematic Literature Review. (arXiv 2024) [Link]
-
Awesome things about LLM-powered agents: Papers, Repos, and Blogs. (arXiv 2024) [Link]
-
Comprehensive Outline of Large Language Model-based Multi-Agent Research. (None 2024) [Link]
-
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents. (arXiv 2023) [Link]
-
Cognitive Architectures for Language Agents. (arXiv 2023) [Link]
-
The Rise and Potential of Large Language Model Based Agents: A Survey. (arXiv 2023) [Link]
-
LMFLow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. (ACL 2024) [Link]
-
codellama: Inference code for CodeLlama models. (Meta 2023) [Link]
-
CodeFuse: LLM for Code from Ant Group. (Ant 2023) [Link]
-
Owl-LM: Large Language Model for Blockchain. (Sec3 2023) [Link]