| Title | Venue | Year | Paper | Slide | Video | Github |
|---|---|---|---|---|---|---|
| FIN: Boosting binary code embedding by normalizing function inlinings | JSS | 2025 | link | link | ||
| REVDECODE: Enhancing Binary Function Matching with Context-Aware Graph Representations and Relevance Decoding | Usenix | 2025 | link | link | ||
| VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity | TOSEM | 2025 | link | webapp | link | |
| Cross-Inlining Binary Function Similarity Detection | ICSE | 2024 | Link | link | ||
| Improving ML-based Binary Function Similarity Detection by Assessing and Deprioritizing Control Flow Graph Features | Usenix | 2024 | link | |||
| BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching | ICSE | 2024 | link | |||
| Code is not Natural Language: Unlock the Power of Semantics-Oriented Graph Representation for Binary Code Similarity Detection | Usenix | 2024 | link | link | ||
| CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision | ISSTA | 2024 | link | link | ||
| CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection | ISSTA | 2024 | link | link | ||
| FASER: Binary Code Similarity Search through the use of Intermediate Representations | CAMLIS | 2023 | link | link | link | |
| kTrans: Knowledge-Aware Transformer for Binary Code Embedding | 2023 | link | link | |||
| Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis | ISSTA | 2023 | link | link | ||
| Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity Detection by Incorporating Domain Knowledge | TOSEM | 2023 | link | link | ||
| sem2vec: Semantics-aware Assembly Tracelet Embedding | TOSEM | 2023 | link | link | ||
| 1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis | TOSEM | 2023 | link | |||
| Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures | AsiaCCS | 2023 | Link | |||
| VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search | NDSS | 2023 | link | link | ||
| A Game-Based Framework to Compare Program Classifiers and Evaders | CGO | 2023 | link | link | link | link |
| BBDetector: A Precise and Scalable Third-Party Library Detection in Binary Executables with Fine-Grained Function-Level Features | MDPI | 2023 | link | |||
| A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features | CSUR | 2022 | link | |||
| Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning | ACSAC | 2022 | link | link | link | |
| Improving cross-platform binary analysis using representation learning via graph alignment | ISSTA | 2022 | link | link | link | |
| jTrans: Jump-Aware Transformer for Binary Code Similarity | ISSTA | 2022 | link | link | link | |
| COBRA-GCN: Contrastive Learning to Optimize Binary Representation Analysis with Graph Convolutional Networks | DIMVA | 2022 | link | |||
| A Large-Scale Empirical Analysis of the Vulnerabilities Introduced by Third-Party Components in IoT Firmware | ISSTA | 2022 | link | link | link | |
| How Machine Learning Is Solving the Binary Function Similarity Problem | Usenix | 2022 | link | link | link | |
| Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking | TSE | 2022 | link | link | ||
| Program Representations for Predictive Compilation: State of Affairs in the Early 20's | COLA | 2022 | link | link | link | |
| Improving binary diffing speed and accuracy using community detection and locality-sensitive hashing: an empirical study | JCVHT | 2022 | link | |||
| PalmTree: Learning an Assembly Language Model for Instruction Embedding | CCS | 2021 | link | link | link | |
| Binary code similarity detection | ASE | 2021 | link | |||
| Binary diffing as a network alignment problem via belief propagation | ASE | 2021 | link | |||
| Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection | IEEE DSN 2021 | 2021 | link | link | ||
| BinDeep: A deep learning approach to binary code similarity detection | ESWA | 2021 | link | |||
| EnBinDiff: Identifying Data-Only Patches for Binaries | TDSC | 2021 | link | |||
| BinDiffNN: Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences | TSE | 2021 | link | link | ||
| Codee: A Tensor Embedding Scheme for Binary Code Search | TSE | 2021 | link | link | ||
| Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned | TSE(revision) | 2021 | link | link | ||
| How could Neural Networks understand Programs? | ICML 2021 | 2021 | link | link | ||
| Multi-threshold token-based code clone detection | SANER 2021 | 2021 | link | |||
| FastSpec: Scalable Generation and Detection of Spectre Gadgets Using Neural Embeddings | IEEE Euro S&P 2021 | 2021 | link | link | link | |
| TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity | 2020 | link | link | |||
| Similarity of Binaries Across Optimization Levels and Obfuscation | ESORICS 2020 | 2020 | link | link | ||
| Open-source tools and benchmarks for code-clone detection: past, present, and future trends | 2020 | link | ||||
| Semantically Find Similar Binary Codes with Mixed Key Instruction Sequence | 2020 | |||||
| LibDX: A Cross-Platform and Accurate System to Detect Third-Party Libraries in Binary Code | 2020 | link | ||||
| Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree | SANER | 2020 | link | |||
| What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning | 2020 | link | ||||
| Clone Detection on Large Scala Codebases | 2020 | link | ||||
| CloneCompass: Visualizations for Code Clone Analysis | 2020 | link | ||||
| DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing | NDSS | 2020 | link | link | link | |
| VGraph: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets | EuroS&P | 2020 | link | |||
| Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection | AAAI | 2020 | link | |||
| Similarity Metric Method for Binary Basic Blocks of Cross-Instruction Set Architecture | NDSS | 2020 | link | link | ||
| Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis | NDSS Workshop on Binary Analysis Research (BAR) | 2019 | link | link | ||
| Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization | IEEE S&P | 2019 | link | link | link | |
| Semantic-Based Representation Binary Clone Detection for Cross-Architectures in the Internet of Things | MDPI | 2019 | link | |||
| A Survey of Binary Code Similarity | CSUR | 2019 | link | |||
| 代码克隆检测研究进展 | 软件学报 | 2019 | link | |||
| A Systematic Review on Code Clone Detection | 2019 | link | ||||
| A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis | NDSS | 2019 | link | link | ||
| Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs | NDSS | 2019 | link | link | link | model |
| SAFE: Self-Attentive Function Embeddings for Binary Similarity | 2019 | link | link | link | ||
| Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection | SANER | 2019 | link | |||
| 基于深度学习的跨平台二进制代码关联分析 | 2019 | link | ||||
| CVSkSA: cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph | 2019 | link | ||||
| Function matching between binary executables: efficient algorithms and features | JCVHT | 2019 | link | |||
| BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis | ICSME | 2018 | link | |||
| αDiff: Cross-Version Binary Code Similarity Detection with DNN | ASE | 2018 | link | dataset | ||
| Binary Similarity Detection Using Machine Learning | PLDI | 2018 | link | |||
| CCAligner: A Token Based Large-Gap Clone Detector | ICSE | 2018 | link | |||
| Oreo: Detection of Clones in the Twilight Zone | FSE | 2018 | link | |||
| VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-platform Binary | ASE | 2018 | link | link | ||
| VulSeeker-pro: enhanced semantic learning based binary vulnerability seeker with emulation | 2018 | link | ||||
| FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware | 2018 | link | ||||
| BINARM: Scalable and Efficient Detection of Vulnerabilities in Firmware Images of Intelligent Electronic Devices | 2018 | link | ||||
| A Resilient and Efficient System for Identifying FOSS Functions in Malware Binaries | 2018 | link | ||||
| Beyond Precision and Recall: Understanding Uses (and Misuses) of Similarity Hashes in Binary Analysis | 2018 | link | link | |||
| BCD: Decomposing Binary Code Into Components Using Graph-Based Clustering | ASIA CCS | 2018 | link | |||
| A Deep Learning Approach to Program Similarity | MASES | 2018 | link | |||
| Recurrent Neural Network for Code Clone Detection | SEIM | 2018 | link | |||
| The Adverse Effects of Code Duplication in Machine Learning Models of Code | 2018 | link | link | |||
| Benchmarks for software clone detection: A ten-year retrospective | SANER | 2018 | link | |||
| Binary Code Clone Detection across Architectures and Compiling Configurations | ICPC | 2017 | link | |||
| Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection | ACM CCS | 2017 | link | link | ||
| BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection | ASIA CCS | 2017 | link | |||
| BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape | DIMVA | 2017 | link | |||
| Compiler-agnostic function detection in binaries | IEEE EuroS&P | 2017 | link | link | ||
| BinSign: Fingerprinting binary functions to support automated analysis of code executables | 2017 | link | ||||
| Similarity of binaries through re-optimization | PLDI | 2017 | link | link | ||
| Transferring code-clone detection and analysis to practice | ICSE-SEIP | 2017 | link | |||
| Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping | IEEE S&P | 2017 | link | |||
| Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code | IJCAI | 2017 | link | |||
| Extracting Conditional Formulas for Cross-Platform Bug Search | ASIA CCS | 2017 | link | |||
| SPAIN: Security Patch Analysis for Binaries Towards Understanding the Pain and Pills | ICSE | 2017 | link | |||
| CCLearner: A Deep Learning-Based Clone Detection Approach | 2017 | link | link | |||
| BinSim: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking | USENIX | 2017 | link | link | link | |
| In-memory Fuzzing for Binary Code Similarity Analysis | ASE | 2017 | link | |||
| DéjàVu: a map of code duplicates on GitHub | OOPSLA | 2017 | link | |||
| Some from Here, Some from There: Cross-project Code Reuse in GitHub | MSR | 2017 | link | |||
| CVSSA: Cross-Architecture Vulnerability Search in Firmware Based on Support Vector Machine and Attributed Control Flow Graph | 2017 | link | ||||
| Identifying Functionally Similar Code in Complex Codebases | ICPC | 2016 | link | link | ||
| Scalable graph-based bug search for firmware images (Genius) | ASM CCS | 2016 | link | link | link | |
| Cross-Architecture Binary Semantics Understanding via Similar Code Comparison | IEEE SANER | 2016 | link | |||
| discovRE: Efficient cross-architecture identification of bugs in binary code | NDSS | 2016 | link | |||
| BinGo: Cross-architecture cross-OS Binary Search | FSE | 2016 | link | |||
| Kam1n0: Mapreduce-based assembly clone search for reverse engineering | KDD | 2016 | link | link | ||
| Statistical similarity of binaries | PLDI | 2016 | link | link | link | |
| Deep learning code fragments for code clone detection | ASE | 2016 | link | |||
| A Survey of Software Clone Detection Techniques | 2016 | link | ||||
| SourcererCC: Scaling Code Clone Detection to Big Code | ICSE | 2016 | link | |||
| Binary executable file similarity calculation using function matching | 2016 | link | ||||
| Matching Similar Functions in Different Versions of a Malware | 2016 | link | ||||
| BinDNN: Resilient Function Matching Using Deep Learning | 2016 | link | ||||
| VulPecker: An Automated Vulnerability Detection System Based on Code Similarity Analysis | ACSAC | 2016 | link | link | ||
| BigCloneEval: A Clone Detection Tool Evaluation Framework with BigCloneBench | 2016 | link | link | |||
| Cross-architecture bug search in binary executables | IEEE S&P | 2015 | link | |||
| Library functions identification in binary code by using graph isomorphism testings | 2015 | link | ||||
| Evaluating clone detection tools with BigCloneBench | 2015 | link | link | |||
| Memoized semantics-based binary diffing with application to malware lineage inference | 2015 | link | ||||
| Sigma: A semantic integrated graph matching approach for identifying reused functions in binary code | 2015 | link | link | |||
| BYTEWEIGHT: Learning to Recognize Functions in Binary Code | USENIX | 2014 | link | link | link | |
| Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection | FSE | 2014 | link | |||
| Binclone: Detecting code clones in malware | SERE | 2014 | link | link | ||
| Detecting fine-grained similarity in binaries | 2014 | link | ||||
| Leveraging semantic signatures for bug search in binary programs | ACSAC | 2014 | link | |||
| How Accurate Is Coarse-grained Clone Detection?: Comparision with Fine-grained Detectors | 2014 | link | ||||
| Tracelet-based code search in executables | PLDI | 2014 | link | |||
| Control Flow-Based Malware Variant Detection | 2014 | link | ||||
| Hashing for Similarity Search: A Survey | 2014 | link | ||||
| Achieving accuracy and scalability simultaneously in detecting application clones on android markets | ICSE | 2014 | link | |||
| Identifying Shared Software Components to Support Malware Forensics | 2014 | link | ||||
| Evaluating Modern Clone Detection Tools | 2014 | link | ||||
| Rendezvous: a search engine for binary code | MSR | 2013 | link | |||
| Binslayer: accurate comparison of binary executables | PPREW | 2013 | link | link | ||
| Software clone detection: A systematic review | 2013 | link | ||||
| How to extract differences from similar programs? A cohesion metric approach | 2013 | link | ||||
| Software clone detection and refactoring | 2013 | link | ||||
| An Emerging Approach towards Code Clone Detection: Metric Based Approach on Byte Code | 2013 | link | ||||
| A hybrid-token and textual based approach to find similar code segments | 2013 | link | ||||
| Gapped code clone detection with lightweight source code analysis | 2013 | link | ||||
| MutantX-S: Scalable Malware Clustering Based on Static Features | USENIX | 2013 | link | link | ||
| Binjuice: Fast Location of Similar Code Fragments Using Semantic Juice | PPREW | 2013 | link | |||
| Towards Automatic Software Lineage Inference | USENIX | 2013 | link | link | ||
| AnDarwin: Scalable Detection of Semantically Similar Android Applications | 2013 | link | ||||
| Expose: Discovering potential binary code re-use | 2013 | link | ||||
| Function Matching-based Binary level Software Similarity Calculation | RACS | 2013 | link | |||
| FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors | RAID | 2013 | link | |||
| A study of repetitiveness of code changes in software evolution | ASE | 2013 | link | |||
| ibinhunt: Binary hunting with interprocedural control flow | 2012 | link | link | |||
| ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions | USENIX | 2012 | link | |||
| Boreas: an accurate and scalable token-based approach to code clone detection | ASE | 2012 | link | |||
| Folding Repeated Instructions for Improving Token-Based Code Clone Detection | 2012 | link | ||||
| A metrics-based data mining approach for software clone detection | 2012 | link | ||||
| Comparison of Clone Detection Techniques | 2012 | |||||
| Malware Classification Method via Binary Content Comparison | RACS | 2012 | link | |||
| Binary function clustering using semantic hashes | ICMLA | 2012 | link | |||
| Value-based program characterization and its application to software plagiarism detection | 2011 | link | ||||
| CMCD: Count Matrix Based Code Clone Detection | 2011 | link | ||||
| Incremental code clone detection: A pdg-based approach | 2011 | link | ||||
| Anywhere, Any-Time Binary Instrumentation | 2011 | link | ||||
| Code reuse in open source software development: Quantitative evidence, drivers, and impediments | 2010 | |||||
| Index-based code clone detection: incremental, distributed, scalable | 2010 | |||||
| Detection of Type-1 and Type-2 Code Clones Using Textual Analysis and Metrics | 2010 | |||||
| Ghezzi, A hybrid approach (syntactic and textual) to clone detection | 2010 | |||||
| Evaluating code clone genealogies at release level: An empirical study | 2010 | |||||
| A survey of Binary similarity and distance measures | 2010 | |||||
| Idea: Opcode-Sequence-Based Malware Detection | 2010 | link | ||||
| Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces | USENIX | 2010 | ||||
| Data fingerprinting with similarity digests | 2010 | |||||
| Automatic mining of functionally equivalent code fragments via random testing | 2009 | |||||
| A mutation/injection-based automatic framework for evaluating code clone detection tools | 2009 | |||||
| Problematic code clones identification using multiple detection results | 2009 | |||||
| Incremental clone detection | 2009 | |||||
| Scalable and incremental clone detection for evolving software | 2009 | |||||
| Large-scale Malware Indexing Using Function-call Graphs | 2009 | |||||
| Scalable, Behavior-Based Malware Clustering | 2009 | |||||
| peHash: A Novel Approach to Fast Malware Clustering | USENIX | 2009 | ||||
| Detecting Code Clones in Binary Executables | 2009 | |||||
| Binhunt: Automatically finding semantic differences in binary programs | 2008 | link | ||||
| Scalable detection of semantic clones | 2008 | link | ||||
| Deckard: Scalable and accurate tree-based detection of code clones | 2007 | |||||
| Large-scale code reuse in open source software | 2007 | |||||
| A survey on software clone detection research | 2007 | link | ||||
| A study of consistent and inconsistent changes to code clones | 2007 | |||||
| Comparison and evaluation of clone detection tools | 2007 | |||||
| Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions | 2007 | |||||
| A Static Birthmark of Binary Executables Based on API Call Structure | 2007 | |||||
| CP-Miner: Finding copy-paste and related bugs in large-scale software code | 2006 | |||||
| Survey of research on software clones | 2006 | link | ||||
| "Cloning considered harmful" considered harmful: patterns of cloning in software | 2006 | link | ||||
| GPLAG: detection of software plagiarism by program dependence graph analysis | 2006 | |||||
| Detecting Self-mutating Malware Using Control-flow Graph Matching | 2006 | |||||
| Identifying Almost Identical Files Using Context Triggered Piecewise Hashing | 2006 | |||||
| Hamsa: Fast signature generation for zero-day polymorphic worms with provable attack resilience | IEEE S&P | 2006 | ||||
| Graph-based comparison of executable objects | 2005 | |||||
| SDD: high performance code clone detection system for large scale source code | 2005 | link | ||||
| Polygraph: Automatically generating signatures for polymorphic worms | 2005 | |||||
| K-gram Based Software Birthmarks | 2005 | |||||
| Insights into System-Wide Code Duplication | IEEE | 2004 | link | |||
| Clone detection in source code by frequent itemset techniques | 2004 | |||||
| Evaluating clone detection techniques from a refactoring perspective | 2004 | |||||
| Structural comparison of executable objects | 2004 | |||||
| Code compaction of matching single-entry multiple-exit regions | 2003 | link | ||||
| CloSpan: Mining: Closed sequential patterns in large datasets | 2003 | |||||
| Ccfinder: a multilinguistic token-based code clone detection system for large scale source code | 2002 | |||||
| Identifying similar code with program dependence graphs | 2001 | |||||
| Using slicing to identify duplication in source code | 2001 | |||||
| BMAT – A Binary Matching Tool for Stale Profile Propagation | 2000 | |||||
| A language independent approach for detecting duplicated code | 1999 | |||||
| Compressing Differences of Executable Code | 1999 | |||||
| Similarity search in high dimensions via hashing | 1999 | |||||
| Clone detection using abstract syntax trees | 1998 | |||||
| Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics | 1996 | |||||
| Pattern matching for clone and concept detection | 1996 | |||||
| On finding duplication and near-duplication in large software systems | 1995 | link | ||||
| Detecting code similarity using patterns | 1995 | |||||
| A Cross-platform Binary Diff | 1995 |
forked from SystemSecurityStorm/Awesome-Binary-Similarity
-
Notifications
You must be signed in to change notification settings - Fork 0
yangtt57/Awesome-Binary-Similarity
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
An awesome & curated list of binary code similarity papers
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published