Automated Semi-Polymorphic String Obfuscator
This project and all associated materials are provided strictly for authorized red teaming and educational and research purposes only.
This project declares that it is NOT intended to hinder malware analysis or disrupt DFIR.
XStringer is an automated semi-polymorphic string obfuscator designed to automatically convert plain strings into a semi-polymorphic obfuscated strings, ultimately to protect string data within a binary from being analyzed through reverse engineering software (e.g. IDA Free, Ghidra, Binary Ninja).
The ultimate goal of this project is to develop an automated semi-polymorphic obfuscator to protect sensitive string data within a binary/script and examine the effectiveness of the obfuscation specifically on the automatic string recovery on commerical decompilers.
The scope of this project covers source codes written in:
- C
- C++
The outcome of this project will be tested on the binaries for the following operating systems:
- Ubuntu Linux
| Environment | Information |
|---|---|
| Operating System | Ubuntu 24.04.4 LTS |
| Architecture | Intel x64 |
- Ubuntu 24.04 LTS (Acer Laptop)
- Ubuntu Compiler
- Ubuntu clang version 18.1.3 (1ubuntu1)
clang -O2 -s -fvisibility=hidden [C_CODE] -o [OUTPUT]
- Ubuntu clang version 18.1.3 (1ubuntu1)
usage: XStringer [-h] [-l {c,cplusplus,powershell,bash}] [-r REPEAT] [-b BLOCK_SIZE] [-o OUTPUT] [-n NAME] string
Automated Semi-Polymorphic String Obfuscator
positional arguments:
string The string to be encoded
options:
-h, --help show this help message and exit
-l {c,cplusplus,powershell,bash}, --lang {c,cplusplus,powershell,bash}
Programming language to generate the encoded string for (e.g., c, cplusplus, etc.)
-r REPEAT, --repeat REPEAT
Number of times to repeat the encoded string (default: 2)
-b BLOCK_SIZE, --block-size BLOCK_SIZE
Block size for encoding (default: 1)
-o OUTPUT, --output OUTPUT
Output file to save the generated code (optional)
-n NAME, --name NAME The name of the encoded string variable (optional, default: 'encoded_string')- Automatic String Recovery
- Automatic RE using Generative AI
- The compiled binary does not contain debug symbols
- The compiled binary is built with recommended optimizations (-O2)
For a quick validation, a simple printf sample written in C will be compiled with recommended optimization (-O2) and will be reverse engineered using Binary Ninja. Throughout this process, pseudocode readability and automatic string recovery result will be tested. Additionally, the assembly instructions will be provided to the following generative AI models to test:
- GPT 5.5
- Gemini 3
- Thinking
- Pro
- Grok Free Tier
- Deepseek
- DeepThink
The following prompt will be used to analyze the speed and accuracy of each AI models:
The x64 assemblies below is an obfuscated string that contains information. Your goal is to analyze this assembly instructions "statically" (meaning you must not use any emulations) and find the final output:
[DISASSEMBLY]
When the decompiler was set to Pseudo C mode, automatic string recovery (via decompiler) was failed and showed &str which is the obfuscated string variable.
In the case of GPT 5.5, the AI model took 55 seconds to think and 65.19 seconds to fully answer to the request. The recovery was successful.
In the case of Gemini 3, the AI model took 43 seconds to fully answer to the request. The recovery was failed.
In the case of Gemini 3 Thinking, the AI model timed out. The recovery was failed.
In the case of Gemini 3 Pro, the AI model took 99.99 seconds to fully answer to the request. The recovery was successful.
In the case of Grok Free Tier, the AI model took 14 seconds to think and 20.2 seconds to fully answer to the request. The recovery was failed.
In the case of Deepseek, the AI model took 72.34 seconds to fully answer to the request. The recovery was failed.
In the case of Deepseek, the AI model took 583.74 seconds to fully answer to the request. The recovery was successful.
GPT 5.5 and Deepseek DeepThink showed meaningful outcomes in reverse engineering polymorphic algorithms and decoding obfuscated strings on assembly level. In general, default models (often called fast) showed poor speed and accuracy on deobfuscation.
Note: Potentially, reverse engineering ability of Grok can be dependent on their (paid) license. As Gemini 3 models were not fully tested, it is possible that these models are potentially capable to effectively deobfuscate encoded strings.
- Debugging: Although it is dependent on the size of the binary, debugging can expose the encoded strings effortlessly.
- Automatic RE: As this research suggested, few generative AI models (GPT 5.5 and Deepseek DeepThink) can statically reverse engineer the polymorphic algorithm to deobfuscate the encoded strings.
- Source Code Complexity: As the algorithm was automatically generated, readability of the source code can significantly decrease.






