A powerful Python tool for decoding Base64-encoded content in XML files, with special support for UTF-8 Chinese characters, Unicode escape sequences, and string escape characters.
✅ Base64 Decoding: Automatically detects and decodes base64="true" XML elements
✅ UTF-8 Support: Properly handles Chinese and other multi-byte characters
✅ Unicode Escapes: Converts \uxxxx sequences to actual characters
✅ Escape Sequences: Expands \n, \t, \", etc. for better readability
✅ JSON Formatting: Auto-detects and pretty-prints JSON content
✅ HTTP Request Formatting: Formats HTTP requests with proper line breaks
✅ Plain Text View: Creates readable plain-text representation of JSON data
✅ Command-Line Interface: Easy to use with flexible options
✅ Secure XML Parsing: Uses the defusedxml library by default to mitigate XXE and entity expansion attacks. Falling back to the standard parser requires the --unsafe-xml flag
✅ Size Limits: Optional --max-xml-size and --max-b64-size parameters allow limiting XML file size and Base64 payload length to prevent resource exhaustion attacks
✅ Strict Base64 Validation: Accepts only valid Base64 characters and rejects malformed input to prevent data integrity issues
✅ Output Sanitization: Escapes control characters and ANSI escape sequences to protect your terminal and logs (disable with --raw-output)
- Python 3.10 or higher
- Requires the defusedxml library for secure XML parsing (
pip install defusedxml). The tool falls back to the standard library only when the--unsafe-xmlflag is used.
# Clone the repository
git clone https://github.com/yu-min/xml-base64-decoder.git
cd xml-base64-decoder
# Install dependencies
pip install defusedxml
# Make the script executable (optional)
chmod +x xml_decoder.py
# Run the decoder
python3 xml_decoder.py input.xml# Decode and display results in console
python3 xml_decoder.py input.xml
# Decode and save to file
python3 xml_decoder.py input.xml -o output.txt
# Quiet mode (only show summary)
python3 xml_decoder.py input.xml -o output.txt -q# Keep escape sequences in HTTP headers (don't expand \n, \t, etc.)
python3 xml_decoder.py input.xml --no-unescape
# Keep escape sequences in JSON strings
python3 xml_decoder.py input.xml --no-expand-json
# Both options together
# Set maximum XML and Base64 sizes
python3 xml_decoder.py input.xml --max-xml-size 102400 --max-b64-size 2000
# Disable sanitisation of control characters (raw output)
python3 xml_decoder.py input.xml --raw-output
# Use standard (unsafe) XML parser
python3 xml_decoder.py input.xml --unsafe-xml
python3 xml_decoder.py input.xml --no-unescape --no-expand-json -o output.txtpython3 xml_decoder.py -hThe tool processes XML files containing Base64-encoded content. Elements with base64="true" attribute will be decoded:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<item>
<request base64="true"><![CDATA[UE9TVCAvYXBpL3YzL3YxL2NoYXQvY29tcGxldGlvbnM...]]></request>
<response base64="true"><![CDATA[SGVsbG8gV29ybGQ=]]></response>
</item>
</root>Before:
POST /api HTTP/1.1\nHost: example.com\nContent-Type: application/json\n\n{"key":"value"}
After:
POST /api HTTP/1.1
Host: example.com
Content-Type: application/json
{
"key": "value"
}
Automatically detects and formats JSON content with proper indentation.
Before:
{
"content": "Line 1\nLine 2\nLine 3"
}After (Plain Text View):
content:
Line 1
Line 2
Line 3
Converts \u6587 → 文, \u4ef6 → 件, etc.
Input XML:
<request base64="true">UE9TVCAvYXBpIEhUVFAvMS4xDQpIb3N0OiBleGFtcGxlLmNvbQ0KDQp7ImhlbGxvIjoid29ybGQifQ==</request>Output:
POST /api HTTP/1.1
Host: example.com
{
"hello": "world"
}
Input XML:
<data base64="true">eyJuYW1lIjoi546L5bCP5piOIiwiY2l0eSI6IuWPsOWMlyJ9</data>Output:
{
"name": "王小明",
"city": "台北"
}See the examples/ directory for more complex examples.
xml-base64-decoder/
├── xml_decoder.py # Main decoder script
├── README.md # This file
├── LICENSE # MIT License
├── examples/ # Example input files
│ ├── sample_input.xml
└── tests/ # Test files (optional)
└── test_decoder.py
You can also use the decoder as a Python module:
from xml_decoder import XMLDecoder
# Create decoder instance
decoder = XMLDecoder(
verbose=True,
unescape_strings=True,
expand_json_strings=True
)
# Process file
results = decoder.process_file('input.xml', 'output.txt')
# Access results
for result in results:
print(f"Tag: {result['tag']}")
print(f"Content: {result['decoded_content']}")
if result['json_data']:
print(f"JSON: {result['json_data']}")| Option | Description |
|---|---|
input_file |
Input XML file path (required) |
-o, --output |
Output file path (optional) |
-q, --quiet |
Quiet mode - suppress detailed output |
--no-unescape |
Don't decode escape sequences in HTTP headers |
--no-expand-json |
Don't expand escape sequences in JSON strings |
| '--raw-output' | Do not sanitise control characters in console output |
| '--max-xml-size' | Maximum XML file size in bytes (default: unlimited) |
| '--max-b64-size' | Maximum Base64 payload length in characters (default: unlimited) |
| '--unsafe-xml' | Use standard XML parser (disables protections) |
-h, --help |
Show help message |
Decode captured HTTP requests/responses from network traffic logs.
Extract and format Base64-encoded content from application logs.
Convert legacy Base64-encoded XML data to readable format.
Inspect encoded API payloads during development and testing.
Solution: This tool correctly handles UTF-8 encoding. If you still see issues, ensure your terminal supports UTF-8:
export LANG=en_US.UTF-8Solution: The JSON must be valid. Use --no-expand-json to see the raw JSON and debug syntax errors.
Solution: Make sure you're not using --no-unescape or --no-expand-json flags.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
YM - GitHub
- Inspired by the need to decode complex XML traffic logs
- Built for cybersecurity professionals and developers
- Special thanks to the open-source community
- Initial release
- Base64 decoding with UTF-8 support
- Unicode escape sequence handling
- HTTP request formatting
- JSON pretty-printing
- Plain text view generation
- Command-line interface
- Add support for gzip-compressed content
- XML validation before processing
- Batch file processing
- Web interface
- Docker support
- Additional output formats (CSV, HTML)
If you encounter any issues or have questions:
- Open an issue on GitHub
- Check existing issues for solutions