Skip to content

yu-min/xml-base64-decoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XML Base64 Decoder

A powerful Python tool for decoding Base64-encoded content in XML files, with special support for UTF-8 Chinese characters, Unicode escape sequences, and string escape characters.

Features

Base64 Decoding: Automatically detects and decodes base64="true" XML elements
UTF-8 Support: Properly handles Chinese and other multi-byte characters
Unicode Escapes: Converts \uxxxx sequences to actual characters
Escape Sequences: Expands \n, \t, \", etc. for better readability
JSON Formatting: Auto-detects and pretty-prints JSON content
HTTP Request Formatting: Formats HTTP requests with proper line breaks
Plain Text View: Creates readable plain-text representation of JSON data
Command-Line Interface: Easy to use with flexible options
Secure XML Parsing: Uses the defusedxml library by default to mitigate XXE and entity expansion attacks. Falling back to the standard parser requires the --unsafe-xml flag
Size Limits: Optional --max-xml-size and --max-b64-size parameters allow limiting XML file size and Base64 payload length to prevent resource exhaustion attacks
Strict Base64 Validation: Accepts only valid Base64 characters and rejects malformed input to prevent data integrity issues
Output Sanitization: Escapes control characters and ANSI escape sequences to protect your terminal and logs (disable with --raw-output)

Installation

Requirements

  • Python 3.10 or higher
  • Requires the defusedxml library for secure XML parsing (pip install defusedxml). The tool falls back to the standard library only when the --unsafe-xml flag is used.

Quick Start

# Clone the repository
git clone https://github.com/yu-min/xml-base64-decoder.git
cd xml-base64-decoder
# Install dependencies
pip install defusedxml

# Make the script executable (optional)
chmod +x xml_decoder.py

# Run the decoder
python3 xml_decoder.py input.xml

Usage

Basic Usage

# Decode and display results in console
python3 xml_decoder.py input.xml

# Decode and save to file
python3 xml_decoder.py input.xml -o output.txt

# Quiet mode (only show summary)
python3 xml_decoder.py input.xml -o output.txt -q

Advanced Options

# Keep escape sequences in HTTP headers (don't expand \n, \t, etc.)
python3 xml_decoder.py input.xml --no-unescape

# Keep escape sequences in JSON strings
python3 xml_decoder.py input.xml --no-expand-json

# Both options together
# Set maximum XML and Base64 sizes
python3 xml_decoder.py input.xml --max-xml-size 102400 --max-b64-size 2000

# Disable sanitisation of control characters (raw output)
python3 xml_decoder.py input.xml --raw-output

# Use standard (unsafe) XML parser
python3 xml_decoder.py input.xml --unsafe-xml
python3 xml_decoder.py input.xml --no-unescape --no-expand-json -o output.txt

Help

python3 xml_decoder.py -h

Input Format

The tool processes XML files containing Base64-encoded content. Elements with base64="true" attribute will be decoded:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <item>
    <request base64="true"><![CDATA[UE9TVCAvYXBpL3YzL3YxL2NoYXQvY29tcGxldGlvbnM...]]></request>
    <response base64="true"><![CDATA[SGVsbG8gV29ybGQ=]]></response>
  </item>
</root>

Output Features

1. HTTP Request Formatting

Before:

POST /api HTTP/1.1\nHost: example.com\nContent-Type: application/json\n\n{"key":"value"}

After:

POST /api HTTP/1.1
Host: example.com
Content-Type: application/json

{
  "key": "value"
}

2. JSON Pretty-Printing

Automatically detects and formats JSON content with proper indentation.

3. Escape Sequence Expansion

Before:

{
  "content": "Line 1\nLine 2\nLine 3"
}

After (Plain Text View):

content:
  Line 1
  Line 2
  Line 3

4. Unicode Character Decoding

Converts \u6587, \u4ef6, etc.

Examples

Example 1: Basic HTTP Request

Input XML:

<request base64="true">UE9TVCAvYXBpIEhUVFAvMS4xDQpIb3N0OiBleGFtcGxlLmNvbQ0KDQp7ImhlbGxvIjoid29ybGQifQ==</request>

Output:

POST /api HTTP/1.1
Host: example.com

{
  "hello": "world"
}

Example 2: Traditional Chinese Content

Input XML:

<data base64="true">eyJuYW1lIjoi546L5bCP5piOIiwiY2l0eSI6IuWPsOWMlyJ9</data>

Output:

{
  "name": "王小明",
  "city": "台北"
}

Example 3: Complex JSON with Escape Sequences

See the examples/ directory for more complex examples.

Project Structure

xml-base64-decoder/
├── xml_decoder.py          # Main decoder script
├── README.md               # This file
├── LICENSE                 # MIT License
├── examples/               # Example input files
│   ├── sample_input.xml
└── tests/                  # Test files (optional)
    └── test_decoder.py

API Usage

You can also use the decoder as a Python module:

from xml_decoder import XMLDecoder

# Create decoder instance
decoder = XMLDecoder(
    verbose=True,
    unescape_strings=True,
    expand_json_strings=True
)

# Process file
results = decoder.process_file('input.xml', 'output.txt')

# Access results
for result in results:
    print(f"Tag: {result['tag']}")
    print(f"Content: {result['decoded_content']}")
    if result['json_data']:
        print(f"JSON: {result['json_data']}")

Command-Line Options

Option Description
input_file Input XML file path (required)
-o, --output Output file path (optional)
-q, --quiet Quiet mode - suppress detailed output
--no-unescape Don't decode escape sequences in HTTP headers
--no-expand-json Don't expand escape sequences in JSON strings
'--raw-output' Do not sanitise control characters in console output
'--max-xml-size' Maximum XML file size in bytes (default: unlimited)
'--max-b64-size' Maximum Base64 payload length in characters (default: unlimited)
'--unsafe-xml' Use standard XML parser (disables protections)
-h, --help Show help message

Common Use Cases

1. API Traffic Analysis

Decode captured HTTP requests/responses from network traffic logs.

2. Log File Processing

Extract and format Base64-encoded content from application logs.

3. Data Migration

Convert legacy Base64-encoded XML data to readable format.

4. Debugging

Inspect encoded API payloads during development and testing.

Troubleshooting

Issue: Chinese characters appear as gibberish

Solution: This tool correctly handles UTF-8 encoding. If you still see issues, ensure your terminal supports UTF-8:

export LANG=en_US.UTF-8

Issue: JSON not formatting properly

Solution: The JSON must be valid. Use --no-expand-json to see the raw JSON and debug syntax errors.

Issue: Escape sequences not expanding

Solution: Make sure you're not using --no-unescape or --no-expand-json flags.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

YM - GitHub

Acknowledgments

  • Inspired by the need to decode complex XML traffic logs
  • Built for cybersecurity professionals and developers
  • Special thanks to the open-source community

Changelog

Version 1.0.0 (2026-03-03)

  • Initial release
  • Base64 decoding with UTF-8 support
  • Unicode escape sequence handling
  • HTTP request formatting
  • JSON pretty-printing
  • Plain text view generation
  • Command-line interface

Roadmap

  • Add support for gzip-compressed content
  • XML validation before processing
  • Batch file processing
  • Web interface
  • Docker support
  • Additional output formats (CSV, HTML)

Support

If you encounter any issues or have questions:

  • Open an issue on GitHub
  • Check existing issues for solutions

About

A powerful Python tool for decoding Base64-encoded content in XML files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages