Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 0 additions & 49 deletions .github/workflows/build-20.04.yml

This file was deleted.

26 changes: 18 additions & 8 deletions .github/workflows/build-24.04.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ name: CI Build - Python 3.8+

on:
push:
branches: [ "master" ]
branches: [ "main", "master" ]
tags: '**'
pull_request:
branches: [ "master" ]
branches: [ "main", "master" ]

jobs:
build:
Expand All @@ -17,7 +17,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14-dev"]
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14"]

steps:
- uses: actions/checkout@v4
Expand All @@ -32,13 +32,23 @@ jobs:
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
# javaobj/v3 and tests/test_v3.py require Python 3.12+ syntax; exclude them on older versions
if python -c "import sys; sys.exit(0 if sys.version_info >= (3, 12) else 1)"; then
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
else
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics --exclude=javaobj/v3,tests/test_v3.py
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --exclude=javaobj/v3,tests/test_v3.py
fi
- name: Test
run: |
coverage run -m pytest
if python -c "import sys; sys.exit(0 if sys.version_info >= (3, 12) else 1)"; then
coverage run -m pytest
else
coverage run --omit='javaobj/v3/*,tests/test_v3.py' -m pytest --ignore=tests/test_v3.py
fi
- name: Coveralls
env:
COVERALLS_REPO_TOKEN: ${{ secrets.COVERALLS_REPO_TOKEN }}
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,7 @@ nosetests.xml
/issue*/
/repro*.py
/test*.py

# uv
.venv
uv.lock
185 changes: 172 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,22 +28,27 @@ This fork intends to work both on Python 2.7 and Python 3.4+.
| Implementations | Version |
|-----------------|----------|
| `v1`, `v2` | `0.4.0+` |
| `v3` | `0.5.0+` |

Since version 0.4.0, two implementations of the parser are available:
Since version 0.4.0, three implementations of the parser are available:

* `v1`: the *classic* implementation of `javaobj`, with a work in progress
implementation of a writer.
* `v2`: the *new* implementation, which is a port of the Java project
* `v2`: a rewritten implementation, which is a port of the Java project
[`jdeserialize`](https://github.com/frohoff/jdeserialize/),
with support of the object transformer (with a new API) and of the `numpy`
arrays loading.
* `v3`: a **new** implementation, written from scratch to benefit from
Python 3.12+ features.

You can use the `v1` parser to ensure that the behaviour of your scripts
doesn't change and to keep the ability to write down files.

You can use the `v2` parser for new developments
*which won't require marshalling* and as a *fallback* if the `v1`
fails to parse a file.
You can use the `v2` parser for developments in Python versions lower
than 3.12 and *which won't require marshalling*, or as a *fallback*
if the `v1` parser fails to parse a file.

For new development, you should use the `v3` parser.

### Object transformers V1

Expand All @@ -67,6 +72,18 @@ it, and avoids a mismatch between the referenced object and the transformed one.
The `v2` implementation provides a new API for the object transformers.
Please look at the *Usage (V2)* section in this file.

### Object transformers V3

| Implementations | Version |
|-----------------|----------|
| `v3` | `0.5.0+` |

The `v3` implementation is a full rewrite targeting **Python 3.12+**.
It uses `dataclasses`, structural pattern matching (`match/case`) and PEP 604
union types. Its API is intentionally similar to `v2` but fixes several
correctness issues and adds stricter safety limits.
Please look at the *Usage (V3)* and *Migration to V3* sections in this file.

### Bytes arrays

| Implementations | Version |
Expand Down Expand Up @@ -98,7 +115,8 @@ You can find a sample usage in the *Custom Transformer* section in this file.

## Requirements

* Python >= 2.7 or Python >= 3.4
* Python >= 2.7 or Python >= 3.4 for `v1` and `v2`
* Python >= 3.12 for `v3`
* `enum34` and `typing` when using Python <= 3.4 (installable with `pip`)
* Maven 2+ (for building test data of serialized objects.
You can skip it if you do not plan to run `tests.py`)
Expand Down Expand Up @@ -134,8 +152,8 @@ with open("objCollections.ser", "rb") as fd:

**Note:** The objects and methods provided by `javaobj` module are shortcuts
to the `javaobj.v1` package, for Compatibility purpose.
It is **recommended** to explicitly import methods and classes from the `v1`
(or `v2`) package when writing new code, in order to be sure that your code
It is **recommended** to explicitly import methods and classes from the `v1`,
`v2`, or `v3` package when writing new code, in order to be sure that your code
won't need import updates in the future.


Expand Down Expand Up @@ -391,13 +409,13 @@ class JavaRandomTransformer(BaseTransformer):
values = []
for f_name, f_type in zip(self.field_names, self.field_types):
values.append(parser._read_field_value(f_type))
fields.append(javaobj.beans.JavaField(f_type, f_name))
fields.append(javaobj.v2.beans.JavaField(f_type, f_name))

class_desc = javaobj.beans.JavaClassDesc(
javaobj.beans.ClassDescType.NORMALCLASS
class_desc = javaobj.v2.beans.JavaClassDesc(
javaobj.v2.beans.ClassDescType.NORMALCLASS
)
class_desc.name = self.name
class_desc.desc_flags = javaobj.beans.ClassDataType.EXTERNAL_CONTENTS
class_desc.desc_flags = javaobj.v2.beans.ClassDataType.EXTERNAL_CONTENTS
class_desc.fields = fields
class_desc.field_data = values
return class_desc
Expand Down Expand Up @@ -473,10 +491,151 @@ transformers = [
RandomChildTransformer(),
JavaRandomTransformer()
]
pobj = javaobj.loads("custom_objects.ser", *transformers)
with open("custom_objects.ser", "rb") as fd:
pobj = javaobj.load(fd, *transformers)

# Here we show a field that isn't visible from the class description
# The field belongs to the class but it's not serialized by default because
# it's static. See: https://stackoverflow.com/a/16477421/12621168
print(pobj.field_data["int_not_in_fields"])
```

## Usage (V3 implementation)

> **Requires Python 3.12+.**

The `javaobj.v3` package is a full rewrite of the Java object stream parser.
It provides the same two entry-points as `v2`:

* `load(fd, *transformers, use_numpy_arrays=False, max_array_size=…, max_depth=500)`:
Parses a binary file descriptor opened in `rb` mode and returns the top-level
object if the stream contains exactly one, a list of objects if there are
several, or `None` for an empty stream. Pass additional `ObjectTransformer`
instances as positional arguments.

* `loads(data, *transformers, …)`:
Convenience wrapper around `load()` that accepts `bytes`.

Sample usage:

```python
import javaobj.v3 as javaobj

with open("obj5.ser", "rb") as fd:
pobj = javaobj.load(fd)

# Access fields by name (preferred)
value = pobj.get_field("myField")

# Or use attribute-style access (issues a warning on ambiguity)
value = pobj.myField
```

### New features in V3

| Feature | V1 | V2 | V3 |
|---|---|---|---|
| Python 3.12+ (`match/case`, PEP 604) | ✗ | ✗ | ✓ |
| Fully typed (`dataclasses`, PEP 695 `type` aliases) | ✗ | partial | ✓ |
| `TC_RESET` handling | ✗ | ✗ | ✓ |
| `TC_EXCEPTION` in object graph | ✗ | ✗ | ✓ |
| `TC_PROXYCLASSDESC` | ✗ | ✓ | ✓ |
| Security limits (max depth / array size) | ✗ | ✗ | ✓ |
| Correct `TYPE_CHAR` numpy dtype (`>u2`) | ✗ | ✗ | ✓ |
| Typed exception hierarchy | ✗ | ✗ | ✓ |
| `BlockData.__eq__(bytes)` compatibility | ✓ | ✓ | ✓ |

### Security limits

`v3` adds two optional safety limits that prevent resource exhaustion when
parsing untrusted streams:

```python
import javaobj.v3 as javaobj

with open("untrusted.ser", "rb") as fd:
pobj = javaobj.load(
fd,
max_array_size=10 * 1024 * 1024, # 10 MiB max per array
max_depth=100, # max object-graph depth
)
```

### Object Transformer V3

The `ObjectTransformer` base class in `v3` has the same three override points
as in `v2`:

* `create_instance(classdesc)` — return a `JavaInstance` subclass (or `None`
to fall back to the next transformer).
* `load_array(reader, type_code, size)` — called for `TC_ARRAY` records;
return the array data (`bytes` or `list`) or `None` to use the default logic.
* `load_custom_writeObject(parser, reader, class_name)` — called when a
class written with `writeObject()` requires fully custom parsing.

The `DefaultObjectTransformer` additionally exposes a public `handles(name)`
method that returns `True` when the transformer knows how to load the given
Java class name.

### Using NumPy arrays (V3)

```python
import javaobj.v3 as javaobj

with open("arrays.ser", "rb") as fd:
pobj = javaobj.load(fd, use_numpy_arrays=True)
```

When `use_numpy_arrays=True`, a `NumpyArrayTransformer` is appended to the
transformer list and primitive arrays are returned as `numpy.ndarray`.

---

## Migration to V3

### From V1 to V3

| V1 | V3 |
|---|---|
| `import javaobj` | `import javaobj.v3 as javaobj` |
| `pobj.classdesc.name` | `pobj.classdesc.name` (unchanged) |
| `pobj.myField` (direct attribute) | `pobj.get_field("myField")` (preferred) or `pobj.myField` |
| `pobj._data` on arrays | `pobj.data` (public) |
| `javaobj.JavaObjectUnmarshaller` | removed — use `javaobj.v3.parser.JavaStreamParser` |
| `javaobj.JavaObjectMarshaller` | marshalling not available in `v3` |
| Exceptions: bare `Exception` | Typed: `ParseError`, `UnexpectedOpcodeError`, … |

Shallow conversion helper (best-effort, for gradual migration):

```python
from javaobj.v3._compat import v1_to_v3
v3_obj = v1_to_v3(v1_obj)
```

### From V2 to V3

| V2 | V3 |
|---|---|
| `import javaobj.v2 as javaobj` | `import javaobj.v3 as javaobj` |
| `javaobj.load(fd)` | `javaobj.load(fd)` (same signature) |
| `javaobj.loads(data)` | `javaobj.loads(data)` (same signature) |
| `pobj.classdesc.name` | `pobj.classdesc.name` (unchanged) |
| `pobj.field_data[cd][field]` | `pobj.field_data[cd][field]` (unchanged) |
| `pobj.get_field("name")` | `pobj.get_field("name")` (unchanged) |
| `pobj.__getattr__` ambiguity silent | warns when field exists in multiple classes |
| `transformer._type_mapper` (private) | `transformer.handles(name)` (public) |
| `JavaArray.data` (`tuple` of ints for bytes) | `JavaArray.data` (`bytes` for `TYPE_BYTE`) |
| `BlockData` compared with `bytes` | `BlockData.__eq__(bytes)` still works |
| `use_numpy_arrays=True` (v2 option) | `use_numpy_arrays=True` (same) |
| No depth/size limits | `max_depth=500`, `max_array_size=100 MiB` |
| No typed exceptions | `ParseError`, `SecurityError`, … |

Shallow conversion helper (best-effort, for gradual migration):

```python
from javaobj.v3._compat import v2_to_v3
v3_obj = v2_to_v3(v2_obj)
```

> **Note:** `v3` requires **Python 3.12+** and does **not** support marshalling
> (writing). If you need to write Java object streams, use `v1`.
Loading
Loading