Skip to content

bug/include_orig_elements doesnt include elements metadata / coords #73

@Depcos

Description

@Depcos

Describe the bug
When using:

shared.PartitionParameters(
    coordinates=True,
    strategy="hi_res",
    chunking_strategy="by_title",
    include_orig_elements=True,
)

the result in CompositeElement doesnt return .metadata.orig_elements
It doesn't return coordinates for the CompositeElement also. Only works for tables.
Version 0.22.0 returned these metadata but only their memory addresses so that wasnt very usefull.

To Reproduce
Provide a code snippet that reproduces the issue.
I use last version 0.23.0 downloaded and builded from github.

s = unstructured_client.UnstructuredClient(
    api_key_auth=api_key,
    server_url=api_url,
)
x_files = shared.Files(
    content=file,
    file_name=file_name,
)
req = shared.PartitionParameters(
    files=x_files,
    coordinates=True,
    split_pdf_page=True,
    hi_res_model_name="detectron2_onnx",
    strategy="hi_res",
    chunking_strategy="by_title",
    include_orig_elements=True,
    max_characters=50,
    combine_under_n_chars=20
)
try:
    resp = s.general.partition(req)
    s.general.sdk_configuration
    return resp.elements
    # print(resp.elements[0])
except SDKError as e:
    print(e)

Expected behavior
A clear and concise description of what you expected to happen.
I expect to recieve coordinates for CompositeElement or coordinates for all elements which it was composed from atleast.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment Info
Please run python scripts/collect_env.py and paste the output here.
This will help us understand more about the environment in which the bug occurred.
Doesnt have this script.
Additional context
Add any other context about the problem here.
Normal unstructured returns list of objects with their respective coordinates, so its propably something yet to be implemented.

Metadata

Metadata

Assignees

No one assigned

    Labels

    investigatingAttempting to reproduce or otherwise diagnosing the problem.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions