Open
Description
I have a pptx which leads to an error when trying to MarkItDown().convert()
it.
Traceback is:
Traceback (most recent call last):
[...]
File "/var/lang/lib/python3.12/site-packages/markitdown/_markitdown.py", line 273, in convert
return self.convert_stream(source, stream_info=stream_info, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/lang/lib/python3.12/site-packages/markitdown/_markitdown.py", line 361, in convert_stream
return self._convert(file_stream=stream, stream_info_guesses=guesses, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/lang/lib/python3.12/site-packages/markitdown/_markitdown.py", line 600, in _convert
raise FileConversionException(attempts=failed_attempts)
markitdown._exceptions.FileConversionException: File conversion failed after 1 attempts:
- PptxConverter threw TypeError with message: '<' not supported between instances of 'NoneType' and 'Emu'
The proplematic pptx is confidential, so I cannot provide it. But I could pin the bug down to this line of code. There is indeed for one shape shape.top == None
, so the sorting fails. The problematic shape seems to be empty anyways.
Currently, I use a very ugly monkey patch:
def _shape_filter(s):
return not ( # if "top" and "left" attributes exist, both of them must not be None
hasattr(s, "top") and hasattr(s, "left") and (s.top is None or s.left is None)
)
def _mock_sorted(iterable, **kwargs):
iterable = (it for it in iterable if _shape_filter(it))
return sorted(iterable, **kwargs)
from markitdown.converters import _pptx_converter
_pptx_converter.sorted = _mock_sorted # type: ignore
Metadata
Metadata
Assignees
Labels
No labels