Python utility function to convert an iterable of bytes
or str
to a readable file-like object.
It can be seen as the inverse of the two-argument iter function. The iter function allows conversion of file-like objects to iterables, but the function here converts from iterables to file-like objects. This allows you to bridge the gap between incompatible streaming APIs - passing data from sources that offer data as iterables to destinations that only accept file-like objects.
- Inherits from
IOBase
- some APIs require this - The resulting file-like object is well-behaved - it does not return more data than requested
- It evaluates the iterable lazily - avoiding loading all its data into memory
- Under the hood copying is avoided as much as possible
- Converts iterables of
bytes
to bytes-based file-like objects, which can be passed to boto3's upload_fileobj or to io.TextIOWrapper which is useful in stream CSV parsing. - Converts iterables of
str
to text-based file-like objects, which can be passed to psycopg2's copy_expert
pip install to-file-like-obj
If you have an iterable of bytes
instances, you can pass them to the to_file_like_obj
function, and it will return the corresponding file-like object.
from to_file_like_obj import to_file_like_obj
f = to_file_like_obj((b'one', b'two', b'three',))
print(f.read(5)) # b'onetw'
print(f.read(6)) # b'othree'
If you have an iterable of str
instances, you can pass them to the to_file_like_obj
, along with base=str
as a named argument, and it will return the corresponding file-like object.
from to_file_like_obj import to_file_like_obj
f = to_file_like_obj(('one', 'two', 'three',), base=str)
print(f.read(5)) # 'onetw'
print(f.read(6)) # 'othree'
These examples have the iterables hard coded and so loaded all into memory. However, to_file_like_obj
works equally well with iterables that are generated dynamically, and without loading them all into memory.
Using httpx it's possible to use the to_file_like_obj
function to parse a CSV file while downloading it.
import csv
import io
import httpx
from to_file_like_obj import to_file_like_obj
with httpx.stream("GET", "https://www.example.com/my.csv") as r:
bytes_iter = r.iter_bytes()
f = to_file_like_obj(bytes_iter)
lines_iter = io.TextIOWrapper(f, newline='', encoding='utf=8')
rows_iter = csv.reader(lines):
for row in rows_iter:
print(row)
Similarly, using httpx and stream-unzip, it's possible to use the to_file_like_obj
function to robustly parse a zipped CSV file while downloading it.
import csv
import io
import httpx
from stream_unzip import stream_unzip
from to_file_like_obj import to_file_like_obj
with httpx.stream("GET", "https://www.example.com/my.zip") as r:
zipped_bytes_iter = r.iter_bytes()
# Assumes a single CSV file in the ZIP (in the case of more, this will concatanate them together)
unzipped_bytes_iter = (
chunk
for _, _, chunks in stream_unzip(zipped_bytes_iter)
for chunk in chunks
)
f = to_file_like_obj(unzipped_bytes_iter)
lines_iter = io.TextIOWrapper(f, newline='', encoding='utf=8')
rows_iter = csv.reader(lines):
for row in rows_iter:
print(row)
boto3's upload_fileobj is a powerful function, but it's not obvious that it can be used with iterables of bytes that are returned from various APIs, such as those in httpx.
import httpx
from to_file_like_obj import to_file_like_obj
s3 = boto3.client('s3')
with httpx.stream("GET", "https://www.example.com/my.zip") as r:
bytes_iter = r.iter_bytes()
f = to_file_like_obj(bytes_iter)
s3.upload_fileobj(f, 'my-bucket', 'my.zip')
stream-zip can be used with boto3 and this package to upload objects to S3 while zipping them.
import datetime
import httpx
from stat import S_IFREG
from to_file_like_obj import to_file_like_obj
from stream_zip import ZIP_32, stream_zip
s3 = boto3.client('s3')
with httpx.stream("GET", "https://www.example.com/my.txt") as r:
unzipped_bytes_iter = r.iter_bytes()
member_files = (
(
'my.txt',
datetime.now(),
S_IFREG | 0o600,
ZIP_32,
unzipped_bytes_iter,
),
)
zipped_bytes_iter = stream_zip(member_files)
f = to_file_like_obj(zipped_bytes_iter)
s3.upload_fileobj(f, 'my-bucket', 'my.zip')