Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Lazy image reading support (reading, converting, resizing) #1288

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

maartenbreddels
Copy link
Member

Example:

image

cc @xdssio


@vaex.register_function(scope='image')
def as_image(arrays):
images = [rgba_2_pil(image_array) for image_array in images]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

images = [rgba_2_pil(image_array) for image_array in **arrays**]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's wrong, you can also see this is not tested yet in the unittests, feel free to fix.

@xdssio
Copy link
Collaborator

xdssio commented Apr 13, 2021

I think the printing of the image in different sizes in an overkill, have the image always look as in the first "image" column is fine, but the resize is definitely important.

@maartenbreddels
Copy link
Member Author

printing of the image in different sizes

this is not something vaex is doing, if you look at the code you see I created 3 image columns, just for a demo.

@xdssio
Copy link
Collaborator

xdssio commented Apr 13, 2021

We might want a helper function to find all files in a dir.

Something like this orso

def get_paths(path, suffix=None, resize=None):
    if os.path.isfile(path):
        files = [path]
    if os.path.isdir(path):
        files = []
        if suffix is not None:
            files = [str(path) for path in Path(path).rglob(f"*{suffix}")]
        else:
            for suffix in ['jpg', 'png', 'jpeg', 'ppm', 'thumbnail']:
                files.extend([str(path) for path in Path(path).rglob(f"*{suffix}")])
    num_skipped = 0
    ignores = set([])
    for file in files:
        try:
            fobj = open(file, "rb")
            is_jfif = tf.compat.as_bytes("JFIF") in fobj.peek(10)
        finally:
            fobj.close()
        if not is_jfif:
            num_skipped += 1
            logger.error(f"file {path} is corrupted - ignore")
            ignores.add(file)
    files = [file for file in files if file not in ignores]
 return files

@Ben-Epstein
Copy link
Contributor

Hi @maartenbreddels @xdssio are there plans to continue/merge these efforts? We use Vaex for a lot of processing now and plan to use it for images in the near future :-]

@maartenbreddels
Copy link
Member Author

@JovanVeljanoski what do you think?

@JovanVeljanoski
Copy link
Member

Yeah, I think this is definitely worth looking into at some point soon. I think it would be quite cool do various types of pre-processing on all images instead of per-batch, especially for deep NN stuff. I wonder what impact that would have.

We are working hard on the next major version, and the roadmap for that is pretty much fixed i believe. It revolves around stabilizing features like shift and diff as well as major improvements to the internal "pipeline" of vaex dataframes, together with various bugfixes, performance improvements and the like.

After that I do not know what the plan is yet, so we could look into this. We typically look at what is most in demand or has the highest impact. Or if someone is willing to fund/sponsor the development of certain features, it would get a priority.

Do you agree @maartenbreddels ?

@maartenbreddels
Copy link
Member Author

I agree, i'd like to have a feature for displaying images, but lets focus on those things you mention first, and work on an example that uses/requires images, unless funding ups the priority.

@xdssio
Copy link
Collaborator

xdssio commented Oct 7, 2021

To continue the discussion.
I propose an "Image" column.

df = vaex.open_images("dir/with/images/*.png", column_name='image) # orso
df = vaex.from_array(image=[np.fromstring(image_data)]) # this is for loading an image in server

df['image'].path
df['image'].shape

df['image'].pixels
# or
df['image'].array
# or 
df['image'].matrix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants