Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

[Upload Module] - HTTP Concerns and RESTful API #121

Closed
nhuray opened this Issue · 20 comments

6 participants

@nhuray

Hi Globo Team,

I think there's an issue on the Upload Module regarding the HTTP concerns.

The issue

Indeed, when you upload an image on thumbor :

curl -XPOST -F 'media=@cat-eye.jpg;filename=cat-eye.jpg' http://thumbor-server/upload

The response would be :

HTTP/1.1 201 Created
Content-Length: 22
Content-Type: text/html; charset=UTF-8
Location: cat-eye.jpg
Server: TornadoServer/2.1.1

Regarding the HTTP specification, the Location header MUST be an URI (absolute or relative) and in the case of created responses (201 Created) the Location is the URI of the new resource which was created by the request.

In the example above the resolution of the Location header (as a relative URI) will be :

http://thumbor-server/cat-eye.jpg

but there's no resource created at this uri (404 is returned) :(

A solution

So I think we should create a /original endpoint to GET the resource created by the Upload Module.

In the previous example the response will be :

HTTP/1.1 201 Created
Content-Length: 22
Content-Type: text/html; charset=UTF-8
Location: original/cat-eye.jpg
Server: TornadoServer/2.1.1

And why not a RESTful API ?

Moreover this /original endoint should be REST compliant to support POST / PUT / DELETE / GET methods.

The main advantage of a REST endpoint compared to the current Upload Module is that the id of the uploaded image is managed by the server itself and not by the user through the filename. So there are no risk of id collision when new images are uploaded.

For example to create a new image we just do that :

curl -XPOST -d @cat-eye.jpg http://thumbor-server/original/cat-eye.jpg

The response would be :

HTTP/1.1 201 Created
Content-Length: 22
Content-Type: text/html; charset=UTF-8
Location: 22e7c25d39a24e45a6412148bbd466f4/cat-eye.jpg
Server: TornadoServer/2.1.1

The id of the image uploaded is created by concatenation of an uuid and the filename : 22e7c25d39a24e45a6412148bbd466f4/cat-eye.jpg

I started some time ago a REST API to do this, so I can make a pull request if this feature appears to be cool for the community.

Nicolas

@heynemann
Owner

I've asked a few people to review this and comment here. Let's wait and see.

@nhuray

Ok thank you Bernardo !

@heynemann
Owner

Just to clarify, I'm not saying I agree or disagree... :) I just think we should get more people to review this. Thanks A LOT for your contribution, Nicolas.

@nhuray

I understood :) I think also It's a feature that more people should review !

@fabiomcosta
Collaborator

Looks like there is a bug and a feature request, right?
First the Location should be returned correctly. This could be fixed.
The REST API idea is very nice and clean. I liked it.
Maybe we should split the upload part of thumbor into a new project, since it's like an optional thing and it's starting to grow (even more if the API is implemented).
These are just my thoughts.
Thanks @nhuray !

@nhuray

Yeah it's a bug and a feature request. If you want I can split this issue for more visibility.

IMO It's not necessary to split the Upload Module in a new project, in the core of Thumbor it's really perfect.
But the current module is for me designed for forms (html forms) because it manipulate form-data. The Upload Rest API I propose should be more REST compliant.

So we could create a base UploadHandler containing the validation logic.

The current module should be renamed in UploadFormHandler extending UploadHandler and the new REST module should be UploadApiHandlerextending UploadHandler too.

Regards

@heynemann
Owner
@diogomonica

High level comments:

  • Agreed that we should respect the Location spec and return a valid image URL with the 201.
  • Having the server managing the ID for the images is a good idea (ignoring any user submitted content, except the image itself).
  • And finally, I agree that a single endpoint for uploading files would be ideal.

I don't think I'm familiar enough with the code base to provide a very insightful comment on the way to organize this, or even if it is possible to achieve all three properties without some ugly hack, but I would be more than happy to review a pull-request.

@heynemann I don't fully understand what you mean with providing "file field with the name", but we should minimize the venue for users to submit parameters to this API. One very common mistake is to respect the user's provided filename, and ending up with a vulnerability in which I can overwrite arbitrary files on the remote server.

@heynemann
Owner
@nhuray nhuray referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@nhuray

Hi Bernardo and Globo Community,

I've just commit in my fork a new upload module.

This module have 2 handlers :

  • ImagesHandler implementing only POST method to upload images. This handler support image content in the request body (rest style) and in a multipart/form-data (designed for forms)

  • ImageHandler implementing GET, PUT, DELETE to manipulate the existing images

The validation logic between this handlers is centralized in ImageApiHandler located in __init__.py.

Usage:

Assuming the thumbor server is located at : http://thumbor-server

Using the new upload module via a form

When using the new upload module via a form, the user is free to choose the id of the image via the filename field :

curl -i -XPOST http://thumbor-server/image -F 'media=@vows/crocodile.jpg;type=image/jpeg;filename=something.jpg'
HTTP/1.1 201 Created
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Location: /image/something.jpg
Server: TornadoServer/2.1.1

The image is created at http://thumbor-server/image/something.jpg. It can be retrieve, modify or delete via this URI.

Note : a second POST will overwrite the created image, if the property UPLOAD_PUT_ALLOWED is true. A 403 Forbidden is return if UPLOAD_PUT_ALLOWED is false.

Using the new upload module via the REST api

When using the new upload module via the REST api :

curl -i -H "Content-Type: image/jpeg" -XPOST http://thumbor-server/image --data-binary "@vows/crocodile.jpg"
HTTP/1.1 201 Created
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Location: /image/05b2eda857314e559630c6f3334d818d
Server: TornadoServer/2.1.1

The image is created at http://thumbor-server/image/05b2eda857314e559630c6f3334d818d. It can be retrieve, modify or delete via this URI.

Could you review my code and tell me if it was you expect.

Thanks.

Nicolas

@nhuray nhuray referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@heynemann
Owner
@nhuray

Bernardo,

IMO the use case you are describing (where the user define the uri of the resource to create) should be use with a PUT method.

You can read this thread on stackoverflow which explain the semantic difference between POST and PUT : http://stackoverflow.com/questions/630453/put-vs-post-in-rest

Cheers

@heynemann
Owner
@nhuray

Thank you Bernardo for these explanations.

I expect the problem you have with POST, is the id auto-generated without the filename.

I understand the SEO constraints you want to preserve, so I would like to propose another way to answer this constraints, preserving the REST style :

Upload images

Upload via REST API

For now, I chose a simple uuid to create the image id but it's possible to add an optional http header Slug to specify a filename :

For example :

POST /image

Content-Type: image/jpeg
Content-Length: 822
Slug: crocodile.jpg

should return :

HTTP/1.1 201 Created
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Location: /image/05b2eda857314e559630c6f3334d818d/crocodile.jpg

This way to do is the strategy chosen by Google for Picasa

Not specifying a Slug causes the server to use a default filename for the image. This could be configured in Thumbor with an UPLOAD_DEFAULT_SLUG parameter.

For example :

UPLOAD_DEFAULT_SLUG = image

should return :

Content-Type Default Slug
image/jpeg image.jpg
image/png image.png
image/gif image.gif

Upload via a Form

In case of using the upload module via a form the Slug parameter should be ignored because the filename field is required (and have the same semantic).

POST /image

Content-Disposition: form-data; name=crocodile.jpg; filename=something.jpg
Content-Type: image/jpeg
Content-Length: 822
Slug: slug-should-be-ignored.jpg

should return :

HTTP/1.1 201 Created
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Location: /image/05b2eda857314e559630c6f3334d818d/something.jpg

Resolving images

IMO, one thing we have to consider is if the image filename (set via Slug or filename field) is a part of the image id for the storage.

Google answered NO to this question. Indeed these URLs returns the same content :

IMO it's a good way to prevent 404 and a good strategy regarding the SEO constraints.

Resolving uploaded images

In this case these URLs should return the same content :

  • http://thumbor-server/image/05b2eda857314e559630c6f3334d818d/crocodile.jpg

  • http://thumbor-server/image/05b2eda857314e559630c6f3334d818d/another-filename.jpg

This is easy we just have to take the 32 first characters.

Resolving processed images

In image processing these URLs should return the same content :

  • http://thumbor-server/K97LekICOXT9MbO3X1u8BBkrjbu5/300x200/smart/05b2eda857314e559630c6f3334d818d/crocodile.jpg

  • http://thumbor-server/K97LekICOXT9MbO3X1u8BBkrjbu5/300x200/smart/05b2eda857314e559630c6f3334d818d/another-filename.jpg

We have to modify url.py to check if image have the following pattern ([^/]{32})/.* and capture the uuid.

I let you, the globo team and the thumbor community evaluate these proposals.

Cheers,

Nicolas

PS: Sorry for this so long thread

@texvex

Hi,

This kind of Api would be great for us.
We have several applications using the upload module. Currently each application have to manage the image id (using a pattern) in order to prevent image overwriting.
So we need that thumbor manage the unicity of these images (with the uuid).
Keeping the filename in the last path of the uri would be great too.

best regards,

Jérémie

@heynemann
Owner

After careful consideration, I really like the last proposal Nicolas. My only issue with it is the "special case" of the UUID, which goes against thumbor's extensibility pattern.

That can be the default loader and storage (i.e.: FileLoader and FileStorage) way of resolving the original, but I'm not willing to modify url.py to include such a special scenario (nor I think we need to).

Consider the scenario where a request arrives for an image with this URL:

http://thumbor-server/K97LekICOXT9MbO3X1u8BBkrjbu5/300x200/smart/05b2eda857314e559630c6f3334d818d/crocodile.jpg

Thumbor would ask the configured loader to load the image 05b2eda857314e559630c6f3334d818d/crocodile.jpg. One of two things happen:

  • The loader can find the image and it loads its bytes;
  • The loader can't find the image and thumbor returns 404.

Do you think that makes sense?

Cheers,
Bernardo Heynemann

@heynemann
Owner

Btw, no need to apologize for the long thread. You can't even begin to imagine how happy we are to collaborate with the community around thumbor.

Cheers,

@nhuray

Hi Bernardo,

IMO the problem to integrate this logic in the loader is that we have to define it in all loaders (fileloader, hbase_loader, etc...). I would prefer to integrate this logic before, but I don't know how to do that for the moment.

I let you think of that with the community during my vacation. I come back the 25 september.

Regards,

Nicolas

@sefournier

Hi,

This is a more well-presentend & argued of my request (#88).

A restful api would simplify our implementation of Thumbor.

Regards,

Sébastien

@nhuray nhuray referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@nhuray nhuray referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@nhuray nhuray referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@nhuray nhuray referenced this issue from a commit in nhuray/thumbor
@nhuray nhuray [REST Upload Module] : REST Upload API (#121) :
   - Rename image.py to imaging.py
   - Create ImagesHandler to upload new image with content in body or via multipart/form-data (POST)
   - Create ImageHandler to retrieve, modify and delete existing image (GET, PUT, DELETE)
   - Modify imaging.py to check if the 32 first characters is an image existing in storage
d854629
@nhuray

Bernardo, I close this issue and create another for the documentation

@nhuray nhuray closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.