[Upload Module] - HTTP Concerns and RESTful API #121

Closed
nhuray opened this Issue Aug 30, 2012 · 20 comments

Projects

None yet

6 participants

@nhuray
Contributor
nhuray commented Aug 30, 2012

Hi Globo Team,

I think there's an issue on the Upload Module regarding the HTTP concerns.

The issue

Indeed, when you upload an image on thumbor :

curl -XPOST -F 'media=@cat-eye.jpg;filename=cat-eye.jpg' http://thumbor-server/upload

The response would be :

HTTP/1.1 201 Created
Content-Length: 22
Content-Type: text/html; charset=UTF-8
Location: cat-eye.jpg
Server: TornadoServer/2.1.1

Regarding the HTTP specification, the Location header MUST be an URI (absolute or relative) and in the case of created responses (201 Created) the Location is the URI of the new resource which was created by the request.

In the example above the resolution of the Location header (as a relative URI) will be :

http://thumbor-server/cat-eye.jpg

but there's no resource created at this uri (404 is returned) :(

A solution

So I think we should create a /original endpoint to GET the resource created by the Upload Module.

In the previous example the response will be :

HTTP/1.1 201 Created
Content-Length: 22
Content-Type: text/html; charset=UTF-8
Location: original/cat-eye.jpg
Server: TornadoServer/2.1.1

And why not a RESTful API ?

Moreover this /original endoint should be REST compliant to support POST / PUT / DELETE / GET methods.

The main advantage of a REST endpoint compared to the current Upload Module is that the id of the uploaded image is managed by the server itself and not by the user through the filename. So there are no risk of id collision when new images are uploaded.

For example to create a new image we just do that :

curl -XPOST -d @cat-eye.jpg http://thumbor-server/original/cat-eye.jpg

The response would be :

HTTP/1.1 201 Created
Content-Length: 22
Content-Type: text/html; charset=UTF-8
Location: 22e7c25d39a24e45a6412148bbd466f4/cat-eye.jpg
Server: TornadoServer/2.1.1

The id of the image uploaded is created by concatenation of an uuid and the filename : 22e7c25d39a24e45a6412148bbd466f4/cat-eye.jpg

I started some time ago a REST API to do this, so I can make a pull request if this feature appears to be cool for the community.

Nicolas

@heynemann
Member

I've asked a few people to review this and comment here. Let's wait and see.

@nhuray
Contributor
nhuray commented Aug 30, 2012

Ok thank you Bernardo !

@heynemann
Member

Just to clarify, I'm not saying I agree or disagree... :) I just think we should get more people to review this. Thanks A LOT for your contribution, Nicolas.

@nhuray
Contributor
nhuray commented Aug 30, 2012

I understood :) I think also It's a feature that more people should review !

@fabiomcosta
Member

Looks like there is a bug and a feature request, right?
First the Location should be returned correctly. This could be fixed.
The REST API idea is very nice and clean. I liked it.
Maybe we should split the upload part of thumbor into a new project, since it's like an optional thing and it's starting to grow (even more if the API is implemented).
These are just my thoughts.
Thanks @nhuray !

@nhuray
Contributor
nhuray commented Aug 30, 2012

Yeah it's a bug and a feature request. If you want I can split this issue for more visibility.

IMO It's not necessary to split the Upload Module in a new project, in the core of Thumbor it's really perfect.
But the current module is for me designed for forms (html forms) because it manipulate form-data. The Upload Rest API I propose should be more REST compliant.

So we could create a base UploadHandler containing the validation logic.

The current module should be renamed in UploadFormHandler extending UploadHandler and the new REST module should be UploadApiHandlerextending UploadHandler too.

Regards

@heynemann
Member

For the sake of simplicity I'd rather have one way of uploading things to
thumbor.

I like the rest api, but I'd rather have a different name for the route.

/image/some-image.jpg

The storage can then define how to store that image and how to retrieve it
later on.

What do you guys think of going with that approach? When uploading from a
form all you have to do is send the upload file field with the name the
rest API expects for the file. It should work.

Cheers,
Bernardo Heynemann

On Thu, Aug 30, 2012 at 3:24 PM, Nicolas Huray notifications@github.comwrote:

Yeah it's a bug and a feature request. If you want I can split this issue
for more visibility.

IMO It's not necessary to split the Upload Module in a new project, in the
core of Thumbor it's really perfect.
But the current module is for me designed for forms (html forms) because
it manipulate form-data. The Upload Rest API I propose should be more REST
compliant.

So we could create a base UploadHandler containing the validation logic.

The current module should be renamed in UploadFormHandler extending
UploadHandler and the new REST module should be UploadApiHandlerextending
UploadHandler too.

Regards


Reply to this email directly or view it on GitHubhttps://github.com/globocom/thumbor/issues/121#issuecomment-8169314.

@diogomonica

High level comments:

  • Agreed that we should respect the Location spec and return a valid image URL with the 201.
  • Having the server managing the ID for the images is a good idea (ignoring any user submitted content, except the image itself).
  • And finally, I agree that a single endpoint for uploading files would be ideal.

I don't think I'm familiar enough with the code base to provide a very insightful comment on the way to organize this, or even if it is possible to achieve all three properties without some ugly hack, but I would be more than happy to review a pull-request.

@heynemann I don't fully understand what you mean with providing "file field with the name", but we should minimize the venue for users to submit parameters to this API. One very common mistake is to respect the user's provided filename, and ending up with a vulnerability in which I can overwrite arbitrary files on the remote server.

@heynemann
Member

True, yet for SEO reasons the name of the file (not the path) may be
important.

The actual storage strategy may be decided by the storage and returned in
the location.

The issue I have with this strategy is that it requires the URL to be
uniform, meaning that if I post to an url I want to get at the same URL,
not at the one returned in the location header.

The flow is:

POST => http://thumbor-server/image/something.jpg
GET => http://thumbor-server/image/something.jpg (will be 404 if we use a
auto-generated ID)

That's not very consistent IMHO.

Cheers,
Bernardo Heynemann

On Thu, Aug 30, 2012 at 3:47 PM, Diogo Mónica notifications@github.comwrote:

High level comments:

  • Agreed that we should respect the Location spec and return a valid
    image URL with the 201.
  • Having the server managing the ID for the images is a good idea
    (ignoring any user submitted content, except the image itself).
  • And finally, I agree that a single endpoint for uploading files
    would be ideal.

I don't think I'm familiar enough with the code base to provide a very
insightful comment on the way to organize this, or even if it is possible
to achieve all three properties without some ugly hack, but I would be more
than happy to review a pull-request.

@heynemann https://github.com/heynemann I don't fully understand what
you mean with providing "file field with the name", but we should minimize
the venue for users to submit parameters to this API. One very common
mistake is to respect the user's provided filename, and ending up with a
vulnerability in which I can overwrite arbitrary files on the remote server.


Reply to this email directly or view it on GitHubhttps://github.com/globocom/thumbor/issues/121#issuecomment-8170051.

@nhuray
Contributor
nhuray commented Aug 31, 2012

Hi Bernardo and Globo Community,

I've just commit in my fork a new upload module.

This module have 2 handlers :

  • ImagesHandler implementing only POST method to upload images. This handler support image content in the request body (rest style) and in a multipart/form-data (designed for forms)
  • ImageHandler implementing GET, PUT, DELETE to manipulate the existing images

The validation logic between this handlers is centralized in ImageApiHandler located in __init__.py.

Usage:

Assuming the thumbor server is located at : http://thumbor-server

Using the new upload module via a form

When using the new upload module via a form, the user is free to choose the id of the image via the filename field :

curl -i -XPOST http://thumbor-server/image -F 'media=@vows/crocodile.jpg;type=image/jpeg;filename=something.jpg'
HTTP/1.1 201 Created
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Location: /image/something.jpg
Server: TornadoServer/2.1.1

The image is created at http://thumbor-server/image/something.jpg. It can be retrieve, modify or delete via this URI.

Note : a second POST will overwrite the created image, if the property UPLOAD_PUT_ALLOWED is true. A 403 Forbidden is return if UPLOAD_PUT_ALLOWED is false.

Using the new upload module via the REST api

When using the new upload module via the REST api :

curl -i -H "Content-Type: image/jpeg" -XPOST http://thumbor-server/image --data-binary "@vows/crocodile.jpg"
HTTP/1.1 201 Created
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Location: /image/05b2eda857314e559630c6f3334d818d
Server: TornadoServer/2.1.1

The image is created at http://thumbor-server/image/05b2eda857314e559630c6f3334d818d. It can be retrieve, modify or delete via this URI.

Could you review my code and tell me if it was you expect.

Thanks.

Nicolas

@heynemann
Member

I'm not sure if the "auto-generated" image name is a good thing. I think
aiming at:

curl -i -XPOST http://thumbor-server/image/crocodile.jpg -F
'media=@vows/crocodile.jpg;type=image/jpeg;filename=something.jpg'

For the rest API makes much more sense. The user is posting to the URI that
he intends for the image. This means that he already knows the image URI
and can easily "guess" it.

What do you guys think?

Cheers,
Bernardo Heynemann

On Fri, Aug 31, 2012 at 1:02 PM, Nicolas Huray notifications@github.comwrote:

Hi Bernardo and Globo Community,

I've just commit in my fork a new upload module.

This module have 2 handlers :

ImagesHandler implementing only POST method to upload images. This
handler support image content in the request body (rest style) and in a
multipart/form-data (designed for forms)

ImageHandler implementing GET, PUT, DELETE to manipulate the existing
images

The validation logic between this handlers is centralized in
ImageApiHandler located in init.py.
Usage:

Assuming the thumbor server is located at : http://thumbor-server
Using the new upload module via a form

When using the new upload module via a form, the user is free to choose
the id of the image via the filename field :

curl -i -XPOST http://thumbor-server/image -F 'media=@vows/crocodile.jpg;type=image/jpeg;filename=something.jpg'

HTTP/1.1 201 Created
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Location: /image/something.jpg
Server: TornadoServer/2.1.1

The image is created at http://thumbor-server/image/something.jpg. It can
be retrieve, modify or delete via this URI.

Note : a second POST will overwrite the created image, if the property
UPLOAD_PUT_ALLOWED is true. A 403 Forbidden is return if UPLOAD_PUT_ALLOWED
is false.
Using the new upload module via the REST api

When using the new upload module via the REST api :

curl -i -H "Content-Type: image/jpeg" -XPOST http://thumbor-server/image --data-binary "@vows/crocodile.jpg"

HTTP/1.1 201 Created
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Location: /image/05b2eda857314e559630c6f3334d818d
Server: TornadoServer/2.1.1

The image is created at
http://thumbor-server/image/05b2eda857314e559630c6f3334d818d. It can be
retrieve, modify or delete via this URI.

Could you review my code and tell me if it was you expect.

Thanks.

Nicolas


Reply to this email directly or view it on GitHubhttps://github.com/globocom/thumbor/issues/121#issuecomment-8196699.

@nhuray
Contributor
nhuray commented Aug 31, 2012

Bernardo,

IMO the use case you are describing (where the user define the uri of the resource to create) should be use with a PUT method.

You can read this thread on stackoverflow which explain the semantic difference between POST and PUT : http://stackoverflow.com/questions/630453/put-vs-post-in-rest

Cheers

@heynemann
Member

When I replied, I was aware of that difference. I do not like it, though.

It makes for a difficult API. You can create a resource using POST (which
makes it a random-named resource) through one URL (/image) OR you can
create a resource using PUT (which makes a specified name resource) through
a different URL (/image/my-image.jpg).

Thumbor, as you guys are well aware, has a myriad of different URL
variations. It is a little overwhelming for someone starting to use it as
of now. I'm very concerned of adding this many options right now.

I would be more comfortable with just choosing put over post, due to the
implications of having images named "1939419481248" (without even
extension).

My dream scenario is:

PUT /image/path.jpg (if image exists update it, otherwise create it)
POST /image/path.jpg (if image exists 406 (not sure anyway), otherwise
create it)
DELETE /image/path.jpg (if image exists, delete it. otherwise 404)
GET /image/path.jpg (if image exists, return it. otherwise 404)

This is the least surprising FOR ME. Not saying it is for everyone else.
Maybe it is not in accord with the definition of REST in that post. It is
simpler IMHO.

That's why I said I would be VERY interested in everyone else's opinion.

Cheers,
Bernardo Heynemann

On Fri, Aug 31, 2012 at 5:29 PM, Nicolas Huray notifications@github.comwrote:

Bernardo,

IMO the use case you are describing (where the user define the uri of the
resource to create) should be use with a PUT method.

You can read this thread on stackoverflow which explain the semantic
difference between POST and PUT :
http://stackoverflow.com/questions/630453/put-vs-post-in-rest

Cheers


Reply to this email directly or view it on GitHubhttps://github.com/globocom/thumbor/issues/121#issuecomment-8204121.

@nhuray
Contributor
nhuray commented Sep 1, 2012

Thank you Bernardo for these explanations.

I expect the problem you have with POST, is the id auto-generated without the filename.

I understand the SEO constraints you want to preserve, so I would like to propose another way to answer this constraints, preserving the REST style :

Upload images

Upload via REST API

For now, I chose a simple uuid to create the image id but it's possible to add an optional http header Slug to specify a filename :

For example :

POST /image

Content-Type: image/jpeg
Content-Length: 822
Slug: crocodile.jpg

should return :

HTTP/1.1 201 Created
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Location: /image/05b2eda857314e559630c6f3334d818d/crocodile.jpg

This way to do is the strategy chosen by Google for Picasa

Not specifying a Slug causes the server to use a default filename for the image. This could be configured in Thumbor with an UPLOAD_DEFAULT_SLUG parameter.

For example :

UPLOAD_DEFAULT_SLUG = image

should return :

Content-Type Default Slug
image/jpeg image.jpg
image/png image.png
image/gif image.gif

Upload via a Form

In case of using the upload module via a form the Slug parameter should be ignored because the filename field is required (and have the same semantic).

POST /image

Content-Disposition: form-data; name=crocodile.jpg; filename=something.jpg
Content-Type: image/jpeg
Content-Length: 822
Slug: slug-should-be-ignored.jpg

should return :

HTTP/1.1 201 Created
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Location: /image/05b2eda857314e559630c6f3334d818d/something.jpg

Resolving images

IMO, one thing we have to consider is if the image filename (set via Slug or filename field) is a part of the image id for the storage.

Google answered NO to this question. Indeed these URLs returns the same content :

IMO it's a good way to prevent 404 and a good strategy regarding the SEO constraints.

Resolving uploaded images

In this case these URLs should return the same content :

This is easy we just have to take the 32 first characters.

Resolving processed images

In image processing these URLs should return the same content :

We have to modify url.py to check if image have the following pattern ([^/]{32})/.* and capture the uuid.

I let you, the globo team and the thumbor community evaluate these proposals.

Cheers,

Nicolas

PS: Sorry for this so long thread

@texvex
texvex commented Sep 4, 2012

Hi,

This kind of Api would be great for us.
We have several applications using the upload module. Currently each application have to manage the image id (using a pattern) in order to prevent image overwriting.
So we need that thumbor manage the unicity of these images (with the uuid).
Keeping the filename in the last path of the uri would be great too.

best regards,

Jérémie

@heynemann
Member

After careful consideration, I really like the last proposal Nicolas. My only issue with it is the "special case" of the UUID, which goes against thumbor's extensibility pattern.

That can be the default loader and storage (i.e.: FileLoader and FileStorage) way of resolving the original, but I'm not willing to modify url.py to include such a special scenario (nor I think we need to).

Consider the scenario where a request arrives for an image with this URL:

http://thumbor-server/K97LekICOXT9MbO3X1u8BBkrjbu5/300x200/smart/05b2eda857314e559630c6f3334d818d/crocodile.jpg

Thumbor would ask the configured loader to load the image 05b2eda857314e559630c6f3334d818d/crocodile.jpg. One of two things happen:

  • The loader can find the image and it loads its bytes;
  • The loader can't find the image and thumbor returns 404.

Do you think that makes sense?

Cheers,
Bernardo Heynemann

@heynemann
Member

Btw, no need to apologize for the long thread. You can't even begin to imagine how happy we are to collaborate with the community around thumbor.

Cheers,

@nhuray
Contributor
nhuray commented Sep 6, 2012

Hi Bernardo,

IMO the problem to integrate this logic in the loader is that we have to define it in all loaders (fileloader, hbase_loader, etc...). I would prefer to integrate this logic before, but I don't know how to do that for the moment.

I let you think of that with the community during my vacation. I come back the 25 september.

Regards,

Nicolas

@sefournier

Hi,

This is a more well-presentend & argued of my request (#88).

A restful api would simplify our implementation of Thumbor.

Regards,

Sébastien

@nhuray nhuray added a commit to nhuray/thumbor that referenced this issue Oct 24, 2012
@nhuray nhuray + Nicolas Huray [REST Upload Module] : REST Upload API (#121) :
   - Rename image.py to imaging.py
   - Create ImagesHandler to upload new image with content in body or via multipart/form-data (POST)
   - Create ImageHandler to retrieve, modify and delete existing image (GET, PUT, DELETE)
   - Modify imaging.py to check if the 32 first characters is an image existing in storage
d854629
@nhuray
Contributor
nhuray commented Nov 4, 2012

Bernardo, I close this issue and create another for the documentation

@nhuray nhuray closed this Nov 4, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment