Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video upload (take 2) #929

Closed
wants to merge 9 commits into from
223 changes: 216 additions & 7 deletions tweepy/api.py
Expand Up @@ -9,15 +9,25 @@

import six

if six.PY2:
from urllib import urlencode
elif six.PY3:
from urllib.parse import urlencode

from tweepy.binder import bind_api
from tweepy.error import TweepError
from tweepy.parsers import ModelParser, Parser
from tweepy.parsers import ModelParser, Parser, RawParser
from tweepy.utils import list_to_csv

IMAGE_MIMETYPES = ('image/gif', 'image/jpeg', 'image/png', 'image/webp')
CHUNKED_MIMETYPES = ('image/gif', 'image/jpeg', 'image/png', 'image/webp', 'video/mp4')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CHUNKED_MIMETYPES = ('image/gif', 'image/jpeg', 'image/png', 'image/webp', 'video/mp4')
CHUNKED_MIMETYPES = IMAGE_MIMETYPES + ('video/mp4',)


class API(object):
"""Twitter API"""

max_size_standard = 5120 # standard uploads must be less then 5 MB
max_size_chunked = 15360 # chunked uploads must be less than 15 MB

Comment on lines +28 to +30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason these aren't initialized in __init__ or better yet, constants outside the class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be 5 and 15 MiB rather than MB. Have these limits been tested?
The library is currently using limits of 4883 and 14649 KiB right now, corresponding to just over 5 and 15 MB.

def __init__(self, auth_handler=None,
host='api.twitter.com', search_host='search.twitter.com',
upload_host='upload.twitter.com', cache=None, api_root='/1.1',
Expand Down Expand Up @@ -195,11 +205,34 @@ def update_status(self, *args, **kwargs):
)(post_data=post_data, *args, **kwargs)

def media_upload(self, filename, *args, **kwargs):
""" :reference: https://developer.twitter.com/en/docs/media/upload-media/api-reference/post-media-upload
""" :reference: https://dev.twitter.com/rest/reference/post/media/upload
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link is old and simply redirects to https://developer.twitter.com/en/docs/media/upload-media/overview.

Suggested change
""" :reference: https://dev.twitter.com/rest/reference/post/media/upload
""" :reference: https://developer.twitter.com/en/docs/media/upload-media/overview

:allowed_param:
"""
f = kwargs.pop('file', None)

mime, _ = mimetypes.guess_type(filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been changed to use imghdr.what with #1086.
mimetypes.guess_type should probably only be used as a fallback in the case that imghdr.what fails to determine the type to check if the file is an mp4.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imghdr doesn't work on video files, so that's another reason to keep it as a backup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I was suggesting that mimetypes.guess_type be used as a fallback to check if the file is an mp4 in that case.
This has already been implemented in the master branch with 7c60edd to resolve #1411.

try:
size = os.path.getsize(filename)
except OSError:
f.seek(0, 2)
size = f.tell()
f.seek(0)

if mime in IMAGE_MIMETYPES and size < self.max_size_standard:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path.getsize returns the size in bytes.
Given a filename, this will chunk any image greater than 5120 bytes rather than 5 MB.

return self.image_upload(filename, f=f, *args, **kwargs)

elif mime in CHUNKED_MIMETYPES:
return self.upload_chunked(filename, f=f, *args, **kwargs)

else:
raise TweepError("Can't upload media with mime type %s" % mime)

def image_upload(self, filename, *args, **kwargs):
""" :reference: https://dev.twitter.com/rest/reference/post/media/upload
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
""" :reference: https://dev.twitter.com/rest/reference/post/media/upload
""" :reference: https://developer.twitter.com/en/docs/media/upload-media/api-reference/post-media-upload

:allowed_param:
"""
f = kwargs.pop('file', None)
headers, post_data = API._pack_image(filename, 4883, form_field='media', f=f)
headers, post_data = API._pack_image(filename, self.max_size_standard, form_field='media', f=f)
kwargs.update({'headers': headers, 'post_data': post_data})

return bind_api(
Expand All @@ -212,6 +245,70 @@ def media_upload(self, filename, *args, **kwargs):
upload_api=True
)(*args, **kwargs)

def upload_chunked(self, filename, *args, **kwargs):
""" :reference https://dev.twitter.com/rest/reference/post/media/upload-chunked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an old link that now 404s.

Suggested change
""" :reference https://dev.twitter.com/rest/reference/post/media/upload-chunked
""" :reference: https://developer.twitter.com/en/docs/media/upload-media/uploading-media/chunked-media-upload

:allowed_param:
"""
f = kwargs.pop('file', None)

# Media category is dependant on whether media is attached to a tweet
# or to a direct message. Assume tweet by default.
is_direct_message = kwargs.pop('is_direct_message', False)

# Initialize upload (Twitter cannot handle videos > 15 MB)
headers, post_data, fp = API._chunk_media('init', filename, self.max_size_chunked, form_field='media', f=f, is_direct_message=is_direct_message)
kwargs.update({ 'headers': headers, 'post_data': post_data })

# Send the INIT request
media_info = bind_api(
api=self,
path='/media/upload.json',
method='POST',
payload_type='media',
allowed_param=[],
require_auth=True,
upload_api=True
)(*args, **kwargs)

# If a media ID has been generated, we can send the file
if media_info.media_id:
# default chunk size is 1MB, can be overridden with keyword argument.
# minimum chunk size is 16K, which keeps the maximum number of chunks under 999
chunk_size = kwargs.pop('chunk_size', 1024 * 1024)
chunk_size = max(chunk_size, 16 * 2014)

fsize = os.path.getsize(filename)
nloops = int(fsize / chunk_size) + (1 if fsize % chunk_size > 0 else 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nloops = int(fsize / chunk_size) + (1 if fsize % chunk_size > 0 else 0)
nloops = fsize // chunk_size + (fsize % chunk_size > 0)

for i in range(nloops):
headers, post_data, fp = API._chunk_media('append', filename, self.max_size_chunked, chunk_size=chunk_size, f=fp, media_id=media_info.media_id, segment_index=i, is_direct_message=is_direct_message)
kwargs.update({ 'headers': headers, 'post_data': post_data, 'parser': RawParser() })
# The APPEND command returns an empty response body
bind_api(
api=self,
path='/media/upload.json',
method='POST',
payload_type='media',
allowed_param=[],
require_auth=True,
upload_api=True
)(*args, **kwargs)
# When all chunks have been sent, we can finalize.
headers, post_data, fp = API._chunk_media('finalize', filename, self.max_size_chunked, media_id=media_info.media_id, is_direct_message=is_direct_message)
kwargs = {'headers': headers, 'post_data': post_data}

# The FINALIZE command returns media information
return bind_api(
api=self,
path='/media/upload.json',
method='POST',
payload_type='media',
allowed_param=[],
require_auth=True,
upload_api=True
)(*args, **kwargs)
else:
return media_info
Comment on lines +309 to +310
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this happen?


def update_with_media(self, filename, *args, **kwargs):
""" :reference: https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/post-statuses-update_with_media
:allowed_param:'status', 'possibly_sensitive', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'auto_populate_reply_metadata', 'lat', 'long', 'place_id', 'display_coordinates'
Expand Down Expand Up @@ -1334,7 +1431,7 @@ def configuration(self):
@staticmethod
def _pack_image(filename, max_size, form_field="image", f=None):
"""Pack image from file into multipart-formdata post body"""
# image must be less than 700kb in size
# image must be less than 5MB in size
if f is None:
try:
if os.path.getsize(filename) > (max_size * 1024):
Expand All @@ -1352,11 +1449,12 @@ def _pack_image(filename, max_size, form_field="image", f=None):
fp = f

# image must be gif, jpeg, or png
file_type = mimetypes.guess_type(filename)
file_type, _ = mimetypes.guess_type(filename)

if file_type is None:
raise TweepError('Could not determine file type')
file_type = file_type[0]
if file_type not in ['image/gif', 'image/jpeg', 'image/png']:

if file_type not in IMAGE_MIMETYPES:
raise TweepError('Invalid file type for image: %s' % file_type)

if isinstance(filename, six.text_type):
Expand All @@ -1383,3 +1481,114 @@ def _pack_image(filename, max_size, form_field="image", f=None):
}

return headers, body

@staticmethod
def _chunk_media(command, filename, max_size, form_field="media", chunk_size=4096, f=None, media_id=None, segment_index=0, is_direct_message=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default chunk_size should be consistent here.

fp = None
if command == 'init':
if f is None:
file_size = os.path.getsize(filename)
try:
if file_size > (max_size * 1024):
raise TweepError('File is too big, must be less than %skb.' % max_size)
except os.error as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except os.error as e:
except OSError as e:

raise TweepError('Unable to access file: %s' % e.strerror)
Comment on lines +1494 to +1495
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path.getsize is outside the try block and nothing else in it will raise OSError.


# build the mulitpart-formdata body
fp = open(filename, 'rb')
else:
f.seek(0, 2) # Seek to end of file
file_size = f.tell()
if file_size > (max_size * 1024):
raise TweepError('File is too big, must be less than %skb.' % max_size)
f.seek(0) # Reset to beginning of file
fp = f
elif command != 'finalize':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other option is

Suggested change
elif command != 'finalize':
elif command == 'append':

if f is not None:
fp = f
else:
raise TweepError('File input for APPEND is mandatory.')

# video must be mp4
file_type, _ = mimetypes.guess_type(filename)

if file_type is None:
raise TweepError('Could not determine file type')

if file_type not in CHUNKED_MIMETYPES:
raise TweepError('Invalid file type for video: %s' % file_type)
Comment on lines +1512 to +1519
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already determined in media_upload and could be an image.
The file type should be passed to this method rather than redetermined here.


BOUNDARY = b'Tw3ePy'
body = list()
if command == 'init':
query = {
'command': 'INIT',
'media_type': file_type,
'total_bytes': file_size,
'media_category': API._get_media_category(
is_direct_message, file_type)
}
body.append(urlencode(query).encode('utf-8'))
headers = {
'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8'
}
elif command == 'append':
if media_id is None:
raise TweepError('Media ID is required for APPEND command.')
body.append(b'--' + BOUNDARY)
body.append('Content-Disposition: form-data; name="command"'.encode('utf-8'))
body.append(b'')
body.append(b'APPEND')
body.append(b'--' + BOUNDARY)
body.append('Content-Disposition: form-data; name="media_id"'.encode('utf-8'))
body.append(b'')
body.append(str(media_id).encode('utf-8'))
body.append(b'--' + BOUNDARY)
body.append('Content-Disposition: form-data; name="segment_index"'.encode('utf-8'))
body.append(b'')
body.append(str(segment_index).encode('utf-8'))
body.append(b'--' + BOUNDARY)
body.append('Content-Disposition: form-data; name="{0}"; filename="{1}"'.format(form_field, os.path.basename(filename)).encode('utf-8'))
body.append('Content-Type: {0}'.format(file_type).encode('utf-8'))
body.append(b'')
body.append(fp.read(chunk_size))
body.append(b'--' + BOUNDARY + b'--')
headers = {
'Content-Type': 'multipart/form-data; boundary=Tw3ePy'
}
elif command == 'finalize':
if media_id is None:
raise TweepError('Media ID is required for FINALIZE command.')
body.append(
urlencode({
'command': 'FINALIZE',
'media_id': media_id
}).encode('utf-8')
)
headers = {
'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8'
}
Comment on lines +1523 to +1570
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be combined with the prior if-elif statement.


body = b'\r\n'.join(body)
# build headers
headers['Content-Length'] = str(len(body))

return headers, body, fp

@staticmethod
def _get_media_category(is_direct_message, file_type):
""" :reference: https://developer.twitter.com/en/docs/direct-messages/message-attachments/guides/attaching-media
:allowed_param:
"""
if is_direct_message:
prefix = 'dm'
else:
prefix = 'tweet'

if file_type in IMAGE_MIMETYPES:
if file_type == 'image/gif':
return prefix + '_gif'
else:
return prefix + '_image'
elif file_type == 'video/mp4':
return prefix + '_video'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return prefix + '_video'
return prefix + '_video'

20 changes: 14 additions & 6 deletions tweepy/binder.py
Expand Up @@ -182,12 +182,20 @@ def execute(self):

# Execute request
try:
resp = self.session.request(self.method,
full_url,
data=self.post_data,
timeout=self.api.timeout,
auth=auth,
proxies=self.api.proxy)
try:
resp = self.session.request(self.method,
full_url,
data=self.post_data,
timeout=self.api.timeout,
auth=auth,
proxies=self.api.proxy)
except UnicodeEncodeError:
resp = self.session.request(self.method,
full_url,
data=self.post_data.decode('utf-8'),
timeout=self.api.timeout,
auth=auth,
proxies=self.api.proxy)
Comment on lines -185 to +198
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

except Exception as e:
six.reraise(TweepError, TweepError('Failed to send request: %s' % e), sys.exc_info()[2])

Expand Down