Skip to content
This repository has been archived by the owner on May 22, 2024. It is now read-only.

Commit

Permalink
Fix 556/repo conversion times out (#92)
Browse files Browse the repository at this point in the history
* RC - fixed so that ids of reg and rsv are bible types.  Added unit test case.

* Changed back to original approach: RC - reverted changes.  Usfm2HtmlConverter - added reg as supported resource.

* Preprocessors - fixed do_preprocess to add "reg" support.

* Updated to require newer usfm-tools.  Added test case for converter with illegal USFM sequence.

* Updated MAT test case with more illegal USFM sequence.

* TxManager - added support for child jobs.
ManagerTest - added tests for child jobs.
TestBiblePreprocessor - added test case with multiple books

* Preprocessor - added support multiple book jobs.

* TestClientWebhook - adding unit test for process_webhook.

* TestClientWebhook - more work on unit test for process_webhook.

* ClientWebhook & TestClientWebhook - Added support for starting multiple jobs if more than one book in repo.

* ClientWebhook  - merge errors in multiple projects.

* ClientWebhook - added overal job log for multiple and separate job logs for each part
TestClientWebhook - added error test cases.

* ClientCallback - added support for multiple project parts
ClientWebhook & TestClientWebhook - improved testability.
TestClientCallbackHandler - added deeper testing.

* ClientCallback - tweaked support for multiple project parts
ClientWebhook & TestClientWebhook - improved testability.
TestClientCallbackHandler - added deeper testing.

* ClientCallback - worked on checking for and reassembling multiple project parts
ClientWebhook & TestClientWebhook - improved testability.
TestClientCallbackHandler - added deeper testing.

* ClientCallback - worked on checking for and reassembling multiple project parts
TestClientCallback - new unit tests.

* ClientCallback - have multiple working in unit tests
TestClientCallback - added unit tests for multiple.

* ClientCallback - minor fix for multiple part job
ClientWebhook - added cdn cleanup.

* ClientCallback - logging additions for multiple part job

* ClientCallback - fix double .zip extensions

* ClientCallback - fixes for merging build_logs of each part

* ClientWebhook - fixes to return last job id

* ClientWebhook - fix path for build_log
TestClientCallback - improvements for testing

* ClientWebhook - added logging

* ClientCallback & ClientWebhook - moved location of build_log for parts

* ClientCallback - fix parameter parts in updateBuildLog.

* ClientWebhook - added logging

* ClientCallback - testing.

* ClientCallback - testing.

* ClientWebhook - fixed problem with files being deleted

* ClientWebhook - fixed master build_log

* ClientWebhook - fixed master build_log

* TestClientCallback - improved testing.

* ClientCallback - fixes to master_build_log_json
TestClientCallback - improved testing.

* ClientCallback - cleanup temp files
ClientWebhook - cleanup temp files

* ClientCallback - restored missing method

* ClientCallback - fixed path for finish file

* ProjectDeployer - added cleanup of tempfiles

* ClientWebhook - fix source path

* ClientWebhook - fix file name for finished files

* ClientCallback - fix display of finished jobs filenames list
TestClientCallback - added unit test for no jobs finished.

* ClientWebhook & TestClientWebhook - cleanup

* ClientCallback - removed looking for {0}.zip converted files.

* ClientCallback - if multiple part job and download error, we note error and move on.
ClientWebhook - added note of book converted if multiple part job

* ClientWebhook - fixed note of book converted if multiple part job

* ClientWebhook & TestClientWebhook - minimizing zip files created for multi part job
Usfm2HtmlConverter - add support for converting only files specified in options

* Cleanup Python warnings

* ClientWebhook - fix options

* Loging

* ClientWebhook - pass convert_only as source parameter
Usfm2HtmlConverter - decode convert_only from source parameter
Added more unit tests

* Usfm2HtmlConverter - minimize logging

* Cleanup unneeded stuff.

* ClientCallback - add check for errors.

* ClientCallback - fixed setting resource type.

* ClientCallback - added ID of logs and warnings.

* ClientCallback & TestClientCallback - code cleanup and better S3 Mocking.

* ClientCallback & TestClientCallback - code cleanup and better S3 Mocking.

* ClientWebhook & TestClientWebhook - code cleanup and better S3 Mocking.

* ClientWebhook & ClientCallback - code cleanup.

* Changes to using invoke for starting jobs

* Fixes converter funciton name

* Handled no payload

* Removes files and dirs no longer needed

* ClientWebhook - fix build log.

* converter - Removed unneeded ignore_error flag.

* ClientWebhook - added needed commit id.
TestClientWebhook - improving tests.

* ClientWebhook - fix build log.
TestClientWebhook - improving unit test validation.

* Merge with develop - resolve merge conflicts

* TestConversions - updated with changes from lambda

* ManagerTest - converted mock response to lambda respose

* cleanup

* cleanup

* fix project files

* cleanup merge conflicts in project files

* cleanup merge conflicts in project files

* TestClientWebhook - fix aws_tools path
  • Loading branch information
PhotoNomad0 authored and richmahn committed Jun 21, 2017
1 parent 8323949 commit 0b2885d
Show file tree
Hide file tree
Showing 22 changed files with 1,722 additions and 434 deletions.
26 changes: 26 additions & 0 deletions libraries/aws_tools/lambda_handler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
from __future__ import unicode_literals, print_function
import json
import boto3
import logging
from boto3 import Session


class LambdaHandler(object):
def __init__(self, aws_access_key_id=None, aws_secret_access_key=None, aws_region_name='us-west-2'):
self.aws_access_key_id = aws_access_key_id
self.aws_secret_access_key = aws_secret_access_key
self.aws_region_name = aws_region_name
self.client = None
self.logger = logging.getLogger()
self.setup_resources()

def setup_resources(self):
self.client = boto3.client('lambda')

def invoke(self, function_name, payload):
return self.client.invoke(
FunctionName=function_name,
InvocationType='RequestResponse',
LogType='Tail',
Payload = json.dumps(payload)
)
228 changes: 195 additions & 33 deletions libraries/client/client_callback.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from __future__ import print_function, unicode_literals
import json
import os
import tempfile
import logging
import time
from logging import Logger
from libraries.general_tools.file_utils import unzip, write_file
from libraries.general_tools.file_utils import unzip, write_file, remove_tree, remove
from libraries.general_tools.url_utils import download_file
from libraries.aws_tools.s3_handler import S3Handler
from libraries.manager.job import TxJob
Expand All @@ -22,52 +22,194 @@ def __init__(self, job_data=None, cdn_bucket=None, gogs_url=None):
self.job = TxJob(job_data)
self.cdn_bucket = cdn_bucket
self.gogs_url = gogs_url
self.temp_dir = tempfile.mkdtemp(suffix="", prefix="client_callback_")
self.cdn_handler = None

def process_callback(self):
cdn_handler = S3Handler(self.cdn_bucket)
owner_name, repo_name, commit_id = self.job.identifier.split('/')
if not self.cdn_handler:
self.cdn_handler = S3Handler(self.cdn_bucket)

parts = self.job.identifier.split('/')
multiple_project = len(parts) >= 6
part_count = '0'
part_id = '0'
if not multiple_project:
owner_name, repo_name, commit_id = parts[0:3] # extract fields
else:
owner_name, repo_name, commit_id, part_count, part_id, book = parts # extract fields
self.logger.debug('Multiple project, part {0} of {1}, converting book {2}'
.format(part_id, part_count, book))

# The identifier is how to know which username/repo/commit this callback goes to
s3_commit_key = 'u/{0}/{1}/{2}'.format(owner_name, repo_name, commit_id)

self.logger.debug('Callback for commit {0}...'.format(s3_commit_key))

# Download the ZIP file of the converted files
converted_zip_url = self.job.output
converted_zip_file = os.path.join(tempfile.gettempdir(), converted_zip_url.rpartition('/')[2])
converted_zip_file = os.path.join(self.temp_dir, converted_zip_url.rpartition('/')[2])
remove(converted_zip_file) # make sure old file not present
download_success = True
self.logger.debug('Downloading converted zip file from {0}...'.format(converted_zip_url))
try:
self.logger.debug('Downloading converted zip file from {0}...'.format(converted_zip_url))
tries = 0
# Going to try to get the file every second for 200 seconds just in case there is a delay in the upload
# (For example, 3.6MB takes at least one minute to be seen on S3!)
time.sleep(5)
while not os.path.isfile(converted_zip_file) and tries < 200:
tries += 1
time.sleep(1)
try:
download_file(converted_zip_url, converted_zip_file)
except:
if tries >= 200:
raise
download_file(converted_zip_url, converted_zip_file)
except:
download_success = False # if multiple project we note fail and move on
if not multiple_project:
remove_tree(self.temp_dir) # cleanup
if self.job.errors is None:
self.job.errors = []
self.job.errors.append("Missing converted file: " + converted_zip_url)
finally:
self.logger.debug('finished.')
self.logger.debug('download finished, success={0}'.format(str(download_success)))

if download_success:
# Unzip the archive
unzip_dir = self.unzip_converted_files(converted_zip_file)

# Upload all files to the cdn_bucket with the key of <user>/<repo_name>/<commit> of the repo
self.upload_converted_files(s3_commit_key, unzip_dir)

if multiple_project:
# Now download the existing build_log.json file, update it and upload it back to S3
build_log_json = self.update_build_log(s3_commit_key, part_id + "_")

# mark part as finished
self.cdn_upload_contents(build_log_json, s3_commit_key + '/' + part_id + '.finished')

# check if all parts are present, if not return
missing_parts = []
finished_parts = self.cdn_handler.get_objects(prefix=s3_commit_key, suffix='.finished')
finished_parts_file_names = ','.join([finished_parts[x].key for x in range(len(finished_parts))])
self.logger.debug('found finished files: ' + finished_parts_file_names)

count = int(part_count)
for i in range(0, count):
file_name = '{0}.finished'.format(i)

match_found = False
for part in finished_parts:
if file_name in part.key:
match_found = True
self.logger.debug('Found converted part: ' + part.key)
break

if not match_found:
missing_parts.append(file_name)

if len(missing_parts) > 0:
self.logger.debug('Finished processing part. Other parts not yet completed: ' + ','.join(missing_parts))
remove_tree(self.temp_dir) # cleanup
return build_log_json

self.logger.debug('All parts finished. Merging.')

# all parts are present, merge together

master_build_log_json = self.get_build_log(s3_commit_key)
build_logs_json = []
self.job.status = 'success'
self.job.log = []
self.job.warnings = []
self.job.errors = []
for i in range(0, count):
self.logger.debug('Merging part {0}'.format(i))

# Now download the existing build_log.json file
build_log_json = self.get_build_log(s3_commit_key, str(i) + "_")

self.build_log_sanity_check(build_log_json)

build_logs_json.append(build_log_json)

# Unzip the archive
unzip_dir = tempfile.mkdtemp(prefix='unzip_')
if 'book' in build_log_json:
book = build_log_json['book']
else:
book = build_log_json['commit_id'] # if no book then use commit_id

# merge build_log data
self.job.log += self.prefix_list(build_log_json, 'log', book)
self.job.errors += self.prefix_list(build_log_json, 'errors', book)
self.job.warnings += self.prefix_list(build_log_json, 'warnings', book)
if ('status' in build_log_json) and (build_log_json['status'] != 'success'):
self.job.status = build_log_json['status']
if ('success' in build_log_json) and (build_log_json['success'] is not None):
self.job.success = build_log_json['success']
if ('message' in build_log_json) and (build_log_json['message'] is not None):
self.job.message = build_log_json['message']

# Now upload the merged build_log.json file, update it and upload it back to S3
master_build_log_json['build_logs'] = build_logs_json # add record of all the parts
build_logs_json0 = build_logs_json[0]
master_build_log_json['commit_id'] = build_logs_json0['commit_id']
master_build_log_json['created_at'] = build_logs_json0['created_at']
master_build_log_json['started_at'] = build_logs_json0['started_at']
master_build_log_json['repo_owner'] = build_logs_json0['repo_owner']
master_build_log_json['repo_name'] = build_logs_json0['repo_name']
master_build_log_json['resource_type'] = build_logs_json0['resource_type']
build_log_json = self.upload_build_log(master_build_log_json, s3_commit_key)
self.logger.debug('Updated build_log.json: ' + json.dumps(build_log_json))

# Download the project.json file for this repo (create it if doesn't exist) and update it
project_json = self.update_project_file(commit_id, owner_name, repo_name)
self.logger.debug('Updated project.json: ' + json.dumps(project_json))

self.logger.debug('Multiple parts: Finished deploying to cdn_bucket. Done.')
remove_tree(self.temp_dir) # cleanup
return build_log_json

else: # single part conversion
# Download the project.json file for this repo (create it if doesn't exist) and update it
self.update_project_file(commit_id, owner_name, repo_name)

# Now download the existing build_log.json file, update it and upload it back to S3
build_log_json = self.update_build_log(s3_commit_key)

self.logger.debug('Finished deploying to cdn_bucket. Done.')
remove_tree(self.temp_dir) # cleanup
return build_log_json

def prefix_list(self, build_log_json, key, book):
if key not in build_log_json:
return []

items = build_log_json[key]
for i in range(0, len(items)):
item = items[i]
new_text = book + ': ' + item
items[i] = new_text
return items

def build_log_sanity_check(self, build_log_json):
# sanity check
if 'log' not in build_log_json:
build_log_json['log'] = []
if 'warnings' not in build_log_json:
build_log_json['warnings'] = []
if 'errors' not in build_log_json:
build_log_json['errors'] = []

def unzip_converted_files(self, converted_zip_file):
unzip_dir = tempfile.mkdtemp(prefix='unzip_', dir=self.temp_dir)
try:
self.logger.debug('Unzipping {0}...'.format(converted_zip_file))
unzip(converted_zip_file, unzip_dir)
finally:
self.logger.debug('finished.')

# Upload all files to the cdn_bucket with the key of <user>/<repo_name>/<commit> of the repo
return unzip_dir

def upload_converted_files(self, s3_commit_key, unzip_dir):
for root, dirs, files in os.walk(unzip_dir):
for f in sorted(files):
path = os.path.join(root, f)
key = s3_commit_key + path.replace(unzip_dir, '')
self.logger.debug('Uploading {0} to {1}'.format(f, key))
cdn_handler.upload_file(path, key)
self.cdn_handler.upload_file(path, key)

# Download the project.json file for this repo (create it if doesn't exist) and update it
def update_project_file(self, commit_id, owner_name, repo_name):
project_json_key = 'u/{0}/{1}/project.json'.format(owner_name, repo_name)
project_json = cdn_handler.get_json(project_json_key)
project_json = self.cdn_handler.get_json(project_json_key)
project_json['user'] = owner_name
project_json['repo'] = repo_name
project_json['repo_url'] = 'https://{0}/{1}/{2}'.format(self.gogs_url, owner_name, repo_name)
Expand All @@ -91,12 +233,17 @@ def process_callback(self):
commits.append(c)
commits.append(commit)
project_json['commits'] = commits
project_file = os.path.join(tempfile.gettempdir(), 'project.json')
project_file = os.path.join(self.temp_dir, 'project.json')
write_file(project_file, project_json)
cdn_handler.upload_file(project_file, project_json_key, 0)
self.cdn_handler.upload_file(project_file, project_json_key, 0)
return project_json

# Now download the existing build_log.json file, update it and upload it back to S3
build_log_json = cdn_handler.get_json(s3_commit_key + '/build_log.json')
def update_build_log(self, s3_base_key, part=''):
build_log_json = self.get_build_log(s3_base_key, part)
self.upload_build_log(build_log_json, s3_base_key, part)
return build_log_json

def upload_build_log(self, build_log_json, s3_base_key, part=''):
build_log_json['started_at'] = self.job.started_at
build_log_json['ended_at'] = self.job.ended_at
build_log_json['success'] = self.job.success
Expand All @@ -114,10 +261,25 @@ def process_callback(self):
build_log_json['errors'] = self.job.errors
else:
build_log_json['errors'] = []
build_log_file = os.path.join(tempfile.gettempdir(), 'build_log_finished.json')
write_file(build_log_file, build_log_json)
cdn_handler.upload_file(build_log_file, s3_commit_key + '/build_log.json', 0)
build_log_key = self.get_build_log_key(s3_base_key, part)
self.logger.debug('Writing build log to ' + build_log_key)
# self.logger.debug('build_log contents: ' + json.dumps(build_log_json))
self.cdn_upload_contents(build_log_json, build_log_key)
return build_log_json

self.logger.debug('Finished deploying to cdn_bucket. Done.')
def cdn_upload_contents(self, contents, key):
file_name = os.path.join(self.temp_dir, 'contents.json')
write_file(file_name, contents)
self.logger.debug('Writing file to ' + key)
self.cdn_handler.upload_file(file_name, key, 0)

def get_build_log(self, s3_base_key, part=''):
build_log_key = self.get_build_log_key(s3_base_key, part)
self.logger.debug('Reading build log from ' + build_log_key)
build_log_json = self.cdn_handler.get_json(build_log_key)
# self.logger.debug('build_log contents: ' + json.dumps(build_log_json))
return build_log_json

def get_build_log_key(self, s3_base_key, part=''):
upload_key = '{0}/{1}build_log.json'.format(s3_base_key, part)
return upload_key
Loading

0 comments on commit 0b2885d

Please sign in to comment.