Skip to content
This repository has been archived by the owner on May 22, 2024. It is now read-only.

Commit

Permalink
Feature 9/add TW to md2html conversion module (#178)
Browse files Browse the repository at this point in the history
* Refactoring of linter and conveter

* Feautre - Isuse #737 - Merge request_job and start_job into client_webhook

* Fixes for test_client_webhook

* Updates test_manager

* Removed functions no longer used

* Adds back in markdown_linter

* Fixes for client_linter_callback

* Reverted callbacks and standardized the payloads

* Fix to tests

* Adds manifests_id to jobs table

* Fix to convert handler

* Moved callback urls into init

* Adds info to job object

* FIrst pass adding TW support for preporcessor and converter

* Uses main job for first multipart

* Check if job exists in linter callback

* Checks if build log exists

* TwPreprocessor - worked on organizing into key.md and other.md files

* TwPreprocessor - worked on fixing links to TW items

* TwPreprocessor - worked on fixing links to TN items
updated with more recent en_tw.zip

* updated with more recent en_tw.zip

* TestMd2HtmlConverter - tweaking test_tw

* TestPreprocessor - tweaking test_preprocessor_for_tw

* TwPreprocessor - added index for words

* TwPreprocessor - added repo_name for help link mapping
TestTwPreprocessor - fixed test_fix_links() tests

* TwLinter - added checking of unconverted links
TestTwLinter - added tests for unconverted links warnings

* TwLinter & TestTwLinter - improving code coverage

* TestTwPreprocessor - improving code coverage

* TwPreprocessor & TestTwPreprocessor - improving code coverage

* fix timezone problem

* in job table changed owner_name to user_name so that we can use test pipeline.

* Fixes time zones

* Fixed convert handler vars

* in job table reverted back user_name to owner_name

* in job table reverted back user_name to owner_name

* ClientLinterCallback - fixes to prevent null mergers.  Fix for unit test and job db.

* TestClientLinterCallback - added tests for many edge cases.

* ClientWebhook - fix to use the right job for multi-part.
Usfm2HtmlConverter - add logging.

* ClientWebhook - fixes for job fields in multi-part.

* Converter - tweak to upload file to cdn immediately.

* Fix for travis.

* ClientConverterCallback - switch to downloading from cdn to get around permissions issue.
Fix for travis.

* ClientConverterCallback - switch to downloading from cdn to get around permissions issue.

* ClientConverterCallback - reverted downloading from cdn to get around permissions issue.

* ClientWebhook - fix identifier for multi-job.

* ClientWebhook - fix linter for single job.

* ClientLinterCallback - fix setting job end time.

* ClientLinterCallback - fix setting job end time.

* ClientLinterCallback - fix setting job end time.

* tweaked db closing.

* Fixes for deploy.

* Fixes for deploy.

* Removed upload delays.

* ProjectDeployer - changed to not move new file names.

* ProjectDeployer - changed to not move new file names.

* Fixing unit tests

* Fixing unit tests

* Fixing unit tests

* Fixing unit tests

* Fixing unit tests

* Preprocessor - add logging

* Fix registration of tw linter
ClientLinterCallback - limit warnings list

* debug logging

* debug logging

* cleanup

* ClientConverterCallback - setting final job conversion state

* ClientConverterCallback - setting final job conversion state

* tweak

* TestClientLinterCallback - added validation that build log contains final status and success

* TestClientLinterCallback - added validation that build log contains final status and success

* TestClientLinterCallback - removed code no longer used.

* TestClientLinterCallback - removed code no longer used.

* TestClientLinterCallback - revert removed code no longer used.

* Preprocessor - sorted files and tweaked formatting in index.

* TwPreprocessor - creates index.json.

* ProjectDeployerTests - add test for TW.

* TwTemplater - added templater for TW.
ProjectDeployerTests - added test for deploying TW project

* renamed TW index.html & index.md to 0toc.html & index.html

* ClientWebhook - switch to lint original files

* changed converter output to got to convert_log.json so it doesn't kick off deployer before linter is finished.

* Updated unit tests.

* TwLinter & TestTwLinter - Added checking of relative links.

* TestTwPreprocessor - fix unit test.

* ClientLinterCallback - fixed logic for multiple.

* ClientLinterCallback - fixed logic for multiple.

* tweak logging.

* working

* ClientLinterCallback - fix for linter to do deploy of single part of multipart

* ClientLinterCallback - fix for linter to do deploy of single part of multipart

* ClientLinterCallback - fix for linter to do deploy of single part of multipart

* TwLinter - getting url for file with error

* TwLinter - getting link for file containing error

* Linter - logging fix

* TestPreprocessor - fixed tw unit test

* documentation

* cleanup

* testing

* Md2HtmlConverter - switch tw to use markdown instead of markdown2 class.

* TestConversions - added test case for tw.

* TestConversions - added test case for tw.

* TwPreprocessor & TestTwPreprocessor - removed passing repo name

* TestTwPreprocessor - cleanup

* cleanup
  • Loading branch information
PhotoNomad0 authored and richmahn committed Oct 17, 2017
1 parent 25caedc commit 09137df
Show file tree
Hide file tree
Showing 14 changed files with 531 additions and 8 deletions.
2 changes: 1 addition & 1 deletion functions/convert_md2html/module.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"name": "md2html",
"version": "2",
"type": "converter",
"resource_types": ["obs", "ta", "tq"],
"resource_types": ["obs", "ta", "tq", "tw"],
"input_format": ["md"],
"output_format": ["html"],
"options": [],
Expand Down
136 changes: 136 additions & 0 deletions libraries/client/preprocessors.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ def do_preprocess(rc, repo_dir, output_dir):
elif rc.resource.identifier == 'ta':
App.logger.debug("do_preprocess: using TaPreprocessor")
preprocessor = TaPreprocessor(rc, repo_dir, output_dir)
elif rc.resource.identifier == 'tw':
App.logger.debug("do_preprocess: using TwPreprocessor")
preprocessor = TwPreprocessor(rc, repo_dir, output_dir)
elif rc.resource.identifier == 'tq':
App.logger.debug("do_preprocess: using TqPreprocessor")
preprocessor = TqPreprocessor(rc, repo_dir, output_dir)
Expand Down Expand Up @@ -658,3 +661,136 @@ def get_link_for_section(self, section):
if len(parts) > 1:
link = parts[1].lower()
return link


class TwPreprocessor(Preprocessor):
sections = [
{'link': 'kt', 'title': 'Key Terms'},
{'link': 'names', 'title': 'Names'},
{'link': 'other', 'title': 'Other'}
]

def __init__(self, *args, **kwargs):
super(TwPreprocessor, self).__init__(*args, **kwargs)
self.section_container_id = 1
self.toc = ''
self.index_json = None

def get_title(self, project, alt_title=None):
title = alt_title
return title.title()

def get_content(self, content_file):
if os.path.isfile(content_file):
return read_file(content_file)

def compile_section(self, project, section, level):
"""
Recursive section markdown creator
:param project:
:param dict section:
:param int level:
:return:
"""
if 'link' in section:
link = section['link']
else:
return ''
title = self.get_title(project, section['title'])
markdown = ''
if 'link' in section:
level_increase = ('#' * level)
files = sorted(glob(os.path.join(self.source_dir, project.path, link, '*.md')))
if files:
markdown += '{0} <a id="{1}"/>{2}\n\n'.format('#' * level, link, title)
self.toc += '### {0}:\n\n'.format(title)
for file in files:
top_box = ""
if top_box:
markdown += '<div class="top-box box" markdown="1">\n{0}\n</div>\n\n'.format(top_box)
content = self.get_content(file)
content = content.replace('\r', '')
lines = content.split('\n')
for i in range(0, len(lines)):
line = lines[i]
if line and (line[0] == '#'):
line = level_increase + line.rstrip() + level_increase
lines[i] = line
content = '\n'.join(lines)
if content:
file_name = os.path.basename(file)
anchor = os.path.splitext(file_name)[0]
markdown += '<a id="{0}"/>\n\n{1}\n\n'.format(anchor, content)
self.toc += '* [{1}]({0}.html#{1})\n'.format(link, anchor)

markdown += '---\n\n' # horizontal rule

return markdown

def run(self):
self.index_json = {
'titles': {},
'chapters': {},
'book_codes': {}
}

for idx, project in enumerate(self.rc.projects):
self.section_container_id = 1
title = project.title
self.toc = '# {0}\n\n'.format(title)
self.toc += '## Table of Contents:\n\n'
for section in TwPreprocessor.sections:
markdown = '# {0}\n\n'.format(title)
section_md = self.compile_section(project, section, 2)
if not section_md:
continue
markdown += section_md
markdown = self.fix_links(markdown, section['link'])
output_file = os.path.join(self.output_dir, '{0}.md'.format(section['link']))
write_file(output_file, markdown)
self.index_json['titles'][section['link'] + '.html'] = section['title']

self.toc = self.fix_links(self.toc, '-')
output_file = os.path.join(self.output_dir, '0toc.md')
write_file(output_file, self.toc)
self.index_json['titles']['0toc.html'] = 'Table of Contents'
output_file = os.path.join(self.output_dir, 'index.json')
write_file(output_file, self.index_json)

# Copy the toc and config.yaml file to the output dir so they can be used to
# generate the ToC on live.door43.org
toc_file = os.path.join(self.source_dir, project.path, 'toc.yaml')
if os.path.isfile(toc_file):
copy(toc_file, os.path.join(self.output_dir, 'toc.yaml'))
config_file = os.path.join(self.source_dir, project.path, 'config.yaml')
if os.path.isfile(config_file):
copy(config_file, os.path.join(self.output_dir, 'config.yaml'))
return True

def fix_links(self, content, section_link):
# convert RC links, e.g. rc://en/tn/help/1sa/16/02 => https://git.door43.org/Door43/en_tn/1sa/16/02.md
content = re.sub(r'rc://([^/]+)/([^/]+)/([^/]+)/([^\s)\]\n$]+)',
r'https://git.door43.org/{0}/\1_\2/src/master/\4.md'.format(self.rc.repo_name), content,
flags=re.IGNORECASE)
# fix links to other sections within the same manual (only one ../ and a section name that matches section_link)
# e.g. [covenant](../kt/covenant.md) => [covenant](#covenant)
pattern = r'\]\(\.\.\/{0}\/([^/]+).md\)'.format(section_link)
content = re.sub(pattern, r'](#\1)', content)
# fix links to other sections within the same manual (only one ../ and a section name)
# e.g. [commit](../other/commit.md) => [commit](other.html#commit)
for section in TwPreprocessor.sections:
link_ = section['link']
pattern = re.compile(r'\]\(\.\./{0}/([^/]+).md\)'.format(link_))
replace = r']({0}.html#\1)'.format(link_)
content = re.sub(pattern, replace, content)
# fix links to other sections that just have the section name but no 01.md page (preserve http:// links)
# e.g. See [Verbs](figs-verb) => See [Verbs](#figs-verb)
content = re.sub(r'\]\(([^# :/)]+)\)', r'](#\1)', content)
# convert URLs to links if not already
content = re.sub(r'([^"(])((http|https|ftp)://[A-Z0-9/?&_.:=#-]+[A-Z0-9/?&_:=#-])', r'\1[\2](\2)',
content, flags=re.IGNORECASE)
# URLS wth just www at the start, no http
content = re.sub(r'([^A-Z0-9"(/])(www\.[A-Z0-9/?&_.:=#-]+[A-Z0-9/?&_:=#-])', r'\1[\2](http://\2)',
content, flags=re.IGNORECASE)
return content
10 changes: 10 additions & 0 deletions libraries/door43_tools/templaters.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ def init_template(resource_type, source_dir, output_dir, template_file):
templater = TaTemplater(resource_type, source_dir, output_dir, template_file)
elif resource_type == 'tq':
templater = TqTemplater(resource_type, source_dir, output_dir, template_file)
elif resource_type == 'tw':
templater = TwTemplater(resource_type, source_dir, output_dir, template_file)
else:
templater = Templater(resource_type, source_dir, output_dir, template_file)
return templater
Expand Down Expand Up @@ -247,6 +249,14 @@ def __init__(self, *args, **kwargs):
self.titles = index['titles']


class TwTemplater(Templater):
def __init__(self, *args, **kwargs):
super(TwTemplater, self).__init__(*args, **kwargs)
index = file_utils.load_json_object(os.path.join(self.source_dir, 'index.json'))
if index:
self.titles = index['titles']


class BibleTemplater(Templater):
def __init__(self, *args, **kwargs):
super(BibleTemplater, self).__init__(*args, **kwargs)
Expand Down
44 changes: 44 additions & 0 deletions libraries/linters/tw_linter.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
from __future__ import print_function, unicode_literals
import os
import re
from libraries.app.app import App
from libraries.general_tools import file_utils
from libraries.linters.markdown_linter import MarkdownLinter


class TwLinter(MarkdownLinter):

# match links of form '](link)'
link_marker_re = re.compile(r'\]\(([^\n()]+)\)', re.UNICODE)

def lint(self):
"""
Checks for issues with translationWords
Expand All @@ -12,4 +19,41 @@ def lint(self):
self.source_dir is the directory of source files (.md)
:return bool:
"""
self.source_dir = os.path.abspath(self.source_dir)
for root, dirs, files in os.walk(self.source_dir):
for f in files:
file_path = os.path.join(root, f)
parts = os.path.splitext(f)
if parts[1] == '.md':
contents = file_utils.read_file(file_path)
self.find_invalid_links(root, f, contents)

return super(TwLinter, self).lint() # Runs checks on Markdown, using the markdown linter

def find_invalid_links(self, folder, f, contents):
for link_match in TwLinter.link_marker_re.finditer(contents):
link = link_match.group(1)
if link:
if link[:4] == 'http':
continue
if link.find('.md') < 0:
continue

file_path = os.path.join(folder, link)
file_path_abs = os.path.abspath(file_path)
exists = os.path.exists(file_path_abs)
if not exists:
a = self.get_file_link(f, folder)
msg = "{0}: contains invalid link: ({1})".format(a, link)
self.log.warnings.append(msg)
App.logger.debug(msg)

def get_file_link(self, f, folder):
parts = folder.split(self.source_dir)
sub_path = self.source_dir # default
if len(parts) == 2:
sub_path = parts[1][1:]
url = "https://git.door43.org/{0}/{1}/src/master/{2}/{3}".format(self.repo_owner, self.repo_name,
sub_path, f)
a = '<a href="{0}">{1}/{2}</a>'.format(url, sub_path, f)
return a
Binary file modified tests/client_tests/resources/raw_sources/en_tw.zip
100644 → 100755
Binary file not shown.
3 changes: 2 additions & 1 deletion tests/client_tests/test_preprocessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ def test_preprocessor_for_tw(self):
rc, repo_dir, self.temp_dir = self.extractFiles(file_name, repo_name)
self.out_dir = tempfile.mkdtemp(prefix='output_')
do_preprocess(rc, repo_dir, self.out_dir)
self.assertTrue(os.path.isfile(os.path.join(self.out_dir, '01-bible.md')))
self.assertTrue(os.path.isfile(os.path.join(self.out_dir, '0toc.md')))
self.assertTrue(os.path.isfile(os.path.join(self.out_dir, 'kt.md')))

def test_preprocessor_for_tq_two_books(self):
file_name = os.path.join('raw_sources', 'en_tq_two_books.zip')
Expand Down
Loading

0 comments on commit 09137df

Please sign in to comment.