Skip to content

Commit

Permalink
initial commit after deleting history
Browse files Browse the repository at this point in the history
  • Loading branch information
withanage committed Oct 19, 2017
0 parents commit b770e2f
Show file tree
Hide file tree
Showing 1,825 changed files with 566,979 additions and 0 deletions.
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
*~lock*
*~
*.pyc
.idea/*
tests/example/example_project/*
tools/fop/*
sessions/*
editors/metadata/bower_components/*
plugins/import/omp/settings.json
6 changes: 6 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[submodule "tools/meTypeset"]
path = tools/meTypeset
url = https://github.com/UB-Heidelberg/meTypeset.git
[submodule "tools/saxon-he"]
path = tools/saxon-he
url = https://github.com/pressbooks/saxon-he
Empty file added .nojekyll
Empty file.
636 changes: 636 additions & 0 deletions LICENSE.md

Large diffs are not rendered by default.

83 changes: 83 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
![Heidelberg Monograph Publication Tool (heiMPT)](https://raw.githubusercontent.com/withanage/heimpt/master/static/images/heiMPT.jpg)

**Heidelberg Monograph Publication Tool (heiMPT)** is a stand-alone platform, as well as a plug-in application for OMP, developed by staff of **Heidelberg University Library** in cooperation with external partners, with fundings of German Research Foundation ([DFG](http://www.dfg.de/)). It enables a high degree of automation in the digital publication process.
The platform consists of 4 modules: (1) typesetting (meTypeset), (2) xml-processor, (3) an output generation engine and (4) a WYSIWYG editor.


(1) To covert from a Microsoft Word .docx format to NLM/JATS-XML for scholarly/scientific article typesetting, we utilize meTypeset, which we developed in collaboration with Dr. Martin Eve and the Public Knowledge Project (PKP). meTypeset is an extension/wrapper of OxGarage and uses TEI as an intermediary format to facilitate interchange. meTypeset allows for intelligent size processing of input documents and section grouping algorithms. It automatically detects figure and table lists, footnotes, heading structure, bibliographies, and metadata.

(2) A Set of utilities to process the XML files and functions to manipulate the content. Some of the functions are numbering, sorting references, deleting unreferenced references etc.

(3) XML documents are converted to desired output formats that can then be offered to users, including HTML, PDF, and ePub.

(4) The WYSIWYG editor, provides an interactive interface to confirm the information detected by meTypeset and to generate a suitable layout for the desired output format. The editor is written in both HTML and JavaScript, and handles data in XML format, so that each monograph is efficiently standardized and can be re-used. The editor is designed in a WYSIWYG (what you see is what you get) format that enables users to work with both text and images as they envision them.

![doc2pdf Pipeline](https://raw.githubusercontent.com/withanage/heimpt/master/images/mpt.png)

## Presentations
* PKP Conference, 2017 [Paper](https://pkp.sfu.ca/pkp2017/paper/view/565) [Slides](https://pkp.sfu.ca/pkp2017/paper/download/565/402) [:movie_camera: Video](https://www.youtube.com/watch?v=yOH1DS2EUck)


## Documentation
https://withanage.github.io/heimpt


## heiMPT Installation

Check if you have persmissions to intall in the BUILD_DIR

```
BUILD_DIR=/usr/local
git clone https://github.com/withanage/heimpt.git $BUILD_DIR/heimpt
cd $BUILD_DIR/heimpt
git submodule update --init --recursive
pip install -r requirements.txt
java -version
```
Optionally required
```
cd editors/metadata/
bower install
```

### FO Processors
Only needed if you generate PDF files.

* Apache FOP (free): Download from [Apache FOP processor](https://xmlgraphics.apache.org/fop/download.html) (Binary version) into $BUILD_DIR/heimpt/tools
```
cd $BUILD_DIR/heimpt/tools
tar -xvzf fop-2.2-bin.tar.gz;
mv fop-2.2 fop
chmod u+x fop/fop/fop
```
If you changed the default $BUILD_DIR in the installation step, set the path in fop.print.xml and fop.electronic.xml in tools/configurations/fop/conf/ folder.

* Antenna- House(Commercial) : See the [distributor's](https://www.antennahouse.com) instructions


### Test your Installation
If your `$BUILD_DIR` differs from the previous path, change project path in `example.json`

```
python $BUILD_DIR//heimmpt.py $BUILD_DIR/configurations/example.json --debug
```
## Tests
```
pip install -U pytest pytest-xdist pytest-json
```


## Credits

The lead developer is Dulip Withanage, Heidelberg University Library

Additional contributions were made, in (alphabetical order) by:

* Frank Krabbes, Heidelberg University Library
* Mayumi Ohta (Jun.2014 - Feb.2015), Cluster of Excellence, University Heidelberg
* Katharina Wäschle (Nov.2015- Oct.2016), Heidelberg University Library
* Nils Weiher, Heidelberg University Library


Empty file added __init__.py
Empty file.
78 changes: 78 additions & 0 deletions cmos2jats_citations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
import re
import sys
from lxml import etree


cmos1 = 'Heilman, James M., and Andrew G. West "Wikipedia and Medicine: Quantifying Readership, Editors, and the Significance of Natural Language." Journal of Medical Internet Research 17 no. 3 (2015) e62 doi:10.2196/jmir.4069'
cmos2 = 'Knaller, Susanne. 2012. &#8220;The Ambiguousness of the Authentic: Authenticity Between Reference, Fictionality, and Fake in Modern and Contemporary Art.&#8221; In <italic>Authenticity: Contemporary Perspectives on a Critical Concept</italic>, edited by Julia Straub, 51&#8211;75. Bielefeld: transcript.'
cmos3 = 'Susanne Knaller studied Romance and German Philology at the University of Graz. 2002 Habilitation (Romance Philology and General and Comparative Literature) at Johann-Wolfgang-Goethe-University Frankfurt. In 1999/2000 Fellow and in 2002/2003 Visiting Professor at Columbia University. Since 2002 Associate Professor at the University of Graz. Founder and Speaker of the Research Department General and Comparative Literature in Graz. Since 2013 Director of the Center of Cultural Studies at the University of Graz. A selection of recent publications includes: <italic>Ein Wort aus der Fremde. Geschichte und Theorie des Begriffs Authentizit&#228;t</italic> (2007); <italic>Realit&#228;tsbegriffe in der Moderne. Beitr&#228;ge zu Literatur, Kunst, Philosophie und Wissenschaft</italic>. (2011), edited with Harro M&#252;ller; <italic>Literaturwissenschaft heute &#8211; Gegenstand, Positionen, Relevanz </italic>(2013), edited with Doris Pichler; <italic>Realit&#228;t und Wirklichkeit in der Moderne. Texte zu Literatur, Kunst, Film und Fotografie </italic>(2013); and <italic>Die Realit&#228;t der Kunst. Programme und Theorien zu Literatur, Kunst und Fotografie seit 1700</italic> (2015); <italic>&#196;sthetische Emotion. Formen und Figurationen zur Zeit des Umbruchs der Medien und Gattungen (1880 &#8211; 1939)</italic> (2016), edited with Rita Rieger.'


#article_title = re.findall('<italic>(.*)</italic>',cmos2)
#journal_title = re.findall('\"(.*)\"',cmos1)
#person_group = re.split('^(\D+)',cmos2)

def clean(s):
return s.strip().replace('\n', ' ').replace('\r', '') if s else s


def cre(s):
return etree.Element(s)


def set_element_citations(t):
rl = t.findall('.//mixed-citation')
for r in rl:
r.tag = 'element-citation'
r.attrib['publication-type'] = "book"
pr = clean(r.text)
r.text = ''
athrs = re.split('^(\D+)', pr) if pr else []
pg = cre('person-group')
pg.attrib['person-group-type'] = "author"
n = cre("name")
g = cre('given-names')
s = cre('surname')
n.append(g)
n.append(s)
pg.append(n)
r.append(pg)

for i in r.findall('.//italic'):
i.tag = 'article-title'
tl = clean(i.tail)
i.tail = ''
s = cre('source')

s.text = tl
r.append(s)

# print etree.tostring(r)


t = etree.parse(sys.argv[1])


def convert_citation2reference(t):
bd = t.find('.//body')
sc = etree.Element('sec')
ttl = etree.Element('title')
ttl.text = 'References'
sc.append(ttl)
mc = t.findall('.//mixed-citation')
if len(mc) > 0:
for r in mc:
r.tag = 'p'
sc.append(r)
bd.append(sc)
rlst = t.find('.//ref-list')
rlst.getparent().remove(rlst)
bck = t.find('.//back')
bck.append(etree.Element('ref-list'))


convert_citation2reference(t)


t.write('output.xml', pretty_print=True,
xml_declaration=True, encoding="utf-8")
Binary file added color-profiles/AdobeRGB1998.icc
Binary file not shown.
Binary file added color-profiles/CoatedFOGRA39.icc
Binary file not shown.
108 changes: 108 additions & 0 deletions configurations/01_wintz.bits.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
{
"projects": [
{
"active": true,
"chain": true,
"files": {
"1": "Wintz_01_Remerciements.docx",
"2": "Wintz_02_Introduction.docx",
"3": "Wintz_03_Chapitre_1.docx",
"4": "Wintz_04_Chapitre_2.docx",
"5": "Wintz_05_Chapitre_3.docx",
"6": "Wintz_06_Conclusion.docx",
"7": "Wintz_07_Bibliographie.docx"
},
"name": "Processed_Data",
"path": "/home/wit/Arbeit/OMP/Heiup/Wintz",
"typesetters": {
"1": {
"arguments": {
"1": "--create-dir"
},
"name": "metypeset",
"out_type": "xml",
"process": true
},
"2": {
"arguments": {
"1": "--create-dir"
},
"name": "xmlprocess",
"out_type": "xml",
"process": true
},
"3": {
"arguments": {
"1": "--create-dir",
"2": "bits",
"3": "--metadata book-meta.bits2",
"4": "--set-numbering-tags=disp-quote,tr,sec,title,p"
},
"name": "xmlmerge",
"out_type": "xml",
"out_file": "fullFile.xml",
"merge": true
},
"4": {
"name": "xml2fo",
"out_type": "fo",
"expand": true,
"arguments": {
"1": "--create-dir"
}
},
"5": {
"name": "fo2pdf",
"out_type": "pdf",
"expand": true,
"arguments": {
"1": "--create-dir"
}
}
}
}
],
"typesetters": {
"metypeset": {
"arguments": {
"1": "docx",
"2": "--debug",
"3": "--nogit",
"4": "--noimageprocessing"

},
"executable": "/home/wit/projects/heimpt/meTypeset/bin/meTypeset.py"
},
"xmlprocess": {
"arguments": {
"1": "--metadata book-part-meta.bits2",
"2": "--set-uuids=fn,ref",
"3": "--set-numbering-values=xref,ref-type,fn",
"4": "--clean-references"
},
"executable": "/home/wit/projects/heimpt/prepare.py"
},
"xmlmerge": {
"arguments": {
},
"executable": "/home/wit/projects/heimpt/merge.py"
},
"xml2fo": {
"arguments": {
"1": "--xsl=/formatter.xsl",
"2": "--medium=electronic,print",
"3": "--formatter=AH",
"4": "--out-type=FO"
},
"executable": "/home/wit/projects/heimpt/disseminate.py"
},
"fo2pdf": {
"arguments": {
"1": "--medium=electronic,print",
"2": "--formatter=AH",
"3": "--out-type=PDF"
},
"executable": "/home/wit/projects/heimpt/disseminate.py"
}
}
}
48 changes: 48 additions & 0 deletions configurations/01_wintz.jats.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"projects": [
{
"active": true,
"chain": true,
"files": {
"1": "Wintz_01_Remerciements.xml",
"2": "Wintz_02_Introduction.xml",
"3": "Wintz_03_Chapitre_1.xml",
"4": "Wintz_04_Chapitre_2.xml",
"5": "Wintz_05_Chapitre_3.xml",
"6": "Wintz_06_Conclusion.xml",
"7": "Wintz_07_Bibliographie.xml"
},
"name": "wintz_xml",
"path": "/home/wit/Arbeit/OMP/wintz/wintz-jats/",
"typesetters": {
"1": {
"arguments": {
"1": "--create-dir",
"2": "--stand-alone"
},
"name": "xmlprepare",
"out_type": "xml",
"process": true
}


}
}
],
"typesetters": {
"xmlprepare": {
"arguments": {
"1": "--metadata book-part-meta.jats",
"2": "--set-uuids=fn,ref",
"3": "--set-numbering-values=xref,ref-type,fn",
"4": "--citations-to-references"
},
"executable": "/home/wit/projects/heimpt/prepare.py"
},
"xmlmerge": {
"arguments": {
},
"executable": "/home/wit/projects/heimpt/merge.py"
}
}
}
Loading

0 comments on commit b770e2f

Please sign in to comment.