Skip to content
Makefiles, scenarios and support scripts for the development of HamleDT within the Treex infrastructure
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
doc
estimate_morpho_richness
licenses
nonprojectivity
normalize
parsing
pmltq
statistics
tests
transformations
tred-configuration
user-zeman-treebanks_files
web
.gitignore
LICENSE.txt
README.txt
apply_conll_patch.pl
beta2unicode.pl
clean_shared_data.pl
conll_wx2utf.pl
conllu_slp2utf.pl
coordination_samples.pl
create_conll_patch.pl
find_duplicate_sentences.pl
hamledtdiff.pl
populate_ud.pl
populate_ud12.pl
populate_ud13.pl
populate_ud20.pl
populate_ud21.pl
populate_ud22.pl
publish_treebanks.sh
publish_treebanks_20.sh
regtest.pl
regtest2.pl
release_hamledt_2.0.sh
test_hamledt_release.pl
tred-ud-treex-backup-possibly-outdated.mac
tred-ud-treex.mac – zástupce.lnk
treebanks-for-web.ods
treebanks.ods
update-web.sh
user-zeman-treebanks.dokuwiki
user-zeman-treebanks.html

README.txt

HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of
existing dependency treebanks (or dependency conversions of other treebanks),
transformed so that they all conform to the same annotation style. For more
information please see the project website at

http://ufal.mff.cuni.cz/hamledt

This repository contains makefiles and support scripts needed for HamleDT
development. You also need Treex and Interset, which are in separate
repositories. In particular, the tree transformation and harmonization code
is part of Treex (implemented as Treex blocks), see the ufal/treex Github
repository.



History:

These files were originally stored in the TectoMT Subversion repository
(https://svn.ms.mff.cuni.cz/svn/tectomt_devel/trunk/treex/devel/hamledt).
Some important points in time:

r5974  (2011-06-27 zabokrtsky) ... created treex/devel/normalize_treebanks
r7684  (2011-12-31) .............. HamleDT 0.9 or 1.0 approximate date (not fixed and archived)
r8819  (2012-06-11 popel) ........ normalize_treebanks renamed to hamledt
r11004 (2013-08-28 rosa) ......... hamledt copied to hamledt2
r11606 (2014-02-15 zeman) ........ HamleDT release 1.5 (Prague, article in LRE)
r11870 (2014-03-14 zeman) ........ removed old hamledt (after checking all languages for HamleDT release 2.0)
r11991 (2014-03-23 zeman) ........ hamledt2 renamed to hamledt
r12700 (2014-05-24 zeman) ........ HamleDT release 2.0 (Prague + Stanford)
r14841 (2015-04-23 zeman) ........ pruned large generatable files, hamledt with history copied to Github ufal/hamledt
r14847 (2015-04-24 zeman) ........ hamledt removed from the Subversion repository
commit 19f47665fed00b9defe5119b557ca950384db0ba (2015-08-18 zeman) ..... HamleDT release 3.0 (UD)

See also

https://svn.ms.mff.cuni.cz/trac/tectomt_devel/ (password-protected access, only for ÚFAL members)
https://github.com/ufal/hamledt



Notes on migration to Github:

Created a users.txt file following the instructions in
http://git-scm.com/book/es/v2/Git-and-Other-Systems-Migrating-to-Git

git svn clone https://svn.ms.mff.cuni.cz/svn/tectomt_devel --authors-file=users.txt --no-metadata --trunk=trunk/treex/devel/hamledt --prefix=svn/

Tag statistics and similar files that were comparably large and that could be
generated again if necessary were removed from the repository. The history was
then pruned using the BFG repo-cleaner (https://rtyley.github.io/bfg-repo-cleaner/),
with the blob size limit set to 400K. Subsequently the git garbage collection
was invoked as recommended in the BFG documentation:

java -jar bfg-1.12.3.jar --private -b 400K hamledt
cd hamledt
git reflog expire --expire=now --all && git gc --prune=now --aggressive

git remote add origin https://github.com/ufal/hamledt.git
git push -u origin master
You can’t perform that action at this time.