Skip to content

Commit

Permalink
Merge branch 'master' into sandbox-oplatek
Browse files Browse the repository at this point in the history
Conflicts:
	.gitignore
	INSTALL
	README.txt
	egs/babel/s5/local/generate_proxy_keywords.sh
	egs/wsj/s5/steps/train_nnet_cpu.sh
	egs/wsj/s5/utils/nnet-cpu/make_nnet_config_preconditioned.pl
	src/Makefile
	src/configure
	src/lat/Makefile
	src/makefiles/cygwin.mk
	src/makefiles/darwin_10_5.mk
	src/makefiles/darwin_10_6.mk
	src/makefiles/darwin_10_7.mk
	src/makefiles/darwin_10_8.mk
	src/makefiles/linux_atlas.mk
	src/makefiles/linux_atlas_64bit.mk
	src/makefiles/linux_clapack.mk
	src/makefiles/linux_openblas.mk
	src/nnet-cpu/mixup-nnet.cc
	src/nnet-cpu/nnet-component-test.cc
	src/nnet-cpu/nnet-component.cc
	src/nnet-cpu/nnet-component.h
	src/nnet-cpu/nnet-nnet.cc
	src/nnet-cpu/nnet-nnet.h
	src/nnet-cpu/nnet-update-parallel.cc
	src/nnet-cpu/nnet-update-parallel.h
	src/nnet-cpubin/nnet-train-parallel.cc
	src/nnet/nnet-pdf-prior.h
	src/nnetbin/nnet-forward.cc
	tools/Makefile
	tools/extras/install_portaudio.sh

git-svn-id: https://svn.code.sf.net/p/kaldi/code/sandbox/oplatek@2520 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8
  • Loading branch information
oplatek committed Jun 18, 2013
1 parent f38fe56 commit 817ce77
Show file tree
Hide file tree
Showing 155 changed files with 14,774 additions and 8 deletions.
2 changes: 1 addition & 1 deletion INSTALL
@@ -1,4 +1,4 @@

This is the official Kaldi INSTALL. Look also at INSTALL.md for the git mirror installation.
[for native Windows install, see windows/INSTALL]

(1)
Expand Down
168 changes: 168 additions & 0 deletions INSTALL.md
@@ -0,0 +1,168 @@
Installation TIPS for KALDI and installation INSTRUCTIONS for my additional repositories
=================================================================================
Intro
-----
Kaldi has very good instructions and tutorial
for building it from source. It is easy and straightforward.
However, I needed also to build shared libraries
and maybe you will face some of my problems too.
So this is the reasons for writing my building procedure down.

Installing external dependencies
================================
See `kaldi-trunk/tools/INSTALL` for info.
Basically it telss you to use `kaldi-trunk/tools/Makefile`, which I used also.

How have I installed OpenBlas?
----------------------
Simple enough:
```bash
make openblas
```

How have I installed Openfst?
----------------------
In order to install also shared libraries
I changed the line 37 in
`kaldi-trunk/tools/Makefile`

```sh
*** Makefile
************
*** 34,38 ****

openfst-1.3.2/Makefile: openfst-1.3.2/.patched
cd openfst-1.3.2/; \
! ./configure --prefix=`pwd` --enable-static --disable-shared --enable-far --enable-ngram-fsts

--- 34,38 ----

openfst-1.3.2/Makefile: openfst-1.3.2/.patched
cd openfst-1.3.2/; \
! ./configure --prefix=`pwd` --enable-static --enable-shared --enable-far --enable-ngram-fsts

```
Than I ran
```bash
make openfst_tgt
```

How have I installed PortAudio?
--------------------------
NOTE: Necessary only for Kaldi online decoder

In kaldi-trunk/tools/extras/install_portaudio.sh
I changed line
```
./configure --prefix=`pwd`/install
```
to
```
./configure --prefix=`pwd`/install --with-pic
```

Then I ran
```bash
extras/install_portaudio.sh
```


How have I built Kaldi?
------------------
```bash
./configure --openblas-root=`pwd`/../tools/OpenBLAS/install --fst-root=`pwd`/../tools/openfst --static-math=no
```

Edit the `kaldi.mk` and add the `-fPIC` flag.
TODO It would be nice to do something like
```bash
EXTRA_CXXFLAGS=-fPIC make
EXTRA_CXXFLAGS=-fPIC make ext
```
But the local makefiles overrides `EXTRA_CXXFLAGS`.

If you updated from the svn repository do not forget to run `make depend`
Since by *default it is turned of! I always forget about that!*
```
# DO NOT FORGET TO CHANGE kaldi.mk TODO SCRIPT IT!
# make depend and make ext_depend are necessary only if dependencies changed
make depend && make ext_depend && make && make ext
```

How have I updated Kaldi src code?
----------------------------
I checkout the kaldi-trunk version.

[Kaldi install instructions](http://kaldi.sourceforge.net/install.html)

Note: If you checkout Kaldi before March 2013 you need to relocate svn. See the instructions in the link above!


What setup did I use?
--------------------
In order to use Kaldi binaries everywhere I add them to `PATH`.
In addition, I needed to add `openfst` directory to `LD_LIBRARY_PATH`, I compiled Kaldi dynamically linked against `openfst`. To conclude, I added following lines to my `.bashrc`.
```bash
############# Kaldi ###########
kaldisrc=/net/work/people/oplatek/kaldi/src
export PATH="$PATH":$kaldisrc/bin:$kaldisrc/fgmmbin:$kaldisrc/gmmbin:$kaldisrc/nnetbin:$kaldisrc/sgmm2bin:$kaldisrc/tiedbin:$kaldisrc/featbin:$kaldisrc/fstbin:$kaldisrc/latbin:$kaldisrc/onlinebin:$kaldisrc/sgmmbin

### Openfst ###
openfst=/ha/home/oplatek/50GBmax/kaldi/tools/openfst
export PATH="$PATH":$openfst/bin
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH":$openfst/lib
```

Which tool for building a Language Model (LM) have I used?
---------------------------------------------------------
None. I received built LM in Arpa format.

NOTE: Probably, I should build my own LM.


How have I installed Atlas?
--------------------
NOTE: I decided NOT to use Atlas, I USE OpenBlas INSTEAD. It is open source and it allows me to compile both shared and static libraries at one run.

Nevertheless how I install Atlas:

* I installed version atlas3.10.1.tar.bz2 (available at sourceforge)
* I unpackaged it under `kaldi-trunk/tools` which created `kaldi-trunk/tools/ATLAS`
* The main problem with building ATLAS was for me disabling CPU throtling.
* I solved it by

```bash
# running following command under root in my Ubuntu 12.10
# It does not turn off CPU throttling in fact, but I do not need the things optimaze on my local machine
# I ran it for all of my 4 cores
# for n in 0 1 2 3 ; do echo 'performance' > /sys/devices/system/cpu/cpu${n}/cpufreq/scaling_governor ; done
```

* Then I needed to install Fortran compiler (The error from configure was little bit covered by consequent errors) by

```bash
sudo apt-get install gfortran
```

* On Ubuntu 12.04 I had issue with

```bash
/usr/include/features.h:323:26: fatal error: bits/predefs.h
```

Which I solved by

```bash
sudo apt-get install --reinstall libc6-dev
```

* Finally, in `kaldi-trunk/tools/ATLAS` I run:

```bash
mkdir build
mkdir ../atlas_install
cd build
../configure --shared --incdir=`pwd`/../../atlas_install
make
make install
```
25 changes: 25 additions & 0 deletions README.md
@@ -0,0 +1,25 @@
ABOUT
=====
* This is a Git mirror of [Svn trunk of Kaldi project](http://sourceforge.net/projects/kaldi/)
`svn://svn.code.sf.net/p/kaldi/code/trunk`
* In the branch `master` I commit my work. In the branch `svn_mirror` I mirror `svn://svn.code.sf.net/p/kaldi/code/trunk`. In the branch `sandbox-oplatek` I am developing changes which I would like to check in back to Kaldi.
* Currently, I mirror the repository manually as often as I needed.
* The main purpose for mirroring is that I want to build my own decoder and train my models for decoding based on up-to-date Kaldi version.
* Recipe for training the models can be found at `egs/kaldi-vystadial-recipe`
* Source code for python wrapper for online-decoder is at `src/python-kaldi-decoding`
* Remarks about new decoder are located at `src/vystadial-decoder`
* I use the `Fake submodules` approach to merge the 3 subprojects to this repository. More about `Fake submodules` [at this blog](http://debuggable.com/posts/git-fake-submodules:4b563ee4-f3cc-4061-967e-0e48cbdd56cb).
* I mirror the svn via `git svn`. [Nice intro to git svn](http://viget.com/extend/effectively-using-git-with-subversion), [Walk through](http://blog.shinetech.com/2009/02/17/my-git-svn-workflow/) and [Multiple svn-remotes](http://blog.shuningbian.net/2011/05/git-with-multiple-svn-remotes.html)

OTHER INFO
----------
* Read `INSTALL.md` and `INSTALL` first!
* For training models read `egs/kaldi-vystadial-recipe/s5/README.md`
* For building and developing decoder callable from python read `src/python-kaldi-decoding/README.md`
* For information about new decoder read `src/vystadial-decoder/README.md`
* This work is done under [Vystadial project](https://sites.google.com/site/filipjurcicek/projects/vystadial).

LICENSE
--------
* We release all the changes at pyKaldi under `Apache license 2.0` license. Kaldi also uses `Apache 2.0` license).
* We also want to publicly release the training data in the autumn 2013.
3 changes: 2 additions & 1 deletion README.txt
@@ -1,4 +1,5 @@

This is oficial Kaldi readme. You are now in Kaldi/trunk mirror.
Read Kaldi.md and INSTALL.md first!


See http://kaldi.sourceforge.net/ for documentation
Expand Down
124 changes: 124 additions & 0 deletions egs/babel/s5/local/annotatedKwlist2KWs.pl
@@ -0,0 +1,124 @@
#!/usr/bin/perl

# Copyright 2012 Johns Hopkins University (Author: Guoguo Chen)
# Apache 2.0.
#

use strict;
use warnings;
use Getopt::Long;

my $Usage = <<EOU;
Usage: annotatedKwlist2KWs.pl [options] <kwlist.annot.xml|-> <keywords|-> [category]
e.g.: annotatedKwlist2KWs.pl kwlist.annot.list keywords.list "NGram Order:2,3,4"
This script reads an annotated kwlist xml file and writes a list of keywords, according
to the given categories. The "category" is a "key:value" pair in the annotated kwlist xml
file. For example
1. "NGram Order:2,3,4"
2. "NGram Order:2"
3. "NGram Order:-"
where "NGram Order" is the category name. The first line means print keywords that are
bigram, trigram and 4gram; The second line means print keywords only for bigram; The last
line means print all possible ngram keywords.
If no "category" is specified, the script will print out the possible categories.
Allowed options:
EOU

GetOptions();

@ARGV >= 2 || die $Usage;

# Workout the input/output source
my $kwlist_filename = shift @ARGV;
my $kws_filename = shift @ARGV;

my $source = "STDIN";
if ($kwlist_filename ne "-") {
open(KWLIST, "<$kwlist_filename") || die "Fail to open kwlist file: $kwlist_filename\n";
$source = "KWLIST";
}

# Process kwlist.annot.xml
my %attr;
my %attr_kws;
my $kwid="";
my $name="";
my $value="";
while (<$source>) {
chomp;
if (m/<kw kwid=/) {($kwid) = /kwid="(\S+)"/; next;}
if (m/<name>/) {($name) = /<name>(.*)<\/name>/; next;}
if (m/<value>/) {
($value) = /<value>(.*)<\/value>/;
if (defined($attr{$name})) {
$attr{"$name"}->{"$value"} = 1;
} else {
$attr{"$name"} = {"$value", 1};
}
if (defined($attr_kws{"${name}_$value"})) {
$attr_kws{"${name}_$value"}->{"$kwid"} = 1;
} else {
$attr_kws{"${name}_$value"} = {"$kwid", 1};
}
}
}

my $output = "";
if (@ARGV == 0) {
# If no category provided, print out the possible categories
$output .= "Possible categories are:\n\n";
foreach my $name (keys %attr) {
$output .= "$name:";
my $count = 0;
foreach my $value (keys %{$attr{$name}}) {
if ($value eq "") {$value = "\"\"";}
if ($count == 0) {
$output .= "$value";
$count ++; next;
}
if ($count == 6) {
$output .= ", ...";
last;
}
$output .= ",$value"; $count ++;
}
$output .= "\n";
}
print STDERR $output;
$output = "";
} else {
my %keywords;
while (@ARGV > 0) {
my $category = shift @ARGV;
my @col = split(/:/, $category);
@col == 2 || die "Bad category \"$category\"\n";
$name = $col[0];
if ($col[1] eq "-") {
foreach my $value (keys %{$attr{$name}}) {
foreach my $kw (keys %{$attr_kws{"${name}_$value"}}) {
$keywords{$kw} = 1;
}
}
} else {
my @col1 = split(/,/, $col[1]);
foreach my $value (@col1) {
foreach my $kw (keys %{$attr_kws{"${name}_$value"}}) {
$keywords{$kw} = 1;
}
}
}
}
foreach my $kw (keys %keywords) {
$output .= "$kw\n";
}
}

if ($kwlist_filename ne "-") {close(KWLIST);}
if ($kws_filename eq "-") { print $output;}
else {
open(O, ">$kws_filename") || die "Fail to open file $kws_filename\n";
print O $output;
close(O);
}

0 comments on commit 817ce77

Please sign in to comment.