-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
131 lines (85 loc) · 3.14 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
=============================================
| Embedded Relational Learning |
=============================================
This program produces SVG graphics describing the relationships between queries
and interpretations in a relational learning setting.
Prerequisites
---------------------------------------------
for the c++ program
- a current c++ compiler (>=4.1)
- boost >= 1.35
- for boost.ublas to work, you need to install the LAPACK backend.
- boost-numeric-bindings
This is a header-only library you can get from:
http://mathema.tician.de/software/boost-bindings
just drop it in /usr/local/include/boost-numeric-bindings
- matio for matlab output
for the xml to svg conversion script
- perl >= 5.1
- The Perl modules SVG, Convert::Color, File::Util.
Building
---------------------------------------------
$ tar xzvf erl.tar.gz
$ cd erl
$ mkdir build
$ cd build
$ cmake ../src
$ make -j2
Installing
---------------------------------------------
not done yet, just run from build directory.
Running
---------------------------------------------
try:
$ ./erl --help
$ ./erl {action} [options]
there are two main actions:
- count
this is a Molfea implementation for SDF files
count does currently not compile on 64bit systems due to known bugs in boost.
- code
this is a specific RPROP-based CODE implementation for visualization
purposes.
Configuration File
---------------------------------------------
You can put all options in the configuration file as well.
The configuration file defaults to config.dat and can be changed using the -c
parameter.
Configuration files have a INI-syntax.
The options translate as follows:
the option:
--count.out_base="something"
becomes
[count]
out_base = something
Examples -- count
---------------------------------------------
you need SDF-files (the LONG format!!!)
the output of the count stage is cached, therefore you need to specify "-r" if
you want the program to re-count and re-read everything.
an example call could be:
$ ./erl count --count.out_base=base --SDFReader.files molecules_a.sdf:1,molecules_i.sdf:0 --output-dir=`pwd` -r -v
Explanation:
This count action generates two csv-files in output-dir, base.csv and
base-names.csv needed for the code action (see below).
It uses two SDF-files, molecules_a.sdf and molecules_i.sdf, with class 1/0,
respectively. The output files (and the cache) is written to the current
directory, re-counting is forced. The program is verbose.
Examples -- code
---------------------------------------------
you need two files: the matrix file and the names file.
matrix file
- contains the description of an interpretation in each line
- a line consists of
- a name for the interpretation (currently ignored)
- for every query, a number of co-occurrences (usually 1/0)
- a class label (should be an int)
names file
- contains a list of names for the queries, one per line.
use this name scheme for the files:
Matrix file: BASENAME.csv
Names file: BASENAME-names.csv
you can then run erl using the following command
$ ./erl code --code.input_file BASENAME --output-dir=`pwd`
currently, the output-dir parameter is _mandatory_.
good luck!