Lets suppose you have a configuration file in the following form:
# System wide configuration
[SystemConstants]
MAX_MEMORY = 120
DISK_QUOTA = 50
HOME_ROOT = "/home"
# Users module
[UserConstants]
BOOST = 3.5
BOOST_ENABLED = true
Maybe because it's shared between your java app and some legacy app. Maybe because your technical support / ops like this format and can adapt it easily, before building a new release. Or maybe because you don't want to change something everyone i familiar with.
Your task is to read the config file in your application and use those constants on proper places.
So how would you handle it?
If you are lucky, there already exists a library to parse this config in java. Maybe it's ini4j, maybe standard java properties file. If not, you are on your own. So hack some regex and iterate over all lines, do some find-and-replace, prepare a singleton and utility classes to access properties easily? And then usage like:
int maxMemory = Config.getInstance().getIntValue("SystemConstants", "MAX_MEMORY");
You write tests, right? So cover it for some obscure cases and hope for the best?
But there must be a better approach. This isn't how one builds a reliable solution. We have compilers, data types checking, code generators, maven goals and plugins, tons of useful libraries and tools. Let's get the best of it.
What would I like to see? A Java class, called SystemConstants.java
with following content, automatically generated from the config above:
public final class SystemConstants {
public static final int MAX_MEMORY = 120;
public static final int DISK_QUOTA = 50;
public static final String HOME_ROOT = "/home";
}
Then in my real code, access to this config value would be only
int maxMemory = SystemConstants.MAX_MEMORY;
Clean, easy to read and actually use, compile-time verifiable, type secure, code completion friendly.
All we need to figure out is, how to transform our config file and generate some java source code of it. In the same way we do it with WSDL (wsdl2java).
No matter how simple or complicated your config is, regex is probably not the right tool. It will be either too naive and fragile or extremely complicated and unreadable. You have probably already heard of abstract syntax trees. This is how source code is translated to a tree representation before compilation.
In our case it should translate the config into something like:
Config
├── SystemConstants
│ ├── MAX_MEMORY = 120 (int)
│ ├── DISK_QUOTA = 50 (int)
│ └── HOME_ROOT = "/home" (String)
└── UserConstants
├── BOOST = 3.5 (double)
└── BOOST_ENABLED = true (boolean)
Two different config enums, each with several key-value pairs. Values can be of different type - integer
, double
, boolean
, String
. All we need to do is to parse a *.cfg
file and construct similar tree of it.
To avoid regexes and line iterating, let's check Parboiled. Parboiled is a parsing library based on parsing expression grammars. We have to describe supported elements of our config language an subsequent actions on the tree nodes. To get a better idea, what is this all about, check the Calculator example.
In our case, we have to describe several structures:
- Section (SystemConstants, UserConstants), each holding one or more key-value pairs
- Key-Value pair, compound of
- Key identificator (variable name)
- Value -
integer
,double
,boolean
,String
- Comments and whitespaces, to cover rest the of the file
Parboiled gives us the power to define each structure as a java method. Take for example the key-value pair:
Rule Assignment() {
return Sequence(
Identifier(),
EQUAL,
Value(),
Spacing()
);
}
And a Section consists of one or more Assignments:
Rule Section() {
return Sequence(
LEFT_BRACKET,
Identifier(),
RIGHT_BRACKET,
OneOrMore(Assignment())
);
}
This is the actual code, used to parse our configuration file - not a pseudo code. Pretty cool, right? Readable, easy to write and actually understand, unlike typical regex for such a complicated structure.
Parboiled provides a value stack, convenient for storing our node values. Consider following Integer
value rule:
Rule Integer() {
return Sequence(
OneOrMore(Digit()),
push(Integer.parseInt(match()))
);
}
Integer consists of one or more digits. At the end of digits matching, we push the match to the value stack. From the stack can be the value obtained later, when the whole Assignment is matched and a key-value pair build. The same methods can be applied to an Assignment and later to a Section. Assignment will pop the last and last minus one values. Last is the actual value, last minus one is the key. And so one. You get the idea.
Now we are able to build the whole configuration tree described above. Sections->Assignments->Key-value pairs. What's next?
We have a config tree. Converting to the java classes should be more or less:
- Iterate over all Section nodes
- Convert Section to a class
- Attach every Assignment to this class as a field
- Persist class to a file
Since java source codes are usual text files, we could do it on our own. But String joining of all the pieces together could quickly become same nightmare as those regexes before. Luckily we don't have to do it. Roaster is a parser / formatter of java source codes. It provides fluent API to generate java classes:
final JavaClassSource javaClass = Roaster.create(JavaClassSource.class);
javaClass.setPackage("cz.tomasdvorak").setName("SystemConstants");
javaClass.addField()
.setName("MAX_MEMORY")
.setType("int")
.setLiteralInitializer("120")
.setPublic()
.setStatic(true)
.setFinal(true);
Will produce following source code:
package cz.tomasdvorak;
public class SystemConstants {
public static final int MAX_MEMORY = 120;
}
All we need to do is to iterate over Sections, iterate over Assignments and generate bunch of the source codes.
Now we are able to parse the configuration and generate java source codes of it. Those generated-sources have to be available before the compiler starts to compile our actual application sources. How to glue everything together?
We will need following project layout:
Root project
├── Codegen
│ ├── Parser
│ ├── Generator
│ └── Main
└── MyApp
├── Generated-sources
│ ├── SystemConstants.java
│ └── UserConstants.java
└── Sources
└── *.java
With the right Maven setup, following steps take places during mvn package
of the root project:
compile
phase of Codegen source codes - prepaires Parser, Grammar, Generator and Maingenerate-sources
phase of MyApp, where Codegen.Main is executed and all configuration classes are generated from provided config file, then everything persistedcompile
phase of MyApp is executed, with already available Configuration sourcestest
phase of MyApp is executed, when compilation succeeds- ...
The root project doesn't have any sources and serves only as an aggregator project. Two subprojects are needed to separate compile phases of Parser/Generator and actual application sources. Exec Maven Plugin executes the Generator.Main during generate-sources
phase of MyApp project, triggering the whole parse-generate-persist chain.
This project structure also ensures the Separation of concerns. Your application only provides a *.cfg
file and receives ready-to-use java classes from them. All the parsing and generating is separated in its own project. You can have many apps using the similar configuration files, every app depending on the same Codegen project. So next time you don't have to repeat yourself.
We are ready, all the infrastructure is prepaired. It's time to write our app, that depends on generated SystemConstants and UserConstants. It could be something like:
package cz.tomasdvorak.users;
import cz.tomasdvorak.myapp.settings.SystemConstants;
import cz.tomasdvorak.myapp.settings.UserConstants;
public class UserUtils {
public static double getMaxMemory() {
if(UserConstants.BOOST_ENABLED) {
return SystemConstants.MAX_MEMORY * UserConstants.BOOST;
} else {
return SystemConstants.MAX_MEMORY;
}
}
}
If we remove any of BOOST_ENABLED
, BOOST
or MAX_MEMORY
assignments in the *.cfg
file, the compilation fails. If we change any of them to different type - like String, the compilation fails. If we rename them, the compilation fails. Everything must be pretty much aligned to pass the compile phase. The majority of errors can and will be detected before our tests are started. There exists no simple workaround / hack like disabling or ignoring failing test results. Even without a single test we are still able to catch most of the typical errors and mistakes. Our app code is clean, reliable, typed, available for code completion. What more to ask for?!
Don't forget to check the travis-ci build of this repository. It proves, that everything plays nicely together. Have you find a better solution or a bug in my code? Let me know on twitter or send a pull request. Thanks!