RPZLA translates as Response Policy Zone (RPZ) Log Analysis (LA). Its a bit of a misnoma, in that RPZ is a BIND thing and the project also relies heavily on log analysis of Apache logs.
It is a Proof of Concept to allow IT organisations to utilize the data that can be gathered from an implementation of the RPZ feature included in BIND to enhance their organisation's security. This is NOT a carrier grade solution.
The first release version of BIND to make the RPZ feature available was version 9.8.1. This feature has been continuously supported by BIND ever since.
See the LICENCE.txt file (GPLv2).
This project is evolving.
A first alpha release was achieved in late 2012. Since then, the project has looped between requirement re-analysis and alpha.
Documentation for the project is in Perl based 'pod' files. They are located in the source code tree where they are relevant. Its great that github.com is using the multi-rendering capability that was one of the proposed uses of the POD format.
Overall documenation is placed in the https://github.com/yesxorno/rpzla/tree/master/doc directory.
Wiki's may come. The current focus is that if you download the repository onto a potentially command line only server, and you still have the doco there, it is in an easily readable form. That is the Perl thing: pod == Plain Old Documentation
This is a project in development. In a production environment it likely involves 3+ sub-systems; all need to be configured and the required supporting technology may require manual install (download, build, deploy, configure, use) as it may not have yet hit 'standard package' delivery.
It is of note that RedHat have included BIND 9.8.2 in their repositories. So, manual build may not be required, depending on the Linux distro.
Funding and Sponsorship
This project is funded exclusively by the generous donation of time by private individuals.
The product of the project is and will continue to be available for general free (as in both speech and beer) use based on the License.
Paul Vixie of the Internet Systems Consortium (ISC) (http://www.isc.org) implemented the first reputational data based method of spam filtering (summary available at https://en.wikipedia.org/wiki/Mail_Abuse_Prevention_System). This reputational method has become a standard component in spam protection.
Domain Based Reputational Data
With the release of BIND 9.8.1 a new reputational mechanism is available, this time for use by DNS resolvers. An organisation is able to receive a reputational data feed describing internet domains that have a 'poor' reputation. A poor reputation is usually based on the delivery of malware, or other forms of nefarious internet activity.
The ISC have provided an efficient standardised mechanism for the use of reputational data by recursive DNS resolvers and have left the provision of the reputational data itself to professional organisations that specialize in this type of information. Additionally, the response that shall be given to a client attempting to resolve a domain which is listed amongst those with a 'poor' reputation is left to the local organisation to decide.
The response delivered will commonly be one of two alternatives.
The first is an NXDOMAIN (no such domain) which is a simple black hole which prevents the client from reaching the domain with a poor reputation.
The second is to respond with a CNAME which will redirect the client to a common, staged location designed to both collect information about who has visited and also to inform them of the danger to which they have been averted by visiting the alternate site rather than the potentially dangerous domain.
This second (CNAME) based response is known in the RPZ literature as a 'Walled Garden'. The RPZLA project focusses on this approach and integrating data gathered by both the DNS resolvers and visits to the Walled Garden to allow distinguishment between human behaviour (more precisely web browser type behaviour) and malware behaviour. The distinguishment between the two and its value is discussed further below.
If a recursive DNS resolver uses RPZ and records in its logs responses made based on RPZ data (i.e something asked for the DNS resolution of a domain with a poor reputation based on the data known to RPZ) then that data can be available for later analysis.
This tells us which client systems using the resolver(s) are attempting to resolve domains with a poor repuation.
Additionally, if the organisation chooses to use a constant CNAME (Walled Garden) strategy and visits to the walled garden are also logged we can then correlate between the DNS logs and the Walled Garden logs.
This is particularly useful, as some software will continuously re-ask the DNS for resolution, and others will cache it. For software that caches, the frequency of attempts to visit the 'dangerous' domain is not recorded in the DNS logs, but in the Walled Garden (assuming they speak http).
The RPZLA project attempts to make all of this DNS and Walled Garden data available, and to allow persons monitoring this data to infer differences in malware from the interactions logged.
Additionally, the project wishes to ensure that data gathered is stored in a manner which is simple and allows an organisation to utilise that data according to their own wishes.
The key elements to RPZLA are a collection of resolvers which are implementing RPZ, a Walled Garden web-site, an RDBMS, and a web site (known as the 'analysis site' in RPZLA parlance) which allows the organisation to view the data delivered to the RDBMS from the resolvers and the Walled Garden.
See [Open Document Graphics] https://github.com/yesxorno/rpzla/raw/master/doc/Pictorial-Overview.odg
To allow analysis of the data from the DNS resolvers and the Walled Garden, their log data must be delivered in a known format and then 'watched' with the 'watchers' then collecting the relevant data to be shipped off to a central location (the RDBMS) for storage.
Then, the analysis site can present any view of that data as wished for by the organisation.
Thus, the implementation is:
a collection of data gatherers running on the DNS resolvers and Walled Garden which ship data to the RDBMS
the analysis web site which queries that RDBMS and displays data as wished for by the organisation.
The project community interact via the github mechanisms (issue logs and/or email). Additional mechanisms will be deployed if/when that is required.
The site below is the single point of reference for materials about RPZ: