tbana

tbana is a suite of connectors enabling several "big data" frameworks to acccess Splunk collected data.

With tbana, users of Hadoop, Cascading, Spark, and Storm can utilize data already collected in Splunk such as system log files, and other sources of "machine data," rather than setting up completely new and parallel data integest systems.

tbana is designed for the non-Splunk system to perform the analytics, in contrast to other connectors that enable Splunk to analyze data on non-Splunk systems. In Hadoop, for instance, this is done by providing an InputFormat that Hadoop MapReduce jobs can use. In effect, this opens up the data for any higher-layer system that can utilize a Hadoop InputFormat such as Hive, Pig, etc.

Features

Currently, tbana supports the following:

Apache Hadoop
Cascading
Apache Spark
Twitter Storm

Wish List

On the todo list are the following:

Scalding
Cascalog
Spark Streaming
Hive
Pig

Some of these "should work" but are untested, or may need a thin wrapper.

Data Retrieval

Data from Splunk is retrieved in the following ways:

Via a live search request using Splunk's REST API
Via data from Splunk persisted in HDFS via Shuttl

Search requests to Splunk can be performed in the following ways:

synchronous (via "export" mode)
asynchronous (via "job" mode)

Splits and Parallel Transfers

When pulling live data from Splunk to MapReduce jobs, the InputFormat class has an option to split the data by indexer. Normally, a client to Splunk will access the cluster via the Search Head. The Search Head is responsible for distributing searches to all the Indexers, and doing aggregating/reduce functions on the data before returning to the client. In this case, the search head is bypassed in order to avoid a potential bottleneck in the search head. Keep in mind that searches directly against the indexers may differ than from the search head, depending on how you've configured Splunk. However, if you know that searches run directly against the individual Indexers are sufficient for your needs, you can use the Indexer Split scheme to do parallel transfers.

Documentation

Visit the wiki for build instructions and developer manual

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
conf		conf
examples		examples
gradle/wrapper		gradle/wrapper
lib		lib
scripts		scripts
splunk_tgz		splunk_tgz
src		src
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
spark-run		spark-run
spark-shell		spark-shell

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tbana

Features

Wish List

Data Retrieval

Splits and Parallel Transfers

Documentation

About

Releases

Packages

Contributors 3

Languages

License

yolodata/tbana

Folders and files

Latest commit

History

Repository files navigation

tbana

Features

Wish List

Data Retrieval

Splits and Parallel Transfers

Documentation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages