Skip to content
Ethan Ruszanowski edited this page Nov 22, 2022 · 6 revisions

ETL Documentation

This ETL component extracts login information from access logs and loads them into the behind-the-scenes database component.

About

We chose to use Go to take advantage of routines and channels. As each line of the input file is ingested by the read routine, it is transformed into a struct with the necessary attributes for insertion into the logs table. The struct is then handed through a channel to an insert routine. Loading is a batch process that occurs once per night as a scheduled job.

Docker

Automated builds are available on Docker Hub and are pulled automatically when using docker-compose.yml.

Setup

Input formatting

The ETL expects the following format for the log file.

Failed authentication

Oct 25 08:46:49 debian sshd[12345]: Failed password for abc1234 from 10.10.10.10 port 12345 ssh2

Successful authentication

Oct 09 19:18:10 debian sshd[12345]: pam_unix(sshd:session): session opened for user abc1234(uid=1000) by (uid=0)

Environment variables

Environment variables containing the database credentials and log file information must be passed to this container during the Docker Compose startup process as defined in docker-compose.yml.

Example

  # Go container for ETL and nightly jobs
  etl:
    image: ethanrusz/tmc-etl:latest
    hostname: etl
    environment:
      DB_USER: user
      DB_PASS: password
      DB_HOST: db
      DB_NAME: operational_analytics
      LOG_PATH: logfile
    stdin_open: true
    depends_on:
      - db

Testing

Automated tests are included in main_test.go for some functions.

go test *.go

Code Outline

image

Login struct example

type LoginAttempt struct { // Struct to store login attempts from log file
	Username  string `json:"username"`
	Timestamp string `json:"timestamp"`
	Success   string `json:"success"`
}
Clone this wiki locally