GitHub - uwegeercken/tweakstreet-samples

Tweakstreet Data Integration Tool Samples

Various examples for the Tweakstreet Data Integration Tool. The samples offer different levels of difficulty, so that both beginners and advanced users can benefit from it. For the Tweakstreet ETL tool, check out the Tweakstreet website here: https://tweakstreet.io

The base components of the Tweakstreet GUI application are: control flows, data flows, steps, hops and modules. Control flows allow serial processing of tasks; they typically contain - amongst other components - one or multiple data flows. Data flows execute certain processing tasks in parallel. The core components in a data flow are steps - which encapsulate certain functionality - and the hops which connect steps to each other. Modules allow to define variables or additional functionality (functions) in a file and config modules can be used to define configuration e.g. for development and production environments. Besides handling flat records, Tweakstreet represents all data natively as nested data structures.

Except for the database related examples, the samples in this repository are self contained. You do not need anything else except of the Tweakstreet application which can be downloaded from the above site. The repository also contains a folder: data. The data files in this folder are used for the samples. For each sample there is also a screenshot available. All samples have a description on the dataflow or controlflow level, as well as descriptions for all or most of the steps.

For the samples with the SQLite database you need to download the JDBC driver from https://github.com/xerial/sqlite-jdbc or another source. Put the JDBC driver jar file into the HOME/.tweakstreet/drivers folder.

I recommend to also check the Tweakstreet forum at: https://forum.tweakstreet.io/ which contains tutorials, data challenges and more.

To get a copy of all samples clone this repository, install the Tweakstreet Data Integration Tool and run it. On the Tweakstreet homepage you find a link to the documentation. There is also an introduction tutorial video available on youtube: https://www.youtube.com/watch?v=zjduMFtbmFM.

Attention: After starting Tweakstreet, select the folder of the cloned repository as your workspace folder ('File' menu). And then please make sure that you use the config module: 'conf-module-samples.tsm'. It contains some variables which the samples rely on: in the sidebar on the left which shows the file and directory tree (Strg+B to activate it if you don't see it), click on the file 'conf-module-samples.tsm' to open it. Then right-click on the tab which carries the name of the file and select 'Set as config module'. Alternatively right-click on '$none' in the blue toolbar at the bottom, select 'Choose config module' and navigate to this file and select it.

Flows: Below is the list of available samples, listing the name of the folder, the name of the flow file and a short description what the flow does.

Folder	Filename	Description
basic-01	read-csv-01.dfl	read a CSV file
basic-01	read-csv-02.dfl	read and filter a CSV file
basic-01	read-csv-03.dfl	read CSV file and remove some fields from the output
basic-01	read-csv-04.dfl	read CSV file, add a calculated list field, convert elevationto meters and write to log
basic-01	read-csv-05.dfl	read CSV file, group by country and sort by highest number of airports
basic-01	read-csv-06.dfl	read CSV file, lookup country details, group by country and sort by highest number of airports
basic-01	read-csv-07.dfl	read fixed-length ASCII file and split each row into its individual fields
basic-02	modules-01.dfl	simply output variables from a config module to the log
basic-02	modules-02.dfl	import config variables to the data flow
basic-02	modules-03.dfl	call function from a global module
basic-03	random-data-01.dfl	generate random data using generators
basic-03	random-data-02.dfl	generate random data using generators, merge results
basic-03	random-data-03.dfl	generate random data using generators, merge results directly in random data step
basic-03	random-data-04.dfl	generate random data using list and dict generators
basic-03	random-data-05.dfl	generate random data based on distinct lists generated from a CSV file
basic-04	split-01.dfl	split a string of multiple key/value pairs into its components and create a dictionary
basic-05	diff-01.dfl	calculate the difference between a reference and a second data set
basic-05	diff-02.dfl	calculate the difference between a reference and a second data set. use a decision step to route output
basic-06	misc-01.dfl	retrieve system information and do some date calculations using the time library
basic-06	serialize-01.dfl	serialize data to a file
basic-06	deserialize-01.dfl	deserialize data to the file produced in the serialize-01 data flow
basic-06	distinct-01.dfl	determine the distinct values over a group of fields
basic-06	rest-01.dfl	get data from a REST endpoint and split into individual rows
basic-06	rest-02.dfl	get data from the geonames REST endpoint, split the JSON response into rows and sum up population per continent
basic-07	partitioning-01.dfl	partition data to process in parallel
basic-08	database-01.dfl	read airport data from a local SQLite database
basic-08	database-02.dfl	read airport data and lookup country data from a local SQLite database using a "SQL Input" step
basic-08	database-03.dfl	read airport data and join country data using a "Join on Condition" step
basic-08	database-04.dfl	read airport data and join country data using a "SQL Script" step and convert list to rows
basic-08	database-05.dfl	read airport and country data and output the results including JSON field
basic-08	database-06.dfl	read data stored in the database-05.dfl flow and restore the dictionary with the country data
basic-08	database-07.dfl	read data stored in the database-05.dfl flow and restore the dictionary data already in the SQL query
basic-09	stateful-01.dfl	Generate a row count and calculate the running average age using the "Stateful Calculator" step
basic-10	templates-01.dfl	Use sample data and merge it with a template using the Freemarker template engine
basic-11	mainflow-01.dfl, subflow-01.dfl	usage of the subflow step
basic-11	mainflow-02.dfl, mainflow-02_subflow-01, mainflow-02_subflow-02	switching subflows depending on a variables value
medium-01	cast-values-01.dfl	Cast selected values of a list of dictionaries from string to long or double
medium-01	functions-01.dfl	Show the use of functions as a formular or using the widgets

Available modules:

Filename	Description
conf-module-samples.tsm	config module with variables for the data files
global-module-samples.tsm	global module with variables for re-use across flows
random-data-lists.tsm	various lists with random data. used with the "generate from list" generator

Data Files:

Folder	Filename	Description
data	airports.csv	7733 airports with name, codes, coordinates, elevation
data	airports_fixed_length.csv	7733 airports with name, codes, coordinates, elevation. fields have fixed start/end positions
data	airports.json	7733 airports with name, codes, coordinates, elevation
data	airports_cleaned.json	7733 airports with name, codes, coordinates, elevation. with corrected long and double data types
data	countries.csv	241 countries with name and code
data	country_continent_lookup.csv	country to continent lookup data
data	continent_names_lookup.csv	continent names lookup data
data/sqlite	airports.db	airports and countries in a Sqlite database file

Steps: Below is the list of steps and an indication in which dataflow they are used.

Step	Dataflow
CSV Input	read-csv-01.dfl, read-csv-02.dfl, read-csv-03.dfl, read-csv-04.dfl, read-csv-05.dfl, read-csv-06.dfl, read-csv-07.dfl, random-data-05.dfl
Filter	read-csv-02.dfl, read-csv-04.dfl
Pick Fields	read-csv-03.dfl, random-data-02.dfl, partitioning-01.dfl, database-05.dfl, random-data-05.dfl, rest-02.df
Calculator	read-csv-04.dfl, read-csv-07.dfl, modules-03.dfl, random-data-02.dfl, split-01.dfl, partitioning-01.dfl, database-06.dfl, templates-01.dfl, cast-values-01.dfl, functions-01.dfl
Logger	read-csv-04.dfl, modules-01.dfl, modules-02.dfl
Group By	read-csv-05.dfl, read-csv-06.dfl, random-data-05.dfl, rest-02.df
Sort	read-csv-05.dfl, read-csv-06.dfl
Stream Lookup	read-csv-06.dfl
Generate Rows	random-data-01.dfl, random-data-02.dfl, random-da,ta-03.dfl, random-data-04.dfl, diff-01.dfl, diff-02.dfl, serialize-01.dfl, partitioning-01.dfl, random-data-05.dfl, mainflow-01.dfl
Random Data	random-data-01.dfl, random-data-02.dfl, random-data-03.dfl, random-data-04.dfl, serialize-01.dfl, partitioning-01.dfl, random-data-05.dfl, mainflow-01.dfl
Data Table	split-01.dfl, stateful-01.dfl, distinct-01.dfl, templates-01.dfl
Diff on sorted Keys	diff-01.dfl, diff-02.dfl
Decision	diff-02.dfl, partitioning-01.dfl
Deserialize	deserialize-01.dfl
Serialize	serialize-01.dfl
System Info	misc-01.dfl
Clock	misc-01.dfl, stateful-01.dfl
SQL Input	database-01.dfl, database-02.dfl, database-03.dfl, database-05.dfl, database-06.dfl, database-07.dfl, mainflow-02.dfl
SQL Lookup	database-02.dfl, database-05.dfl
Join on Condition	database-03.dfl
SQL Script	database-04.dfl
List to Rows	database-04.dfl, rest-01.dfl, rest-02.df, cast-values-01.dfl
SQL Insert	database-05.dfl
Stateful Calculator	stateful-01.dfl
Distinct	distinct-01.dfl
Freemarker	templates-01.dfl
HTTP Request	rest-01.dfl, rest-02.dfl
Read File	cast-values-01.dfl, functions-01.dfl
Sub Flow	mainflow-01.dfl, mainflow-02.dfl
Interface Input	subflow-01.dfl, mainflow-02_subflow-01, mainflow-02_subflow-02
Interface Output	subflow-01.dfl, mainflow-02_subflow-01, mainflow-02_subflow-02
Value Mapper	mainflow-02.dfl

last update: 2021-10-26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweakstreet Data Integration Tool Samples

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
basic-01		basic-01
basic-02		basic-02
basic-03		basic-03
basic-04		basic-04
basic-05		basic-05
basic-06		basic-06
basic-07		basic-07
basic-08		basic-08
basic-09		basic-09
basic-10		basic-10
basic-11		basic-11
data		data
medium-01		medium-01
output		output
templates		templates
README.md		README.md
conf-module-samples.tsm		conf-module-samples.tsm
global-module-samples.tsm		global-module-samples.tsm
overview-neo4j-bloom.png		overview-neo4j-bloom.png
random-data-lists.tsm		random-data-lists.tsm

uwegeercken/tweakstreet-samples

Folders and files

Latest commit

History

Repository files navigation

Tweakstreet Data Integration Tool Samples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages