# Technologies for advanced programming (TAP) - 2025
[Salvo Nicotra](https://about.me/snicotra)

![tap logo](images/tap_logo.jpg)

# Syllabus

General knowledge of technologies useful to build end-to-end solutions to analyse, manage, store, process, visualize, *document* data acquired in (near) real time.

Using simplified and cross infrastructure software deployment systems (containers) and microservices orchestration tool (compose/kubernetes), the course will present "on-the-edge" technologies used for data ingestion, pipelines, big data processing and visualization, metadata description

Using an agile and multidisciplinary approach: topics, technologies and enviroments discussed will be applied to real case examples 

## Course detail

### Lessons 

- Tue 16-19  
- Fri 14-17

## Contacts

Office hours: Tue 15-16

Email: salvatore.nicotra1@unict.it

## Telegram Group
![](https://upload.wikimedia.org/wikipedia/commons/thumb/8/82/Telegram_logo.svg/240px-Telegram_logo.svg.png)
(https://t.me/joinchat/DPCishPgqXBxeWUbrUKyCg)[@tap]

## GitHub Organization
![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/4a/GitHub_Mark.png/246px-GitHub_Mark.png)
https://github.com/tapunict

# What we are going to deal with ?

## Technologies for realtime data collection and analytics systems

![](images/mediaset-enabler.png)

## Running Prototypes of End to End Solutions

### Tap Projects

https://github.com/tapunict/crew/blob/main/projects.md

### Pass the baton

![](https://www.leadershipmanagementmagazine.com/wp-content/uploads/Passaggio-generazionale-960x520.jpg)

### PARL
![](https://github.com/ManciSee/PARL/raw/main/images/Parlogo.png)

## Meme

![](https://i.imgflip.com/4zth70.jpg)

# Concepts
> Areas of interests where technologies can be applied to get more value 

## Big Data

Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze

[McKinsey 2011](https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/big-data-the-next-frontier-for-innovation)

![](images/Big_Data.png)

### Big data is better data

[▶![](http://img.youtube.com/vi/8pHzROP1D-w/0.jpg)](https://www.youtube.com/watch?v=8pHzROP1D-w&start=113)

## Digital Marketing

Digital marketing is a form of direct marketing which links consumers with sellers electronically using interactive technologies like emails, websites, online forums and newsgroups, interactive television, mobile communications etcetera (Kotler and Armstrong, 2009)

![](images/digital-marketing.jpg)

###  Top 15 Global Brands Ranking (2000-2023)

[▶![](http://img.youtube.com/vi/WHBYnHu4rv4/0.jpg)](https://www.youtube.com/watch?v=WHBYnHu4rv4)

    

## Social Impact of Big Data

big data applications such as lifestyle, disaster relief, energy and sustainability, critical infrastructure, and so forth that indicate promise for making a societal impact through the use of analytics.
[Big Data & Analytics for Societal Impact](https://link.springer.com/article/10.1007/s10796-018-9846-7)

![](images/social-impact.jpg)

### Ethics and AI: tackling biases hidden in big data

[▶![](http://img.youtube.com/vi/a081Gpp5MeQ/0.jpg)](http://www.youtube.com/watch?v=a081Gpp5MeQ)

### The era of blind faith in big data must end

[▶![](http://img.youtube.com/vi/_2u_eHHzRto/0.jpg)](http://www.youtube.com/watch?v=_2u_eHHzRto)

## Literate Programming

Literate programming: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do. 
> Donald Knuth (1984)

![](images/knuth-vs-mcilroy.png)

Source: https://catonmat.net/knuth-vs-mcilroy

## Stream Processing

The definition of stream processing is exactly opposite of my definition of batch processing. In stream processing, you do not collect your data to reach certain quorum or timeout before you trigger your process. As soon as the data event is received, the program processes it, and creates the output. It’s event processing. So “real-time” word is somewhat redundant. Yet, a lot of systems do use “real-time” to describe them as low latency systems. sing-what-are-your-choices/

![](images/streams.png)

## Machine Learning

> Machine learning is the science (and art) of programming computers so they can learn from data

Aurélien Géron in Hands-on Machine Learning with Scikit-Learn and TensorFlow.

![](images/machine-learning.jpg)

## Cloud Computing

> Cloud computing is a style of computing in which scalable and elastic IT-enabled capabilities are delivered as a service using internet technologies.

Gartner Glossary

![](images/cloudcomputing.jpg)

## Modern Data Platform

Companies have contended with a deluge of data for years. And while most have not yet found a good way of managing it all, the challenges—diverse data sources, types, and structures and new environments and platforms—have grown ever more complex. At the same time, deriving value from data has become a business imperative, making the consequences of not managing your organization’s data more severe—from lack of critical business insights to the hobbling of AI implementations.

https://www.technologyreview.com/2023/01/05/1066239/modern-data-architectures-fuel-innovation/


![](https://wp.technologyreview.com/wp-content/uploads/2023/01/MIT_Kyndryl_Cover_1200.png?w=1200)

# Technologies

## Containers

A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.

![](images/docker.png)

## Workload management

Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available.

![](https://d33wubrfki0l68.cloudfront.net/69e55f968a6f44613384615c6a78b881bfe28bd6/9e66c/it/_common-resources/images/flower.svg)

## Data Ingestion

Logstash is an open source data collection engine with real-time pipelining capabilities.

Logstash can dynamically unify data from disparate sources and normalize the data into destinations of your choice. Cleanse and democratize all your data for diverse advanced downstream analytics and visualization use cases.

![](https://static-www.elastic.co/v3/assets/bltefdd0b53724fa2ce/blt0ee60d54428ec0b2/614b1cea69b7947c1b3ae7c5/illustration-logstash-white-bg-608x404.png)

## Data Streaming

> Kafka® is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.

![](images/kafka.jpg)

## Data Processing

> Apache Spark™ is a unified analytics engine for large-scale data processing.

![](images/spark-logo.png)

## Data Indexing

> Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

![](images/elastic-search.jpg)

## Data Visualization

> Kibana is a free and open user interface that lets you visualize your Elasticsearch data and navigate the Elastic Stack. Do anything from tracking query load to understanding the way requests flow through your apps.

![](https://static-www.elastic.co/v3/assets/bltefdd0b53724fa2ce/blt0423c2ca741d3c05/5ea8c90064f47652ec7993f4/brand-kibana-220x130.svg)

## Notebooks

> Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.

![](images/jupyter.png)

## Data Catalog Tools

DataHub's extensible metadata platform enables data discovery, data observability and federated governance that helps tame the complexity of your data ecosystem.

![](https://datahubproject.io/assets/ideal-img/datahub-flow-diagram-light.5ce651b.1600.png)

# Applications

## Data Science && Business Intelligence

![](https://www.kdnuggets.com/images/data-science-vs-business-intelligence-700.jpg)

[Source](https://www.kdnuggets.com/2021/02/data-science-vs-business-intelligence-explained.html)

## Stream Mining

![](https://miro.medium.com/max/637/1*JTtkTYEiH12mJI9MjFasxg.png)
[Source](https://towardsdatascience.com/introduction-to-stream-mining-8b79dd64e460)

## Data Governance

![Data Governance and SMART cities, @NTusikov lays down the concerns #viznotes](https://live.staticflickr.com/1942/44529706365_385018c739_3k.jpg)

## New Ideas Welcome

![](images/supriseme.gif)