🐍 Easy data processing with Azure functions and Python
This repository contains the tutorial for the Microsoft Azure Sponsored workshop. As well as all the solutions to the different sections.
Table of Contents
🐍Easy data processing with Azure functions and Python
Serverless computing (also known as function as a service, FaaS) is a design patterns where applications are hosted by a third-party service (i.e. Azure) eliminating the need for server software and hardware management by the developer.
Serverless can be an excellent alternative for Pythonistas interested in data processing as it allows them to focus on their code rather than the cloud infrastructure. This workshop we introduce attendees to Azure Functions for data processing scenarios (including data acquisition, cleaning and transformation and storage for subsequent usage).
After this tutorial, attendees will have practical experience with Azure functions for data processing scenarios. Also, they will leave the workshop with a basic function for data processing that could be further modified/extended to suit their needs/requirements.
- Introduction to serverless and Azure functions
- Creating your first Azure function:
- Create a simple scheduled function using the VS Code extension
- Familiarise with functions projects and structure
- Running and debugging locally
- Functions deployment
- Deploy your function to Azure
- Familiarise with the Azure portal
- Data processing use case
- Updating your function to collect data
- Data cleaning, aggregation and storage
This workshop is aimed at folks interested in data processing, data engineering or data science. The goal is to provide a practical introduction to serverless for data processing scenarios.
We assume that you:
Have intermediate Python knowledge:
- Have a good understanding of how to write and call functions
- Have a good understanding of how Python modules and scripts work
Have some experience with data wrangling and/or data processing (not extensive experience required but have, for example, used libraries like pandas and requests for data wrangling and API access)
Are comfortable using the command line/terminal (no need to be an expert but should be comfortable enough to navigate file systems and perform necessary Git tasks)
The solutions can be found in the solutions directory in this repository.
- Timer function: API data acqusition only
- Timer function: API + Blob binding
- Timer function + Data processing/email sending: full pipeline
ARM templates included:
The contents in this repo are licensed under the https://opensource.org/licenses/MIT OSI license.