CDF-CDH Labs: Real-time sentiment analysis with NiFi, Kafka, Schema Registry, Streams Messaging Manager, Kudu, Impala and Hue.

--

Objective

The objective of this lab is to provide hands-on experience on Cloudera CDF, CDH (NiFi, Schema Registry, Streams Messaging Manager, Kafka, Kudu, Impala and Hue) through a single integrated workflow that brings all these components together in a single use-case.

For the purpose of this lab, we would build an end-to-end use case that will:

Ingest data sets from Meetup.com for a specific event through NiFi
Parse the dataset and extract key terms from the data set, derive a sentiment rating with StanFord CoreNLP engine
Configure NiFi with NiFi registry for Version Control
Configure Schema Registry for maintaining a schema version that all services will use as reference
Configure Streaming Messaging Manager (SMM) to setup and manage Kafka Topics
Setup Kafka Topics to ingest data from NiFi
Setup Kudu Tables to store the social data
Use Impala to run queries on Kudu
Build a dashboard in Hue for better visualization of this dataset

Pre-requisites

You would require an environment where the following are installed and configured:

Cloudera CDH (Impala, Kudu, Hue)
Cloudera Data Flow (Nifi, Registry)

We provide instructions to deploy a single node CDH cluster with all the above pre-requisites configured and installed here: Github Repo

Using this repo, you can bring up a CDH cluster with all components pre-installed (also includes CDSW deployment instructions as well). For specific cloudera workshops, we may provide an AMI image that can be launched without the need to install all components from scratch.

To prepare the environment:

Deploy the OneNodeCluster using Github Repo. This OneNodeCluster Github repo was built by my colleague Fabio Ghirardello, who put in a lot of effort to have a single CDSW+CDH+CDF instance that can be leveraged for end-to-end demos and labs. The deployment takes <30 mins for a built-from-scratch environment.

--

Content

Lab 1 - Accessing the sandbox
Lab 2 - Preparing your instance for labs
Lab 3 - Stream data using NiFi
Lab 4 - Creating and Registering Schema in Schema Registry
Lab 5 - Using Streams Messaging Manager to create and manage Kafka Topics
Lab 6 - Enhance NiFi flow to identify sentiment on comments
Lab 7 - Incorporating Schema Registry in the NiFi flow
Lab 8 - Using SMM to track messages in Kafka
Lab 9 - Configure Kudu and Impala
Lab 10 - Using NiFi to populate data from Kafka to Kudu
Lab 11 - Use Impala to query Kudu
Lab 12 - Configure Hue
Lab 13 - Build Dashboard with Hue

Accessing the sandbox

SSH to the sandbox

If you are using mac, then open terminal and navigate to the directory where you have downloaded the .pem file and execute the following:

$ chmod 400 sg-cdf-cdp-cdsw-workshop.pem

Then you can ssh by typing (Public IP to be provided by Cloudera):

$ ssh -i sg-cdf-cdp-cdsw-workshop.pem centos@public_ip_of_instance

On Mac use the terminal to SSH

On Windows use putty

In the event you don't have local installation rights, the easiest option is Google Secure Shell.

Important: Before you start with the lab exercises, and if you are using AWS/Azure/Google instance for running this OneNodeCluster, then please ensure that you have opened the following ports (inbound) to your laptop IP.

80, 8080, 9999, 22, 7180, 7788, 9991, 10080, 18080

Some of these ports are also being used by the services to interoperate. Please ensure that you have also opened these ports towards (inbound) your PUBLIC_IP of the instance, in addition to your laptop. If you are doing this in a Cloudera workshop conducted by a Solutions Architect, then this would be addressed by the instructor.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
images		images
schema		schema
scripts		scripts
README.md		README.md

therealmanalu/CDF-CDH-Workshop

Folders and files

Latest commit

History

Repository files navigation

CDF-CDH Labs: Real-time sentiment analysis with NiFi, Kafka, Schema Registry, Streams Messaging Manager, Kudu, Impala and Hue.

Objective

Pre-requisites

Content

Accessing the sandbox

SSH to the sandbox

Accessing Cloudera Manager

Preparing your instance for labs

Stream data using NiFi

Run the sentiment analysis model

Build NiFi flow

Creating and Registering Schema in Schema Registry

Using Streams Messaging Manager to create and manage Kafka Topics

Enhance NiFi Flow to identify sentiment on comments

Incorporating Schema Registry in the NiFi flow

Using SMM to track messages in Kafka

Configure Kudu and Impala

Using NiFi to populate the date from Kafka to Kudu

Use Impala to query Kudu

Configure Hue

Build Dashboard with Hue

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages