Thanks to Ryan Templeton for their help with troubleshooting.
This is based on the Hortonworks Twitter Demo
Purpose: Monitor Twitter stream for the procided Hastags & act on unexpected increases in tweet volume
-
Ingest: Listen for Twitter streams related to Hashtags input in NiFi Garden Hose (GetHTTP) processor
-
Processing:
- Monitor tweets for unexpected volume
- Volume thresholds managed in HBASE
-
Persistence:
- HDFS (for future batch processing)
- Hive (for interactive query)
- HBase (for realtime alerts)
- Solr/Banana (for search and reports/dashboards)
-
Refine:
- Update threshold values based on historical analysis of tweet volumes
-
Demo setup:
- Either download and start prebuilt VM
- Start HDP 2.3 sandbox and run provided scripts to setup demo
- Download VM from here. Import it into VMWare Fusion and start it up.
- Find the IP address of the VM and add an entry into your machines hosts file e.g.
192.168.191.133 sandbox.hortonworks.com sandbox
- Connect to the VM via SSH (password hadoop)
ssh root@sandbox.hortonworks.com
- Start the demo by
cd /root/hdp_nifi_twitter_demo
./start-demo.sh
#once storm topology is submitted, press control-C
#start Nifi processor
1. Using Browser, go to http://sandbox.hortonworks.com:<port#>/nifi
2. Upload the XML file into NiFi templates section in the UI. The XML file is under /root/hdp_nifi_twitter_demo/nifi-template
-
Observe results in HDFS, Hive, Solr/Banana, HBase
-
Troubleshooting: check the Storm webUI for any errors and try resetting using below script:
./reset-demo.sh
These setup steps are only needed first time and may take upto 30min to execute (depending on your internet connection)
- Download HDP 2.3 sandbox VM image file (Sandbox_HDP_2.3_VMWare.ova) from Hortonworks website
- Find the IP address of the VM and add an entry into your machines hosts file e.g.
192.168.191.241 sandbox.hortonworks.com sandbox
- Connect to the VM via SSH (password hadoop)
ssh root@sandbox.hortonworks.com
- Pull latest code/scripts
git clone git@github.com:vedantja/hdp_nifi_twitter_demo.git
- NiFi Garden Hose Processor requires you to have a Twitter account and obtain developer keys by registering an "app". Create a Twitter account and app and get your consumer key/token and access keys/tokens: https://apps.twitter.com > sign in > create new app > fill anything > create access tokens
- Then enter the 4 values into the appropriate fields (see screenshot)
consumerKey
consumerSecret
oauth.accessToken
oauth.accessTokenSecret
- Run below to setup demo (one time): start Ambari/HBase/Kafka/Storm and install maven, solr, banana -may take 10 min
cd /root/hdp22-twitter-demo
./setup-demo.sh
Most of the below steps are optional as they were already executed by the setup script above but are useful to understand the components of the demo:
-
(Optional) Review the list of stock symbols whose Twitter mentiones we will be tracking http://en.wikipedia.org/wiki/List_of_S%26P_500_companies
-
(Optional) Generate securities csv from above page and review the securities.csv generated. The last field is the generated tweet volume threshold
/root/hdp_nifi_twitter_demo/fetchSecuritiesList/rungeneratecsv.sh
cat /root/hdp_nifi_twitter_demo/fetchSecuritiesList/securities.csv
- (Optional) for future runs: you can add other stocks/hashtags to monitor to the csv (make sure no trailing spaces/new lines at the end of the file). Find these at http://mobile.twitter.com/trends
sed -i '1i$HDP,Hortonworks,Technology,Technology,Santa Clara CA,0000000001,5' /root/hdp22-twitter-demo/fetchSecuritiesList/securities.csv
sed -i '1i#hadoopsummit,Hadoop Summit,Hadoop,Hadoop,Santa Clara CA,0000000001,5' /root/hdp22-twitter-demo/fetchSecuritiesList/securities.csv
- (Optional) Open connection to HBase via Phoenix and check you can list tables. Notice securities data was imported and alerts table is empty
/usr/hdp/current/phoenix-client/bin/sqlline.py localhost:2181:/hbase-unsecure
!tables
select * from securities;
select * from alerts;
select * from dictionary;
!q
- (Optional) check Hive table schema where we will store the tweets for later analysis
hive -e 'desc tweets_text_partition'
- Start Storm Twitter topology to generate alerts into an HBase table for stocks whose tweet volume is higher than threshold this will also read tweets into Hive/HDFS/local disk/Solr/Banana. The first time you run below, maven will take 15min to download dependent jars
cd /root/hdp_nifi_twitter_demo
./start-demo.sh
#once storm topology is submitted, press control-C
- (Optional) Other modes the topology could be started in future runs if you want to clean the setup or run locally (not on the storm running on the sandbox)
cd /root/hdp_nifi_twitter_demo/twitterstorm
./runtopology.sh runOnCluster clean
./runtopology.sh runLocally skipclean
- open storm UI and confirm topology was created http://sandbox.hortonworks.com:8744/
-
To stop producing tweets, hit the stop button on the template processor in the NiFi console.
-
kill the storm topology to stop processing tweets
storm kill Twittertopology
- SSH into the sandbox: ssh root@sandbox….
- There is an xml file in the nifi-template folder. Scp the file to your local disk.
- Start nifi: nifi.sh start
- Go to sandbox.hortonworks.com:9090/nifi & upload the template
- Add the Access Keys from your Twitter Developer account.
- Meanwhile, start solr & banana: sh ~/setup-scripts/restart_solr_banana.sh
- Start Storm topology: sh ~/twittertopology/runtopology.sh
- Once topology has started, hit play on the NiFi dashboard
- Go to sandbox.hortonworks.com:8983/banana