VMware has ended active development of this project, this repository will no longer be updated.

xd-demo with Pivotal HD Retail data ===================================

Contributors

James Williams - jwilliams@gopivotal.com
Michael Goddard - mgoddard@gopivotal.com
Adam Zwickey - azwickey@gopivotal.com

Demo User Story
We want to ingest real time orders from our POS system directly to HDFS via a pipe delimited HTTP post. A sample post looks like:

Customer ID, Order ID, Order Amount, Store ID
curl -d "{\"orderid\":\"123\",\"storeid\":\"456\",\"customerid\":\"789\",\"orderamount\":\"5000.01\"}" http://localhost:8000 - Good Post
curl -d "{\"orderid\":\"BAD_DATA\",\"storeid\":\"456\",\"customerid\":\"789\",\"orderamount\":\"5000.01\"}" http://localhost:8000 - Bad Post
123|456|789|5000.01 - Dream State in HDFS with HAWQ and in-memory Query

We are going to re-use some integration work that was done in the past and we need to transform and filter the POS data before ingesting into hadoop. The HTTP stream will accept JSON formatted key/value pairs of Order data. Some orders have bad data. We need to filter these records before persisting them to HDFS. After landing the data into hadoop, we would like to run SQL analytics on the orders to see if they match known fraudulent orders from the past. Hive is not an option because it does not provide fast enough response time and full ANSI compliance. We want to run a logistic regression model on all
orders to feed our real-time fraud detection applications that aim to catch criminals before they leave the store. The logistic regression model needs to be re-trained periodically via a scheduled process. The in-memory fraud data store needs to be flushed on a configurable interval and HDFS files need to be archived via a scheduled process.

In order to get this running with Pivotal HD

Start Pivotal HD instance. It is optional to run the "pivotal-samples" data labs to populate the retail_demo DB with HAWQ tables/data. The "pivotal-samples" github project is located at:
https://github.com/PivotalHD/pivotal-samples

Download and install the latest Spring XD binary. The project is located at:
http://projects.spring.io/spring-xd/

<<<<<<< HEAD

Update your spring-xd hadoop config ($SPRING_XD/conf/hadoop.properties) to reflect your hdfs address: =======
Update your spring-xd hadoop config ($SPRING_XD/xd/config/hadoop.properties) to reflect webhdfs: >>>>>>> 4985ef63c23b7c2723e426e91d14f685bebacd48
fs.default.name=hdfs://my-hadoop:8020

Open config.py and add entries for each property. This is very important to ensure connectivity to Pivotal HD and SQLFire.
In a terminal window run(will scp python demo scripts to pivotal hd and sqlfire VMs. Will copy spring xd scripts, lib jars, modules and sink config:
./install.py
Run 3 Spring XD runtimes in terminal windows(redis, admin, container)
sudo sysctl -w net.inet.tcp.msl=1000 $SPRING_XD/redis/bin/redis-server $SPRING_XD/xd/bin/xd-admin --hadoopDistro phd1 $SPRING_XD/xd/bin/xd-container --hadoopDistro phd1
Run Spring XD Shell in a terminal window

$SPRING_XD/shell/bin/spring-xd-shell --hadoopDistro phd1
In Spring XD Shell - Create Hadoop ingest, Pivotal HD analytics tap and SQLFire sink. script --file ../../xd/cmd/create-all.cmds
[PIVOTALHD TERMINAL] Open an ssh session to your Pivotal VM and run this script. You must do this before starting the data stream.
./demo.py setup_hdfs
In a terminal window, run send_data.py to start a data stream simulation.
./send_data.py
[SQLFIRE TERMINAL] Verify that SQLFire is getting only a small subset of orders
./demo.py query
In Spring XD Shell - Re-run batch jobs(should delete SQLFire data, populate HAWQ tables, and re-run analytic training model)
script --file ../../xd/cmd/deploy-batch.cmds
In Spring XD Shell - Reset the richgauge taps to 0)
script --file ../../xd/cmd/reset-taps.cmds
[PIVOTALHD_TERMINAL] Run a PXF and HAWQ Query
./demo.py query_hawq
Install DB Visualizer and run queries through a JDBC client GUI. http://www.dbvis.com/. You will need to add a new "Cache" Driver JAR for SQLFire. You will need to modify '/data/1/hawq_master/gpseg-1/pg_hba.conf' in your Pivotal HD VM to remote connect.
[PIVOTALHD TERMINAL] Restart Pivotal HD via the stop/start scripts.
```
/home/gpadmin/stop_all.sh;
/home/gpadmin/start_all.sh;
```
In Spring XD Shell - Remove all streams/taps from Spring XD. Does not delete any data) script --file ../../xd/cmd/destroy-all.cmds

xd-demo-client

Update app.properties (src/main/webapps/WEB-INF/classes) to reflect the IP addresses of your sqlfire environment
Open a terminal and build the war via maven
mvn install
Copy the WAR file to a working tc Server or Tomcat server
The application will be available at: http://localhost:8080/xd-demo-client/resources/index.html

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.settings		.settings
springxd_files		springxd_files
xd-demo-client		xd-demo-client
.DS_Store		.DS_Store
.project		.project
LICENSE		LICENSE
README.md		README.md
build_training_data.py		build_training_data.py
config.py		config.py
config.pyc		config.pyc
install.py		install.py
open_source_disclosure_springxd-fraud-app_1.0.0.tar.gz		open_source_disclosure_springxd-fraud-app_1.0.0.tar.gz
phd_demo.py		phd_demo.py
send_data.py		send_data.py
sql.txt		sql.txt
sqlf_demo.py		sqlf_demo.py
xd_streams.txt		xd_streams.txt

License

vmware-archive/retail-demo-xd

Folders and files

Latest commit

History

Repository files navigation

VMware has ended active development of this project, this repository will no longer be updated.

xd-demo-client

About

Resources

License

Security policy

Stars

Watchers

Forks

Languages