-
Notifications
You must be signed in to change notification settings - Fork 9
Example ingestion using Apache Gearpump
Piotr Grabuszynski edited this page May 17, 2016
·
13 revisions
How to deploy and use example Apache Gearpump application - ingestion-ws-kafka-gearpump-hbase
This example will use also: hbase-java-api-example and twitter-to-kafka applications.
- Create kafka instance on platform. In this example we will use twitter-to-kafka app, which needs instance named “kafka-twitter-instance”
- In TAP console go to marketplace and choose “kafka”
- Create new instance
- Create hbase instance on platform. In this example we will use hbase-java-api-example, which needs instance named “hbase1”
- In TAP console go to marketplace and choose “hbase”
- Choose tab “Shared”
- Create new instance
- Deploy applications twitter-to-kafka and hbase-java-api-example (see the documentation of this components) on TAP
- As a KAFKA_TOPIC in twitter-to-kafka environment variable set for example “topicIn”
- Get zookeeper uri
- In TAP console click Services > Instances,
- Create key for “hbase1”,
- Click ”Export Keys” in the right top corner,
- Click “+ Add to exports” near just created key,
- At the bottom, find “exported Keys” section, then your instance, credentials and zookeeperUri. Copy this uri:
"credentials":{ ... "zookeeperUri": "<<COPY THIS VALUE, example value: cdh-master-0.node.domain.consul:2181,cdh-master-5.node.domain.consul:2181,cdh-master-2.node.domain.consul:2181/kafka>>"}
- Now generate output topic table for ingestion-ws-kafka-gearpump-hbase example
- Go the cdh node, and then cdh-master-0:
ssh ec2-user@cdh.domain.com -i ~/.ssh/yourKey.pem
ssh cdh-master-0
- Create topics (instead “<<ZOOKEEPER_URI>>” paste uri copied in step 4, including ‘/kafka’ suffix)
kafka-topics --create --zookeeper <<ZOOKEEPER_URI>> --replication-factor 1 --partitions 1 --topic topicIn
kafka-topics --create --zookeeper <<ZOOKEEPER_URI>> --replication-factor 1 --partitions 1 --topic topicOut
- If you want to check, if topic has been created
kafka-topics --list --zookeeper <<ZOOKEEPER_URI>>
- Go the cdh node, and then cdh-master-0:
- Create a table in hbase using hbase-java-api-example
curl http://domain.and:port/api/tables -X POST -H "Content-Type: application/json" -d '{"tableName":"pipeline","columnFamilies":["message"]}'
- where domain.and:port is your real domain, for example:
curl http://trustedanalytics.org:80/api/tables -X POST -H "Content-Type: application/json" -d '{"tableName":"pipeline","columnFamilies":["message"]}'
- Create GearPump instance or deploy on existing one
- In TAP console, go to Data Science > GearPump tab,
- If you don’t have an instance, you can create it right now,
- Click link “Deploy App” on the right to chosen instance name,
- Choose gearpump application jar (from ingestion-ws-kafka-gearpump-hbase/gearpump),
- Add extra parameters:
- inputTopic – topicIn
- outputTopic – topicOut
- tableName – pipeline
- columnFamily – message
- Check hbase instance called “hbase1” and kafka instance “kafka-twitter-instance” from the list and deploy the application.
- Check if the information flow is working – tweets should be visible in hbase:
curl http://domain.and:port/api/tables/pipeline/head
- You can also check output kafka topic:
- Go to cdh and cdh-master-0, then use command (instead “<<ZOOKEEPER_URI>>” paste uri copied in step 4):
kafka-console-consumer --zookeeper <<ZOOKEEPER_URI>> --topic topicin --from-beginning
- Go to cdh and cdh-master-0, then use command (instead “<<ZOOKEEPER_URI>>” paste uri copied in step 4):