Skip to content

Set up the AWS infrastructure for a small Hadoop cluster as well as install the Cloudera Manager server and agents.

License

Notifications You must be signed in to change notification settings

teamclairvoyant/terraform-hadoop-talk

Repository files navigation

This demo will use Terraform to set up and manage the AWS infrastructure for a small Hadoop cluster as well as install the Cloudera Manager server and agents.

WARNING: Running this code will cost you money in AWS fees.

  • You must have an AWS account.
  • You must have Terraform and awscli installed.
  • You must have awscli credentials (aws configure) already set up.
  • You must have git installed locally.

Summary Terraform commands

terraform init  # only needed once

terraform plan
terraform apply

terraform destroy

Instructions

WARNING: Running this code will cost you money in AWS fees.

This demonstration will build a seven node Cloudera Hadoop environment using CentOS Linux that is ready for you to configure within Cloudera Manager. All required and recommended OS preparation and tuning steps will be automatically performed. There are three master nodes and four worker nodes.

There is also a jumphost (Bastion server) which allows you to SSH into the Hadoop environment. This is required due to the Haddop server being on a private subnet in the AWS VPC.

Run Terraform to apply the configuration. There will be some questions to answer and then you can wait about 16 minutes until the infrastructure is ready.

$ terraform apply
var.aws_profile
  AWS CLI profile name

  Enter a value: default

var.aws_region
  EC2 Region for the VPC

  Enter a value: ap-south-1

var.remote_ips
  Your IP address used to limit SSH access to the jumphost. (ex ["10.1.2.3/32"])

  Enter a value: ["1.2.3.4/32"]

var.ssh_key_file
  The full pathname of the file which holds the SSH private key. (ex ~/.ssh/id_rsa)

  Enter a value: ~/.ssh/id_demo

var.tag_name
  Your cluster name which will be added to object tags.

  Enter a value: PUNE-DEMO

var.tag_owner
  Your email address which will be added to object tags.

  Enter a value: hello@clairvoyantsoft.com

...

NOTE: The SSH private key can not have a passphrase.

If the terraform apply fails, attempt to fix the error and then run terraform apply again.

Once the terraform apply has completed successfully, there should be output listing the hostnames of the instances in EC2. Find the value of ec2_instance.jumphost.pub. You will use it to SSH to the jumphost (step 1) in order to further access the cluster environment.

ssh-add ~/.ssh/id_demo
ssh -ACD 8157 -i ~/.ssh/id_demo -l centos <EC2_INSTANCE.JUMPHOST.PUB VALUE>
# Example: ssh -ACD 8157 -i ~/.ssh/id_demo -l centos ec2-35-154-146-83.ap-south-1.compute.amazonaws.com

Now, you will configure (step 2) and use your web browser to connect to the Cloudera Manager server. First, find your browser's network proxy settings. Then change the value of the SOCKS proxy to use localhost and port 8157. Finally, find the value of ec2_instance.manager.priv in the Terraform output and place that value in your browser's navigation bar with the addition of port 7180.

Example: http://ip-192-168-100-84.ap-south-1.compute.internal:7180/

Log in to the Cloudera Manager server with the ussername "admin" and the password "admin".

Finish the CDH deployment by following the new cluster deployment wizard.

Do not forget to run terraform destroy when you are done.

WARNING: Running this code will cost you money in AWS fees.

License

Copyright (C) 2017 Clairvoyant, LLC.

Licensed under the Apache License, Version 2.0.

About

Set up the AWS infrastructure for a small Hadoop cluster as well as install the Cloudera Manager server and agents.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published