AADHAR-Dataset-Analysis

Data analysis of AADHAR dataset using Apache Spark

Technologies Used

Spark
Scala
Spark SQL
Linux Shell Scripting

Initial Data Cleaning

Removing the header containing column names (Done using scala)
Removing NULL values. Assumed them to be 0 (Done using UNIX SED)

Creating a DataFrame

Creating the DataFrame for starting the analysis using the case class corresponding to the column names in input data

Questions Answered about data

Count for number of participants and count for each gender

Number of Male Participants = 102037
Number of Female Participants = 120225
Total Number of Participants = 222281
Number of records with unspecified gender(T) = 19

Count the number of identities(Aadhaar) generated by each Enrollment Agency and get Top 3

CSC SPV : 85088
Rajcomp Info Services Ltd : 16356
Mahaonline Limited : 7749

Top 10 districts with maximum identities generated for both Male and Female

East Champaran : 3700
Jaipur : 3144
West Champaran : 2619
East Khasi Hills : 2481
Siwan : 2402
Muzaffarpur : 2250
Bharatpur : 1999
Agra : 1865
Ahmedabad : 1851
Shrawasti : 1810

Bottom 10 districts with maximum identities generated for both Male and Female

Serchhip : 0
Yanam : 1
Nicobar : 1
North Sikkim : 1
Dibang Valley : 1
Anjaw : 1
Tirap : 2
Mokokchung : 2
North Cachar Hills : 2
Narayanpur : 3

Seeing the top 10 and bottom 10 one thing we can notice that it is easy to bring well-known districts under the radar for issuing the aadhar but work still needs to be done in the remote areas

Top 3 State With number of identities generated for both Male and Female

Uttar Pradesh : 50254
Bihar : 29842
Rajasthan : 20744

Bottom 3 State With number of identities generated for both Male and Female

Lakshadweep : 14
Dadra and Nagar Haveli : 27
Daman and Diu : 45

Top 3 States With number of identities generated for Female

Uttar Pradesh : 26063
Bihar : 15353
Rajasthan : 11404

Bottom 3 States With number of identities generated for Female

Lakshadweep - 6
Others - 17
Dadra and Nagar Haveli - 21

Top 3 States With number identities generated for Male

Uttar Pradesh : 24191
Bihar : 14489
Rajasthan : 9340

Bottom 3 States With number identities generated for Male

Dadra and Nagar Haveli - 6
Lakshadweep - 8
Daman and Diu - 17

The gender-wise distribution follows the same trend as that of same distribution

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
InputData		InputData
src/main/scala		src/main/scala
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InputData

InputData

src/main/scala

src/main/scala

.gitignore

.gitignore

README.md

README.md

build.sbt

build.sbt

Repository files navigation

AADHAR-Dataset-Analysis

Technologies Used

Initial Data Cleaning

Creating a DataFrame

Questions Answered about data

Count for number of participants and count for each gender

Count the number of identities(Aadhaar) generated by each Enrollment Agency and get Top 3

Top 10 districts with maximum identities generated for both Male and Female

Bottom 10 districts with maximum identities generated for both Male and Female

Top 3 State With number of identities generated for both Male and Female

Bottom 3 State With number of identities generated for both Male and Female

Top 3 States With number of identities generated for Female

Bottom 3 States With number of identities generated for Female

Top 3 States With number identities generated for Male

Bottom 3 States With number identities generated for Male

About

Releases

Packages

Languages

varunu28/AADHAR-Dataset-Analysis

Folders and files

Latest commit

History

Repository files navigation

AADHAR-Dataset-Analysis

Technologies Used

Initial Data Cleaning

Creating a DataFrame

Questions Answered about data

Count for number of participants and count for each gender

Count the number of identities(Aadhaar) generated by each Enrollment Agency and get Top 3

Top 10 districts with maximum identities generated for both Male and Female

Bottom 10 districts with maximum identities generated for both Male and Female

Top 3 State With number of identities generated for both Male and Female

Bottom 3 State With number of identities generated for both Male and Female

Top 3 States With number of identities generated for Female

Bottom 3 States With number of identities generated for Female

Top 3 States With number identities generated for Male

Bottom 3 States With number identities generated for Male

About

Topics

Resources

Stars

Watchers

Forks

Languages