Hi everyone. In this notebook, you can find data visualization of Google Play Store Apps. 

In [None]:
library(dplyr)
library(readr)
library(ggplot2)

In [None]:
dataset <- read_csv("../input/googleplaystore.csv")

In [None]:
head(dataset)

In [None]:
dataset$Category <- tolower(dataset$Category)

Firstly, we visualize the most installed (one billion)  apps's categories. 

In [None]:
a <- dataset %>%
  select(Category, Installs) %>%
  filter(Installs == "1,000,000,000+") %>%
  group_by(Category) %>%
  arrange(Category)

ggplot(a, aes(x= Installs, fill = Category)) +
  geom_bar(position = "dodge") +
  coord_flip()

*  Respectively, communication, social and game apps are the most installed by over one billion people.

**How many apps included in each category ?**

In [None]:
c <- dataset %>%
  group_by(Category) %>%
  summarize(Count = n()) %>%
  arrange(desc(Count))

  
c <- head(c, 10)

ggplot(c, aes(x = Category, y = Count)) +
  geom_bar(stat="identity", width=.5,  fill="firebrick4") +
  labs(title = "Top10 Categories") +
   # subtitle = "How many apps included in each category ?") + 
  theme(axis.text.x = element_text(angle=65, vjust=0.6))

Family is the most frequent category. 

I wonder is there demand for family apps ? So, we need to calculate total number of installs for each category. 

'Installs' column have char value. 
We should do; 

1. getting rid of commas,

2. removing the last letter ( + sign ) ,

3. converting numeric to able to calculate sum of installs

In [None]:
dataset$Rating[dataset$Rating == ""] <- "None"

In [None]:
dataset <- dataset %>%
  filter(Installs != "0")

options(scipen = 999)

#1
dataset$Installs <- gsub(",", "", gsub("\\.", "", dataset$Installs))

#2
dataset$Installs <- as.character(dataset$Installs)
dataset$Installs = substr(dataset$Installs,1,nchar(dataset$Installs)-1)

#3
dataset$Installs <- as.numeric(dataset$Installs)


In [None]:
dataset %>% 
  group_by(Category) %>%
  summarize(totalInstalls = sum(Installs)) %>%
  arrange(desc(totalInstalls)) %>%
  head(10) %>%
  ggplot(aes(x = Category, y = totalInstalls, fill = Category)) +
  geom_bar(stat="identity") +
  labs(title= "Top10 Installed Categories" ) +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

The highest total installed category is game. But why family apps are more frequent ? Let's answer this question with visualization in below. 

In [None]:
dataset %>%
  filter(Type == "Paid") %>%
  group_by(Category) %>%
  summarize(totalInstalls = sum(Installs)) %>%
  arrange(desc(totalInstalls)) %>%
  head(10) %>%
  ggplot(aes(x = Category, y = totalInstalls)) +
  geom_bar(stat="identity", width=.5,  fill="forestgreen") +
  labs(title= "Top10 Paid Categories" ) +
  theme(axis.text.x = element_text(angle=65, vjust=0.6))

* Because, family category apps have the highest income. (*Money talks* :) )

What are the genres of family category ?

In [None]:
dataset %>%
  filter(Category == "family") %>%
  group_by(Genres) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  head(10) %>%
  ggplot(aes(x = Genres, y = Count)) +
  geom_bar(stat="identity", width=.5,  fill="gold1") +
  labs(title= "Top10 Genres of 'Family' Category" ) +
  theme(axis.text.x = element_text(angle=65, vjust=0.6))

* That means,  parents tend to pay for apps especially entertainment and education genres.

In [None]:
dataset %>%
  filter(Type == "Free") %>%
  group_by(Category) %>%
  summarize(totalInstalls = sum(Installs)) %>%
  arrange(desc(totalInstalls)) %>%
  head(10) %>%
ggplot(aes(x = Category, y = totalInstalls)) +
  geom_bar(stat="identity", width=.5,  fill="deepskyblue2") +
  labs(title= "Top10 Free Categories" ) +
  theme(axis.text.x = element_text(angle=65, vjust=0.6))


* But, for free apps, game is the most popular category. 

In [None]:
dataset %>%
  filter(Category == "game") %>%
  group_by(Genres) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  head(10) %>%
  ggplot(aes(x = Genres, y = Count)) +
  geom_bar(stat="identity", width=.5,  fill="cyan2") +
  labs(title= "Top10 Genres of 'Game' Category" ) +
  theme(axis.text.x = element_text(angle=65, vjust=0.6)) 




* Action and Arcade game genres is the most preferred. 






**Thank you for reading.**