# Exploratory Data Analysis
As of May 2023, Apple's App Store offered over 1.76 million apps across 70 categories, accounting for over 62% of the global revenue share in 2022. This iterative, yet lean, exploratory data analysis (EDA) of over 300,000 apps across 26 categories aims to discover patterns, trends and clusters of opinion, and to generate actionable, high-impact insights for opportunity discover and analysis.

> “Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.” — John Tukey

The EDA is guided by the following nine, intentionally vague, yet provoking questions to stimulate discovery of the unknown, latent, yet high-impact/high-value, customer experience in an iterative, and likely chaotic learning process, given the available data.

1. What best characterizes the nature of customer opinion within the mobile app market today?  
2. What best describes the intensity of customer opinion?
3. How does opinion and customer satisfaction vary across mobile app markets and categories? 
4. In what ways does the nature and intensity of opinion correlate, overall, within and across categories?
5. Are their clusters revealed by the nature and intensity of opinion?   
6. How and in which ways has opinion changed or evolved over time?
7. Where is customer opinion most varied? Most similar?
8. Does the distribution of rating and rating volume reveal any clustering of app creators / developers?
9. To what degree do these data expose, and quantify unmet needs, and empower opportunity discovery?

This EDA is divided into two parts, corresponding to the two primary sources of data for this endeavor.
1. AppData EDA: Exploratory *quantitative* data analysis of rating data, vis-a-vis, apps, within and across categories, and developers as they manifest over time.
2. Review EDA: Exploratory *qualitative* content, sentiment, intensity, and emotion analysis of review data.

## AppData EDA




,  a deep inquiry into the mobile app customer exper to discover the voice of the mobile app customer, discover patterns, and clusters  to and review data were obtained for over 300,000 apps across 26 categories.  Data, rating and review data were for approximately 300,000 mobile apps
This exploratory data analysis of approximately 300,000 mobile apps currently offered in Apple's App Store, will span 26 categories and include the following 15 variables. 

| #  | Variable                | Date Type  | Description                                |
|----|-------------------------|------------|--------------------------------------------|
| 1  | id                      | Nominal    | App Id from the App Store                  |
| 2  | name                    | Nominal    | App Name                                   |
| 3  | description             | Text       | App Description                            |
| 4  | category_id             | Nominal    | Numeric category identifier                |
| 5  | category                | Nominal    | Category name                              |
| 6  | price                   | Continuous | App Price                                  |
| 7  | developer_id            | Nominal    | Identifier for the developer               |
| 8  | developer               | Nominal    | Name of the developer                      |
| 9  | rating                  | Interval   | Average user rating since first released   |
| 10 | ratings                 | Discrete   | Number of ratings since first release      |
| 11 | rating_current_version  | Interval   | Average customer rating of current release |
| 12 | ratings_current_version | Discrete   | Numer of user ratings for current release  |
| 13 | released                | DateTime   | Datetime of first release                  |
| 14 | released_current        | DateTime   | Datetime of current release                |
| 15 | version                 | Nominal    | Current version of app                     |


**Dependencies** 

In [1]:
import os

import pandas as pd
from IPython.display import HTML

from aimobile.data.dataset.appdata import AppDataDataset
from aimobile.container import AIMobileContainer

container = AIMobileContainer()
container.init_resources()
container.wire(packages=["aimobile.data.dataset"])


## AppData Dataset 
### Overview
The characteristics of the AppData dataset are as follows:

<a id='appdata'></a>

In [2]:
dataset = AppDataDataset()
dataset.structure

Unnamed: 0,Characteristic,Total
0,Number of Observations,334821.0
1,Number of Variables,15.0
2,Number of Cells,5022315.0
3,Missing Cells,0.0
4,Missing Cells (%),0.0
5,Duplicate Rows,0.0
6,Duplicate Rows (%),0.0
7,Size (Bytes),1087470469.0


### Profile
Data type, cardinality, validity, and size data are summarized at the variable level. 

In [3]:
dataset.info

Unnamed: 0,Column,Dtype,Valid,Null,Validity,Unique,Cardinality,Size
0,id,int64,334821,0,1.0,302760,0.9,2678568
1,name,string[python],334821,0,1.0,302434,0.9,28546848
2,description,string[python],334821,0,1.0,297963,0.89,987808693
3,category_id,category,334821,0,1.0,26,0.0,336101
4,category,category,334821,0,1.0,26,0.0,337633
5,price,float64,334821,0,1.0,116,0.0,2678568
6,developer_id,int64,334821,0,1.0,168841,0.5,2678568
7,developer,string[python],334821,0,1.0,168335,0.5,25693867
8,rating,float64,334821,0,1.0,52988,0.16,2678568
9,ratings,int64,334821,0,1.0,23167,0.07,2678568


App, price, rating and market release data are characterized for over 300,000 apps across 26 categories, created by over 168,000 developers. Variables of interest:
1. Overall
    1.1. Average customer rating,    
    1.3. The distribution of average customer ratings
1. Average customer rating overall
2. Average customer rating for the current reel and for the most recent version of the app
2. Rating counts overall and for the most recent version of the app
3. Distribution 