# Apple App Store Exploratory Data Analysis (EDA)
As of May 2023, Apple's App Store offered over 1.76 million apps across 70 categories, accounting for over 62% of the global revenue share in 2022. This iterative, yet lean, exploratory data analysis (EDA) of over 300,000 apps across 26 categories aims to discover patterns, trends and clusters of opinion, and to generate actionable, high-impact insights for opportunity discover and analysis.

> “Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.” — John Tukey

The EDA is guided by the following nine, intentionally vague, yet provoking questions to stimulate discovery of the unknown, latent, yet high-impact/high-value, customer experience in an iterative, and likely chaotic learning process.

1. What best characterizes the nature of customer opinion within the mobile app market today?  
2. What best describes the intensity of customer opinion?
3. How does opinion and customer satisfaction vary across mobile app categories? 
4. In what ways do the nature and intensity of opinion correlate, overall, within and across categories?
5. Are their clusters revealed by the nature and intensity of opinion?   
6. Where is customer opinion most varied? Most similar?
7. Does the distribution of rating and rating volume reveal any clustering of app creators / developers?
8. To what degree do these data expose, and quantify unmet needs, and empower opportunity discovery?

## Datasets
These data were obtained from the Apple App Store, representing available data as of May 23, 2023. 
- AppData Summary: Summary data including average customer rating, rating count, price
- AppData Quantitative: Detailed app data performance  rating histogram data, rating and review counts 
- AppData Qualitative: Review data for selected app categories.

### AppData Summary Dataset
The AppData Summary dataset provides basic app, developer, and rating data. 

| #  | Variable                | Date Type  | Description                                |
|----|-------------------------|------------|--------------------------------------------|
| 1  | id                      | Nominal    | App Id from the App Store                  |
| 2  | name                    | Nominal    | App Name                                   |
| 3  | description             | Text       | App Description                            |
| 4  | category_id             | Nominal    | Numeric category identifier                |
| 5  | category                | Nominal    | Category name                              |
| 6  | price                   | Continuous | App Price                                  |
| 7  | developer_id            | Nominal    | Identifier for the developer               |
| 8  | developer               | Nominal    | Name of the developer                      |
| 9  | rating                  | Interval   | Average user rating since first released   |
| 10 | ratings                 | Discrete   | Number of ratings since first release      |
| 11 | rating_current_version  | Interval   | Average customer rating of current release |
| 12 | ratings_current_version | Discrete   | Numer of user ratings for current release  |
| 13 | released                | DateTime   | Datetime of first release                  |
| 14 | released_current        | DateTime   | Datetime of current release                |
| 15 | version                 | Nominal    | Current version of app                     |

### AppData Quantitative
This dataset provides additional rating and review quantitative data.

| #  | Variable    | Description                      | Data Type   |
|----|-------------|----------------------------------|-------------|
| 1  | id          | App Identifier                   | Nominal     |
| 2  | name        | App Name                         | Nominal     |
| 3  | category_id | Four Digit Category Id           | Categorical |
| 4  | category    | Category Name                    | Categorical |
| 5  | rating      | Average Customer Rating          | Interval    |
| 6  | reviews     | Total Number Of Customer Reviews | Discrete    |
| 7  | ratings     | Rating Count                     | Discrete    |
| 8  | onestar     | One Star Rating Count            | Discrete    |
| 9  | twostar     | Two Star Rating Count            | Discrete    |
| 10 | threestar   | Three Star Rating Count          | Discrete    |
| 11 | fourstar    | Four Star Rating Count           | Discrete    |
| 12 | fivestar    | Five Star Rating Count           | Discrete    |

### AppData Qualitative
This dataset contains app reviews for qualitative analysis.

| #  | Variable    | Description                    | Variable Type |
|----|-------------|--------------------------------|---------------|
| 1  | id          | Review id                      | Nominal       |
| 2  | app_id      | App identifier                 | Nominal       |
| 3  | app_name    | Name of application            | Nominal       |
| 4  | category_id | Four digit category identifier | Categorical   |
| 5  | category    | Category name                  | Categorical   |
| 6  | author      | Review author                  | Nominal       |
| 7  | rating      | Author's rating for app        | Discrete      |
| 8  | title       | Title for review               | Nominal       |
| 9  | content     | Review content                 | Nominal       |
| 10 | vote_sum    | Sum of all votes               | Discrete      |
| 11 | vote_count  | Number of votes                | Discrete      |
| 12 | date        | Review date                    | Nominal       |

## Organization
This analysis is organized along quantitative and qualitative motifs.
1. [App Quantitative EDA](notebooks/02_eda/02_quantitative.ipynb): Exploratory *quantitative* data analysis of rating and review data.
2. [App Qualitative EDA](notebooks/02_eda/03_qualitative.ipynb): Exploratory *qualitative* content, sentiment, intensity, and emotion analysis of review data.

