# AppData Acquisition
Acquisition of the core descriptive, rating, and app data; collectively termed AppData, is encapsulated in this notebook. The data acquisition pipeline extracted 15 variables for 16 search terms, loosely corresponding to Apple's app category taxonomy. 

|               |              |
|---------------|--------------|
| business      | music        |
| education     | photo        |
| entertainment | productivity |
| finance       | reference    |
| food          | shopping     |
| health        | social       |
| lifestyle     | travel       |
| medical       | utilities    |


The 16 appdata variables are:

| #  | Variable                | Date Type  | Description                                |
|----|-------------------------|------------|--------------------------------------------|
| 1  | id                      | Nominal    | App Id from the App Store                  |
| 2  | name                    | Nominal    | App Name                                   |
| 3  | description             | Text       | App Description                            |
| 4  | category_id             | Nominal    | Numeric category identifier                |
| 5  | category                | Nominal    | Category name                              |
| 6  | price                   | Continuous | App Price                                  |
| 7  | developer_id            | Nominal    | Identifier for the developer               |
| 8  | developer               | Nominal    | Name of the developer                      |
| 9  | rating                  | Interval   | Average user rating since first released   |
| 10 | ratings                 | Discrete   | Number of ratings since first release      |
| 11 | rating_current_version  | Interval   | Average customer rating of current release |
| 12 | ratings_current_version | Discrete   | Numer of user ratings for current release  |
| 13 | released                | DateTime   | Datetime of first release                  |
| 14 | released_current        | DateTime   | Datetime of current release                |
| 15 | version                 | Nominal    | Current version of app                     |

In [1]:
from IPython.display import display, HTML

from aimobile.container import AIMobileContainer
from aimobile.data.acquisition.appdata.controller import AppDataController
from aimobile.data.dataset.appdata import AppDataDataset

In [2]:
container = AIMobileContainer()
container.init_resources()
container.wire(packages=["aimobile.data.acquisition", "aimobile.data.dataset"])
repo = container.data.uow().appdata_repo

In [3]:
TERMS = ["health", "productivity", "social", "business", "education", "entertainment", "lifestyle", "medical",
         "finance", "food", "music", "reference", "photo", "shopping", "travel", "utilities"]

In [4]:
controller = AppDataController()
controller.scrape(terms=TERMS)        

[05/21/2023 05:10:07 AM] [INFO] [AppDataController] [scrape] : Running AppDataController is not authorized at this time.


## AppData Dataset Overview

In [5]:
dataset = AppDataDataset()
dataset.structure

Unnamed: 0,Characteristic,Total
0,Number of Observations,334821.0
1,Number of Variables,15.0
2,Number of Cells,5022315.0
3,Missing Cells,0.0
4,Missing Cells (%),0.0
5,Duplicate Rows,0.0
6,Duplicate Rows (%),0.0
7,Size (Bytes),1087470469.0


## AppData Dataset Profile

In [6]:
dataset.info

Unnamed: 0,Column,Dtype,Valid,Null,Validity,Unique,Cardinality,Size
0,id,int64,334821,0,1.0,302760,0.9,2678568
1,name,string[python],334821,0,1.0,302434,0.9,28546848
2,description,string[python],334821,0,1.0,297963,0.89,987808693
3,category_id,category,334821,0,1.0,26,0.0,336101
4,category,category,334821,0,1.0,26,0.0,337633
5,price,float64,334821,0,1.0,116,0.0,2678568
6,developer_id,int64,334821,0,1.0,168841,0.5,2678568
7,developer,string[python],334821,0,1.0,168335,0.5,25693867
8,rating,float64,334821,0,1.0,52988,0.16,2678568
9,ratings,int64,334821,0,1.0,23167,0.07,2678568


## AppData Dataset Summary

In [7]:
dataset.summary

Unnamed: 0,Category,Examples,Apps,Average Rating,Rating Count
0,Medical,34573,33363,1.68,21600308
1,Music,27086,26643,2.79,127712381
2,Games,25697,21580,3.99,416590669
3,Health & Fitness,23998,20778,3.26,97966543
4,Shopping,23295,21524,3.44,335160759
5,Education,23064,20554,3.36,77844099
6,Social Networking,22366,20704,2.68,109962345
7,Utilities,16808,14181,3.53,131862891
8,Photo & Video,16679,15555,3.42,262107178
9,Entertainment,16620,15239,3.22,165525118
