# AppData Acquisition
Acquisition of the core descriptive, rating, and app data; collectively termed AppData, is encapsulated in this notebook. The data acquisition pipeline extracted 15 variables for 16 search terms, loosely corresponding to Apple's app category taxonomy. 

|               |              |
|---------------|--------------|
| business      | music        |
| education     | photo        |
| entertainment | productivity |
| finance       | reference    |
| food          | shopping     |
| health        | social       |
| lifestyle     | travel       |
| medical       | utilities    |


The 16 appdata variables are:

| #  | Variable                | Date Type  | Description                                |
|----|-------------------------|------------|--------------------------------------------|
| 1  | id                      | Nominal    | App Id from the App Store                  |
| 2  | name                    | Nominal    | App Name                                   |
| 3  | description             | Text       | App Description                            |
| 4  | category_id             | Nominal    | Numeric category identifier                |
| 5  | category                | Nominal    | Category name                              |
| 6  | price                   | Continuous | App Price                                  |
| 7  | developer_id            | Nominal    | Identifier for the developer               |
| 8  | developer               | Nominal    | Name of the developer                      |
| 9  | rating                  | Interval   | Average user rating since first released   |
| 10 | ratings                 | Discrete   | Number of ratings since first release      |
| 11 | rating_current_version  | Interval   | Average customer rating of current release |
| 12 | ratings_current_version | Discrete   | Numer of user ratings for current release  |
| 13 | released                | DateTime   | Datetime of first release                  |
| 14 | released_current        | DateTime   | Datetime of current release                |
| 15 | version                 | Nominal    | Current version of app                     |

In [1]:
from IPython.display import display, HTML

from appstore.container import AppstoreContainer
from appstore.data.acquisition.appdata.controller import AppDataController
from appstore.data.dataset.appdata import AppDataDataset

In [2]:
container = AppstoreContainer()
container.init_resources()
container.wire(packages=["appstore.data.acquisition", "appstore.data.dataset"])
repo = container.data.uow().appdata_repo

[08/30/2023 11:33:40 AM] [INFO] [MySQLDatabase] [connect] : Database is not started. Starting database...
[sudo] password for john: 


Starting MySQL...
 * Starting MySQL database server mysqld




   ...done.


In [3]:
OLD_TERMS = ["health", "productivity", "social", "business", "education", "entertainment", "lifestyle", "medical",
         "finance", "food", "music", "reference", "photo", "shopping", "travel", "utilities"]
TERMS = ["games", "graphics", "developer", "news"]

In [4]:
controller = AppDataController()
controller.scrape(terms=TERMS)        

[08/30/2023 11:33:46 AM] [INFO] [AppDataController] [_get_or_start_project] : 
Retrieved Project:
None
[08/30/2023 11:33:46 AM] [INFO] [AppDataController] [_get_or_start_project] : 

Started project for Games apps.
[08/30/2023 11:35:32 AM] [INFO] [AppDataController] [_update_report_stats] : Term: Games	Pages: 10	Apps: 2000	Elapsed Time: 0:01:46.557864	Rate: 18.77 apps per second.
[08/30/2023 11:38:10 AM] [INFO] [AppDataController] [_update_report_stats] : Term: Games	Pages: 20	Apps: 4000	Elapsed Time: 0:04:24.323921	Rate: 15.13 apps per second.
[08/30/2023 11:41:01 AM] [INFO] [AppDataController] [_update_report_stats] : Term: Games	Pages: 30	Apps: 6000	Elapsed Time: 0:07:15.355799	Rate: 13.78 apps per second.
[08/30/2023 11:43:40 AM] [INFO] [AppDataController] [_update_report_stats] : Term: Games	Pages: 40	Apps: 8000	Elapsed Time: 0:09:54.279226	Rate: 13.46 apps per second.
[08/30/2023 11:46:16 AM] [INFO] [AppDataController] [_update_report_stats] : Term: Games	Pages: 50	Apps: 10000	El

## AppData Dataset Overview

In [None]:
dataset = AppDataDataset()
dataset.structure

## AppData Dataset Profile

In [None]:
dataset.info()

## AppData Dataset Summary

In [None]:
dataset.summary