# AppData Acquisition
Acquisition of the core descriptive, rating, and app data; collectively termed AppData, is encapsulated in this notebook. The data acquisition pipeline extracted 11 variables for 24 search terms, loosely corresponding to Apple's app category taxonomy. 

|               |            |              |
|---------------|------------|--------------|
| books         | health     | productivity |
| business      | lifestyle  | reference    |
| catalogs      | magazines  | shopping     |
| education     | medical    | social       |
| entertainment | music      | sports       |
| finance       | navigation | travel       |
| food          | news       | utilities    |
| games         | photo      | weather      |


The 11 appdata variables are:

| #  | Variable                | Date Type  | Description                                |
|----|-------------------------|------------|--------------------------------------------|
| 1  | id                      | Nominal    | App Id from the App Store                  |
| 2  | name                    | Nominal    | App Name                                   |
| 3  | description             | Text       | App Description                            |
| 4  | category_id             | Nominal    | Numeric category identifier                |
| 5  | category                | Nominal    | Category name                              |
| 6  | price                   | Continuous | App Price                                  |
| 7  | developer_id            | Nominal    | Identifier for the developer               |
| 8  | developer               | Nominal    | Name of the developer                      |
| 9  | rating                  | Interval   | Average user rating since first released   |
| 10 | ratings                 | Discrete   | Number of ratings since first release      |
| 11 | released                | DateTime   | Datetime of first release                  |

In [1]:
from IPython.display import display, HTML

from appvoc.container import AppVoCContainer
from appvoc.data.acquisition.appdata.controller import AppDataController
from appvoc.data.dataset.appdata import AppDataDataset

In [2]:
container = AppVoCContainer()
container.init_resources()
container.wire(packages=["appvoc.data.acquisition", "appvoc.data.dataset"])
repo = container.data.uow().appdata_repo

In [3]:
TERMS = ["books", "business", "catalogs", "education", "entertainment", "finance", "food", "games", "health", "lifestyle", "magazines", "medical", "music", "navigation", "news", "photo", "productivity", "reference", "shopping", "social", "sports", "travel", "utilities", "weather"]
DECK = ["business", "catalogs", "education", "entertainment", "finance", "food", "games", "health", "lifestyle", "magazines", "medical", "music", "navigation", "news", "photo", "productivity", "reference", "shopping", "social", "sports", "travel", "utilities", "weather"]

In [4]:
controller = AppDataController()
controller.scrape(terms=DECK)        

[08/31/2023 05:36:56 AM] [INFO] [AppDataController] [_get_or_start_project] : 
Retrieved Project:
None
[08/31/2023 05:36:56 AM] [INFO] [AppDataController] [_get_or_start_project] : 

Started project for Business apps.


[08/31/2023 05:38:16 AM] [INFO] [AppDataController] [_update_report_stats] : Term: Business	Pages: 10	Apps: 2000	Elapsed Time: 0:01:20.364541	Rate: 24.89 apps per second.
[08/31/2023 05:39:36 AM] [INFO] [AppDataController] [_update_report_stats] : Term: Business	Pages: 20	Apps: 4000	Elapsed Time: 0:02:40.138430	Rate: 24.98 apps per second.
[08/31/2023 05:40:05 AM] [INFO] [AppDataController] [_update_report_stats] : Term: Business	Pages: 30	Apps: 5988	Elapsed Time: 0:03:09.239488	Rate: 31.64 apps per second.
[08/31/2023 05:40:27 AM] [INFO] [AppDataController] [_update_report_stats] : Term: Business	Pages: 40	Apps: 7947	Elapsed Time: 0:03:31.686944	Rate: 37.54 apps per second.
[08/31/2023 05:40:36 AM] [INFO] [AppDataController] [_update_report_stats] : Term: Business	Pages: 50	Apps: 9898	Elapsed Time: 0:03:40.414890	Rate: 44.91 apps per second.
[08/31/2023 05:40:45 AM] [INFO] [AppDataController] [_update_report_stats] : Term: Business	Pages: 60	Apps: 11822	Elapsed Time: 0:03:49.163036	Ra

## AppData Dataset Overview

In [5]:
df = repo.getall()
dataset = AppDataDataset(df=df)


## AppData Dataset Profile

In [6]:
dataset.info()

TypeError: 'DataFrame' object is not callable

## AppData Dataset Summary

In [None]:
dataset.summary()