# AppData Acquisition
Acquisition of the core descriptive, rating, and app data; collectively termed AppData, is encapsulated in this notebook. The data acquisition pipeline extracted 11 variables for 24 search terms, loosely corresponding to Apple's app category taxonomy. 

|               |            |              |
|---------------|------------|--------------|
| books         | health     | productivity |
| business      | lifestyle  | reference    |
| catalogs      | magazines  | shopping     |
| education     | medical    | social       |
| entertainment | music      | sports       |
| finance       | navigation | travel       |
| food          | news       | utilities    |
| games         | photo      | weather      |


The 11 app variables are:

| #  | Variable                | Date Type  | Description                                |
|----|-------------------------|------------|--------------------------------------------|
| 1  | id                      | Nominal    | App Id from the App Store                  |
| 2  | name                    | Nominal    | App Name                                   |
| 3  | description             | Text       | App Description                            |
| 4  | category_id             | Nominal    | Numeric category identifier                |
| 5  | category                | Nominal    | Category name                              |
| 6  | price                   | Continuous | App Price                                  |
| 7  | developer_id            | Nominal    | Identifier for the developer               |
| 8  | developer               | Nominal    | Name of the developer                      |
| 9  | rating                  | Interval   | Average user rating since first released   |
| 10 | ratings                 | Discrete   | Number of ratings since first release      |
| 11 | released                | DateTime   | Datetime of first release                  |

In [1]:
from IPython.display import display, HTML

from appvoc.container import AppVoCContainer
from appvoc.data.acquisition.app.controller import AppDataController
from appvoc.data.dataset.app import AppDataDataset

ModuleNotFoundError: No module named 'studioai.visual'

In [None]:
container = AppVoCContainer()
container.init_resources()
container.wire(packages=["appvoc.data.acquisition", "appvoc.data.dataset"])
repo = container.data.uow().app_repo

In [None]:
TERMS = ["books", "business", "catalogs", "education", "entertainment", "finance", "food", "games", "health", "lifestyle", "magazines", "medical", "music", "navigation", "news", "photo", "productivity", "reference", "shopping", "social", "sports", "travel", "utilities", "weather"]
DECK = ["shopping"]

In [None]:
controller = AppDataController()
controller.scrape(terms=DECK)        

## AppData Dataset Overview

In [None]:
df = repo.getall()
dataset = AppDataDataset(df=df)


## AppData Dataset Profile

In [None]:
dataset.info()

## AppData Dataset Summary

In [None]:
dataset.summary()