# AppData Acquisition
Acquisition of the core descriptive, rating, and app data; collectively termed AppData, is encapsulated in this notebook. The data acquisition pipeline extracted 15 variables for 16 search terms, loosely corresponding to Apple's app category taxonomy. 

|               |              |
|---------------|--------------|
| business      | music        |
| education     | photo        |
| entertainment | productivity |
| finance       | reference    |
| food          | shopping     |
| health        | social       |
| lifestyle     | travel       |
| medical       | utilities    |


The 16 appdata variables are:

| #  | Variable                | Date Type  | Description                                |
|----|-------------------------|------------|--------------------------------------------|
| 1  | id                      | Nominal    | App Id from the App Store                  |
| 2  | name                    | Nominal    | App Name                                   |
| 3  | description             | Text       | App Description                            |
| 4  | category_id             | Nominal    | Numeric category identifier                |
| 5  | category                | Nominal    | Category name                              |
| 6  | price                   | Continuous | App Price                                  |
| 7  | developer_id            | Nominal    | Identifier for the developer               |
| 8  | developer               | Nominal    | Name of the developer                      |
| 9  | rating                  | Interval   | Average user rating since first released   |
| 10 | ratings                 | Discrete   | Number of ratings since first release      |
| 11 | rating_current_version  | Interval   | Average customer rating of current release |
| 12 | ratings_current_version | Discrete   | Numer of user ratings for current release  |
| 13 | released                | DateTime   | Datetime of first release                  |
| 14 | released_current        | DateTime   | Datetime of current release                |
| 15 | version                 | Nominal    | Current version of app                     |

In [None]:
from IPython.display import display, HTML

from appstore.container import AppstoreContainer
from appstore.data.acquisition.appdata.controller import AppDataController
from appstore.data.analysis.appdata import AppDataDataset

In [None]:
container = AppstoreContainer()
container.init_resources()
container.wire(packages=["appstore.data.acquisition", "appstore.data.analysis"])
repo = container.data.uow().appdata_repo

In [None]:
TERMS = ["health", "productivity", "social", "business", "education", "entertainment", "lifestyle", "medical",
         "finance", "food", "music", "reference", "photo", "shopping", "travel", "utilities"]

In [None]:
controller = AppDataController()
controller.scrape(terms=TERMS)        

## AppData Dataset Overview

In [5]:
dataset = AppDataDataset()
dataset.structure

: 

## AppData Dataset Profile

In [None]:
dataset.info

## AppData Dataset Summary

In [None]:
dataset.summary