# AppData Acquisition
Acquisition of the core descriptive, rating, and app data; collectively termed AppData, is encapsulated in this notebook. The data acquisition pipeline extracted 15 variables for 16 search terms, loosely corresponding to Apple's app category taxonomy. 

|               |              |
|---------------|--------------|
| business      | music        |
| education     | photo        |
| entertainment | productivity |
| finance       | reference    |
| food          | shopping     |
| health        | social       |
| lifestyle     | travel       |
| medical       | utilities    |


The 16 appdata variables are:

| #  | Variable                | Date Type  | Description                                |
|----|-------------------------|------------|--------------------------------------------|
| 1  | id                      | Nominal    | App Id from the App Store                  |
| 2  | name                    | Nominal    | App Name                                   |
| 3  | description             | Text       | App Description                            |
| 4  | category_id             | Nominal    | Numeric category identifier                |
| 5  | category                | Nominal    | Category name                              |
| 6  | price                   | Continuous | App Price                                  |
| 7  | developer_id            | Nominal    | Identifier for the developer               |
| 8  | developer               | Nominal    | Name of the developer                      |
| 9  | rating                  | Interval   | Average user rating since first released   |
| 10 | ratings                 | Discrete   | Number of ratings since first release      |
| 11 | rating_current_version  | Interval   | Average customer rating of current release |
| 12 | ratings_current_version | Discrete   | Numer of user ratings for current release  |
| 13 | released                | DateTime   | Datetime of first release                  |
| 14 | released_current        | DateTime   | Datetime of current release                |
| 15 | version                 | Nominal    | Current version of app                     |

In [1]:
from IPython.display import display, HTML

from appstore.container import AppstoreContainer
from appstore.data.acquisition.appdata.controller import AppDataController
from appstore.data.analysis.appdata import AppDataDataset

In [2]:
container = AppstoreContainer()
container.init_resources()
container.wire(packages=["appstore.data.acquisition", "appstore.data.analysis"])
repo = container.data.uow().appdata_repo

In [3]:
TERMS = ["health", "productivity", "social", "business", "education", "entertainment", "lifestyle", "medical",
         "finance", "food", "music", "reference", "photo", "shopping", "travel", "utilities"]

In [4]:
# controller = AppDataController()
# controller.scrape(terms=TERMS)        

## AppData Dataset Overview

In [5]:
dataset = AppDataDataset()
dataset.structure

Unnamed: 0,Characteristic,Total
0,Number of Observations,475132
1,Number of Variables,11
2,Number of Cells,5226452
3,Size (Bytes),962579922


## AppData Dataset Profile

In [7]:
dataset.info()

AttributeError: 'AppDataDataset' object has no attribute 'info'

## AppData Dataset Summary

In [8]:
dataset.summary

Unnamed: 0,Category,Examples,Apps,Average Rating,Rating Count
0,Finance,58428,58428,1.98,129336005
1,Shopping,40903,40903,2.28,117443339
2,Medical,34203,34203,1.57,9257369
3,Social Networking,32855,32855,1.77,44232524
4,Music,30630,30630,2.45,69215558
5,Health & Fitness,29584,29584,3.06,41880978
6,Education,29184,29184,2.63,35866830
7,Business,27099,27099,2.22,41241961
8,Games,25246,25246,3.74,213422210
9,Reference,22072,22072,2.2,18447863
