# AppStore 
As of 2022, Apple's App Store was home to some 1.76 million apps and over 460,000 games. The aim of this exercise is to obtain app rating and review data for select categories for sentiment analysis, product analytics, opportunity discovery, and selection.  Our data acquisition centers on three entities: AppData, Rating, and Review, which are described below.

## AppData
The AppData entity encapsulates the core data for each app and is defined as follows.  

| #  | attribute     | type  | description                                  | API Field         |
|----|---------------|-------|----------------------------------------------|-------------------|
| 1  | id:           | int   | Unique Apple App Identifier                  | trackId           |
| 2  | name:         | str   | Name of the app.                             | trackName         |
| 3  | description:  | str   | Description                                  | description       |
| 4  | category_id:  | int   | Four digit category identifier               | primaryGenreId    |
| 5  | category:     | str   | Category name                                | primaryGenreName  |
| 6  | price:        | float | Cost of the app                              | price             |
| 7  | rating:       | float | The user average rating                      | averageUserRating |
| 8  | ratings:      | int   | The rating count                             | userRatingCount   |
| 9  | developer_id: | int   | The app developer identifier                 | artistId          |
| 10 | developer:    | str   | The app developer name                       | artistName        |
| 11 | released:     | str   | The date of initial release                  | releaseDate       |
| 12 | source:       | str   | The host from which the data were obtained.  | itunes.apple.com  |

The data acquisition pipeline will obtain app data for the following categories and persist them in an RDBMS table.

1. business
2. education
3. entertainment
4. health
5. lifestyle
6. medical
7. productivity
9. social_networking


### Imports

In [1]:
from IPython.display import display_html

from aimobile.scraper.appstore.container import AppStoreContainer
from aimobile.scraper.appstore.service.appdata import AppStoreScraper

In [None]:
DIRECTORY = "data/appstore/archive/"
TERMS = ["business", "education", "entertainment", "health", "lifestyle", "medical", "productivity", "social_networking"]

### Dependencies

In [2]:
container = AppStoreContainer()
container.init_resources()
container.wire(
    modules=[
        "aimobile.scraper.appstore.container",
        "aimobile.scraper.appstore.repo.datacentre",
        "aimobile.scraper.appstore.service.reviews",
        "aimobile.scraper.appstore.service.appdata",
    ]
)

### DataCentre 
This master repository will contain the AppData, Ratings, and Review data. DataCentre implements the Unit of Work behavior pattern, manifesting transaction control on the underlying repositories.

In [4]:
datacentre = container.datacentre.repo()

### AppData Scraper
AppStoreScraper object will extract the App Store App Data as per our data model. The following iterates through our search terms above, scrapes the app data, and stores the data in the appdata RDBMS table.

In [None]:
scraper = AppStoreScraper()

In [None]:

for TERM in TERMS:
    scraper.search(term=TERM)
    datacentre.appdata_repository.save(term=TERM, directory=DIRECTORY)


### App Data Repository
The AppData is aggregated and summarized by App Store category (not term). Here, we see the app counts, average ratings, and average rating counts for the categories relating to our search terms.

In [5]:
df1 = datacentre.appdata_repository.summarize()    
df2 = df1.copy()
df2.sort_values(by=["Average Rating"], ascending=False, inplace=True)
df3 = df1.copy()
df3.sort_values(by=["Average Rating Count"], ascending=False, inplace=True)

    

In [6]:
df1_styler = df1.style.set_table_attributes("style='display:inline'").set_caption('By App Count')
df2_styler = df2.style.set_table_attributes("style='display:inline'").set_caption('By Average Rating')
df3_styler = df3.style.set_table_attributes("style='display:inline'").set_caption('By Average Rating Count')
display_html(df1_styler._repr_html_() + df2_styler._repr_html_() + df3_styler._repr_html_(), raw=True)

Unnamed: 0,Category,App Count,Average Rating,Average Rating Count,Total Rating Count
0,Medical,138680,1.542831,203.74284,28255057
1,Health & Fitness,67879,2.743961,1189.693396,80755198
2,Social Networking,63761,1.7792,993.99724,63378258
3,Education,27807,2.562092,2281.806236,63450186
4,Business,27534,2.676952,2321.213627,63912296
5,Lifestyle,21507,3.085214,3687.584833,79308887
6,Games,19887,4.08629,12061.642178,239869878
7,Productivity,15179,3.107982,4786.305751,72651335
8,Utilities,12559,2.800103,3567.250418,44801098
9,Entertainment,10278,3.329343,10063.506227,103432717

Unnamed: 0,Category,App Count,Average Rating,Average Rating Count,Total Rating Count
6,Games,19887,4.08629,12061.642178,239869878
19,Music,2467,3.750132,24057.367248,59349525
22,Graphics & Design,570,3.711632,5776.491228,3292600
10,Finance,6720,3.635785,9655.258482,64883337
25,Developer Tools,83,3.592212,540.710843,44879
23,Weather,281,3.543433,22075.295374,6203158
20,Book,1468,3.419606,13599.675068,19964323
14,Food & Drink,3794,3.392027,14815.091724,56208458
9,Entertainment,10278,3.329343,10063.506227,103432717
13,Photo & Video,5309,3.287068,26043.188359,138263287

Unnamed: 0,Category,App Count,Average Rating,Average Rating Count,Total Rating Count
13,Photo & Video,5309,3.287068,26043.188359,138263287
19,Music,2467,3.750132,24057.367248,59349525
23,Weather,281,3.543433,22075.295374,6203158
11,Shopping,6180,3.053258,19214.631715,118746424
14,Food & Drink,3794,3.392027,14815.091724,56208458
15,Travel,3371,2.824528,13979.663305,47125445
20,Book,1468,3.419606,13599.675068,19964323
6,Games,19887,4.08629,12061.642178,239869878
18,News,2569,2.456827,10863.087193,27907271
9,Entertainment,10278,3.329343,10063.506227,103432717
