# AppStore 
As of 2022, Apple's App Store was home to some 1.76 million apps and over 460,000 games. The aim of this exercise is to obtain app rating and review data for select categories for sentiment analysis, product analytics, opportunity discovery, and selection.  Our data acquisition centers on three entities: AppData, Rating, and Review, which are described below.

## AppData
The AppData entity encapsulates the core data for each app and is defined as follows.  

| #  | attribute     | type  | description                                  | API Field         |
|----|---------------|-------|----------------------------------------------|-------------------|
| 1  | id:           | int   | Unique Apple App Identifier                  | trackId           |
| 2  | name:         | str   | Name of the app.                             | trackName         |
| 3  | description:  | str   | Description                                  | description       |
| 4  | category_id:  | int   | Four digit category identifier               | primaryGenreId    |
| 5  | category:     | str   | Category name                                | primaryGenreName  |
| 6  | price:        | float | Cost of the app                              | price             |
| 7  | rating:       | float | The user average rating                      | averageUserRating |
| 8  | ratings:      | int   | The rating count                             | userRatingCount   |
| 9  | developer_id: | int   | The app developer identifier                 | artistId          |
| 10 | developer:    | str   | The app developer name                       | artistName        |
| 11 | released:     | str   | The date of initial release                  | releaseDate       |
| 12 | source:       | str   | The host from which the data were obtained.  | itunes.apple.com  |

The data acquisition pipeline will obtain app data for the following categories and persist them in an RDBMS table.

1. business
2. education
3. entertainment
4. health
5. lifestyle
6. medical
7. productivity
9. social_networking


### Imports

In [9]:
from aimobile.container import AIMobileContainer
from aimobile.data.acquisition.appstore.appdata.controller import AppStoreAppDataController

In [10]:
TERMS_COMPLETE = ["health", "productivity", "social", "business", "education", "entertainment", "lifestyle", "medical"]
TERMS = ["finance", "food", "music", "reference", "photo", "video", "shopping", "travel", "utilities"]

### Dependencies

In [11]:
container = AIMobileContainer()
container.init_resources()
container.wire(packages=["aimobile.data.acquisition.appstore"])

### AppData Acquisition
General App metadata, such as name, price, description, as well as rating statistics are obtained here.

In [12]:
controller = AppStoreAppDataController()
controller.scrape(terms=TERMS)
controller.summary()


[05/01/2023 10:00:45 AM] [INFO] [AppStoreAppDataController] [_scrape] : Project for finance is complete. Skipping ahead to next project
[05/01/2023 10:00:45 AM] [INFO] [AppStoreAppDataController] [_scrape] : Project for food is complete. Skipping ahead to next project
[05/01/2023 10:01:21 AM] [INFO] [AppStoreAppDataController] [_update_report_stats] : Term: Music	Pages: 220	Apps: 34954	Elapsed Time: 0:00:36.192947	Rate: 965.77 apps per second.
[05/01/2023 10:02:54 AM] [INFO] [AppStoreAppDataController] [_update_report_stats] : Term: Music	Pages: 230	Apps: 36327	Elapsed Time: 0:02:08.789749	Rate: 282.06 apps per second.
[05/01/2023 10:04:25 AM] [INFO] [AppStoreAppDataController] [_update_report_stats] : Term: Music	Pages: 240	Apps: 37794	Elapsed Time: 0:03:39.774487	Rate: 171.97 apps per second.
[05/01/2023 10:05:58 AM] [INFO] [AppStoreAppDataController] [_update_report_stats] : Term: Music	Pages: 250	Apps: 39203	Elapsed Time: 0:05:12.974996	Rate: 125.26 apps per second.
[05/01/2023 10:

Project(host='itunes.apple.com', controller='AppStoreAppDataController', term='utilities', status='complete', page_size=200, pages=85, vpages=83, apps=16650, started=datetime.datetime(2023, 5, 1, 19, 14, 1, 68879), updated=datetime.datetime(2023, 5, 1, 19, 31, 1, 35705), completed=datetime.datetime(2023, 5, 1, 19, 31, 13, 767190), id=7)

In [13]:
repo = container.data.project_repo()
repo.getall()

Unnamed: 0,host,controller,term,status,page_size,pages,vpages,apps,started,updated,completed,id
0,itunes.apple.com,AppStoreAppDataController,finance,complete,200,144,133,26629,2023-05-01 04:18:55,2023-05-01 09:24:06.675565,2023-05-01 09:24:25.231438,0
1,itunes.apple.com,AppStoreAppDataController,food,complete,200,144,133,26629,2023-05-01 08:57:55,2023-05-01 09:24:06.675565,2023-05-01 09:24:25.231438,0
2,itunes.apple.com,AppStoreAppDataController,music,complete,200,935,541,108396,2023-05-01 09:24:47,2023-05-01 11:48:01.704748,2023-05-01 11:48:31.568525,1
3,itunes.apple.com,AppStoreAppDataController,reference,complete,200,1126,584,116951,2023-05-01 11:49:01,2023-05-01 14:32:20.048435,2023-05-01 14:32:27.055998,2
4,itunes.apple.com,AppStoreAppDataController,photo,complete,200,354,267,53471,2023-05-01 14:33:02,2023-05-01 15:36:45.637017,2023-05-01 15:36:55.421950,3
5,itunes.apple.com,AppStoreAppDataController,video,complete,200,419,308,61627,2023-05-01 15:37:32,2023-05-01 16:52:33.717146,2023-05-01 16:52:56.028587,4
6,itunes.apple.com,AppStoreAppDataController,shopping,complete,200,528,447,89482,2023-05-01 16:53:36,2023-05-01 18:31:42.507489,2023-05-01 18:32:04.767282,5
7,itunes.apple.com,AppStoreAppDataController,travel,complete,200,224,171,34254,2023-05-01 18:32:50,2023-05-01 19:12:53.240372,2023-05-01 19:13:14.221534,6
8,itunes.apple.com,AppStoreAppDataController,utilities,complete,200,85,83,16650,2023-05-01 19:14:01,2023-05-01 19:31:01.035705,2023-05-01 19:31:13.767190,7
