# 1. <a id='toc1_'></a>[AppStore Exploratory Data Analysis](#toc0_)
As of 2022, Apple's App Store was home to some 1.76 million apps and over 460,000 games. For this effort, app, rating, and review data were obtained from the Apple [App Store](https://www.apple.com/app-store/) for the following nine search terms:
1. business
2. education
3. entertainment
4. health
5. lifestyle
6. medical
7. productivity
9. social_networking

Three datasets comprise the App Store data collection: 
- **AppData**: the core dataset containing app name, description, category, the number of ratings, and average ratings;
- **Rating**: rating histogram, and review count data used to prioritize the targeting and collection of review data; and,
- **Review**: Customer reviews of selected apps available in the Apple App Store.

We kick-off the exploratory data analysis with an examination of the AppData and Rating datasets. With this foundation, an exploratory text analysis of the Review dataset will reveal a more nuanced hearing of the voice of the mobile app customer, their satisfaction, sentiment, and needs, met and unmet. After some dependency housekeeping, the remainder of this section is organized as follows.

**Table of contents**<a id='toc0_'></a>    
- 1. [AppStore Exploratory Data Analysis](#toc1_)    
  - 1.1. [AppData](#toc1_1_)    
    - 1.1.1. [AppData Overview](#toc1_1_1_)    
    - 1.1.2. [AppData Univariate Analysis](#toc1_1_2_)    
      - 1.1.2.1. [Nominal Variables](#toc1_1_2_1_)    
      - 1.1.2.2. [AppData Name](#toc1_1_2_2_)    
      - 1.1.2.3. [AppData Description](#toc1_1_2_3_)    

<!-- vscode-jupyter-toc-config
	numbering=true
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

**Dependencies** 

In [1]:
import os

import pandas as pd
from IPython.display import HTML

from aimobile.container import AIMobileContainer
from aimobile.data.analysis.eda import EDA

container = AIMobileContainer()
container.init_resources()
container.wire(packages=["aimobile.data.acquisition"])
pd.set_option('display.max_colwidth', None)

**Dependencies**

<a id='appdata'></a>

## 1.1. <a id='toc1_1_'></a>[AppData](#toc0_)
AppData, the term, encapsulates the core, descriptive, and aggregate rating data for each app as follows:

| #  | attribute     | type  | description                                  | API Field         |
|----|---------------|-------|----------------------------------------------|-------------------|
| 1  | id:           | int   | Unique Apple App Identifier                  | trackId           |
| 2  | name:         | str   | Name of the app.                             | trackName         |
| 3  | description:  | str   | Description                                  | description       |
| 4  | category_id:  | int   | Four digit category identifier               | primaryGenreId    |
| 5  | category:     | str   | Category name                                | primaryGenreName  |
| 6  | price:        | float | Cost of the app                              | price             |
| 7  | rating:       | float | The user average rating                      | averageUserRating |
| 8  | ratings:      | int   | The rating count                             | userRatingCount   |
| 9  | developer_id: | int   | The app developer identifier                 | artistId          |
| 10 | developer:    | str   | The app developer name                       | artistName        |
| 11 | released:     | str   | The date of initial release                  | releaseDate       |
| 12 | source:       | str   | The host from which the data were obtained.  | itunes.apple.com  |



### 1.1.1. <a id='toc1_1_1_'></a>[AppData Overview](#toc0_)
Let's instantiate an EDA object with the appdata from the appdata repository, and get a sense of the overall profile of the data.

In [2]:
uow = container.data.uow()
appdata = uow.appdata_repo.getall()

In [3]:
appdata_eda = EDA(data=appdata)
appdata_eda.overview

Unnamed: 0,Unnamed: 1
Number of Variables,15.0
Number of Observations,153356.0
Number of Cells,2300340.0
Missing Cells,0.0
Missing Cells (%),0.0
Duplicate Rows,200.0
Duplicate Rows (%),0.13
Size (Bytes),565828162.0


The AppData contains a bit over 500,000 apps, described by 12 variables for a total of over 6 million data cells. Let's examine the variable, data types, validity and cardinality of the dataset.

In [4]:
appdata_eda.summary

Unnamed: 0,Column,Dtype,Valid,Missing,Validity,Unique,Cardinality,Size
0,id,int64,153356,0,1.0,151002,0.98,1226848
1,name,object,153356,0,1.0,150877,0.98,12967494
2,description,object,153356,0,1.0,149167,0.97,486371916
3,category_id,int64,153356,0,1.0,26,0.0,1226848
4,category,object,153356,0,1.0,26,0.0,10318505
5,price,float64,153356,0,1.0,97,0.0,1226848
6,developer_id,int64,153356,0,1.0,98556,0.64,1226848
7,developer,object,153356,0,1.0,98410,0.64,11969426
8,rating,float64,153356,0,1.0,33459,0.22,1226848
9,ratings,int64,153356,0,1.0,13548,0.09,1226848


The AppData summary reveals several observations / insights as we prepare for the univariate analysis:

1. Data validity is 100%, revealing no missing data,    
2. The cardinality of the id, name, and description variables suggests some degree of duplication among these variables,   
3. Similarly, developer and developer id have different unique value counts hinting at data quality/cleaning issues,    
4. Our nine search terms returned apps across 26 categories, and
5. Category id and category share the same cardinality
6. Source has a cardinality of 1 and can be ignored.

Yet, as we engage in the exploration and discovery effort, it is essential that the data types are appropriate at the variable level. As such, the following variables will converted to categorical.

- id
- name
- category_id
- category 
- developer_id
- developer

The description variable will be converted to pandas 'string' dtype.

In [5]:
category_vars = ['id', 'name', 'category_id', 'category', 'developer_id', 'developer']
str_vars = ['description']
appdata_eda.astype(vars=category_vars, dtype='category')
appdata_eda.astype(vars=str_vars, dtype='string')
del appdata_eda.summary
appdata_eda.summary

Unnamed: 0,Column,Dtype,Valid,Missing,Validity,Unique,Cardinality,Size
0,id,category,153356,0,1.0,151002,0.98,6048552
1,name,category,153356,0,1.0,150877,0.98,17602670
2,description,string,153356,0,1.0,149167,0.97,486371916
3,category_id,category,153356,0,1.0,26,0.0,154636
4,category,category,153356,0,1.0,26,0.0,156168
5,price,float64,153356,0,1.0,97,0.0,1226848
6,developer_id,category,153356,0,1.0,98556,0.64,3515448
7,developer,category,153356,0,1.0,98410,0.64,10466253
8,rating,float64,153356,0,1.0,33459,0.22,1226848
9,ratings,int64,153356,0,1.0,13548,0.09,1226848


Data type conversion complete. Let's take look at a few sample sets.

In [6]:
appdata_eda.sample()[['id','description']]

Unnamed: 0,id,description
65272,1460656181,【即构科技提供实时音视频云服务支持】 即构科技致力为各个行业提供一站式实时音视频解决方案，ZegoLive解决方案实现了实时音视频直播，多人连麦互动，自带美颜滤镜，直播超低延迟，能够稳定应对全球高并发。
74064,1180725952,"Discover exclusive freelance opportunities in retail at rates starting up to $22/hr with the best brands in fashion and beauty such as Céline, Givenchy, Gucci, and Loewe — just to name a few! Getting started is as simple as:  1. Download the app  2. Build your profile for our brand partners to review and set your work preferences  3. Pick from a wide range of shifts posted daily  4. Get paid! Why work on the POURED app?  1. PREMIUM PAY: Start at rates from $17-22/hour  2. OPPORTUNITY: Exclusive access to premium brands + popups. The only introduction you’ll ever need to take the next step in your career  3. FLEXIBILITY: Build your own schedule. Work as much or as little as you’d like when you’d like  4. ZERO CONSTRAINTS: Work with multiple stores at once to boost your resume and experience  5. GROWTH: Opportunity to transition into a full-time role with a brand Available Positions:  1. Sales Associate  2. Greeter  3. Stock Associate  4. Key Holder Now filling roles in New York, Los Angeles, Orange County, Chicago, Houston & more! QUESTIONS? Visit us at poured.app or send us an email at hello@poured.app"
124773,1558172672,"أنشئ عیادتك الالكترونية الخاصة معنا لتسهيل الوصول لمراجعيك من خلال الطب الاتصالي بكل سهولة وخصوصية.  تطبيق لآن للأطباء يسمح لك كطبيب بتقديم : - الاستشارات الطبية عن بعد ( صوتية، مرئية، محادثة) مع مراجعيك بكل خصوصية. - صرف وصفات طبية بكل سهولة. - كتابة ملاحظاتك وتقاريرك الطبية الخاصة بك و الخاصة بمراجعيك بكل مرونة. - إمكانية الاطلاع على الملف الطبي قبل دخول الاستشارة.  نرحب بك للانضمام معنا من خلال التواصل مع فريقنا لدعمك في فتح عيادتك الخاصة.  Email : info@laancare.com  Website : Laancare.com Social media: @laanApp Start your own medical clinic to provides convenient high-quality and privacy healthcare to your patients. Laan Doctors App provide you with: - Easy virtual medical consultation with your patients via ( video, call, chat). - Prescribe a medication to your patients. - Write a medical reports for your patients. - Get an access to your patient's medical records. Now is the time to join us to reach your patients anywhere anytime. Email : info@laancare.com  Website : Laancare.com Social media: @laanApp"
971,1529962914,"If you want to lose weight, achieve nutrition goals, or simply eat healthily, Oatsy is the app for you! Oatsy lets you track nutrition quicker and easier than ever before. Log food, water, and exercise with ease, and use Oatsy to become your healthiest self! Count calories, track macro & micronutrients, follow diet plans, discover healthy recipes and get daily life score recommendations. Oatsy can help you reach your goals in your way. REACH YOUR GOALS • Advanced food diary - Easily record the food you eat throughout the day with the largest verified nutrition database in the world. • Set goals - Enter a weight loss or weight gain goal and we'll suggest a game plan for your calorie budget. Eat smarter and achieve a calorie deficit. • Custom diet plans - Select the diet plan that fits your personal goals. Choose from Balanced Dieting, Intermittent Fasting, 5:2 Fasting, and Keto Burn / Keto diet. Manage carbs, protein, and fat. • Barcode scanner - Use our scanner to instantly log foods with a barcode. • Helpful graph charts - Check out your day-by-day weight, calorie intake, and exercise journal, to stay on track with your goals. • FitScore life score - Helps you understand your health activity and track your BMI. Remind you if you're low on carbs or need more exercise. • Apple health integration - Sync your steps counting data directly with Apple health. TRACK NUTRITION • Full nutrition macro tracker & calorie counter - Detailed daily nutrition analysis. Count calories and see full details of the foods you ate - from Calories, Carbs, Fat, Protein to Dietary Fiber, Sugars, Saturated / Unsaturated fat, Sodium, Cholesterol, and Potassium. • Artificial Intelligence Food Auto-Scanner - Simply take a picture of your meal and let Oatsy's revolutionary scanner detect and track the food. • Custom food - Efficiently track daily calorie intake by creating your custom dishes and recipes. • Food ranking - Check out foods that help you achieve your goals or accommodate food restrictions, such as weight loss, bodybuilding, digestion, diabetes, high blood pressure, and detox. • Healthy recipes - Instantly copy popular recipes to your custom food list, or share your favorite recipe with the world! Log in healthy portions, and know all the macro breakdown. The ultimate eating guide with vegan recipes included! • Color-Coded rating system - Show foods that you should eat more or eat less. • Restaurant Foods Menu - Add food items from popular restaurants so you don't have to worry about nutrition accuracy. Accurate calorie calculator from a variety of sources. • Daily nutrition summary - The only calorie counter that shows you all the micro & macronutrients for free. Stop paying to count calories! • Water Tracker - Stay hydrated and stay healthy. Track water intake with an easy tap. TRACK EXERCISE • Track fitness activities - Select from a variety of activities and workouts to log your exercise. Lose weight and stay healthy. • Custom exercises - Don't see your activity on the list? Create a custom exercise to add to your log. • Count steps - Integrated with Apple Health to automatically count your steps from FitBit and other fitness trackers. ENGAGE WITH THE COMMUNITY • Follow your friends' calorie-counting journeys. Keep up to date with the latest news in nutrition. Learn from nutritionists. • Share exercise pics, weight progress, healthy recipes, workout videos, and more. Don't just eat it, track it! Download the Oatsy calorie counter free today and get started with your healthy diet!"
111332,590259379,"VidistarViewer™ offers medical professionals instant access to their medical images and reports from their mobile Apple device(s). With VidistarViewer™, users may view still images, video clips, patient information, diagnostic reports, and review their diagnostic DICOM studies (i.e. echos, vascular studies, OBGYN, nuclear, cath, angio, etc.). Review Studies from Anywhere VidistarViewer™ includes similar functionality to VidiStar's web-solutions, and allows users to review studies and reports regardless of location. Engage Your Patients Use your iPad to show your patient their diagnostic study and enhance their engagement in their plan of care by educating them on their condition, thus heightening their understanding and furthering your patient-relationship."


### 1.1.2. <a id='toc1_1_2_'></a>[AppData Univariate Analysis](#toc0_)
#### 1.1.2.1. <a id='toc1_1_2_1_'></a>[ID](#toc0_)

In [7]:
appdata_eda.describe(x='id')

Unnamed: 0,count,unique,top,freq
id,153356,151002,1041591359,4


Combining the summary from above, we have a cardinality ratio 0.9, with a maximum of 6 id occurrences. Let's take a closer look a the frequency distribution.

In [8]:
id_value_counts = appdata_eda.value_counts(x="id", threshold=2)
id_value_counts = id_value_counts['count'].value_counts().to_frame().reset_index()
id_value_counts.columns = ['Number of Occurrences', 'Number of App Ids']
id_value_counts

Unnamed: 0,Number of Occurrences,Number of App Ids
0,2,1838
1,3,255
2,4,2


The above indicates nearly 44500 app ids that occur twice in the dataset, 3189 app ids that are present 

#### 1.1.2.2. <a id='toc1_1_2_2_'></a>[AppData Name](#toc0_)

In [9]:
appdata_eda.describe(x="name")
name_value_counts = appdata_eda.value_counts(x="name", threshold=2)
name_value_counts['count'].value_counts().to_frame()

Unnamed: 0,count,unique,top,freq
name,153356,150877,Withings Thermo,4


Unnamed: 0,count
2,1957
3,258
4,2


The cardinality for the name variable looks conspicuously similar to that of the id variable.

#### 1.1.2.3. <a id='toc1_1_2_3_'></a>[AppData Description](#toc0_)

In [10]:
appdata_eda.describe(x="description")
desc_value_counts = appdata_eda.value_counts(x="description", threshold=2)
desc_value_counts['count'].value_counts().to_frame()

Unnamed: 0,count,unique,top,freq
description,153356,149167,"De app is een onmisbare tool voor ouders, spelers, trainers, coaches en managers. \n\nDe app bevat onder meer:\n\n- Altijd het laatste clubnieuws\n- Uitgebreide wedstrijddetails, trainingen, scheidsrechters en aanwezigheid\n- Een slimme persoonlijke timeline\n- Gast-modus\n- Agenda-synchronisatie\n- Toewijzen van taken via wedstrijddetails voor teamondersteuning\n- Pushberichten voor clubnieuws\n- Bier/ limonade-pot\n- Wedstrijdschema\n- Trainingsschema",134


Unnamed: 0,count
2,2518
3,333
4,31
5,16
6,9
8,8
7,4
19,3
9,3
10,3


In [11]:
pd.set_option('display.max_colwidth', None)
uow.appdata_repo.get(id='1208362996')['description']

0    A daily journal for self-compassion.\n\nBe your own best friend, cheerleader, spiritual guru, etc. with our carefully crafted daily journal prompts. It’s an easy way to incorporate self reflection into your life, and reflect on all the things that matter. \n\nYour diary is not just about writing what you ate for lunch. Instead, you can befriend yourself with thought-provoking prompts from day one. Reflect on your hopes and dreams, and really listen to yourself like a compassionate friend. If you’ve ever struggled to keep up with a diary, our daily journal prompts can remove the struggle of thinking of things to say, and make this journaling habit easy to maintain.\n\nEach journal prompt comes with “Dive Deeper” questions to help you get the most out of your self care journal:\nE.g. For the writing prompt: “Love is…”, reflect on Dive Deeper questions like “How do you show love towards others? How do others show love towards you? What was your first memory of feeling loved? Does lov