# Week 2 - Google Analytics and Python

## Section 1: Introduction to Python
Python was created in 1990 by Guido van Rossu as a general-purpose and high-level programming language. It has become extremely popular over the past decade because of its intuitive nature, flexibility, and versatility. According to the Developer Nation's recent 30,000 developer survey, Python is among the top three programming language choices of 2023. Python was rated the most popular in data science, machine learning, and artificial intelligence.

I hope that this class can show you the charm of phython and motivate you to continue the learning of this great programming language and its associated libraries for data science and machine learning.

### 1.1 Variables/Data structure
Let's try to create a variable x. The equality sign means "assignment". Variable names can be anything as long as:
- it contains only letters, numbers, or underscores.
- the first character is not a number.
- the variable name is not one of the reserved keywords.

While the variable names can be of anything, it is recommended that you use either of these:
- camel case (e.g., variableName)
- snake case (e.g., variable_name)



### 1.2 Data Type & Operators
The primary data types within Pyton are integers, floats, and strings. They can be further stored in Python as lists, tuples, and dictionaries. 

In [4]:
x=30 #integer
y=1.23 #float
weather="cloudy" #string


Operators are symbols that perform functions. 


In [5]:
A=2
B=3
C=A+B # the addition operation: +
print(C)

5


In [6]:
A=4
B=3
C=A-B
print(C) # the subtraction operator: -

1


In [7]:
A=3
B=2
C=A*B
print(C) # the multiplication operator: *

6


In [8]:
A=9
B=3
C=A/B
print(C) # the division operator: /

3.0


In [9]:
A=7
B=2
C=A//2
print(C) # the floor division operator divides and then rounds down to the nearest whole number.


3


In [10]:
A=9
B=2
C=A**B
print(C) # the exponent operator raises the first value to the power of the second value.

81


### 1.3 Lists and Dataframe
The way data is stored is called its structure.   
#### 1.3.1 List
Lists are collections of items.

In [None]:
#lists
fruits=["apple","pear","orange","banana"] # a list.

Apple=fruits[0] #Python is a zero-based language.

print(Apple)
fruits.append("watermelon")

print(fruits)
fruits.remove("pear")

print(fruits)

#### 1.3.2 Dictionary
Dictionaries hold data that can be retrieved with reference items, or keys.

In [23]:
data = {
  "calories": 100,
  "duration": 10
}

print(data["calories"])

100


In [26]:
# the value assigned to a key can be a list.
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

print(data["calories"][1])

380


#### 1.3.3 Dataframe
A Pandas DataFrame is a 2 dimensional data structure, like a table with rows and columns.

In [27]:
import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}
 
#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df) 



   calories  duration
0       420        50
1       380        40
2       390        45


In [28]:
#index one column of a dataframe
df['calories']

0    420
1    380
2    390
Name: calories, dtype: int64

In [32]:
#index one row 
df.loc[0]

calories    420
duration     50
Name: 0, dtype: int64

In [30]:
df[df['calories'] >400] 

Unnamed: 0,calories,duration
0,420,50


### 1.4 Conditional statements & Control flow statement

As we write programs, we will need to carry out specific actions based on certain conditions (==, <, <=, >, >=, !=). Conditional statements are used to evaluate whether these certain conditions are being met.

The comparison operators can be combined with different logical operators (and, or, not)

In [11]:
#if statement

A=10
B=20
if A<B:
    print("A is less than B")
else:
    print("B is less than A")

#for loops 

iterable=[1,2,3]
for x in iterable:
    print(x)

A is less than B
1
2
3


### 1.5 Functions
Functions are prewritten blocks of code that can be invoked to carry out a certain set of actions. For example, print() is a function. You can call functions in multiple ways. 
- The most intuitive way is to use the function name, followed by parantheses.
- Another way is to use "dot notation" by placing a period before the name of the function and after a specific object. For example, 
```
target_object.function_name()
```
- Sometimes we need to provide the function with certain variables or data values. They are called "parameters" or "arguments". They are passed to the function by putting them within a set of parentheses that follows the function name. For example, 
```python
print("Hello!")
```

#### 1.5.1 Create Your Own Function
In order to tell Python that you would like to create a function, you can use the def keyword. After the def keyword,you provide function name and any arguments you function will make use of. Then you can begin writing the commands.
```python
def name(parameters):
    Code to carry out desired actions.
```
Your functions will often require another keyword, the **return** keyword, to specify an expression, variable, or value you would like the function to pass back out to the main program once the function has finished running.
```python
def name(parameters):
    Code to carry out desired actions.
    return desiredExpression
``` 

If your function returns a value, you can assign that value to a variable by calling the function and assigning it to a variable.

```
returned_value=function_used(list of parameters)
```


## Section 2: Google Analytics Data Collection

More information about dimensions and metrics can be found [here](https://support.google.com/analytics/answer/9143382?hl=en#zippy=%2Cattribution%2Cdemographics%2Cecommerce%2Cevent%2Cgaming%2Cgeneral%2Cgeography%2Clink%2Cpage-screen%2Cplatform-device%2Cpublisher%2Ctime%2Ctraffic-source%2Cuser%2Cuser-lifetime%2Cvideo%2Cadvertising%2Cpredictive%2Crevenue%2Csearch-console%2Csession).

In [None]:
#!pip3 install google.analytics.data

In [17]:
from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.analytics.data_v1beta.types import (
    DateRange,
    Dimension,
    Metric,
    RunReportRequest,
)
import os
import pandas as pd
import json

def sample_run_report(property_id="424145747"):
    """Runs a simple report on a Google Analytics 4 property."""
    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'apt-port-251804-905e08b9e9e3.json'
    client = BetaAnalyticsDataClient()

    request = RunReportRequest(
        property=f"properties/{property_id}",
        dimensions=[Dimension(name="city"),Dimension(name="date")], #Dimension(name="browser"),
        metrics=[Metric(name="activeUsers")],
        date_ranges=[DateRange(start_date="2024-01-01", end_date="today")],
    )
    response = client.run_report(request)
    return response



def response_to_df(response):
    columns = []
    rows = []
     
    for col in response.dimension_headers:
        columns.append(col.name)
    for col in response.metric_headers:
        columns.append(col.name)
     
    for row_data in response.rows:
        row = []
        for val in row_data.dimension_values:
            row.append(val.value)
        for val in row_data.metric_values:
            row.append(val.value)
        rows.append(row)
    return pd.DataFrame(rows, columns=columns)

response=sample_run_report(property_id="424145747")
df=response_to_df(response)

print(df)

               city      date activeUsers
0         (not set)  20240203           3
1            Dallas  20240221           2
2         (not set)  20240124           1
3         Abbeville  20240203           1
4         Abbeville  20240204           1
5       Bloomington  20240124           1
6            Dallas  20240216           1
7            Dallas  20240220           1
8        Richardson  20240120           1
9        Richardson  20240124           1
10       Richardson  20240130           1
11       Richardson  20240203           1
12       Richardson  20240221           1
13  University Park  20240202           1


## Section 3: Statistical Test

### 3.1 Independent t-test (two-sample t-test): 
Used to compare the means of two independent groups.

### 3.2 Paired t-test: 
Used to compare the means of the same group at two different times or under two different conditions.

In [14]:
from scipy import stats
df['activeUsers'] = df['activeUsers'].astype(int)


group_a = df[df['city'] == 'Richardson']['activeUsers']
print(group_a)
group_b = df[df['city'] != 'Richardson']['activeUsers']
print(group_b)
# Perform the t-test
t_stat, p_value = stats.ttest_ind(group_a, group_b)

# Output the results
print(f"T-statistic: {t_stat}, P-value: {p_value}")

8     1
9     1
10    1
11    1
12    1
Name: activeUsers, dtype: int32
0     3
1     2
2     1
3     1
4     1
5     1
6     1
7     1
13    1
Name: activeUsers, dtype: int32
T-statistic: -1.035098339013531, P-value: 0.3210327709321358
