In [35]:
import pandas as pd
import sqlite3

### Create a connection to the database using the library sqlite3

In [36]:
con = sqlite3.connect('../data/checking-logs.sqlite')

In [37]:
cursor = con.cursor()
cursor.execute('SELECT name FROM sqlite_master WHERE type="table"')
print(cursor.fetchall())

[('pageviews',), ('checker',), ('deadlines',), ('datamart',), ('test',), ('control',)]


## Using only one query for each of the groups, create two dataframes: `test_results` and `control_results` with the columns `time` and `avg_diff` and only *two rows*

- `time` should have the values: `after` and `before`
- `avg_diff` contains the *average delta* among all the users for the time period ***before*** each of them made their first visit to the page and ***after***ward
- only take into account the users that have observations before and after

### `test` table info

In [38]:
pd.read_sql('SELECT * FROM test LIMIT 2', con)

Unnamed: 0,index,uid,labname,first_commit_ts,first_view_ts
0,0,user_17,project1,2020-04-18 07:56:45.408648,2020-04-18 10:56:55.833899
1,1,user_30,laba04,2020-04-18 13:36:53.971502,2020-04-17 22:46:26.785035


### `deadlines` table info

In [39]:
pd.read_sql('SELECT * FROM deadlines LIMIT 2', con)

Unnamed: 0,index,labs,deadlines
0,0,laba04,1587945599
1,1,laba04s,1587945599


---
### Creating  `test_results` table

> We want to calculate the `average delta` (first commit - deadline) ***before*** that timestamp and ***after*** that timestamp (`first_view_ts`)

In [40]:
query = """
SELECT
    'before' AS time,
    AVG(delta_before) as before
FROM
    (SELECT
        CAST((julianday(t.first_commit_ts) - julianday(d.deadlines, 'unixepoch')) * 24 AS INTEGER) as delta_before
    FROM
        test t
    LEFT JOIN
        deadlines d
    ON
        t.labname = d.labs
        WHERE
        t.labname != 'project1'
        AND
        t.first_commit_ts < first_view_ts
    )

UNION

SELECT
    'after' AS time,
    AVG(delta_after) as after
FROM
    (SELECT
        CAST((julianday(t.first_commit_ts) - julianday(d.deadlines, 'unixepoch')) * 24 AS INTEGER) as delta_after
    FROM
        test t
    LEFT JOIN
        deadlines d
    ON
        t.labname = d.labs
        WHERE
        t.labname != 'project1'
        AND
        t.first_commit_ts > first_view_ts
    )
"""
test_results = pd.read_sql(query, con)
test_results

Unnamed: 0,time,before
0,after,-103.40625
1,before,-60.5625


---
### Creating  `control_results` table

In [41]:
query = """
SELECT
    'before' AS time,
    AVG(delta_before) as before
FROM
    (SELECT
        CAST((julianday(t.first_commit_ts) - julianday(d.deadlines, 'unixepoch')) * 24 AS INTEGER) as delta_before
    FROM
        control t
    LEFT JOIN
        deadlines d
    ON
        t.labname = d.labs
        WHERE
        t.labname != 'project1'
        AND
        t.first_commit_ts < first_view_ts
    )

UNION

SELECT
    'after' AS time,
    AVG(delta_after) as after
FROM
    (SELECT
        CAST((julianday(t.first_commit_ts) - julianday(d.deadlines, 'unixepoch')) * 24 AS INTEGER) as delta_after
    FROM
        test t
    LEFT JOIN
        deadlines d
    ON
        t.labname = d.labs
        WHERE
        t.labname != 'project1'
        AND
        t.first_commit_ts > first_view_ts
    )
"""
control_results = pd.read_sql(query, con)
control_results

Unnamed: 0,time,before
0,after,-103.40625
1,before,-99.464286


---
## Closing connection

In [42]:
con.close()

## Did the hypothesis turn out to be true and the page does affect the students’ behavior?

In short, `yes`.\
Our hypothesis saying *the page has positive effect on students activity on projects* is *true*, because the difference in deltas between test and control groups significantly decreased. As a result control group has almost no after-before difference.