<h1 style="display:none;">Test</h1>
<h1 style="display:none;">Test</h1>
# Introduction to Databases: Query and Web Application


## Web and REST Applications: Introduction

### Overview

__Simple Web Application__
<br><br><br>
<img src="../images/webapp.jpeg">
<br><br>
- The web browser (Chrome, Internet Explorer/Edge, Firefox, Safari, ...)
    - Retrieves files via Hypertext Transfer Protocol (HTTP) from a web server
    - Common file types
        - HTML
        - JPEG
        - CSS
        - JavaScript
    - Assembles the files into web pages that it renders.
    - Some of the JavaScript files make [Representational State Transfer (REST)](https://en.wikipedia.org/wiki/Representational_state_transfer) API calls
        - Retrieves data (JSON) that plugs into the web pages using HTTP.
        - Creates, Updates, Deletes data by sending HTTP requests.
    - [AngularJS](https://angularjs.org/) is a common library and framework for building browser UIs.
    
    
- The [web application server](https://en.wikipedia.org/wiki/Application_server#Application_Server_definition)
    - Receives HTTP requests.
    - Executes programs/functions that implement application logic, and may access one or more databases.
    - Reurns response codes and optional JSON data.
    - NodeJS, J2EE, etc. are examples of web application servers.
    
    
- Mobile devices also use the REST API. The mobile application
    - May simply be a browser
    - A [native module application](https://www.techopedia.com/definition/27568/native-mobile-app) installed on the device.
<br><br>   

__Slightly More Realistic Web Application__

- Web application software is usually very complex.

- This diagram is also simplistic

<br><br>
<img src="../images/realwebapp.jpeg">
<br><br>
- The system and deployment topology is usually very complex.

- This diagram is also simplistic

<img src="../images/webapptopo.jpeg">

### REST

<br><br>
<img src="../images/rest1.jpg" width="125%">
<br><br>

- In this example, the is a set of _resources_
    - A collection/set of device descriptions, e.g. model, owner, purchases, ...
    - Device description instances, e.g device 1, device 2, ...
    
    
- The REST version of the create, retrieve update, delete operations are:
    - POST
    - GET
    - PUT
    - DELETE
    
### Our Mission and 1st Set of Projects

- Application
    - Implement a simple REST interface for [Lahman's Baseball Database](http://www.seanlahman.com/baseball-archive/statistics/)
    - Build a simple UI, but this will be a minor focus.
    
- REST API implementation and web UI implementation are complete courses themselves.


- Focus on read-GET-SELECT.


- Come to understand
    - Relational selection, project and join in a simple example context.
    - Relational data modeling and notation.
    - SQL SELECT and JOIN
    - SQL keys and indexes
    - SQL views
    - GROUP BY
    - ORDER BY
    - HAVING


- My sample code will use [Django](https://www.djangoproject.com/), but I will provide help if you choose to use JavaScript/NodeJS, J2EE/Tomcat, etc.
<br><br>
<img src="../images/django-architecture.jpg">
    

## Scenario I -- Basic Player Information

### Overview

<img src="../images/scenario1topo.jpeg">

#### Logical System Structure

- The system will have the following subsystems
    - Web content (HTML, CSS, JPEG, ...) on the filesystem
    - A Django (or NodeJS, J2EE, ...) server that delivers static content and implements REST API.
    - MqSQL Server database with tables for the application's data.
    - Web browser interface.
    
    
- The physical topology is that all software runs on a single user machine, i.e. your development laptop.

#### Functionality and Use Cases

- Initially the application is _read only._
    - Users may access query and view information about players and statstics.
    - All data is preloaded into the database using data import capabilities.
    
    
- Some examples of supported queries are:
    - Find a player by ID and display basic name, birth country, etc.
    - Find all players matching a search condition, e.g. name, country of birth, etc.
    - Display a player's career statistics and averages.
    
    We will expand the supported use cases incrementally, learning new data modeling, relational and SQL concepts.


### Data Model

#### Conceptual Data Model

<img src="../images/scenario1-conceptual1.jpeg">

- There are three entities
    - _Master_ represents information about an individual in the database.
    - _Appearances_ represents information about a person's appearances in games.
    - _Batting_ represents information about a player's batting for teams and seasons.
   

- Reminder
<br><br>
<img src="../images/conceptuallogicalphysical.jpeg">


- How do you identify entity types/entities? From http://www.agiledata.org/essays/dataModeling101.html
    - "An entity type, also simply called entity (not exactly accurate terminology, but very common in practice), is similar conceptually to object-orientation’s concept of a class – an entity type represents a collection of similar objects.  An entity type could represent a collection of people, places, things, events, or concepts. Examples of entities in an order entry system would include Customer, Address, Order, Item, and Tax. If you were class modeling you would expect to discover classes with the exact same names. However, the difference between a class and an entity type is that classes have both data and behavior whereas entity types just have data. 
    - "Ideally an entity should be normal, the data modeling world’s version of cohesive. A normal entity depicts one concept, just like a cohesive class models one concept. For example, customer and order are clearly two different concepts; therefore it makes sense to model them as separate entities." 


- How do you identity relationships?
    - "In the real world entities have relationships with other entities.  For example, customers PLACE orders, customers LIVE AT addresses, and line items ARE PART OF orders. Place, live at, and are part of are all terms that define relationships between entities.  The relationships between entities are conceptually identical to the relationships (associations) between objects."  
 

#### Logical Data Model

##### Overview

- The logical data model requires adding:
    - Attributes
    - Primary Keys
    - Foreign Keys
    
##### Attributes

Identifying attributes (http://www.agiledata.org/essays/dataModeling101.html)

- "Each entity type will have one or more data attributes.  For example, 
    - ... [a] Customer entity has attributes such as First Name and Surname and ... 
    - the TCUSTOMER table had corresponding data columns CUST_FIRST_NAME and CUST_SURNAME (a column is the implementation of a data attribute within a relational database). 
    
    
- Attributes should also be cohesive from the point of view of your domain, something that is often a judgment call. ... ... 
    - we decided that we wanted to model the fact that people had both first and last names instead of just a name (e.g. “Scott" and “Ambler" vs. “Scott Ambler")
    - we did not distinguish between the sections of an American zip code (e.g. 90210-1234-5678).
    
    
- Getting the level of detail right can have a significant impact on your development and maintenance efforts.
    - Refactoring a single data column into several columns can be difficult, ...
    - over-specifying an attribute (e.g. having three attributes for zip code when you only needed one) can result in overbuilding your system and hence you incur greater development and maintenance costs than you actually needed.
    

- In our scenario,
    - We were given the data, which partially defined the attributes.
    - We could have, and will, re-factor how the given data fits into a good data model.

<br><br>
    

<img src="../images/masterlogical.jpeg">
<br>

<img src="../images/battinglogical.jpeg" width="80%">

<img src="../images/appearanceslogical.jpeg">

##### Keys and Primary Keys

Ramakrishnan and Gehrke, 2.4.1, 3.2

_Relational Theory_

(Entity) keys refers to a set of attributes that uniquely defines an entity in an entity set. Entity keys can be _super,_ _candidate_ or _primary._
- _Super key:_ A set of attributes (one or more) that together define (uniquely identify) an entity in an entity set.
- _Candidate key:_ A minimal super key, meaning it has the least possible number of attributes to still be a super key. An entity set may have more than one candidate key.
- _Primary key:_ A candidate key chosen by the database designer to uniquely identify the entity set.

In our data model,
- _Master_ primary key is _playerID_
- _Batting_ is more complicated.
    - _playerID_ does not uniquely identify a row. Players play for many years.
    - _(playerID, yearID)_ does not uniquely identify a row. A player could get traded, and play for more than a single team in a year.
    - No problem, we can use _(playerID, yearID, teamID)._ But, a player can have more than one stint with a team in a year.
    - The answer is _(playerID, yearID, stint)._
        - How do I know this? I understand baseball.
        - What if you or I do not understand the domain? We are typically working with a domain expert and these decisions are part of collaborative design in the local modeling phase.
- _Appearances_ primary key is _(playerID, yearID, teamID)._

_An aside:_ I ran the following queries for batting
```
-- (1) What is the maximum number of rows for a given playerID? Also, look up the names. 
SELECT playerID,
	(SELECT nameLast FROM Master WHERE Master.playerID=Batting.playerID) as nameLast,
    (SELECT nameFirst FROM Master WHERE Master.playerID=Batting.playerID) as nameFirst,
    count(*) as row_count FROM batting GROUP BY playerID,nameFirst,nameLast
	ORDER BY row_count DESC LIMIT 1;

-- (2) What is the maximum number of rows if I try playerID and yearID for a primary key? Also, look up the names. 
SELECT playerID,
	(SELECT nameLast FROM Master WHERE Master.playerID=Batting.playerID) as nameLast,
    (SELECT nameFirst FROM Master WHERE Master.playerID=Batting.playerID) as nameFirst,
    count(*) as row_count FROM batting GROUP BY playerID, yearID
	ORDER BY row_count DESC LIMIT 1;

-- (3) Same question using playerID, yearID, teamID. Also, look up the names. 
SELECT playerID,
	(SELECT nameLast FROM Master WHERE Master.playerID=Batting.playerID) as nameLast,
    (SELECT nameFirst FROM Master WHERE Master.playerID=Batting.playerID) as nameFirst,
    count(*) as row_count FROM batting GROUP BY playerID, yearID, teamID
	ORDER BY row_count DESC LIMIT 1;

-- (4) Same question using playerID, yearID, strint. Also, look up the names. 
SELECT playerID,
	(SELECT nameLast FROM Master WHERE Master.playerID=Batting.playerID) as nameLast,
    (SELECT nameFirst FROM Master WHERE Master.playerID=Batting.playerID) as nameFirst,
    count(*) as row_count FROM batting GROUP BY playerID,yearID,stint
	ORDER BY row_count DESC LIMIT 1;
```

These queries returned the following information.

| Query No. | Possible Key             | playerID  | last name | first name | row count |
|-----------|--------------------------|-----------|-----------|------------|-----------|
| 1         | playerID                 | mcguide01 | McGuire   | Deacon     | 31        |
| 2         | playerID, yearID         | chouife01 | Chouinard | Felix      | 5         |
| 3         | playerID, yearID, teamID | chouife01 | Chouinard | Felix      | 3         |
| 4         | playerID, yearID, stint  | zay01     | Zay       | William    | 1         |

What do the queries do?
- For a possible key combination.
- Count the maximum number of rows that have any common combination of keys.
- Returns the largest count.
- And provides information about one of the rows with the largest count.

Do not worry if you do not understand these queries, _you will!_ But, the queries verify that _(playerID,yearID,stint)_ is uniquely identifies a row/entry.

##### Foreign Keys

Ramakrishan and Gehrke, section 3.2.2

"In the context of relational databases, a foreign key is a field (or collection of fields) in one table that uniquely identifies a row of another table or the same table. In simpler words, the foreign key is defined in a second table, but it refers to the primary key or a unique key in the first table." (https://en.wikipedia.org/wiki/Foreign_key)

There are at least two perspectives on _foreign key:_
1. Foreign keys implement _Integrity Constraint,_ which we will cover later. A tuple in one table can exists only if the foreign key matches a primary key in another table.
2. Foreign keys define _Relationships._ I can use a foreign key to find tuples in two different tables that are related.

In our simple example
- Batting.playerID is a foreign key for Master.playerID.
- Appearances.playerID is a foreign key for Master.playerID.

We will see this in more detail in future lectures.


#### SQL Data Model (Our Physical Model)

The physical model requires that we add the following:


- The create table DDL statement
    - Table names
    - Column names
    - Column data types
    
    
- Instead of drawing a diagram, we will do directly in SQL DDL.

```
CREATE TABLE `Master` (
  `playerID` varchar(255) NOT NULL,
  `birthYear` int(11) DEFAULT NULL,
  `birthMonth` int(11) NOT NULL,
  `birthDay` int(11) DEFAULT NULL,
  `birthCountry` varchar(255) DEFAULT NULL,
  `birthState` varchar(255) DEFAULT NULL,
  `birthCity` varchar(255) DEFAULT NULL,
  `deathYear` varchar(255) DEFAULT NULL,
  `deathMonth` varchar(255) DEFAULT NULL,
  `deathDay` varchar(255) DEFAULT NULL,
  `deathCountry` varchar(255) DEFAULT NULL,
  `deathState` varchar(255) DEFAULT NULL,
  `deathCity` varchar(255) DEFAULT NULL,
  `nameFirst` varchar(255) NOT NULL,
  `nameLast` varchar(255) NOT NULL,
  `nameGiven` varchar(255) DEFAULT NULL,
  `weight` int(11) DEFAULT NULL,
  `height` int(11) DEFAULT NULL,
  `bats` varchar(255) DEFAULT NULL,
  `throws` varchar(255) DEFAULT NULL,
  `debut` varchar(255) DEFAULT NULL,
  `finalGame` varchar(255) DEFAULT NULL,
  `retroID` varchar(255) DEFAULT NULL,
  `bbrefID` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`playerID`),
  KEY `player_idx` (`playerID`),
  KEY `name_l` (`nameLast`),
  KEY `name_f` (`nameFirst`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `Appearances` (
  `yearID` int(11) NOT NULL,
  `teamID` varchar(255) NOT NULL,
  `lgID` varchar(255) DEFAULT NULL,
  `playerID` varchar(255) NOT NULL,
  `G_all` int(11) DEFAULT NULL,
  `GS` varchar(255) DEFAULT NULL,
  `G_batting` int(11) DEFAULT NULL,
  `G_defense` int(11) DEFAULT NULL,
  `G_p` int(11) DEFAULT NULL,
  `G_c` int(11) DEFAULT NULL,
  `G_1b` int(11) DEFAULT NULL,
  `G_2b` int(11) DEFAULT NULL,
  `G_3b` int(11) DEFAULT NULL,
  `G_ss` int(11) DEFAULT NULL,
  `G_lf` int(11) DEFAULT NULL,
  `G_cf` int(11) DEFAULT NULL,
  `G_rf` int(11) DEFAULT NULL,
  `G_of` int(11) DEFAULT NULL,
  `G_dh` varchar(255) DEFAULT NULL,
  `G_ph` varchar(255) DEFAULT NULL,
  `G_pr` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`yearID`,`teamID`,`playerID`),
  UNIQUE KEY `ux` (`playerID`,`teamID`,`yearID`),
  KEY `player_idx` (`playerID`),
  KEY `year_idx` (`yearID`) USING BTREE,
  CONSTRAINT `playerID` FOREIGN KEY (`playerID`) REFERENCES `Master` (`playerID`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `Batting` (
  `playerID` varchar(255) NOT NULL,
  `yearID` int(11) NOT NULL,
  `stint` int(11) NOT NULL,
  `teamID` varchar(255) DEFAULT NULL,
  `lgID` varchar(255) DEFAULT NULL,
  `G` int(11) DEFAULT NULL,
  `AB` int(11) DEFAULT NULL,
  `R` int(11) DEFAULT NULL,
  `H` int(11) DEFAULT NULL,
  `2B` int(11) DEFAULT NULL,
  `3B` int(11) DEFAULT NULL,
  `HR` int(11) DEFAULT NULL,
  `RBI` int(11) DEFAULT NULL,
  `SB` int(11) DEFAULT NULL,
  `CS` int(11) DEFAULT NULL,
  `BB` int(11) DEFAULT NULL,
  `SO` int(11) DEFAULT NULL,
  `IBB` varchar(255) DEFAULT NULL,
  `HBP` varchar(255) DEFAULT NULL,
  `SH` varchar(255) DEFAULT NULL,
  `SF` varchar(255) DEFAULT NULL,
  `GIDP` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`playerID`,`yearID`,`stint`),
  CONSTRAINT `batting_player` FOREIGN KEY (`playerID`) REFERENCES `Master` (`playerID`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
```

- The core SQL column types are
<br><br>
<img src="../images/datatypes.jpg">


- All database management systems significantly extend the set of data/column types.


- The length options play a significant role in the database management systems optimization of storage use, which we will cover in future lectures.
    - INT(8), INT(11), TINY INT, ...
    - VARCHAR(16), VARCHAR(1024), ...
    

- If we have the DDL defined and have created the tables, we can _reverse engineer_ the physical model.


- The line endings have very precise meanings, which we will cover in future lectures.


<img src="../images/scenario1physical.jpeg">
    

#### Web Resource Model

We plan to start gathering displaying information about players. Our web resource model (URLs) with be
- /players?<query> to find players matching a template.
- /players/playerID to find a specific player.
    
_Example_

GET http://localhost:8000/baseball/api/players/napolmi01

Returns

```
{
    "playerID": "napolmi01",
    "birthYear": 1981,
    "birthMonth": "10",
    "birthDay": 31,
    "birthCountry": "USA",
    "birthState": "FL",
    "birthCity": "Hollywood",
    "deathYear": "",
    "deathMonth": "",
    "deathDay": "",
    "deathCountry": "",
    "deathState": "",
    "deathCity": "",
    "nameFirst": "Mike",
    "nameLast": "Napoli",
    "nameGiven": "Michael Anthony",
    "weight": 225,
    "height": 73,
    "bats": "R",
    "throws": "R",
    "debut": "2006-05-04",
    "finalGame": "2016-10-02",
    "retroID": "napom001",
    "bbrefID": "napolmi01"
}
```

<img src="../images/playergetpostman.jpeg">


### Scenario I -- Implementation

#### A Warning

<img src="../images/simplecode1.jpeg">

#### Supporting the Path /players/playerID

##### Data Access

The basic pattern is _retrieve_by_id_ with parameters:
- The resource type (table)
- The column name for the primary key (for player this is a simple column)
- Key value

So,
- We have to map the request GET /api/players/napolmi01 to retrieve_by_id("Master","playerID","napolmi01")
- And then to SELECT * FROM Master WHERE playerID='napolmi01'

Some simple code is




In [40]:
import pymysql.cursors
import json


def connect():
    connection = pymysql.connect(host='localhost',
                                 user='dbuser',
                                 password='dbuser',
                                 db='lahman2016',
                                 charset='utf8mb4',
                                 cursorclass=pymysql.cursors.DictCursor)
    return connection

def disconnect(c):
    c.close()


def retrieve_by_id(table, attribute, id):
    try:
        print("DEBUG: table = ", table)
        print("DEBUG: attribute = ", attribute)
        print("DEBUG: id = ", id)

        connection = connect()
        result = {"data": "Not Found"}

        with connection.cursor() as cursor:
            # Read a single record
            sql = "SELECT * FROM " + table
            sql = sql + " WHERE "
            sql = sql + attribute + "=" + "'" + id + "';"
            print("DEBUG: SQL = ", sql)
            cursor.execute(sql)
            result = cursor.fetchone()
            print(result)
    except:  
        print("Something happened.")
    finally:
        disconnect(connection)

    return result

#print("Ran")
# And a test would be
#player = retrieve_by_id("Master", "playerID", "napolmi01")

#print("\n\n Test Result ")
#print("The player with playerID = napolmi01 is")
#print(json.dumps(player, indent=4, sort_keys=True))


##### Business Logic

- These are some examples of short cuts


- Business logic should not be handling [Django HttpRequest](https://docs.djangoproject.com/en/2.0/ref/request-response/)


- Business logic in dependent of the protocol for reaching it (the binding). The binding could be many things.

_Poorly design business logic_

In [39]:
from django.http import JsonResponse
import pymysql.cursors
import os
#from . import dataccess


# Entry point for this module.
# Examines the request and routes to proper handler.
# This is really pretty generic and does not need to be in type
# specific Python files, e.g. players.py
#
# request is of type https://docs.djangoproject.com/en/2.0/ref/request-response/
def index(request):

    done = False

    if request.method == 'GET':
        rsp = handle_get(request)
        done = True

    if (done == False) and \
            (request.method == 'POST'):
        done = True
        rsp = error()

    if (done == False) and \
            (request.method == 'PUT'):
        done = True
        rsp = error()

    if (done == False) and \
            (request.method == 'DELETE'):
        done = True
        rsp = error()

    if (done == False):
        rsp = error()

    rsp = JsonResponse(rsp)

    return rsp

# This currently only works for /players/playerID
# We will need to generalize for queries and for other resources
#
def handle_get(request):

    # The path to the resource will be of the form
    # /players/playerID
    try:
        print("Request = ", request)
        print("Request.GET = ", request.GET)
        id = os.path.basename(request.path)
        print("ID = ", id)
        result = retrieve_by_id("Master","playerID",id)
    except:
       print("Something happened")
       result={"message": "Why me?"}

    return result