<h1 style="display:none;">Test</h1>
<h1 style="display:none;">Test</h1>
<h1 style="display:none;">Test</h1>
# Introduction to Databases: Query and Web Application Continued


## Overview
  
- Continue building back-end application, queries and data model to support baseball application.


- Last lecture we implemented the path /players/playerID


- This lecture we will add support for
  - Finding players that match search conditions.
  - Retrieving a user defined subset of the attributes.
  - For example,
      - "Return players with last name Williams and who threw left handed."
      - "I only want to see the player ID, last name, first name, throwing hand and birth year."
<br><br>
<img src="../images/williamsl.jpeg" width="50%">


- We will also demonstrate the value of keys and indexes.
          

## Relational Theory

### Overview

Ramakrishnan and Gehrke, section 4.2

- The operations on an Entity Set are:
    - Common set operations:
        - Union: $\cup$
        - Intersection: $\cap$
        - Difference: $-$
    - Projection: $\pi$
    - Selection: $\sigma$
    - Cartesian Product: $\times$
    - Join: $\bowtie$
    - Rename/Alias
    
    
- Projection produces a new relation that
    - Has the some rows as the original table
    - But only containing the requested columns/fields
    
    
- Selection produces a new relation that
    - Has the same columns as the original relation
    - But only contains rows with column values matching a predicate.
    
    
- _Adding support for the new functions requires applying projection and selection._ We need to
    - Select the players that match the selection predicate
    - Only return/display the rows that the user (or client application) requested.


### Selection

- The selection operators are: $\lt, \gt, =, \ne, \ge, \le$


- The selection operator is $\sigma$


- $\sigma(Players)$ selects all of the rows/tuples in the player relation.


- The predicate/condition is a "subscript," e.g.
    - $\sigma$<sub>playerID=napolmi01</sub>$(Players)$ selects all players with _playerID=napolmi01_
    - $\sigma$<sub>((nameLast=Williams)$\land(throws=L))\lor(birthYear\ne1914)$</sub>$(Players)$ selects all players with
        - Last name williams who threw lefthanded
        - Bith year now equal to 1914
        
        
- This notation is a little clunky. You can think of the selection predicate being similiar to what goes in an _if()_ statement if you were looping through an array testing for a match.


- __Note:__ We also needed $\sigma$ to implement GET /players/< playerid >.
    - I glossed over this fact.
    - There is nothing special about primary (or any keys) relative to selection syntanx.
    - _Keys have a profound impact on performance and data integrity, however._
    

### Projection

- The projection operator is $\pi$


- The requested columns are subscripts on the operator, e.g. $\pi$<sub>$nameLast,nameFirst,throws$</sub>$(Players)$ returns a table
    - Containing the $nameLast, nameFirst, throws$ column values, in that order
    - For all tuples in the $Players$ table.

### This is an Algebra

- The operators $\sigma$ and $\pi$ operate on relations/tables and produce relations/tables.


- This means that you can combine them into complex combinations just like any algebra.


- For example
    - T<sub>$1$</sub>$ = \pi$<sub>$nameLast,nameFirst,throws$</sub>$($$\sigma$<sub>((nameLast=Williams)$\land(throws=L))\lor(birthYear\ne1914)$</sub>$(Players)$$)$
    - T<sub>$2$</sub>$ = \sigma$<sub>((nameLast=Williams)$\land(throws=L))\lor(birthYear\ne1914)$</sub>$(\pi$<sub>$nameLast,nameFirst,throws$</sub>$(Players))$
    - Are both valid algebraic statements
    - __And__ $T$<sub>$1$</sub>$ = T$<sub>$1$</sub> because $\sigma$ and $\pi$ are commutative if $\pi$ returns all columns that $\sigma$ tests.


### Keys

- "_Data integrity_ is the maintenance of, and the assurance of the accuracy and consistency of, data over its entire life-cycle, and is a critical aspect to the design, implementation and usage of any system which stores, processes, or retrieves data.


- _Relational (integrity) constraints_ are central to the value of relational databases.
    - The database designer _declares (defines)_ integrity constraints.
    - The database management system _rejects_ any _create, update_ or _delete_ operation that would result in a constraint violation.


- _Key(s)_ is a core  _(integrity) constraint_ enforcing _data integrity._ 
    - A _super key_ is a combination of columns with the property that now two rows have the same values for the fields of a super key.
    - A _candidate key_ is a minimal _super key_, that is removing a column from the key definition means that the key no longer uniquely identifies a row.,
    - The _primary key_ is a candidate key subjectively chosen as the "best key" for uniquely identifying the tuples.
    
    
- You will some times here the term _functionally determines._
    - Assume we have a relation $R(a,b,c,d,e)$.
    - If $(a,b)$ is a key $k$, then
    - The we can say that $k(a,b)$ _functionally determines_ $(c,d,e).$
    - Given values $(x,y)$ for $(a,b)$ we can functionally return $(c,d,e).$ The function is
    
    $\pi$<sub>$c,d,e$</sub>$(\sigma$<sub>$(a=x)\land(b=y)$</sub>$(R))$


- Consider the following snapshot of the CS courses table

<img src="../images/L4_courses.jpeg">

<br>
- The underlying relation $C$ has the following fields
    - callNumber
    - courseTitle
    - courseNumber
    - courseSection
    - term
    - year
    - instructor
    - days
    - time
    
    
- Two candidate keys are
    - _(callNumber)_
    - _(courseNumber, courseSection, year, semester)_
    

## SQL and Implementation

### Overview


<img src="../images/L4_search_page.jpeg">

- Provide three UI controls on the baseball dashboard
    - Select the resource to query (Players, Batting, Appearances)
    - Find a player if you know the player ID
    - Create a query to locate players based on columns and values.
    
    

<img src="../images/L4_webapp_e2e.jpeg">

- Web browser front-end
    - Just to provide motivation and idea for what happens.
    - I will try to provide sample code and support. This is not a UI class.
    - I will cut corners.
    
    
- We have seen the GET /api/players/napolmi01


- There are two approaches surfacing a query path
    - /apis/players?last_name="Williams"&first_name="Ted". This option only supports
        - $=$ comparison operator
        - $\land$ boolean operator
    - /api/players?q='< complex query expression >'&f='< requested properties >'. The application may choose to 
        - Support a subset of the columns and a subset of operators/column.
        - Map non-intuitive column names into more user friendly property names.
        
        
- The backend may also provide a path /apis/players/query_options that returns a JSON config file describing
    - Queryable attributes
    - Allowed operators/attribute
    - Friendly column names
    - etc.
    
    
- Enabling form-based or user-defined query may create [SQL Injection Attack,](https://en.wikipedia.org/wiki/SQL_injection) which we will cover later.

        

### SELECT

#### Overview

- MySQL [SELECT syntax](https://dev.mysql.com/doc/refman/5.7/en/select.html)
    - Other relational database management systems are very similiar)
    - And there is a core standard for the [SQL language.](https://www.iso.org/committee/45342/x/catalogue/p/1/u/0/w/0/d/0)

```
SELECT
    [ALL | DISTINCT | DISTINCTROW ]
      [HIGH_PRIORITY]
      [STRAIGHT_JOIN]
      [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
      [SQL_CACHE | SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
    select_expr [, select_expr ...]
    [FROM table_references
      [PARTITION partition_list]
    [WHERE where_condition]
    [GROUP BY {col_name | expr | position}
      [ASC | DESC], ... [WITH ROLLUP]]
    [HAVING where_condition]
    [ORDER BY {col_name | expr | position}
      [ASC | DESC], ...]
    [LIMIT {[offset,] row_count | row_count OFFSET offset}]
    [PROCEDURE procedure_name(argument_list)]
    [INTO OUTFILE 'file_name'
        [CHARACTER SET charset_name]
        export_options
      | INTO DUMPFILE 'file_name'
      | INTO var_name [, var_name]]
    [FOR UPDATE | LOCK IN SHARE MODE]]
```

- The SELECT statement is complex because it is the implementation foundation for
    - Selection
    - Projection
    - Join
    - Cross-product
    - Alias/renaming.
    

- The easiest way to get started is with examples and practice.

#### GET by Primary Key



In [6]:
import pymysql.cursors
import json


def connect():
    connection = pymysql.connect(host='localhost',
                                 user='dbuser',
                                 password='dbuser',
                                 db='lahman2016',
                                 charset='utf8mb4',
                                 cursorclass=pymysql.cursors.DictCursor)
    return connection

def disconnect(c):
    c.close()

def pretty_print(r):
    r = json.dumps(r, indent=4, sort_keys=True)
    print("Result = ", r)

try:
    print("Get by primary key.")
    sql1 = "SELECT * FROM Master WHERE playerID = 'napolmi01';"
    print("SQL statement = ", sql1)
    
    connection = connect()

    with connection.cursor() as cursor:
        # Execute the SQL statement.
        print("Connected.")
        cursor.execute(sql1)
        
        # We know that this is a primary key. We only need to fetch ONE row.
        result = cursor.fetchone()
        
        pretty_print(result)
        
except:  
    print("Something bad happened.")
finally:
    disconnect(connection)
    print("Disconnected")


Get by primary key.
SQL statement =  SELECT * FROM Master WHERE playerID = 'napolmi01';
Connected.
Result =  {
    "bats": "R",
    "bbrefID": "napolmi01",
    "birthCity": "Hollywood",
    "birthCountry": "USA",
    "birthDay": 31,
    "birthMonth": 10,
    "birthState": "FL",
    "birthYear": 1981,
    "deathCity": "",
    "deathCountry": "",
    "deathDay": "",
    "deathMonth": "",
    "deathState": "",
    "deathYear": "",
    "debut": "2006-05-04",
    "finalGame": "2016-10-02",
    "height": 73,
    "nameFirst": "Mike",
    "nameGiven": "Michael Anthony",
    "nameLast": "Napoli",
    "playerID": "napolmi01",
    "retroID": "napom001",
    "throws": "R",
    "weight": 225
}
Disconnected


In [10]:
try:
    print("Get by primary key.")
    sql2 = "SELECT * FROM Batting WHERE yearID = " 
    sql2 = sql2 + "%s AND playerID=%s and stint=%s;"
    print("SQL statement = ", sql2)
    
    connection = connect()

    with connection.cursor() as cursor:
        # Execute the SQL statement.
        print("Connected.")
        cursor.execute(sql2,('1959','willite01', 1))
        
        # We know that this is a primary key. We only need to fetch ONE row.
        result = cursor.fetchone()
        
        pretty_print(result)
        
except:  
    print("Something bad happened.")
finally:
    disconnect(connection)
    print("Disconnected")

Get by primary key.
SQL statement =  SELECT * FROM Batting WHERE yearID = %s AND playerID=%s and stint=%s;
Connected.
Result =  {
    "2B": 15,
    "3B": 0,
    "AB": 272,
    "BB": 52,
    "CS": 0,
    "G": 103,
    "GIDP": "7",
    "H": 69,
    "HBP": "2",
    "HR": 10,
    "IBB": "6",
    "R": 32,
    "RBI": 43,
    "SB": 0,
    "SF": "5",
    "SH": "0",
    "SO": 27,
    "lgID": "AL",
    "playerID": "willite01",
    "stint": 1,
    "teamID": "BOS",
    "yearID": 1959
}
Disconnected


- Database connectors usually support _parameterized query statements._
    - SQL string with %s in various places.
    - (v1, ..., vn) to insert into the SQL statement sent to the database.
    - Generally is the preferred approach and also support [_prepared statements_](https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursorprepared.html), which most databases support.

#### More General SELECT

##### Find All Players with Last name Williams and Bats with One Hand and Throws with Another, and Who Was Born before 1950

In [18]:
import pymysql.cursors
import json


def connect():
    connection = pymysql.connect(host='localhost',
                                 user='dbuser',
                                 password='dbuser',
                                 db='lahman2016',
                                 charset='utf8mb4',
                                 cursorclass=pymysql.cursors.DictCursor)
    return connection

def disconnect(c):
    c.close()

def pretty_print(r):
    r = json.dumps(r, indent=4, sort_keys=True)
    print("Result = ", r)
    
def execute_query(q):

    try:
        print("Executing query = ",q )

        connection = connect()

        with connection.cursor() as cursor:
            # Execute the SQL statement.
            print("Connected.")
            cursor.execute(q)

            # We know that this is a primary key. We only need to fetch ONE row.
            result = cursor.fetchall()
            return result;
    except:  
        print("Something bad happened.")
        return ([ { "Error ": "Error" }])
    finally:
        disconnect(connection)
        print("Disconnected")
        
        
q = "SELECT * FROM Master WHERE "
q = q + "nameLast='Williams' and NOT ( bats = throws ) "
q = q + "AND birthYear < '1950';"
r = execute_query(q)
pretty_print(r)

Executing query =  SELECT * FROM Master WHERE nameLast='Williams' and NOT ( bats = throws ) AND birthYear < '1950';
Connected.
Disconnected
Result =  [
    {
        "bats": "R",
        "bbrefID": "williac01",
        "birthCity": "Montclair",
        "birthCountry": "USA",
        "birthDay": 18,
        "birthMonth": 3,
        "birthState": "NJ",
        "birthYear": 1917,
        "deathCity": "Fort Myers",
        "deathCountry": "USA",
        "deathDay": "16",
        "deathMonth": "9",
        "deathState": "FL",
        "deathYear": "1999",
        "debut": "1940-07-15",
        "finalGame": "1946-04-22",
        "height": 74,
        "nameFirst": "Ace",
        "nameGiven": "Robert Fulton",
        "nameLast": "Williams",
        "playerID": "williac01",
        "retroID": "willa103",
        "throws": "L",
        "weight": 174
    },
    {
        "bats": "L",
        "bbrefID": "williar01",
        "birthCity": "Somerville",
        "birthCountry": "USA",
        "birthDay

__Comments__


- There [MySQL WHERE condition syntax](https://dev.mysql.com/doc/refman/5.7/en/expressions.html) (which is remarkably unhelpful) is

```
expr:
    expr OR expr
  | expr || expr
  | expr XOR expr
  | expr AND expr
  | expr && expr
  | NOT expr
  | ! expr
  | boolean_primary IS [NOT] {TRUE | FALSE | UNKNOWN}
  | boolean_primary

boolean_primary:
    boolean_primary IS [NOT] NULL
  | boolean_primary <=> predicate
  | boolean_primary comparison_operator predicate
  | boolean_primary comparison_operator {ALL | ANY} (subquery)
  | predicate

comparison_operator: = | >= | > | <= | < | <> | !=

predicate:
    bit_expr [NOT] IN (subquery)
  | bit_expr [NOT] IN (expr [, expr] ...)
  | bit_expr [NOT] BETWEEN bit_expr AND predicate
  | bit_expr SOUNDS LIKE bit_expr
  | bit_expr [NOT] LIKE simple_expr [ESCAPE simple_expr]
  | bit_expr [NOT] REGEXP bit_expr
  | bit_expr

bit_expr:
    bit_expr | bit_expr
  | bit_expr & bit_expr
  | bit_expr << bit_expr
  | bit_expr >> bit_expr
  | bit_expr + bit_expr
  | bit_expr - bit_expr
  | bit_expr * bit_expr
  | bit_expr / bit_expr
  | bit_expr DIV bit_expr
  | bit_expr MOD bit_expr
  | bit_expr % bit_expr
  | bit_expr ^ bit_expr
  | bit_expr + interval_expr
  | bit_expr - interval_expr
  | simple_expr

simple_expr:
    literal
  | identifier
  | function_call
  | simple_expr COLLATE collation_name
  | param_marker
  | variable
  | simple_expr || simple_expr
  | + simple_expr
  | - simple_expr
  | ~ simple_expr
  | ! simple_expr
  | BINARY simple_expr
  | (expr [, expr] ...)
  | ROW (expr, expr [, expr] ...)
  | (subquery)
  | EXISTS (subquery)
  | {identifier expr}
  | match_expr
  | case_expr
  | interval_expr
```


- All RDBs have similar and approximately equivalent functionality and syntax.


- "How do you get to be good at this? The same way you get to Carnegie Hall. __Practice.__"
    - There are some good online tutorials.
    - We will go through a lot of examples in class.
    - CAs and I will help if you have trouble on homework or take home exams.
    

- The SQL statement provides an overview of the condition syntax

```
SELECT * FROM Master WHERE nameLast='Williams' and NOT ( bats = throws ) AND  (deathYear - birthYear) > 80;
```

- The basic element is $term$  $operand$  $term$ or $operand(column)$
    - A term is a literal, column name or computed expression.
    - The operands are
        - $\lt$, $\leq$, $=$, $\gt$, $\ge$, $<>$ and have the obvious meanings.
        - $LIKE$ returns true if the string operand matches a [pattern.](https://dev.mysql.com/doc/refman/5.7/en/pattern-matching.html)
        - $term$ $BETWEEN$ $operand$ AND $OPERAND$ returns true of the $term$ is in the range.
    - There are additional operands and different RDBS have different extensions and modifications.
    
    

In [20]:
q = "SELECT * FROM Master WHERE nameLast LIKE 'aard%';"
r = execute_query(q)
pretty_print(r)

Executing query =  SELECT * FROM Master WHERE nameLast LIKE 'aard%';
Connected.
Disconnected
Result =  [
    {
        "bats": "R",
        "bbrefID": "aardsda01",
        "birthCity": "Denver",
        "birthCountry": "USA",
        "birthDay": 27,
        "birthMonth": 12,
        "birthState": "CO",
        "birthYear": 1981,
        "deathCity": "",
        "deathCountry": "",
        "deathDay": "",
        "deathMonth": "",
        "deathState": "",
        "deathYear": "",
        "debut": "2004-04-06",
        "finalGame": "2015-08-23",
        "height": 75,
        "nameFirst": "David",
        "nameGiven": "David Allan",
        "nameLast": "Aardsma",
        "playerID": "aardsda01",
        "retroID": "aardd001",
        "throws": "R",
        "weight": 215
    }
]


### Project Clause (the Clause that Follows SELECT)

- AKA _Select Expression_


- The basic syntax is $term, term,...$ where $term$ is a column name or function of column names.


- $term $ $AS $ $alias$ renames a term (result column) in the result table.



In [28]:
q = "SELECT nameLast as last_name, nameFirst as first_name, "
q = q + "(deathYear - birthYear) AS lifespan, " 
q = q + "CONCAT(nameGiven,' ',nameLast) as greatest_hitter_ever FROM Master "
q = q + "WHERE nameLast='Williams' AND nameFirst='Ted';"
r = execute_query(q)
pretty_print(r)

Executing query =  SELECT nameLast as last_name, nameFirst as first_name, (deathYear - birthYear) AS lifespan, CONCAT(nameGiven,' ',nameLast) as greatest_hitter_ever FROM Master WHERE nameLast='Williams' AND nameFirst='Ted';
Connected.
Disconnected
Result =  [
    {
        "first_name": "Ted",
        "greatest_hitter_ever": "Theodore Samuel Williams",
        "last_name": "Williams",
        "lifespan": 84.0
    }
]


More readble form of the query
```
SELECT
	nameLast as last_name, 
    nameFirst as first_name, 
    (deathYear - birthYear) AS lifespan, 
    CONCAT(nameGiven,' ',nameLast) as greatest_hitter_ever
FROM
	Master
WHERE
	nameLast='Williams' AND nameFirst='Ted';
```

### Keys and Indexes

- Relational theory treats keys as an _integrity constraint._
    - The ICs ensure that create, update and delete do not cause data integrity.
    - This is a very important capability, and RDBs have advanced IC functionality, which we will see in future lectures.
    
    
- For read, most RDBs implement $keys$ using an _index._
    - A table can have many indexes.
    - The indexes optimize both constraint implementation, _and SELECT performance._
    - An index may or may not enforce uniqueness.
    
    
- The CREATE statement below
    - Has a _primary key_ on _playerID,_ which enforces uniqueness and optimizes SELECT.
    - A _non-unique index_ on _nameLast,_ which optimizes SELECT performance
        
```    
CREATE TABLE `Master` (
  `playerID` varchar(255) NOT NULL,
  `birthYear` int(11) DEFAULT NULL,
  `birthMonth` int(11) NOT NULL,
  `birthDay` int(11) DEFAULT NULL,
  `birthCountry` varchar(255) DEFAULT NULL,
  `birthState` varchar(255) DEFAULT NULL,
  `birthCity` varchar(255) DEFAULT NULL,
  `deathYear` varchar(255) DEFAULT NULL,
  `deathMonth` varchar(255) DEFAULT NULL,
  `deathDay` varchar(255) DEFAULT NULL,
  `deathCountry` varchar(255) DEFAULT NULL,
  `deathState` varchar(255) DEFAULT NULL,
  `deathCity` varchar(255) DEFAULT NULL,
  `nameFirst` varchar(255) NOT NULL,
  `nameLast` varchar(255) NOT NULL,
  `nameGiven` varchar(255) DEFAULT NULL,
  `weight` int(11) DEFAULT NULL,
  `height` int(11) DEFAULT NULL,
  `bats` varchar(255) DEFAULT NULL,
  `throws` varchar(255) DEFAULT NULL,
  `debut` varchar(255) DEFAULT NULL,
  `finalGame` varchar(255) DEFAULT NULL,
  `retroID` varchar(255) DEFAULT NULL,
  `bbrefID` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`playerID`),
  KEY `player_idx` (`playerID`),
  KEY `name_l` (`nameLast`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
```

- The SQL Workbench is a good way to view and edit indexes.

<img src="../images/L4_index_workbench.jpeg">


Execution Examples:

<img src="../images/L4_index_performance.jpeg">

- SELECTs that cannot use an index have to scan the table, which is $O(N).$


- SELECTs that can use an index are
    - $O(N*Log(N))$ if it can use a tree index.
    - $O(1)$ if it can use a hash index.
    
    
- We will cover indexes, performance implications and  implementation in future lectures.


- The obvious question is, "Why not index everything?" Indexes take up storage, and there is a [space-time tradeoff.](https://en.wikipedia.org/wiki/Space%E2%80%93time_tradeoff)

## Scenario I -- Implementation Continued


### Overview

We will use Django and MySQL to implement the following GET REST paths for each resource.


- _/api/resource_name/primary_key_value_ to get by primary key.


- _/api/resource_name/query_format_  will return a JSON object describing allowable queries, which we can use to configure the web front-end for user defined query.


- _/api/resource?_ with the following HTTP query parameters
    - f='name1, name2, name3, ...' defining the fields (Projection)
    - A Select expression, which is of one of the the forms.
        - name1=value1&name2=value2&...
        - Q='< query string >'
        


### Implementation Approach

- Building the solutions is the core of HW 1 and the foundation for HW2.


- We will go through elements of the implementation in class, and I will handle as much of the non-DB enablement functions as possible, e.g. AngularJS.


### Find by Primary Key

Well, we know the implementing function looks something like


In [33]:
def find_by_primary_key(resource,key_name,key_value):
    cnx=connect()
    cursor=cnx.cursor()
    q = "SELECT * FROM " + " " + resource + " " 
    q = q + " WHERE " + key_name + " = '" + key_value + "';"
    print("Query = ", q)
    cursor.execute(q);
    r = cursor.fetchone()
    return r

# Just test code below so function executes in Jupyter for presentation.
r = find_by_primary_key("Master","playerID","willite01")
pretty_print(r)

Query =  SELECT * FROM  Master  WHERE playerID = 'willite01';
Result =  {
    "bats": "L",
    "bbrefID": "willite01",
    "birthCity": "San Diego",
    "birthCountry": "USA",
    "birthDay": 30,
    "birthMonth": 8,
    "birthState": "CA",
    "birthYear": 1918,
    "deathCity": "Inverness",
    "deathCountry": "USA",
    "deathDay": "5",
    "deathMonth": "7",
    "deathState": "FL",
    "deathYear": "2002",
    "debut": "1939-04-20",
    "finalGame": "1960-09-28",
    "height": 75,
    "nameFirst": "Ted",
    "nameGiven": "Theodore Samuel",
    "nameLast": "Williams",
    "playerID": "willite01",
    "retroID": "willt103",
    "throws": "R",
    "weight": 205
}


- There are a few issues that we need to work out:
    - Error handling and error codes.
    - Better data mapping, specifically
        - "" is an artifact of data import. Text files cannot represent NULL.
        - Dates: 
            - birth and death handled differently from finalGame and debut.
            - finalGame and debut are VARCHAR in database, not MySQL DATETIME type.
    - Compound keys is a more complex problem
        - /api/players/napolmi01 is fine for single column keys.
        - What do we do for the compound key (playerID, year, stint) for Batting?
        - We could use query params, but this would result in inconsistent URL patterns for resources.
        - [Resource instances having a unique ID relative](https://cloud.google.com/apis/design/resource_names) to the set of resources is a best practice.
        - We will use a delimeter and arbitrarily choose "-". This yields /api/batting/willite01-1960-1.
    - We will handle some of the issues, but this is not a course on Python, web UI, HTTP/HTML types, etc.


- There are [design patterns](https://en.wikipedia.org/wiki/Software_design_pattern) for dealing with these (and other) issues.
    - We will use some elements of some design patterns, but not be rigorous.
    - A simplified [data access object pattern](https://en.wikipedia.org/wiki/Data_access_object) will be useful because our projects will access multiple datasources in the future.
    

<img src="../images/L4_BO_DO.png" width="60%">

- The _business object_ implements the application's behavior, correctness, etc.


- The _data access object_ isolates business logic from
    - Schema change and evolution.
    - Differences between databases. Business logic developers focus on the application, not specifics of
        - MySQL versus Oracle versus DB2.
        - Data implementation choices, e.g. relational versus key-value.
    - There are frameworks for the DAO pattern, e.g.
        - [ADO.NET](https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/ado-net-overview)
        - [OData](http://www.odata.org/)
        - [Django Models](https://docs.djangoproject.com/en/2.0/topics/db/models/)
        
        
- Again, this is not a web application or design patterns class. We will do simple approaches.
        

In [37]:
# Implement a very simple data access object pattern
# We will not use types, classes, metadata, etc.
# We will simple use dictionaries.
import pymysql.cursors
from datetime import datetime

def debug_message(s, o):
    print(str(datetime.now()) + ": " + s)
    if (o != None):
        print(json.dumps(o,indent=2,sort_keys=True))

# Security is a complex topic, which we will cover later.
# NEVER put security credentials in files/code.
def connect():
    connection = pymysql.connect(host='localhost',
                                 user='dbuser',
                                 password='dbuser',
                                 db='lahman2016',
                                 charset='utf8mb4',
                                 cursorclass=pymysql.cursors.DictCursor)
    return connection

def disconnect(c):
    c.close()


# This is an abstraction. We map
# - entity_set to table.
# - key to the primary key: This will come in as a string, and
#   may map to a compound key.
#
def find_by_id(entity_set, key):
    try:

        connection = connect()
        result = {"data": "Not Found"}

        with connection.cursor() as cursor:

            mapped_info = map_entity_set_key(entity_set, key)
            sql = generate_select_statement(mapped_info)
            debug_message("SQL = " + sql, None)
            cursor.execute(sql)
            result = cursor.fetchone()
            debug_message("Result = ", result)
            #print(result)
    finally:
        disconnect(connection)

    return result


# We will handle this extensibly with metadata later. For now
# we simple hard code. Returns the table and a dictionary of
# column: value needed for the primary key
#
def map_entity_set_key(entity_set, key):

    r = {}
    done = False

    if (entity_set == "players"):
        r = {
            "table": "players",
            "columns" : ["playerID"],
            "values": [key],
            "types": ["s"]
        }
        done = True

    if (entity_set == "batting"):
        s = key.split("-")
        print("s = ",s)
        r = {
            "table": "batting",
            "columns" : [ "playerID", "yearID", "stint" ],
            "values" : [s[0], s[1], s[2]],
            "types" : ["s", "s", "i"]
        }
        done = True

    if done == False:
        r = None

    return r

# Input is a entity plus (column, value) template.
# Output is a query string

def generate_select_statement(map_info):
    s = "SELECT * FROM " + map_info.get("table") + " WHERE "
    w = ""

    columns = map_info.get("columns")
    values = map_info.get("values")
    types = map_info.get("types")

    print("columns = ", columns)

    for i in range(0,len(columns)):
        c = columns[i]
        v = values[i]
        t = types[i]

        if w != "" :
            w = w + " AND "

        w = w + c + "="
        if t == "s":
            w = w + "'" + v + "'"
        else:
            w = w + v

    return s+w

e1 = find_by_id("batting","willite01-1960-1")
e2 = find_by_id("players","willite01")
debug_message("\n\nFind by key willite01-1960-1 in batting returned ", e1)
debug_message("\n\nFind by key willite01 in players returned ", e2)

s =  ['willite01', '1960', '1']
columns =  ['playerID', 'yearID', 'stint']
2018-01-15 14:51:48.947187: SQL = SELECT * FROM batting WHERE playerID='willite01' AND yearID='1960' AND stint=1
2018-01-15 14:51:48.948139: Result = 
{
  "2B": 15,
  "3B": 0,
  "AB": 310,
  "BB": 75,
  "CS": 1,
  "G": 113,
  "GIDP": "7",
  "H": 98,
  "HBP": "3",
  "HR": 29,
  "IBB": "7",
  "R": 56,
  "RBI": 72,
  "SB": 1,
  "SF": "2",
  "SH": "0",
  "SO": 41,
  "lgID": "AL",
  "playerID": "willite01",
  "stint": 1,
  "teamID": "BOS",
  "yearID": 1960
}
columns =  ['playerID']
2018-01-15 14:51:48.949766: SQL = SELECT * FROM players WHERE playerID='willite01'
2018-01-15 14:51:48.950639: Result = 
{
  "bats": "L",
  "bbrefID": "willite01",
  "birthCity": "San Diego",
  "birthCountry": "USA",
  "birthDay": 30,
  "birthMonth": 8,
  "birthState": "CA",
  "birthYear": 1918,
  "deathCity": "Inverness",
  "deathCountry": "USA",
  "deathDay": "5",
  "deathMonth": "7",
  "deathState": "FL",
  "deathYear": "2002",
  "debut

Now you can see why people use frameworks.
