# Distribution Transparency in Distributed Database Management Systems (DDBMS)


1. **Fragmentation Transparency**
   - This is the highest level of transparency.
   - The end user or programmer does not need to know that a database is partitioned.
   - Neither fragment names nor fragment locations are specified prior to data access.

2. **Location Transparency**
   - Exists when the end user or programmer must specify the database fragment names but does not need to specify where those fragments are located.

3. **Local Mapping Transparency**
   - The end user or programmer must specify both the fragment names and their locations.


## Example_1: EMPLOYEE Table
To illustrate the use of various transparency levels, consider an `EMPLOYEE` table with the following attributes:
- `EMP_NAME`
- `EMP_DOB`
- `EMP_ADDRESS`
- `EMP_DEPARTMENT`
- `EMP_SALARY`

EMP_NAME | EMP_DOB      | EMP_ADDRESS   | EMP_DEPARTMENT | EMP_SALARY  | EMP_Location
---------| ------------ | ------------- | ---------------| ------------|----------
Alice    | 1958-05-15   | 123 Main St   | Sales          | 70000       | New York
Bob      | 1970-11-20   | 456 Oak Ave   | HR             | 60000       | New York
Carol    | 1955-02-10   | 789 Pine Ln   | Finance        |  80000      | New York
David    | 1962-08-03   | 101 Elm Rd    | Marketing      | 75000       | Atlanta
Eve      | 1959-12-28   | 202 Maple Dr  | IT             | 85000       | Atlanta
Frank    | 1975-04-12   | 303 Cedar Ct  | Sales          | 65000       | Atlanta
Grace    | 1968-09-25   | 404 Birch St  | HR             | 62000       | Miami
Henry    | 1957-06-18   | 505 Willow Pl | Finance        | 82000       | Miami
Ivy      | 1980-01-05   | 606 Spruce Wy | IT             | 78000       | Miami

### Fragmentation by Location
The `EMPLOYEE` data is distributed over three different locations:
- New York (Fragment E1)
- Atlanta (Fragment E2)
- Miami (Fragment E3)

E1 (NY):

EMP_NAME   | EMP_DOB      | EMP_ADDRESS   | EMP_DEPARTMENT   | EMP_SALARY  | EMP_LOCATION
---------- | ------------ | ------------- | ---------------- | ------------| ------------
Alice      | 1958-05-15   | 123 Main St   | Sales            | 70000       | New York
Bob        | 1970-11-20   | 456 Oak Ave   | HR               | 60000       | New York
Carol      | 1955-02-10   | 789 Pine Ln   | Finance          | 80000       | New York

E2 (ATL):

EMP_NAME   | EMP_DOB      | EMP_ADDRESS   | EMP_DEPARTMENT   | EMP_SALARY  | EMP_LOCATION
---------- | ------------ | ------------- | ---------------- | ------------| ------------
David      | 1962-08-03   | 101 Elm Rd    | Marketing        | 75000       | Atlanta
Eve        | 1959-12-28   | 202 Maple Dr  | IT               | 85000       | Atlanta
Frank      | 1975-04-12   | 303 Cedar Ct  | Sales            | 65000       | Atlanta

E3 (MIA):

EMP_NAME   | EMP_DOB      | EMP_ADDRESS   | EMP_DEPARTMENT   | EMP_SALARY  | EMP_LOCATION
---------- | ------------ | ------------- | ---------------- | ------------| ------------
Grace      | 1968-09-25   | 404 Birch St  | HR               | 62000       | Miami
Henry      | 1957-06-18   | 505 Willow Pl | Finance          | 82000       | Miami
Ivy        | 1980-01-05   | 606 Spruce Wy | IT               | 78000       | Miami

### Query Cases
Suppose the end user wants to list all employees with a date of birth prior to *January 1, 1960*. The following cases illustrate how queries differ based on the level of distribution transparency supported:


#### Case 1: Fragmentation Transparency
- The query conforms to a non-distributed database query format:

In [None]:
SELECT *
FROM EMPLOYEE
WHERE EMP_DOB < '1960-01-01';

Output:

EMP_NAME | EMP_DOB    | EMP_ADDRESS  | EMP_DEPARTMENT | EMP_SALARY
---------|------------|------------- |----------------|-----------
Alice    | 1958-05-15 | 123 Main St  | Sales          | 70000
Carol    | 1955-02-10 | 789 Pine Ln  | Finance        | 80000
Eve      | 1959-12-28 | 202 Maple Dr | IT             | 85000
Henry    | 1957-06-18 | 505 Willow Pl| Finance        | 82000

#### Case 2: Location Transparency
- Fragment names must be specified, but the location is not specified:

In [None]:
SELECT *
FROM E1
WHERE EMP_DOB < '1960-01-01'
UNION
SELECT *
FROM E2
WHERE EMP_DOB < '1960-01-01'
UNION
SELECT *
FROM E3
WHERE EMP_DOB < '1960-01-01';

**Output:**

(Same as Fragmentation Transparency)

EMP_NAME | EMP_DOB    | EMP_ADDRESS  | EMP_DEPARTMENT | EMP_SALARY
---------|------------|------------- |----------------|-----------
Alice    | 1958-05-15 | 123 Main St  | Sales          | 70000
Carol    | 1955-02-10 | 789 Pine Ln  | Finance        | 80000
Eve      | 1959-12-28 | 202 Maple Dr | IT             | 85000
Henry    | 1957-06-18 | 505 Willow Pl| Finance        | 82000

#### Case 3: Local Mapping Transparency
- Both the fragment name and its location must be specified:

In [None]:
SELECT *
FROM E1 NODE NY
WHERE EMP_DOB < '1960-01-01'
UNION
SELECT *
FROM E2 NODE ATL
WHERE EMP_DOB < '1960-01-01'
UNION
SELECT *
FROM E3 NODE MIA
WHERE EMP_DOB < '1960-01-01';

**Output:**

(Same as Fragmentation and Location Transparency)

EMP_NAME | EMP_DOB    | EMP_ADDRESS  | EMP_DEPARTMENT | EMP_SALARY
---------|------------|------------- |----------------|-----------
Alice    | 1958-05-15 | 123 Main St  | Sales          | 70000
Carol    | 1955-02-10 | 789 Pine Ln  | Finance        | 80000
Eve      | 1959-12-28 | 202 Maple Dr | IT             | 85000
Henry    | 1957-06-18 | 505 Willow Pl| Finance        | 82000

Please Note:

**NODE** Keyword:
The NODE keyword (or a similar construct) is used to indicate the location or site of the fragment.

- In Local Mapping Transparency example:
    - E1 NODE NY means fragment E1 is located at the NY node (New York).
    - E2 NODE ATL means fragment E2 is located at the ATL node (Atlanta).
    - E3 NODE MIA means fragment E3 is located at the MIA node (Miami).

- The NODE keyword is *not standard SQL*. It is a representation of how a system might implement local mapping transparency. Some systems may use other key words, or different methods all together.

- Some systems might use:
    - Server names.
    - IP addresses.
    - Logical names defined in a distributed data dictionary.
    - A connection string.



## Example 2: 

### Tables:
- EMP(name, ecode, dcode, age, city)
- DEPT(dcode, division, budget)

EMP table is fragmented into 3 fragments:

* EMP1 (Kolkata):

name  |	ecode |	dcode |	 age  |	city
------|-------|-------|-------|------
Amit  |	101   |  D1   |	30	  | Kolkata
Bina  |	102   |	D2    |	28    |	Kolkata
Cintu |	103	  | D4	  | 35	  | Kolkata

* EMP2 (Delhi):

name  |	ecode |	dcode |	 age  |	city
------|-------|-------|-------|------
Dina  |	104   |	D1    |	32	  | Delhi
Esha  |	105   |	D2    |	29	  | Delhi
Fani  |	106   |	D3	  | 31	  | Delhi

* EMP3 (Mumbai):

name  |	ecode |	dcode |	 age  |	city
------|-------|-------|-------|------
Frank |	107   |	D1    |	33	  | Mumbai
Hari  |	108   |	D5    |	27	  | Mumbai
Isha  |	109   |	D6	  | 36	  | Mumbai



DEPT table is fragmented into 2 fragments:

- DEPT1 (Budget < 1000000):

dcode |	division |	budget
------|----------|--------
D1    |	1        |	500000
D2    |	4        |	600000
D3    |	5        |	700000

- DEPT2 (Budget >= 1000000):

dcode |	division  |	budget
------|---------- |--------
D3    |	2         |	1500000
D4    |	4         |	2000000
D6    |	5    	  |	1800000


**Query:**

List the name and dcode for all employees who work in a department in division 4.

**Level 1 Query** - Fragmentation Transparency:

In [None]:
SELECT name, dcode
FROM EMP
WHERE dcode IN (SELECT dcode FROM DEPT WHERE division = 4);

**Explanation:**

The query joins the EMP and DEPT tables based on the dcode column.
It filters the results to include only departments where division is 4.
It selects the name and dcode columns from the resulting joined table.

Output:

name  |	dcode
------|-------
Bina  |	D2
Cintu |	D4

**Level 2 query** - Location transparency

In [None]:
SELECT e.name, e.dcode
FROM EMP1 e
JOIN (
    SELECT dcode FROM DEPT1 WHERE division = 4
    UNION
    SELECT dcode FROM DEPT2 WHERE division = 4
) d ON e.dcode = d.dcode

UNION

SELECT e.name, e.dcode
FROM EMP2 e
JOIN (
    SELECT dcode FROM DEPT1 WHERE division = 4
    UNION
    SELECT dcode FROM DEPT2 WHERE division = 4
) d ON e.dcode = d.dcode

UNION

SELECT e.name, e.dcode
FROM EMP3 e
JOIN (
    SELECT dcode FROM DEPT1 WHERE division = 4
    UNION
    SELECT dcode FROM DEPT2 WHERE division = 4
) d ON e.dcode = d.dcode;

**Explanation of the SQL:**

*Subquery for Division 4 Departments:*

> SELECT dcode FROM DEPT1 WHERE division = 4 UNION SELECT dcode FROM DEPT2 WHERE division = 4
creates a temporary result set containing the department codes from both DEPT1 and DEPT2 that are in division 4.

*Joins:*

>Each SELECT ... FROM EMPn ... JOIN ... block joins an employee fragment (EMP1, EMP2, EMP3) with the subquery result based on the dcode column.

*Unions:*

>The UNION operators combine the results from the three employee fragment queries.


Output:

name  |	dcode
------|-------
Bina  |	D2
Cintu |	D4

**Exercise Query**
1. The end user wants to list name and age for all such employees
who works in a department whose budget is 50,000.




In [None]:
-- Level 1
SELECT name, age
FROM EMP
WHERE dcode IN (SELECT dcode FROM DEPT WHERE budget = 500000);

-- Level 2
SELECT e.name, e.age
FROM EMP1 e
JOIN (SELECT dcode FROM DEPT1 WHERE budget = 500000) d ON e.dcode = d.dcode

UNION

SELECT e.name, e.age
FROM EMP2 e
JOIN (SELECT dcode FROM DEPT1 WHERE budget = 500000) d ON e.dcode = d.dcode

UNION

SELECT e.name, e.age
FROM EMP3 e
JOIN (SELECT dcode FROM DEPT1 WHERE budget = 500000) d ON e.dcode = d.dcode;

Output (Q1):

| name  | age |
|-------|-----|
| Amit  | 30  |
| Dina  | 32  |
| Frank | 33  |

2. The end user wants to list name, age and city for all such
employees who works in division 4 and is below 40 years of age.

In [None]:
-- Level 1
SELECT name, age, city
FROM EMP
WHERE dcode IN (SELECT dcode FROM DEPT WHERE division = 4)
AND age < 40;

-- Level 2
SELECT e.name, e.age, e.city
FROM EMP1 e
JOIN (
    SELECT dcode FROM DEPT1 WHERE division = 4
    UNION
    SELECT dcode FROM DEPT2 WHERE division = 4
) d ON e.dcode = d.dcode
WHERE e.age < 40

UNION

SELECT e.name, e.age, e.city
FROM EMP2 e
JOIN (
    SELECT dcode FROM DEPT1 WHERE division = 4
    UNION
    SELECT dcode FROM DEPT2 WHERE division = 4
) d ON e.dcode = d.dcode
WHERE e.age < 40

UNION

SELECT e.name, e.age, e.city
FROM EMP3 e
JOIN (
    SELECT dcode FROM DEPT1 WHERE division = 4
    UNION
    SELECT dcode FROM DEPT2 WHERE division = 4
) d ON e.dcode = d.dcode
WHERE e.age < 40;

Output (Q2):

| name  | age | city    |
|-------|-----|---------|
| Bina  | 28  | Kolkata |
| Esha  | 29  | Delhi   |
| Cintu | 35  | Kolkata |