## Different Data Formats

* we store data in different file formats and data formats like
    * CSV (comma separated values)
    * JSON 
    * XML
    * ...

* while we can read these files directly and using open and read it may be tiresome.

* we have dedicated modules to access and modify data in these formats

* for most of them we need to import a module and work accordingly.


### CSV file.

* CSV file typically have records where fields are separated by commas
* first row represent field names
* each row is one record
* fields of each recrod is separated by commas


#### We can read it directly using open and read
* it is just a normal file

In [1]:
def read_file(filename):
    with open(filename, 'r') as f:
        return f.read()

In [4]:
data = read_file('book.csv')
data

'Title, Author, Price, Rating\nThe Accursed God, Vivek Dutta Mishra, 399, 4.5\nKane and Abel, Jeffrey Archer, 499, 4.7\nBrethren, John Grisham, 300, 4.1\nManas, Vivek Dutta Mishra, 299,4.7'

### We would prefer to get each record and field separately

In [5]:
def read_csv(file):
    header=None
    records=[]
    with open(file, 'r') as f:
        line = f.readline() #read header line
        header = line.strip().split(',')
        while True:
            line=f.readline() 
            if not line:
                break
            else:
                record = line.strip().split(',')
                records.append(record)

    return records,header

In [7]:
records,header= read_csv('book.csv')

print(header)
for record in records:
    print(record)

['Title', ' Author', ' Price', ' Rating']
['The Accursed God', ' Vivek Dutta Mishra', ' 399', ' 4.5']
['Kane and Abel', ' Jeffrey Archer', ' 499', ' 4.7']
['Brethren', ' John Grisham', ' 300', ' 4.1']
['Manas', ' Vivek Dutta Mishra', ' 299', '4.7']


### read record as dictionary

In [11]:
def read_csv_dict(filename):
    with open(filename) as f:
        columns=[ c.strip() for c in  f.readline().strip().split(',')]
        records=[]
        while True:
            data = f.readline()
            if not data:
                break
            fields= data.strip().split(',')
            record={}
            for i in range(len(columns)):
                record[columns[i]]=fields[i]
            records.append(record)

    return records

In [12]:
books= read_csv_dict('book.csv')

for book in books:
    for field,value in book.items():
        print(f'{field}={value}')
    print()

Title=The Accursed God
Author= Vivek Dutta Mishra
Price= 399
Rating= 4.5

Title=Kane and Abel
Author= Jeffrey Archer
Price= 499
Rating= 4.7

Title=Brethren
Author= John Grisham
Price= 300
Rating= 4.1

Title=Manas
Author= Vivek Dutta Mishra
Price= 299
Rating=4.7



### we can make it easier by using a module

In [13]:
import csv

In [17]:
def read_csv2(file):
    records=[]
    header=None
    with open('book.csv') as f:
        reader=csv.reader(f)
        for row in reader:
            if header==None:
                header=row
            else:
                records.append(row)

    return records,header

In [18]:
records,header=read_csv2('books.csv')

print(header)

for record in records:
    print(record)

['Title', ' Author', ' Price', ' Rating']
['The Accursed God', ' Vivek Dutta Mishra', ' 399', ' 4.5']
['Kane and Abel', ' Jeffrey Archer', ' 499', ' 4.7']
['Brethren', ' John Grisham', ' 300', ' 4.1']
['Manas', ' Vivek Dutta Mishra', ' 299', '4.7']


In [19]:
def read_csv_dict2(file):
    with open(file) as f:
        header=[c.strip() for c in f.readline().split(',')]

        reader=csv.DictReader(f, header)
        records=[]
        for row in reader:
            records.append(row)

    return records



In [21]:
books = read_csv_dict2('book.csv')

for book in books:
    print(book)

{'Title': 'The Accursed God', 'Author': ' Vivek Dutta Mishra', 'Price': ' 399', 'Rating': ' 4.5'}
{'Title': 'Kane and Abel', 'Author': ' Jeffrey Archer', 'Price': ' 499', 'Rating': ' 4.7'}
{'Title': 'Brethren', 'Author': ' John Grisham', 'Price': ' 300', 'Rating': ' 4.1'}
{'Title': 'Manas', 'Author': ' Vivek Dutta Mishra', 'Price': ' 299', 'Rating': '4.7'}


## JSON data format

* one of the most popular data formats 
* extensively used for 
    * transfering data in services

* used with JS and it is native to JavaScript

* AS format very similar pythion codes.


### JSON format


1. we represent an object (record) as a key value pair just like python dictionary


```json
{
"title":"The Accursed God",
"author":"Vivek Dutta Mishra",
"price": 299
}
```

2. we represent a list using []

```json
[ 
    {
        "title":"The Accursed God",
        "author":"Vivek Dutta Mishra",
        "price": 299
    },
    {
        "title":"Manas",
        "author":"Vivek Dutta Mishra",
        "price": 199
    }
]
```

#### To work with json we use json module
    

In [22]:
import json

### Important JSON functions

* we need functionalities to convert
    * python data to json
    * json data to python


#### 1. convert json to python:  load or loads

* load can load a json file and convert it to python objects
    * list
    * dict.

* loads is similar to load except it works with json string that you already have.

#### 2 convert python objects to json:  dump and dumps

* dump will save python data (dict and list combo) into a file
* dumps will create a json string from python data.

In [24]:
#dir(json)

### Converting Python Objects to json

* by default it supports similar sequences like
    * dict to represent object
    * list/tuple to represent collection

#### Here we have a list of dict representing books

* this is a python object

In [44]:
def print_books(books):
    for field in books[0].keys():
        print(field.ljust(20),end="")
    print()
    print('-'*80)
    for book in books:
        for value in book.values():
            print(str(value).ljust(20),end="")
        print()





In [45]:
books= read_csv_dict('book.csv')
print_books(books)

Title               Author              Price               Rating              
--------------------------------------------------------------------------------
The Accursed God     Vivek Dutta Mishra  399                 4.5                
Kane and Abel        Jeffrey Archer      499                 4.7                
Brethren             John Grisham        300                 4.1                
Manas                Vivek Dutta Mishra  299                4.7                 


#### we can convert it to json string

In [36]:
books_json = json.dumps(books)

In [40]:
print(books_json)
print(type(books_json))

[{"Title": "The Accursed God", "Author": " Vivek Dutta Mishra", "Price": " 399", "Rating": " 4.5"}, {"Title": "Kane and Abel", "Author": " Jeffrey Archer", "Price": " 499", "Rating": " 4.7"}, {"Title": "Brethren", "Author": " John Grisham", "Price": " 300", "Rating": " 4.1"}, {"Title": "Manas", "Author": " Vivek Dutta Mishra", "Price": " 299", "Rating": "4.7"}]
<class 'str'>


### we can also save it to a file using json.dump

* note json.dump is for python object (list,dict) and not for string
* you can save string json direction using open/write



In [42]:
with open('books.json', 'w') as f:
    json.dump(books,f)

#### we can save the data in proper indented format using both dump and dumps


In [43]:
with open('books.json',"w") as f:
    json.dump(books,f,indent=4)

### We can also read the data from a json file as python objects

In [46]:
with open('books.json') as f:
    new_books=json.load(f)


print_books(new_books)


Title               Author              Price               Rating              
--------------------------------------------------------------------------------
The Accursed God     Vivek Dutta Mishra  399                 4.5                
Kane and Abel        Jeffrey Archer      499                 4.7                
Brethren             John Grisham        300                 4.1                
Manas                Vivek Dutta Mishra  299                4.7                 


### json module doesn't support standard objects from user defined classes

* we need to define encoders.

In [53]:
class Book:
    def __init__(self,title,author,price,rating):
        self.title=title
        self.author=author
        self.price=price
        self.rating=rating

def print_books_list(books):
    columns=["Title","Author","Price","Rating"]
    for column in columns:
        print(column.ljust(25),end="|")
    print()
    print('-'*104)

    for book in books:
        print(f'{book.title.ljust(25)}|{book.author.ljust(25)}|{str(book.price).rjust(25)}|{str(book.rating).ljust(25)}|')
    
    print('-'*105)

In [54]:
books=[
    Book("The Accursed God","Vivek Dutta Mishra",299,4.6),
    Book("Manas","Vivek Dutta Mishra",199,4.7),
    Book("Rashmirathi","Ramdhari Singh Dinkar",99,4.7),
]

In [55]:
print_books_list(books)

Title                    |Author                   |Price                    |Rating                   |
--------------------------------------------------------------------------------------------------------
The Accursed God         |Vivek Dutta Mishra       |                      299|4.6                      |
Manas                    |Vivek Dutta Mishra       |                      199|4.7                      |
Rashmirathi              |Ramdhari Singh Dinkar    |                       99|4.7                      |
---------------------------------------------------------------------------------------------------------


### We can save this record using json.dump/json.dumps

In [56]:
json.dumps(books)

TypeError: Object of type Book is not JSON serializable

#### Every Pyhon Object has a buildint dictionary

* we can use it to convert python object to json

* we can use it to convert json to python object
class BookEncoder(json.JSONEncoder):
    def default(self,obj):

In [57]:
books_dict = [book.__dict__ for book in books]

books_json= json.dumps(books_dict, indent=3)

print(books_json)

[
   {
      "title": "The Accursed God",
      "author": "Vivek Dutta Mishra",
      "price": 299,
      "rating": 4.6
   },
   {
      "title": "Manas",
      "author": "Vivek Dutta Mishra",
      "price": 199,
      "rating": 4.7
   },
   {
      "title": "Rashmirathi",
      "author": "Ramdhari Singh Dinkar",
      "price": 99,
      "rating": 4.7
   }
]
