## 4.6 Files

Every computer system uses files to save things from one computation to the next.

Python provides many facilities for creating and accessing files.


<b style="color:blue">open()</b> returns a file object, and is most commonly used with two arguments:`open(filename, mode)`.

```python
f = open('workfile', 'w')
```    

* The first argument is a string containing the filename. 


* The second argument is another string containing a few characters describing the way in which the file will be used. 


  * **Writing** : 'w' for only writing (an existing file with the same name will be erased)

  * **Reading** :'r' when the file will only be read 
  
    * **Reading and writing**: 'r+' opens the file for both reading and writing.

  * **Appending** :'a' opens the file for appending; any data written to the file is automatically added to the end.
  
Here we illustrate some of the basic ones:

### 4.6.1  writing 


<b style="color:blue">open</b>  create a file with the name **kids.txt** ,

using the argument <b style="color:blue">w</b> to indicate that the file is to be opened for **writing**.

 * an existing file with the same name will be erased

In [None]:
nameHandle = open('kids.txt', 'w')

for i in range(2):
    name = input('Enter name: ')

    nameHandle.write(name + '\n') # the string '\n' indicates a new line character.

nameHandle.close()

**name + '\n'**: the string **'\n'** indicates a **new line** character.

In [6]:
nameHandle = open('kids.txt', 'w')

for i in range(2):
    if i==0:
        name='David'
    elif i==1:
        name='Andrea'
        
    nameHandle.write(name + '\n') # the string '\n' indicates a new line character.

nameHandle.close()

In [7]:
!dir kids.txt

 驱动器 F 中的卷是 cmh
 卷的序列号是 9C25-3306

 F:\SEU\SEE\PySEE\home\notebook 的目录

2019/03/21  09:52                15 kids.txt
               1 个文件             15 字节
               0 个目录 103,862,226,944 可用字节


In [None]:
%load kids.txt

### 4.6.2 Reading

<b style="color:blue">open</b> the file for **reading**,using the argument <b style="color:blue">r</b>

'r' will be assumed if it’s omitted.

In [11]:
nameHandle = open('kids.txt', 'r')
#nameHandle = open('kids.txt', 'r') #'r' will be assumed if it’s omitted.

for line in nameHandle:
    print(line)
    
nameHandle.close()

David

Andrea



you see **new line** between each name. Because

```python 
`\n`  # one  line
```
**ADD**

```python
print # one line
```

We could have avoided printing that(**new line**) by writing print
```python
  line[:-1]
```
**slicing** line to delete **'\n'** in each line for file. 

In [5]:
nameHandle = open('kids.txt', 'r')
for line in nameHandle:
    print(line[:-1])  # print(line[:len(line)-1] \n
nameHandle.close()

David
Andrea


#### Writing and Reading

In [16]:
nameHandle = open('kids.txt', 'w')
nameHandle.write('Michael\n')
nameHandle.write('Mark\n')
nameHandle.close()

nameHandle = open('kids.txt', 'r')
for line in nameHandle:
    print(line[:-1])
nameHandle.close()

Michael
Mark


In [None]:
# %load kids.txt
Michael
Mark

#### Reading and writing

'r+' opens the file for both reading and writing.

In [13]:
nameHandle = open('kids.txt', 'r+')
nameHandle.write('David\n')
nameHandle.close()

In [None]:
%load kids.txt

### 4.6.3 Appending

<b style="color:blue">open</b> the file for **appending** (instead of writing) by using the argument  <b style="color:blue">a</b>

In [None]:
nameHandle = open('kids.txt', 'a') # argument 'a' -  appending

nameHandle.write('David\n')
nameHandle.write('Andrea\n')

nameHandle.close()

nameHandle = open('kids.txt', 'r')
for line in nameHandle:
    print(line[:-1])
nameHandle.close()

Some of the common operations on files are summarized

![Figure412](./img/figure412.jpg)

### 4.6.4  Files in a specific `encoding`

`Text encoding` is a sufficiently complex topic that there’s no one size fits all answer - the right answer for a given application will depend on factors like:

* how much control you have over the text encodings used

* whether avoiding program failure is more important than avoiding data corruption or vice-versa

* how common encoding errors are expected to be, and whether they need to be handled gracefully or can simply be rejected as invalid input


#### 1 gbk

GBK: 汉字内码扩展规范(Chinese Internal Code Specification)

In [None]:
fname="./code/python/gbk.txt"
f = open(fname,'w',encoding="gbk")
f.write('中文-gbk')
f.close()

Open GBK with UTF-8 encoding

In [None]:
f = open(fname,'r',encoding="utf-8")
line=f.readline()
print(line)
f.close()

Open GBK with GBK encoding

In [None]:
f = open(fname,'r',encoding="gbk")
line=f.readline()
print(line)
f.close()

#### 2 utf-8

In [None]:
fname="./code/python/utf-8.txt"
f = open(fname,'w',encoding="utf-8")
f.write('中文-utf-8')
f.close()

Open UTF-8 with GBK encoding

In [None]:
f = open(fname,'r',encoding="gbk")
line=f.readline()
print(line)
f.close()

Open UTF-8 with UTF-8 encoding

In [None]:
f = open(fname,'r',encoding="utf-8")
line=f.readline()
print(line)
f.close()

#### 3 The default encoding：UTF-8


In [None]:
fname="./code/python/default.txt"
f = open(fname,'w')
f.write('中文default')
f.close()

In [None]:
f = open(fname,'r')
line=f.readline()
print(line)
f.close()

### Further Reading

**Python Tutorial 7.2**. Reading and Writing Files  

 * https://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files
  
 
**Processing Text Files in Python 3**

  * http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html
  

### Binary Data Files

**J. M. Hughes. Real World Instrumentation with Python: CHAPTER 12 Reading and Writing Data Files**

  *  [UnitA-2: Reading-and-Writing-Data-Files-Binary-Data-Files](./UnitA-2-Reading-and-Writing-Data-Files-Binary-Data-Files.ipynb)
 

  
    


#   Table Data(File), Dictionary and List

|name      |  age   |   city|
|---------:|------:|-------:|
|zhangsan  |  28    |  nanjing|

```python
data table  -> dict
     colmun -> key(string)
       row  -> value(list)
```
In the concept of Relation Database

```python
data table  -> Relation Database's Table
     colmun -> field
       row  -> record
```

## Creating data table dicts from sequences

```
dict {field1:[],field2:[]....}
```
data table is dict

In [None]:
fields=['name','age','city']
rows=[['Zhangsan',28,'Nanjing'],['Lishi',18,'Beijing']]
datatable={}

# 1 create the dict of  data table
for key in fields:
    datatable[key] = []
print(datatable)  

# 2 set the value list of key
for r in rows:
    for i in range(len(fields)):
        datatable[fields[i]].append(r[i])
print(datatable)

print("\n",fields)
for r in range(len(rows)):
    currow=[]
    for i in range(len(fields)):
        currow.append(datatable[fields[i]][r])
    print(currow)


## Creating dicts from the file of table data

dict {field1:[],field2:[]....}

datatable is a dict

In [20]:
%%file ./data/personrecords.txt
name        age
zhangsan    28
lishi       18 

Overwriting ./data/personrecords.txt


In [21]:
fields=[]
datatable={}

personrecords=open('./data/personrecords.txt','r')

# 1 get string of field(column)
fields=personrecords.readline().split()
print(fields)

# 2 create the dict of  data table
for key in fields:
    datatable[key] = []
print("dict for datatable:{field1:[],field2:[]....}")
print(datatable)

# 2 read each record into the value list of key 
for line in personrecords:
    currowrecord=line.split()
    for i in range(len(fields)):
         datatable[fields[i]].append(currowrecord[i])

personrecords.close()

print(datatable)

recordCount=len(datatable[fields[0]])
print("\n",fields)
for r in range(recordCount):
    currow=[]
    for i in range(len(fields)):
        currow.append(datatable[fields[i]][r])
    print(currow)    

['name', 'age']
dict for datatable:{field1:[],field2:[]....}
{'name': [], 'age': []}
{'name': ['zhangsan', 'lishi'], 'age': ['28', '18']}

 ['name', 'age']
['zhangsan', '28']
['lishi', '18']


### add `field city` to the table file

```python
list[dict]: [{field1:value,field2:value,*:*},{field1:value,field2:value,*.*},...]
```
data table is a list, each row is dict

In [None]:
%%file ./data/personrecords.txt
name        age      city
zhangsan    28      nanjing
lishi       18      shanghai

In [23]:
records=[]
fields=[]

datatable=[] 

personrecordsfile=open('./data/personrecords.txt','r')

# 1 get string of field(column)
fields=personrecordsfile.readline().split()
print(fields)

# 2 read each record into dict：key is field string
for line in personrecordsfile:
    currowrecord=line.split()
    # 2.1 init dict
    rowrecord={}
    for i in range(len(fields)):
        # 2.2 add key:value to dict
        rowrecord[fields[i]]=currowrecord[i]
    # 2.3 add dict to list:records
    datatable.append(rowrecord)

personrecordsfile.close()

for item in datatable:
    print(item)
    
for item in datatable:
    print(item['name'])    

['name', 'age']
{'name': 'zhangsan', 'age': '28'}
{'name': 'lishi', 'age': '18'}
zhangsan
lishi


## csv.DictReader

The csv module implements classes to read and write tabular data in CSV format.

https://docs.python.org/3.7/library/csv.html

In [25]:
%%file ./data/personrecords.csv
name,age
zhangsan,28
lishi,18 

Overwriting ./data/personrecords.csv


In [26]:
import  csv
filename="./data/personrecords.csv"
csvfile = open(filename, 'r')
reader = csv.DictReader(csvfile)
for line in reader:
    name = line['name']
    age=line['age']
    print(name,age)  

zhangsan 28
lishi 18 


### our DictReader

In [27]:
def ourDictReader(file):
    records=[]
    fields=file.readline()[:-1].split(',')
    print(fields)

    for line in file:
        currowrecord=line.split(',')
        rowrecord={}
        for i in range(len(fields)):
            rowrecord[fields[i]]=currowrecord[i]
        records.append(rowrecord)
    return records

filerecords=open('./data/personrecords.csv','r')
reader=ourDictReader(filerecords)
for line in reader:
    print(line)
    print(line['name'],line['age'])


['name', 'age']
{'name': 'zhangsan', 'age': '28\n'}
zhangsan 28

{'name': 'lishi', 'age': '18 \n'}
lishi 18 

