## Lab 5. Work with files

In this tutorial, you'll learn file handling in Python, file operations such as opening a file, reading from it, writing into it, closing it, renaming a file, deleting a file, and various file methods.

To store data temporarily and permanently, we use files. A file is the collection of data stored on a disk in one unit identified by filename.

### Types of File

    Text File: Text file usually we use to store character data. For example, test.txt
    Binary File: The binary files are used to store binary data such as images, video files, audio files, etc.


## File Path

A file path defines the location of a file or folder in the computer system. There are two ways to specify a file path.

   1. **Absolute path**: which always begins with the root folder
   
   2. **Relative path**: which is relative to the program's current working directory

The absolute path includes the complete directory list required to locate the file.

For example, D:/user/Pynative/data/sales.txt is an absolute path to discover the sales.txt. All of the information needed to find the file is contained in the path string.

After the filename, the part with a period(.) is called the file's **extension**, and that tells us the type of file. Here, project.pdf is a pdf document.

### Read File

To read or write a file, we need to open that file. For this purpose, Python provides a built-in function open().

Pass file path and access mode to the open(file_path, access_mode) function. It returns the file object. This object is used to read or write the file according to the access mode.

Access mode represents the purpose of opening the file. For example, R is for reading and W is for writing


In [None]:
# Opening the file with relative path for writing
fp = open(r'sample.txt', 'w')
for i in range(5):
    fp.write(f'Line {i} \n')

fp.close()


In [None]:
# Opening the file with relative path for reading
fp = open(r'sample.txt', 'r')
print(fp.read())

fp.close()

Line 0 
Line 1 
Line 2 
Line 3 
Line 4 



The following table shows different access modes we can use while opening a file in Python.
<img src="table_61.png">

### Move File Pointer

The seek() method is used to change or move the file's handle position to the specified location. The cursor defines where the data has to be read or written in the file.

The position (index) of the first character in files is zero, just like the string index.

In [None]:
f = open("sample.txt", "r")
# move to 11 character
f.seek(11)
# read from 11th character
print(f.read())
f.close()

ne 1 
Line 2 
Line 3 
Line 4 



The tell() method to return the current position of the file pointer from the beginning of the file.

In [None]:
f = open("sample.txt", "r")
# read first line
f.readline()
# get current position of file handle
print(f.tell())

f.close()

# Output 9

9


### Copy Files

There are several ways to cop files in Python. The shutil.copy() method is used to copy the source file's content to the destination file.

Example

In [None]:
import shutil

src_path = r"sample.txt"
dst_path = r"copied.txt"
shutil.copy(src_path, dst_path)
print('Copied')

Copied


In [None]:
with open("copied.txt", 'r') as f:
    print(f.readlines())


['Line 0 \n', 'Line 1 \n', 'Line 2 \n', 'Line 3 \n', 'Line 4 \n']


### Rename Files

In Python, the os module provides the functions for file processing operations such as renaming, deleting the file, etc. The os module enables interaction with the operating system.

The os module provides rename() method to rename the specified file name to the new name. The syntax of rename() method is shown below.

In [None]:
import os

# Absolute path of a file
old_name = "copied.txt"
new_name = "copied_new.txt"

# Renaming the file
os.rename(old_name, new_name)

### Delete Files

In Python, the os module provides the remove() function to remove or delete file path.

In [None]:
import os

# remove file with absolute path
os.remove(r"copied_new.txt")

### Working With Bytes

A byte consists of 8 bits, and bits consist of either 0 or 1. A Byte can be interpreted in different ways like binary octal, octal, or hexadecimal. Python stores files in the form of bytes on the disk.

When we open a file in text mode, that file is decoded from bytes to a string object. when we open a file in the binary mode it returns contents as bytes object without decoding.

Now let's see the steps to write bytes to a file in Python.

    Open the file in binary write mode using wb
    Specify contents to write in the form of bytes.
    Use the write() function to write byte contents to a binary file.


In [None]:
bytes_data = b'\x21'

with open("test.txt", "wb") as fp:
    # Write bytes to file
    fp.write(bytes_data)

**Task 1: Create a text file. Count a number of lines in file**

In [None]:
my_file = open('laba_5.txt', 'w')
for i in range(5):
    my_file.write(f'Line {i} \n')
my_file.close()


In [None]:
my_file = open('laba_5.txt', 'r')
print(my_file.read())

my_file.close()

Line 0 
Line 1 
Line 2 
Line 3 
Line 4 



**Task 2: Search for a special String in Text File**

Use the file read() method and string class find() method to search for a string in a text file.

In [None]:
my_file = open('laba_5.txt', 'r')
print(my_file.read().find(' 3'))
# print(my_file.read(28))
my_file.close()


28


**Task 3: Count Number of Files in a Directory**

Use the os.listdir() and os.path.isfile() functions of an os module to count the number of files of a directory. 

In [None]:
import os
file_list = os.listdir('/content/sample_data')
file_list

['anscombe.json',
 'README.md',
 'california_housing_train.csv',
 'california_housing_test.csv',
 'mnist_train_small.csv',
 'mnist_test.csv']

In [None]:
len(file_list)

6

In [None]:
n = 0
path = '/content/sample_data'
for i in range(len(file_list)):
  if os.path.isfile(path + '/' + file_list[i]):
    n +=1
n

6

**Task 4: List Files in Directory with Extension txt** 
    
1. Import glob module

    The glob module, part of the Python Standard Library, is used to find the files and folders whose names follow a specific pattern. The searching rules are similar to the Unix Shell path expansion rules.
    
    
2. Construct a pattern to search for the files having the specific extension

    For example, directory_path/*.txt to list all text files present in a given directory path. Here the * means file name can be anything, but it must have a txt extension.
    
    
3. Use glob() method

    The gob.glob(pathname) method returns a list of files that matches the path and pattern specified in the pathname argument. in this case, it will return all text files.


In [None]:
import glob
file_list = glob.glob('/content/sample_data/*.json')
file_list

['/content/sample_data/anscombe.json']

In [None]:
file_list = glob.glob('/content/sample_data/*.csv')
file_list

['/content/sample_data/california_housing_train.csv',
 '/content/sample_data/california_housing_test.csv',
 '/content/sample_data/mnist_train_small.csv',
 '/content/sample_data/mnist_test.csv']

In [None]:
file_list = glob.glob('/content/sample_data/*.md')
file_list

['/content/sample_data/README.md']

**Task 5: List all files in the directory**

There are multiple ways to list files of a directory. The following four methods can be used:
1. os.listdir('dir_path'): Return the list of files and directories present in a specified directory path.
2. os.walk('dir_path'): Recursively get the list all files in directory and subdirectories.
3. os.scandir('path'): Returns directory entries along with file attribute information.
4. glob.glob('pattern'): glob module to list files and folders whose names follow a specific pattern.

In [None]:
file_list = os.listdir('/content/sample_data')
file_list

['anscombe.json',
 'README.md',
 'california_housing_train.csv',
 'california_housing_test.csv',
 'mnist_train_small.csv',
 'mnist_test.csv']

In [None]:
for i in os.walk('/content/sample_data'):
  print(i,'\n')

('/content/sample_data', [], ['anscombe.json', 'README.md', 'california_housing_train.csv', 'california_housing_test.csv', 'mnist_train_small.csv', 'mnist_test.csv']) 



In [None]:
for i in os.scandir('/content/sample_data'):
    print(i)

<DirEntry 'anscombe.json'>
<DirEntry 'README.md'>
<DirEntry 'california_housing_train.csv'>
<DirEntry 'california_housing_test.csv'>
<DirEntry 'mnist_train_small.csv'>
<DirEntry 'mnist_test.csv'>


In [None]:
file_list = glob.glob('/content/sample_data/*')
file_list

['/content/sample_data/anscombe.json',
 '/content/sample_data/README.md',
 '/content/sample_data/california_housing_train.csv',
 '/content/sample_data/california_housing_test.csv',
 '/content/sample_data/mnist_train_small.csv',
 '/content/sample_data/mnist_test.csv']

**For every tasks from 6 to 8 use 2 different file formats, for example: .csv, .xml, .html, .json, .yaml**

**Task 6: Create a list and write it to a file**

list => .csv

In [None]:
list1 = [1 , 2 , ' Tomsk ', 10 , ' TPU ', 'Happy New Year']
my_file = open('laba_5.csv', 'w')
for i in range(len(list1)):
    my_file.write(str(list1[i]))
my_file.close()

In [None]:
my_file = open('laba_5.csv', 'r')
print(my_file.read())

12 Tomsk 10 TPU Happy New Year


list =>.xlsx

In [None]:
import xlsxwriter

list1 = [ 1 , 2 , ' Tomsk ', 10 , ' TPU ', 'Happy New Year']

# открываем новый файл на запись
workbook = xlsxwriter.Workbook('laba_5.xlsx')
# создаем там "лист"
worksheet = workbook.add_worksheet()
# в ячейку A1 пишем текст
for i in range(len(list1)):
  for j in range(len(list1)):
    worksheet.write(i,j, list1[i])

# сохраняем и закрываем
workbook.close()

In [None]:
import pandas as pd

df = pd.read_excel('/content/laba_5.xlsx')
df

Unnamed: 0,1,1.1,1.2,1.3,1.4,1.5
0,2,2,2,2,2,2
1,Tomsk,Tomsk,Tomsk,Tomsk,Tomsk,Tomsk
2,10,10,10,10,10,10
3,TPU,TPU,TPU,TPU,TPU,TPU
4,Happy New Year,Happy New Year,Happy New Year,Happy New Year,Happy New Year,Happy New Year


**Task 7: Create a dictionary and write it to a file**

dict => .json

In [None]:
import json
dict1 = {'City': 'Tomsk', 'Univercity':' TPU', 'Name':'Nikita'}
dict1_json = json.dumps(dict1)
my_file = open('laba_5.json', 'w')
my_file.write(dict1_json)
my_file.close()

In [None]:
my_file = open('laba_5.json', 'r')
print(my_file.read())

{"City": "Tomsk", "Univercity": " TPU", "Name": "Nikita"}


dict => .xml

In [None]:
import xml.etree.ElementTree as ET

items = [
    {"first_name": "Nikita", "last_name": "Yamkin", "city": "Tomsk"},
    {"first_name": "Sergey", "last_name": "Sidorov", "city": "Sochi"},
]

root = ET.Element('root')

for i, item in enumerate(items, 1):
    person = ET.SubElement(root, 'person' + str(i))
    ET.SubElement(person, 'first_name').text = item['first_name']
    ET.SubElement(person, 'last_name').text = item['last_name']
    ET.SubElement(person, 'city').text = item['city']

tree = ET.ElementTree(root)
tree.write('laba_5.xml')

In [None]:
datasource = open('/content/laba_5.xml')
doc = ET.parse(datasource)
root = doc.getroot()
for elem in root:
  for subelem in elem:
    print(subelem.text)

Nikita
Yamkin
Tomsk
Sergey
Sidorov
Sochi


**Task 8: Create a set and write it to a file**

set => .yaml

In [None]:
import yaml

set1 = {'Tomsk', 'TPU', 'Nikita', 'Yamkin'}

my_file = open('laba_5.yaml', 'w')

doc = yaml.dump(set1, my_file)

my_file.close()

In [None]:
my_file = open('laba_5.yaml', 'r')

file_set = yaml.load(my_file, Loader = yaml.FullLoader)

print(file_set)

{'TPU', 'Tomsk', 'Nikita', 'Yamkin'}


set => .txt

In [None]:
my_file = open('laba_5.txt', 'w')
for i in set1:
  my_file.write(i)
my_file.close()

In [None]:
my_file = open('laba_5.txt', 'r')
print(my_file.read())

TPUTomskNikitaYamkin
