# Importing `-view_district.csv`

The file can be downloaded from https://data.go.th/dataset/view_district

## Introduction
This notebook documents the steps taken to successfully import the `-view_district.csv` file, which initially presented challenges due to encoding issues.

## Initial Attempt
Tried to load the CSV file using pandas with default encoding.

In [1]:
import pandas as pd

try:
    df = pd.read_csv('-view_district.csv')
except UnicodeDecodeError as e:
    print('Need to detect encoding')

Need to detect encoding


## Encoding Detection
Utilized the `chardet` library to detect the file's encoding.

In [2]:
import chardet

with open('-view_district.csv', 'rb') as file:
    result = chardet.detect(file.read())
    encoding = result['encoding']
    print(f'Detected encoding: {encoding}')

Detected encoding: Windows-1254


## Adjusting Encoding
Attempted to load the file with several encodings, including 'utf-8', 'ISO-8859-11', and 'TIS-620', without success.

In [3]:
try:
    df = pd.read_csv('-view_district.csv', encoding='utf-8')
except UnicodeDecodeError:
    try:
        df = pd.read_csv('-view_district.csv', encoding='ISO-8859-11')
    except Exception as e:
        print(f'Failed to load with both utf-8 and ISO-8859-11. Error: {e}')

Failed to load with both utf-8 and ISO-8859-11. Error: 'charmap' codec can't decode byte 0xfe in position 28: character maps to <undefined>


## Manual Inspection
Conducted a manual inspection of the file's bytes to infer the file format.

In [4]:
with open('-view_district.csv', 'rb') as file:
    byte_content = file.read(500)  # Read the first 500 bytes for inspection
    print(byte_content)

b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00>\x00\x03\x00\xfe\xff\t\x00\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\xca\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\xfe\xff\xff\xff\x00\x00\x00\x00\xfe\xff\xff\xff\x00\x00\x00\x00\xc8\x00\x00\x00\xc9\x00\x00\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xf

## Successful Import
Discovered the file was in a format used by older Microsoft Office documents, which require `xlrd` library. Successfully imported using `pd.read_excel`.

In [7]:
df = pd.read_excel('-view_district.csv')

print('Success! The file is read as an Excel document.')
df.head()

Success! The file is read as an Excel document.


Unnamed: 0,ชื่ออำเภอ (ภาษาอังกฤษ),ชื่ออำเภอ (ภาษาไทย),รหัสอำเภอ
0,Chiang Khong,เชียงของ,57030000
1,Chiang Khwan,เชียงขวัญ,45180000
2,Chiang Klang,เชียงกลาง,55090000
3,Chiang Saen,เชียงแสน,57080000
4,Chiang Yuen,เชียงยืน,44050000


## Export

In [None]:
# Exporting the DataFrame to a new CSV file with proper encoding for future use
df.to_csv('view_district_converted.csv', index=False, encoding='utf-8')

## Conclusion
This notebook outlined the investigative process and methods used to successfully import the `view_district.csv` file, highlighting the importance of understanding file formats and encoding in data handling.