In [1]:
# !pip install "git+https://github.com/tranngocminhhieu/geo-api-vietnam.git#egg=geoapivietnam"
import pandas as pd
from geoapivietnam import Correct, GetLocation

## Correct data
The `correct_province` and `correct_district` functions help correct spelling errors in Province and District quickly. Accept multiple misspelling variations.

In [2]:
Correct().correct_province('hcm')

'Hồ Chí Minh'

In [3]:
Correct().correct_province('Vùng Tàu')

'Bà Rịa - Vũng Tàu'

In [4]:
Correct().correct_province('Hanoi')

'Hà Nội'

In [5]:
Correct().correct_district('Hồ Chí Minh', 'Nhà Bé')

'Huyện Nhà Bè'

In [6]:
Correct().correct_district('Kien Giang', 'Phù Quốc')

'Thành phố Phú Quốc'

## Get location

This is the main function of the module, just provide search_term, and you will get an object location with the properties:

- `address`: The address has been formatted correctly.
- `province`: Province correct spelling.
- `district`: District is spelled correctly.
- `ward`: Ward is spelled correctly.
- `latitude`: latitude of the address.
- `longitude`: longitude of the address.
- `source`: Origin of `original_address`, including Excel, SQLite, GeoPy, and Google Maps API.
- `original_address`: Address from the above sources.

### Quick start

We can get started quickly by assigning the `GetLocation()` class to a variable.

In [7]:
get_location = GetLocation()

Using the `get_location()` method to search, you can enter any **address** or **coordinates** in Vietnam.

In [8]:
my_location = get_location.get_location(search_term='21.0088528, 105.7439194')

You will get a location object like this.

In [9]:
my_location

address: Phường Tây Mỗ, Quận Nam Từ Liêm, Hà Nội, province: Hà Nội, district: Quận Nam Từ Liêm, ward: Phường Tây Mỗ, latitude: 21.0090732, longitude: 105.7423219, source: GeoPy, original_address: Vinhomes Smart City, Phường Tây Mỗ, Quận Nam Từ Liêm, Thành phố Hà Nội, Việt Nam

Bạn có thể lấy từng phần ra bằng cách gọi thuộc tính của location.

In [10]:
my_location.address

'Phường Tây Mỗ, Quận Nam Từ Liêm, Hà Nội'

In [11]:
my_location.province

'Hà Nội'

In [12]:
my_location.district

'Quận Nam Từ Liêm'

In [13]:
my_location.ward

'Phường Tây Mỗ'

In [14]:
my_location.latitude

21.0090732

In [15]:
my_location.longitude

105.7423219

In [16]:
my_location.source

'GeoPy'

Use the `json_data` attribute if you need the data to be in the form of a dict.

In [17]:
my_location.json_data

{'address': 'Phường Tây Mỗ, Quận Nam Từ Liêm, Hà Nội',
 'province': 'Hà Nội',
 'district': 'Quận Nam Từ Liêm',
 'ward': 'Phường Tây Mỗ',
 'latitude': 21.0090732,
 'longitude': 105.7423219,
 'source': 'GeoPy',
 'original_address': 'Vinhomes Smart City, Phường Tây Mỗ, Quận Nam Từ Liêm, Thành phố Hà Nội, Việt Nam'}

### Advanced usage

Please use the params for GetLocation to improve efficiency.

- `database`: The default is `'../data/data.db'`, this is the database file that stores the search queries that you used the **GeoPy** or **Google Maps API** for the first time (work for `get_location()` only). Later queries, the module will prioritize getting historical data in this database to improve performance.
- `force_data_excel`: If you want to select a required location for a certain query, you can fill in this Excel file. Just enter the file name in the param, the module will create an Excel file with a template available for you.
- `google_maps_api_key`: If the location cannot be found in SQLite and GeoPy, the module will use the Google Maps API to continue searching for the location for you.
- `print_result`: There are a few small methods that will print information when used, you can choose `False` to turn it off.

The order of location search is as follows:
1. Excel
2. SQLite
3. GeoPy
4. Google Maps


In [18]:
with open('../data/google_maps_api_key.txt', 'r') as f:
    google_maps_api_key = f.read()

In [19]:
get_location = GetLocation(database='../data/data.db', force_data_excel='../data/force_geo_location.xlsx', google_maps_api_key=google_maps_api_key)

In [20]:
my_location = get_location.get_location('10.883449, 106.781429')

In [21]:
my_location

address: Thành phố Thủ Đức, Hồ Chí Minh, province: Hồ Chí Minh, district: Thành phố Thủ Đức, ward: None, latitude: None, longitude: None, source: Excel, original_address: Thành phố Thủ Đức, Hồ Chí Minh

### Some small methods

Inside the `GetLocation()` class there are also some small methods that you may need.

In [22]:
get_location.excel_get_location('10.883449, 106.781429')

address: Thành phố Thủ Đức, Hồ Chí Minh, province: Hồ Chí Minh, district: Thành phố Thủ Đức, ward: None, latitude: None, longitude: None, source: Excel, original_address: Thành phố Thủ Đức, Hồ Chí Minh

In [23]:
get_location.sqlite_get_location('21.0088528, 105.7439194')

address: Phường Tây Mỗ, Quận Nam Từ Liêm, Hà Nội, province: Hà Nội, district: Quận Nam Từ Liêm, ward: Phường Tây Mỗ, latitude: 21.0090732, longitude: 105.7423219, source: GeoPy (SQLite), original_address: Vinhomes Smart City, Phường Tây Mỗ, Quận Nam Từ Liêm, Thành phố Hà Nội, Việt Nam

In [24]:
get_location.geopy_get_location('phu quoc kien giang')

address: Thành phố Phú Quốc, Kiên Giang, province: Kiên Giang, district: Thành phố Phú Quốc, ward: None, latitude: 10.2153093, longitude: 103.9880443, source: GeoPy, original_address: Thành phố Phú Quốc, Tỉnh Kiên Giang, Việt Nam

In [25]:
get_location.google_get_location('Lai Châu, Lai Châu')

address: Thành phố Lai Châu, Lai Châu, province: Lai Châu, district: Thành phố Lai Châu, ward: None, latitude: 22.3862227, longitude: 103.4702631, source: Google, original_address: Lai Châu, Lai Chau, Vietnam

## Get valid district

This feature helps to search the district based on two input variables, province and district information.

In [26]:
district = get_location.get_valid_district(province='Hà Nội', search_term='21.0088528, 105.7439194')

Quận Nam Từ Liêm district match with Hà Nội province perfect! (GeoPy (SQLite))


In [27]:
district

'Quận Nam Từ Liêm'

### Work with dataframe

For example we have a DataFrame as follows.

In [28]:
df = pd.DataFrame({'province':['Bắc Giảng', 'Kien Giang', 'Thua Thien - Hue', 'Kiên Giang'],
                   'geo_district':['21.2586332292748, 105.976210111432',
                                   '10.1487906083935, 103.998819739228',
                                   '16.3668339187152, 107.704997146754',
                                   '10.2262425, 103.9725849']})

In [29]:
df

Unnamed: 0,province,geo_district
0,Bắc Giảng,"21.2586332292748, 105.976210111432"
1,Kien Giang,"10.1487906083935, 103.998819739228"
2,Thua Thien - Hue,"16.3668339187152, 107.704997146754"
3,Kiên Giang,"10.2262425, 103.9725849"


As a first step, clean the `province` column.

In [30]:
df.province = df.province.apply(Correct().correct_province)

In [31]:
df

Unnamed: 0,province,geo_district
0,Bắc Giang,"21.2586332292748, 105.976210111432"
1,Kiên Giang,"10.1487906083935, 103.998819739228"
2,Thừa Thiên Huế,"16.3668339187152, 107.704997146754"
3,Kiên Giang,"10.2262425, 103.9725849"


Then use `apply` and `lambda` to create the `district` column.

In [32]:
df['district'] = df.apply(lambda x: get_location.get_valid_district(province=x.province, search_term=x.geo_district), axis=1)

Thành phố Bắc Giang district match with Bắc Giang province perfect! (GeoPy)
Thành phố Phú Quốc district match with Kiên Giang province perfect! (Google)
Thành phố Huế district match with Thừa Thiên Huế province perfect! (GeoPy)
Thành phố Phú Quốc district match with Kiên Giang province perfect! (Google)


In [33]:
df

Unnamed: 0,province,geo_district,district
0,Bắc Giang,"21.2586332292748, 105.976210111432",Thành phố Bắc Giang
1,Kiên Giang,"10.1487906083935, 103.998819739228",Thành phố Phú Quốc
2,Thừa Thiên Huế,"16.3668339187152, 107.704997146754",Thành phố Huế
3,Kiên Giang,"10.2262425, 103.9725849",Thành phố Phú Quốc
