Heri-Graphs: A Dataset Creation Framework for Multi-modal Machine Learning on Graphs of Heritage Values and Attributes with Social Media
This is the Code and Dataset for the Paper 'Heri-Graphs: A Dataset Creation Framework for Multi-modal Machine Learning on Graphs of Heritage Values and Attributes with Social Media' published in ISPRS International Journal of Geo-Information showing the collection, preprocessing, and rearrangement of data related to Heritage values and attributes in three cities that have canal-related UNESCO World Heritage properties: Venice, Suzhou, and Amsterdam.
Bai, N., Nourian, P., Luo, R., & Pereira Roders, A. (2022). Heri-Graphs: A Dataset Creation Framework for Multi-Modal Machine Learning on Graphs of Heritage Values and Attributes with Social Media. ISPRS International Journal of Geo-Information, 11(9), 469. MDPI AG. Retrieved from http://dx.doi.org/10.3390/ijgi11090469
@article{Bai2022HeriGraphs,
title={Heri-Graphs: A Dataset Creation Framework for Multi-Modal Machine Learning on Graphs of Heritage Values and Attributes with Social Media},
volume={11},
ISSN={2220-9964},
url={http://dx.doi.org/10.3390/ijgi11090469},
DOI={10.3390/ijgi11090469},
number={9},
journal={ISPRS International Journal of Geo-Information},
publisher={MDPI AG},
author={Bai, Nan and Nourian, Pirouz and Luo, Renqian and Pereira Roders, Ana},
year={2022},
month={Aug},
pages={469} }
or
Nan, Bai, Pirouz, Nourian, Renqian, Luo, & Ana, Pereira Roders. (2022, May 17). Heri_Graphs: arXiv supplementary material for HeriGraph (v1.1). Zenodo. https://doi.org/10.5281/zenodo.6556244
@software{nan_bai_2022,
author = {Nan, Bai,
Pirouz, Nourian,
Renqian, Luo, and
Ana, Pereira Roders},
title = {Heri\_Graphs: arXiv supplementary material for HeriGraph},
month = may,
year = 2022,
publisher = {Zenodo},
version = {v1.1},
doi = {10.5281/zenodo.6556244},
url = {https://doi.org/10.5281/zenodo.6556244}
}
Dataset (numpy) Summary (updated 19 September 2023)
The following sections about the workflow can be skipped for those who only intend to use the provided datasets.
deep_translator == 1.7.0
facenet_pytorch == 2.5.2
fastai == 2.5.3
flickrapi == 2.4.0
matplotlib == 3.5.1
networkx == 2.6.3
numpy == 1.22.2
opencv-python == 4.5.5.62
osmnx == 1.1.2
pandas == 1.4.0
pillow == 9.0.1
places365 (please download the repository places365
and put under the root as ./places365
)
scipy == 1.8.0
scikit-learn == 1.0.2
torch == 1.10.2+cu113
torchvision == 0.11.3+cu113
transformers == 4.16.2
WHOSe_Heritage (please download the repository WHOSe_Heritage
and put under the root as ./WHOSe_Heritage
)
This project provides a workflow to to construct graph-based multi-modal datasets HeriGraph concerning heritage values and attributes using data from social media platform Flickr. The workflow is illustrated as follows:
To protect the privacy and copyright of Flickr users, only the final processed (stored) datasets (thus no raw images) will be provided in this repository.
The users are invited to collect and construct datasets of the provided case study cities or any other new [city]
for their own interests.
Three cities related to UNESCO World Heritage and Historic Urban Landscape were selected as case studies: Amsterdam, the Netherlands (Seventeenth-Century Canal Ring Area of Amsterdam inside the Singelgracht), Suzhou, China (Classical Gardens of Suzhou), and Venice, Italy (Venice and its Lagoon).
The data of each case study has been put in a different folder, such as: ./Amsterdam
, ./Suzhou
, and ./Venezia
.
Without further explanation, all the codes and data introduced below will be coresponding to and stored in the respetive folder.
For constructing your own dataset
with any other [city]
, build an individual folder ./[city]
, and record the GEO-locations [city_lat]
, [city_lon]
and diameter [city_radius]
of the demanded area.
Case Study City | World Heritage Name | Latitude | Longitude | Diameter |
---|---|---|---|---|
Amsterdam (AMS) | Seventeenth-Century Canal Ring Area of Amsterdam inside the Singelgracht | 52.365000N | 4.887778E | 2 km |
Suzhou (SUZ) | Classical Gardens of Suzhou | 31.302300N | 120.631300E | 5 km |
Venice (VEN) | Venice and its Lagoon | 45.438759N | 12.327145E | 5 km |
[city] | World Heritage status of [city] | [city_lat] | [city_lon] | [city_radius] |
As the final outcome of this project, datasets for multi-modal machine learning on multi-graphs are provided for each [city]
.
The components of the datasets are respectively saved in ./dataset/[city]/
, ready to be used for multiple tasks.
The merging and saving of datasets could be found following ./Dataset_Saving.ipynb
.
File Name | Column Size | Description | Notation |
---|---|---|---|
Visual_Features.csv | 984 | Visual Features extracted | Xvis |
Textual_Features.csv | 776 | Textual Features extracted | Xtex |
Value_Labels.csv | 26 | Soft and Hard Labels for Heritage Values together with confidence scores | YHV|KHV |
Attribute_Labels.csv | 19 | Soft and Hard Labels for Heritage Attributes together with confidence scores | YHA|KHA |
Edge_List.csv | 18 | Adjacency information of Multi-graphs with three types of links (currently only available on Google Drive) | A, ATEM, ASOC, ASPA |
The complete processed dataset could be obtained through the following Google Drive Link.
(updated 19 September 2023)
A numpy array version of the final dataset (currently only available for Venice) is available under ./dataset_np/[city]/
, which is more efficient and compact than the csv version, especially for the link data as it uses the Scipy sparse matrices.
The saving process of numpy version dataset could be found following ./Dataset_Saving_np.ipynb
, ./HeriGraph_Construction_Venezia(sparse)_np.ipynb
and ./HeriGraph_Construction_Venice_Large(sparse)_np.ipynb
.
File Name | Array Size | Description | Notation |
---|---|---|---|
Visual_Features.npy | (*, 984) | Visual Features extracted | Xvis |
Textual_Features.npy | (*, 776) | Textual Features extracted | Xtex |
labels.npz | - | Values and Attributes combined, with respective keys "VAL_LAB" and "ATT_LAB" | YHV|KHV, YHA|KHA |
node_types.npy | (*, ) | A boolean variable indicating if the sample only has visual features and no textual features (0) or has both features (1) | |
train_val_test_idx.npz | - | Three arrays of indices indicating the training set, validation set, and test set for future tasks, with respective keys "train", "val", "test" | |
A_simp.npz | - | Sparse matrix marking the links of the simplified composed graph | A |
A_SOC.npz | - | Sparse matrix marking the weights of social links | ASOC |
A_SPA.npz | - | Sparse matrix marking the weights of social links | ASPA |
A_TEM.npz | - | Sparse matrix marking the weights of social links | ATEM |
To download the numpy version of Venice directly, use this Google Drive link.
To download the numpy version of Venice-XL directly, use this Google Drive Link.
For loading the numpy version of dataset in Pytorch-Geometric library, check the upcoming repository Stones_Venice in a follow-up project.
Apply for your own API key from Flickr APP Garden, and save the [api_key]
and [api_secret]
for later usage of API whenever requested.
The code to download raw data as IDs of Flickr posts and to save images are given in ./[city]/save_image.py
.
Input the respective [api_key]
, [api_secret]
,[city_lat]
, [city_lon]
, and [city_radius]
to run the code.
A restriction of maximum 5000
IDs has been given to the API to keep datasets comparable to each other.
The downloaded metadata will be saved as ./[city]/data_storage/images/photos_sizes.csv
, and the images of which the owner allowed to download with candownload==True
flag will be saved in ./[city]/data_storage/images/150/
and ./[city]/data_storage/images/320/
, respectively, for the Large Square - url_q
(150×150 px) and Small 320 - url_n
(320×240 px) versions of the original image.
To collect large datasets without the restriction of 5000
IDs, follow ./Venezia/collect_data.py
to save all the IDs and metadata, and follow ./Venezia/save_image_all.py
to download the images in the folder.
Input the respective [api_key]
, [api_secret]
, the range of minumum and maximum
[city_lat]
and [city_lon]
as bounding box of the region, the size of the grid (default 20
), and radius of inquiry in the grid (default 0.3
) to run the code.
The IDs will be collected in a 20 by 20 grid with the name of ./[city]/data_storage/photo_ids_{}_{}.csv
, while the summarized metadata will be saved in ./[city]/data_storage/photos_last.csv
.
All the saved images will be stored in the folder ./[city]/data_storage/images/grid/
with the Large Square - url_q
(150×150 px) version of the original image.
Note that Flickr API might return an error code during the data inquiry. Run the both codes interatively to continue collecting data until the total amount is satisfied.
The 512-dimensional vector of hidden visual features, 365-dimensional scene category predictions, and 102-dimensional scene attribute predictions could be obtained following ./Places_Prediction.ipynb
.
The results will be saved as ./[city]/data_storage/IMG_pred_150.csv
(150×150 px small images only), and ./[city]/data_storage/IMG_pred.csv
(images of both sizes for comparison of confidence and/or consistency).
The 3-dimensional vector of face prediction in images could be obtained following ./Face_Detection_in_Images.ipynb
.
The results will be saved as ./[city]/data_storage/Face_preds.csv
.
The final merged visual features data (982-dimensional) are provided in ./dataset/[city]/Visual_Features.csv
, which is effectively a 984-column table.
Column Index | Name | Description | Data Type | Notation |
---|---|---|---|---|
0 | ID | Unique Image Index from Flickr | String | - |
1 | IO_Type | Indoor/Outdoor Scene | String | - |
2-513 | Vis_Feat_[i] | Last 512-dimensional Hidden Layer of ResNet-18 pretrained on PlacesCNN as Visual Feature | Float | Hv |
514-516 | Face_[*] | Number of faces, confidence of face prediction, proportion of faces in the image | Float | F |
517-881 | SCE_[*] | Smoothened/Filtered 365-dimensional scene category prediction Logit | Float | σ(5)(Ls) |
882-983 | ATT_[*] | Smoothened/Filtered 102-dimensional scene attribute prediction Logit | Float | σ(10)(La) |
The data cleaning of textual data, and the 3-dimensional vector of original language of posts could be obtained following ./Dataset_Cleaning_and_Merging_[city].ipynb
.
The results will be saved as ./[city]/data_storage/metadata.csv
in post level and ./[city]/data_storage/sentences.csv
in sentence level.
The 768-dimensional vector of BERT [CLS] token could be obtained following ./bert_inference_HeriGraph.ipynb
.
The results will be saved as ./[city]/data_storage/metadata_bert.csv
in post level and ./[city]/data_storage/sentences_bert.csv
in sentence level.
The final merged textual features data (771-dimensional) are provided in ./dataset/[city]/Textual_Features.csv
, which is effectively a 776-column table.
Column Index | Name | Description | Data Type | Notation |
---|---|---|---|---|
0 | index | Unique Image Index from Flickr | String | - |
1 | text_bool | Whether the original post has a valid textual data (as a filter) | Boolean | - |
2 | revised_text | The processed and filtered textual data of the post as combination of description, title, and tags. | String | S |
3-4 | num_sent/ text_len | Number of sentences and number of words in the revised text | Integer | - |
5-772 | BERT_[i] | The 768-dimensional output vector of [CLS] token | Float | HB |
773-775 | English/ Local_Lang/ Other_Lang | Detected original language in the posts | Boolean | O |
The temporal features about the timestamps of the posts in their unique week counts could be obtained following ./Dataset_Cleaning_and_Merging_[city].ipynb
.
The results will be saved in ./[city]/data_storage/metadata.csv
.
The social features about the social relations of the post owners could be obtained following ./Social_Links_of_Interests.ipynb
.
Input the [api_key]
and [api_secret]
to activate the queries of the public contacts and public groups of the Flickr users.
The information will be respectively saved as ./[city]/data_storage/contacts.csv
, ./[city]/data_storage/interest.csv
, and ./[city]/data_storage/friendship.csv
, while the final merged social information is saved as ./[city]/data_storage/social_links.csv
.
The spatial features about the locations of the posts and their connectivity in geographical network could be obtained following ./Geographical_Graph_Construction.ipynb
.
Input the respective [city_lat]
, [city_lon]
, and [city_radius]
to run the code.
The spatial network information will be saved respectively as ./[city]/data_storage/GEO_nodes.csv
showing the intersections in spatial network, ./[city]/data_storage/GEO_edges.csv
showing the connectivity of spatial nodes with travel time information, and ./[city]/data_storage/GEO_node_dist.csv
showing the travel time between any two nodes.
The geo-node assigned to each post will be recorded in ./[city]/data_storage/GEO_metadata.csv
.
This project applied the heritage value definition in UNESCO WHL with regard to ten Outstanding Universal Value selection criteria plus one additional "other" class, which is introduced and trained in WHOSe_Heritage.
The predicted labels on heritage values by BERT could be obtained following ./bert_inference_HeriGraph.ipynb
.
The results will be saved as ./[city]/data_storage/metadata_bert.csv
in post level and ./[city]/data_storage/sentences_bert.csv
in sentence level.
The predicted labels on heritage values by ULMFiT could be obtained following ./ulmfit_inference_HeriGraph.ipynb
.
The results will be saved as ./[city]/data_storage/metadata_ulmfit.csv
in post level and ./[city]/data_storage/sentences_ulmfit.csv
in sentence level.
The comparison of the both models for performance, coherence, and consistency on both post level and sentence level could be obtained following ./Diagram_Values.ipynb
.
The final merged heritage value label data (11-dimensional) are provided in ./dataset/[city]/Value_Labels.csv
, which is effectively a 26-column table.
A sample is considered as labelled
if the average top-3 confidence of both BERT and UMLFiT models is larger than 0.75
and the Jaccard Index of such top-3 predictions is larger than 0.5
.
This leads to around 40-50%
texual samples as labelled (thus around 10-35%
of all data samples in each city).
Users are invited to adjust the thresholds of labelled data to experiment on the effects.
Column Index | Name | Description | Data Type | Notation |
---|---|---|---|---|
0 | index | Unique Image Index from Flickr | String | - |
1 | text_bool | Whether the original post has a valid textual data (as a filter) | Boolean | - |
2-12 | Criteria_[i]/ Others | The average predicted soft label of post text concerning heritage values in terms of OUV. | Float | YHV |
13-18 | max_[i]_val/ max_[i]_col | The predicted hard top-3 labels of heritage values | Float/ String | - |
19-20 | max_[i] | The top-k confidence of averaged soft label prediction | Float | - |
21-22 | conf_[i] | The average model confidence of BERT and ULMFiT for their top-k predictions | Float | κHV(0) |
23-24 | same_[i] | The model agreement/consistency of BERT and ULMFiT for their top-k predictions in terms of Jaccard Index | Float/ Boolean | κHV(1) |
25 | labelled | Whether the sample should be considered as "pseudo-labelled" data | Boolean | - |
A demo of labelled heritage values can be seen with the following diagram:
This project applied the heritage definition by Veldpaus (2015) and Ginzarly et al. (2019), keeping a nine-class category of depicted scenery of an image.
A few models have been trained on the data presented by Ginzarly et al. (2019) in Tripoli, Lebanon.
The training process together with hyper-parameter tuning with grid search cross validation with scikit-learn library could be found in ./Machine_Learning_Models_on_Heritage_Attributes_Tripoli.ipynb
.
The trained ensemble VOTE and STACK classification models are saved in the folder ./Tripoli/model_storage/
respectively under ./Tripoli/model_storage/vote_classifier.joblib
and ./Tripoli/model_storage/stack_classifier.joblib
.
The predicted labels on heritage attributes by both classifiers could be obtained following ./Machine_Learning_Models_on_Heritage_Attributes_Tripoli.ipynb
.
The results will be saved as ./[city]/data_storage/IMG_pred_150_cat.csv
.
The comparison of the both models for performance, coherence, and consistency could be obtained following ./Diagram_Attributes.ipynb
.
The final merged heritage attribute label data (9-dimensional) are provided in ./dataset/[city]/Attribute_Labels.csv
, which is effectively a 19-column table.
A sample is considered as labelled
if the average top-1 confidence of both VOTE and STACK models is larger than 0.7
and the top-1 predictions is same
.
This leads to around 35-50%
samples as labelled.
Users are invited to adjust the thresholds of labelled data to experiment on the effects.
Column Index | Name | Description | Data Type | Notation |
---|---|---|---|---|
0 | ID | Unique Image Index from Flickr | String | - |
1-9 | [various names] | The average predicted soft label of post image concerning heritage attributes in terms of depicted scenes. | Float | YHA |
10-11 | category[-/_id] | The predicted hard top-1 labels of heritage attributes | String | - |
12-15 | category/ cat_id_[model] | The top-1 hard label prediction of VOTE and STACK models | String | - |
16 | conf | The average model confidence of VOTE and STACK for their top-1 predictions | Float | κHA(0) |
17 | category_same | The model agreement/consistency of VOTE and STACK for their top-1 predictions | Boolean | κHA(1) |
18 | labelled | Whether the sample should be considered as "pseudo-labelled" data | Boolean | - |
A demo of labelled heritage attributes can be seen with the following diagram:
The graph construction process for the Multi-Graphs, the three subgraphs with Temporal, Social, and Spatial links, as well as the simple composed graphs could be obtained following ./HeriGraph_Construction_[city].ipynb
.
The Edge Lists that could be directly used by NetworkX or other softwares to construct graphs are provided in ./dataset/[city]/Edge_List.csv
provided in the zip
file shared on Google Drive, which is effectively a 16-column table.
The columns of [Temporal/Social/Spatial]_Similarity
are the edge weight for each type of subgraphs, and the column One_Edge
is the adjacency indicator for the composed simple graph.
Column Index | Name | Description | Data Type | Notation |
---|---|---|---|---|
0 | 0 | Unique Image Index from Flickr for Node 0 | String | v0 |
1 | 1 | Unique Image Index from Flickr for Node 1 | String | v1 |
2-3 | Week_[i] | The timestamp ID in week level | String | ti |
4 | dist | The temporal distance of two nodes | String | - |
5 | Temporal_Similarity | The edge weight of temporal links | Float | ATEM |
6-7 | User_[i] | The user ID in Flickr | String | ui |
8 | relationship | The strength level of relationship of two users | Integer | - |
9 | Social_Similarity | The edge weight of social links | Float | ASOC |
10-11 | GEO_[i] | The GEO-location ID in the spatial network | String | υi |
12 | geo_distance | The spatial distance of two nodes in terms of travel time | Float | we |
13 | Spatial_Similarity | The edge weight of spatial links | Float | ASPA |
14 | One_Edge | The adjacency indicator for the composed simple graph | Boolean | A |
15 | Same_Node | Whether the two nodes are the same one | Boolean | - |
The statistics of generated graphs following the standard of PyTorch-Geometric could be found in the following table:
Name | #graphs/ subgraphs | #nodes | #edges | #features | #classes/ tasks |
---|---|---|---|---|---|
HeriGraph | 3 | ~3271.7 | ~907,393.3 | - | - |
└─Amsterdam | 3+1 | 3727 | 1,271,171 | 1753 | 11+9 |
└─Suzhou | 3+1 | 3137 | 916,496 | 1753 | 11+9 |
└─Venice | 3+1 | 2951 | 534,513 | 1753 | 11+9 |
This project applied the pretrained models of the following projects which are openly released on GitHub or published as python packages. Part of the codes are adpated from the original codes.
Face Recognition Using Pytorch
The workflows and datasets of this paper can be used under the Creative Common License (Attribution CC BY 4.0). Please give appropriate credit, such as providing a link to our paper or to this github repository. The copyright of all the downloaded and processed images belongs to the image owners.