GazPNE2

Introduction

We present a robust and general place name extraction method from tweet texts, named GazPNE2. It fuses deep learning, global gazetteers (i.e., OpenStreetMap and GeoNames), and pretrained transformer models (i.e., BERT and BERTweet), requiring no manually annotated data. It can extract place names at both coarse (e.g., country and city) and fine-grained (e.g., street and creek) levels and place names with abbreviations (e.g., ‘tx’ for ‘Texas’ and ‘studemont rd’ for ‘studemont road’).

Test Data

The data we used to evaluate our approach is as follows:

Result

Use the code

Prepare model data

Download the trained model and unzip the files into the model folder.

Install

Java and Python 3.7 is required

conda create -n gazpne2 python=3.7

conda activate gazpne2

pip install -r requirements.txt

Download pretrained BERTweet model

wget https://public.vinai.io/BERTweet_base_fairseq.tar.gz

tar -xzvf BERTweet_base_fairseq.tar.gz

In the first run, the pretrained BERT models will be automaticlly downloaded and cached on the local drive.

Test the code

A snippet of example code is as below.

from main import GazPNE2
gazpne2=GazPNE2() # This will take around 30 seconds to load models
tweets = ["Associates at the Kuykendahl Rd & Louetta Rd. store in Spring, TX gave our customers a reason to smile",\
"Rockport TX any photos of damage down Corpus Christi Street and Hwy 35 area? #houstonflood"]
# It is faster to input multiple tweets at once than one single tweet mutiple times. 
locations = gazpne2.extract_location(tweets)
print(locations)
'''This will output:
{0: [{'LOC': 'Kuykendahl Rd', 'offset': (18, 30)}, {'LOC': 'Louetta Rd', 'offset': (34, 43)},
{'LOC': 'Spring', 'offset': (55, 60)}, {'LOC': 'TX', 'offset': (63, 64)}], 
1: [{'LOC': 'Corpus Christi Street', 'offset': (38, 58)}, {'LOC': 'Hwy 35', 'offset': (64, 69)},
{'LOC': 'Rockport', 'offset': (0, 7)}, {'LOC': 'TX', 'offset': (9, 10)}, {'LOC': 'houston', 'offset': (78, 84)}]}
'''

Execute the command below in case of a jave error.

spack load openjdk

To extract locations from txt file, execute the following command. In the txt file, each line corresponds to a tweet message.

python -u main.py --input=0 --input_file=data/test.txt

To test our manually annotated datasets (3000 tweets), execute the following command.

python -u main.py --input=2

To test public datasets (19), execute the following command. You will get the result of partial datasets since some are not publicly available.

python -u main.py --input=4

datasets [a,b,c] can be obtained from https://rebrand.ly/LocationsDataset.

datasets [e,f] can be obtained from https://revealproject.eu/geoparse-benchmark-open-dataset/.

datasets [g,h] can be obtained by contacting the author of the data.

Citation

If you use the code, please cite the following publication:

@article{hu2022gazpne2,
  title={GazPNE2: A general place name extractor for microblogs fusing gazetteers and pretrained transformer models},
  author={Hu, Xuke and Zhou, Zhiyong and Sun, Yeran and Kersten, Jens and Klan, Friederike and Fan, Hongchao and Wiegmann, Matti},
  journal={IEEE Internet of Things Journal},
  volume={9},
  number={17},
  pages={16259--16271},
  year={2022},
  publisher={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
data		data
experiments		experiments
figure		figure
model		model
BertEmbeds.py		BertEmbeds.py
CMUTweetTagger.py		CMUTweetTagger.py
LICENSE		LICENSE
Model.py		Model.py
README.md		README.md
SentWrapper.py		SentWrapper.py
Twokenize.py		Twokenize.py
ark-tweet-nlp-0.3.2.jar		ark-tweet-nlp-0.3.2.jar
core.py		core.py
dist_v2.py		dist_v2.py
emoticons.py		emoticons.py
main.py		main.py
main_NER.py		main_NER.py
place_tagger.py		place_tagger.py
requirements.txt		requirements.txt
result.png		result.png
rules.py		rules.py
utility.py		utility.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GazPNE2

Introduction

Test Data

Result

Use the code

Prepare model data

Install

Download pretrained BERTweet model

Test the code

Citation

About

Releases

Packages

Languages

License

uhuohuy/GazPNE2

Folders and files

Latest commit

History

Repository files navigation

GazPNE2

Introduction

Test Data

Result

Use the code

Prepare model data

Install

Download pretrained BERTweet model

Test the code

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages