Skip to content

Commit d8ee771

Browse files
Import the quickstart_package.zip contents and provided datasets
0 parents  commit d8ee771

13 files changed

Lines changed: 344591 additions & 0 deletions

File tree

README.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# SIGMOD20 Programming Contest | Quick start package for Entity Resolution
2+
3+
The Quick Start Package for Entity Resolution could be used as a starting code base for the SIGMOD 2020 Programming Contest.
4+
5+
## Prerequisites
6+
7+
- Python 3.*
8+
- Pip
9+
10+
## Installing
11+
12+
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
13+
14+
Install virtualenv to create an isolated environment:
15+
```
16+
$ pip install virtualenv
17+
```
18+
19+
Create a new virtual environment named venv and activate it:
20+
```
21+
$ virtualenv venv
22+
$ source venv/bin/activate
23+
```
24+
25+
N.B. Usage of virtualenv is not mandatory but recommended.
26+
27+
Install the requirements:
28+
29+
```
30+
$ pip install -r requirements.txt
31+
```
32+
33+
## Running
34+
35+
Run the project:
36+
```
37+
$ python main.py
38+
```
39+
40+
This command will produce a CSV file (the submission) in the output directory ("outputh_path") and will print intermediate results in the shell.
41+
42+
N.B. The program will be executed on a mock dataset. If you want to change dataset just edit the value of the "dataset_path" variable.

dataset/large_labelled_data

Lines changed: 297652 additions & 0 deletions
Large diffs are not rendered by default.

dataset/medium_labelled_data_v1

Lines changed: 46666 additions & 0 deletions
Large diffs are not rendered by default.

dataset/sigmod_dataset_specs

8.13 MB
Binary file not shown.

dataset/www.sourceA.com/1.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"<page title>": "Panasonic Lumix DMC-XS1 16.1 Megapixel Compact Camera | Buy at sourceA.com",
3+
"brand": "Panasonic",
4+
"resolution": "16 MPX",
5+
"optical zoom": "5x"
6+
}

dataset/www.sourceA.com/2.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"<page title>": "Nikon Coolpix S3600 Digital Camera (Silver) - Buy at sourceA.com",
3+
"producer": "Nikon Coolpix",
4+
"total pixels": "20",
5+
"battery": "Battery included, 2 x Li-Ion"
6+
}

dataset/www.sourceA.com/3.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"<page title>": "Nikkor Lenses pack + Panasonic Lumix DMC-XS1 - Buy at sourceA.com",
3+
"producer": "Nikon Coolpix",
4+
"total pixels": "20",
5+
"battery": "Battery included, 2 x Li-Ion"
6+
}

dataset/www.sourceB.com/1.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"<page title>": "Panasonic DMC XS1 Camera | Pict Bridge - 1280 x 720 video | Price comparison offers on sourceB.com",
3+
"manufacturer": "Panasonic",
4+
"resolution": "16.1 megapixels",
5+
"zoom": "8x",
6+
"battery": "Li-Ion rechargeable battery"
7+
}

dataset/www.sourceB.com/2.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"<page title>": "Nikon Coolpix P900 White Camera Bundle Pack, 16 Megapixel, GPS, GLONASS | Price comparison offers on sourceB.com",
3+
"manufacturer": "Nikon",
4+
"lcd screen size": "3''",
5+
"wi-fi": "yes"
6+
}

dataset/www.sourceB.com/3.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"<page title>": "Panasonic DMC TZ80EG-K Camera | Pict Bridge - 1280 x 720 video | Price comparison offers on sourceB.com",
3+
"manufacturer": "Panasonic",
4+
"resolution": "16.1 megapixels",
5+
"zoom": "8x",
6+
"battery": "Li-Ion"
7+
}

0 commit comments

Comments
 (0)