Skip to content

Commit

Permalink
update notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
GreatYYX committed May 1, 2017
1 parent ddcebfc commit 2b4bd44
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 6 deletions.
2 changes: 1 addition & 1 deletion docs/notebook_basic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"```\n",
"git clone https://github.com/usc-isi-i2/rltk.git\n",
"```\n",
"2. Create virtual environment (conda should be installed) and install dependencies.\n",
"2. Create virtual environment and install dependencies ([Conda](https://github.com/conda/conda) should be installed).\n",
"```\n",
"conda-env create .\n",
"source activate rltk_env\n",
Expand Down
17 changes: 12 additions & 5 deletions docs/notebook_linkage.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"source": [
"# Record Linkage\n",
"\n",
"Here, let me show you how to do link the actors name from Princeton University Art Museum (PUAM) to Getty Union List of Artist Names (ULAN)."
"Here, let me show you how to link the actor names from Princeton University Art Museum (PUAM) to Getty Union List of Artist Names (ULAN)."
]
},
{
Expand Down Expand Up @@ -40,10 +40,10 @@
"source": [
"## Prepare data\n",
"\n",
"First and the most important step is preparing data. Except the two candidate datasets (format in json_line/csv/text), you need manually mark some postive and negative pairs of these two datasets. Here, [labeled_puam.jsonl](../examples/puam/labeled_100.jsonl) is a 100 lines labeled paris.\n",
"First and the most important step is preparing data. Besides of the two candidate datasets (format in json_line/csv/text), you need manually mark some postive and negative pairs of these two datasets. Here, [labeled_puam.jsonl](../examples/puam/labeled_100.jsonl) is a 100 lines labeled paris.\n",
"\n",
"## Get file iterator of datasets\n",
"Candidate sets will be streamed as FileIterator in RLTK."
"Candidate sets should be streamed as FileIterator in RLTK."
]
},
{
Expand Down Expand Up @@ -167,7 +167,7 @@
"editable": true
},
"source": [
"For testing purpose, I create a [blocking_100.jsonl](../examples/puam/blocking_100.jsonl) file which contains first 100 lines of object from `blocking.jsonl`.\n",
"For testing purpose, I pick out the first 100 lines of object from the output file `blocking.jsonl` and named it to [blocking_100.jsonl](../examples/puam/blocking_100.jsonl).\n",
"\n",
"## Compute vectors and make prediction\n",
"\n",
Expand Down Expand Up @@ -202,6 +202,13 @@
"source": [
"tk.predict(model, feature_path='feature.jsonl', predict_output_path='predicted.jsonl')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, you get the the predicted result of linkage in [predicted.jsonl](../examples/puam/predicted.jsonl)."
]
}
],
"metadata": {
Expand All @@ -224,5 +231,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 0
}

0 comments on commit 2b4bd44

Please sign in to comment.