Skip to content

Open Data Tool User Guide

irwink edited this page Jan 3, 2020 · 17 revisions

Version 1.0 December 20, 2013

Contents

Introduction

[Installation] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-installation)

[Uninstall] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-uninstall)

[Using the WPSS Open Data Tool] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-using-the-wpss-open-data-tool)

[Profiles] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-profiles)

[Results Window] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-results-window)

[Command Line Interface] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-command-line-interface)

[Troubleshooting] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-troubleshooting)

[Test Cases] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-test-cases)


Introduction

The PWGSC WPSS Open Data Tool provides a method to perform tests on datasets and dictionaries being submitted for the Open Data Registry to ensure they are compliant with Open Data standards. The WPSS Open Data Tool reviews all of the documents and analyses each one for Treasury Board compliance.

Tool Limitations

The output of the Open Data Tool is in English only. The tool is using third party software components. The source of these components is available only in the language that it was authored.

Installation

The WPSS Open Data Tool requires a Perl distribution installed on the workstation, and .pl files are associated with the Perl interpreter. The WPSS Open Data Tool has been tested with Strawberry Perl 5.18.1 and ActivePerl 5.14. Other versions of Strawberry Perl or ActivePerl, or other Perl installations may not work as expected or may be missing required modules.

System Requirements

  • Microsoft Windows XP, 7, 1-,
  • Java runtime environment 1.8.0 (other versions may not work),
  • Python version 2.7.6,
  • Strawberry Perl (32 bit) 5.18.1 or newer (does not work with Perl 5.25 or later),
  • Only one installation of Perl on the system. Multiple installations may cause problems.

If you do not have the Perl or Python, you will need to install them manually. You can find installations at:

For the Perl and Python installs, accept the default settings during the installation process.

Top of Page

Remove Existing .pl File Associations

Before installing the WPSS Open Data Tool, remove any existing .pl file type associations. The installation of Strawberry Perl will create a new association to ensure the proper execution of Perl applications.

To remove the .pl file association:

  1. Go to Start > Settings > Control Panel.
  2. Click Folder Options.
  3. Click the File Types tab.
  4. In the Registered file types list, locate and click the .pl entry.
  5. Click Delete.
  6. Click OK.

Installing the WPSS Open Data Tool

To install the WPSS Open Data Tool, double-click the WPSS_Install.exe file and follow the instructions on the screen.

WPSS Validation Tool Install Icon

The default installation folder for the WPSS Validation Tool is C:\Program Files\WPSS_Tool.

Uninstall

To remove the WPSS Open Data Tool from a workstation, run the uninstall script.

To uninstall the Open Data Tool:

Go to Start > Programs > WPSS_Tool > Uninstall.

Uninstall path

Top of Page

Uninstalling Perl

To remove the Perl installation:

  1. Go to Start > Settings > Control Panel.
  2. Click Add or Remove Programs.
  3. Locate the Strawberry installation and click Remove.

Uninstalling Python

To remove the Perl installation:

  1. Go to Start > Settings > Control Panel.
  2. Click Add or Remove Programs.
  3. Locate the Python installation and click Remove.

Using the WPSS Open Data Tool

To start the PWGSC WPSS Open Data Tool:

  1. Go to Start > Programs > WPSS_Tool.
  2. Click the Open_Data_Tool icon.

Alternatively, using Windows Explorer, navigate to the C:\Program Files\WPSS_Tool folder and double click the open_data_tool.pl file.

The main window consists of two tabs:

  • Open Data is for entering the dictionary, data and resource file references for review.
  • Configuration is for configuring the WPSS Open Data Tool profile.

Top of Page

Open Data Tab

To analyse the documents identified for the Open Data Registry, enter the URLs for the dictionary, data and resource files into the WPSS Open Data Tool. Use the Open Data tab to enter this information.

Open Data tab

Configuration Tab

The Configuration tab enables you to select the option for the analysis. There is currently only one profile to select.

Configuration tab

Top of Page

Profiles

You can save the open data details in a configuration profile for sharing, or easy access if you want to use the WPSS Open Data Tool again.

To save the data profile:

  1. Go to File > Save Open Data Config.
  2. Select a folder and file name for the configuration file.
  3. Click OK.

To best manage the configuration files, it is suggested that you save the configuration files to the C:\Program Files\WPSS_Tool\profiles folder.

You can load a previously saved site configuration. Loading a saved configuration file loads the Open Data tab fields. Once loaded, you can modify the information if required.

To open a saved profile:

  1. Go to File > Load Open Data Config.
  2. Locate the folder and file.
  3. Click OK.

Results Window

The Results Window includes three tabs containing the output of an individual analysis from the WPSS Open Data Tool. The output in each tab includes a header that lists information and the time and date when the analysis started.

Open Data results window

Crawled URLs tab – Provides a list of the URLs the WPSS Open Data Tool analysed. It lists the referrer page to indicate how the crawler reached a particular page. Use this tab to monitor the WPSS Open Data Tool to ensure it is actively crawling and analysing the files.

Open Data tab – Lists the documents that contain open data check violations.

Document List tab – The Open Data Tool writes information to this tab after completing the site analysis. It contains the sorted list of documents found and reviewed.

Top of Page

Stopping the Analysis

If you need to stop the analysis while it is running, in the Results Window, go to Options > Stop Crawl. This stops the WPSS Open Data Tool after processing the current document. The results include a note at the bottom of each output tab in the Results Window indicating that the analysis was aborted.

Reporting Passes and Fails

The default behaviour of the analysis tools is to report only URLs that fail checks. You can view results for both passes and fails. To see both passes and fails, in the WPSS Open Data Tool window, go to Options > Report Fails and Passes. The URL for documents that pass checks are recorded in the results output. To see only failed pages, go to Options > Report Fails Only.

Saving Results

To save the analysis results, in the Results Window, go to File > Save As. Select the file name and folder path in the file chooser dialog box. The results are stored in a number of files, one for each result tab. Each file name contains a suffix identifying the report type.

For example, you save the results in the file od_results, the actual files are:

  • od_results_crawl.txt
  • od_results_od.txt
  • od_results_urls.txt

It is suggested that you use the same name for the results as the data site profile name, and save the results in the \WPSS_Tool folder.

Command Line Interface

The WPSS Open Data Tool is available from the GUI interface and from the command prompt.

To access the command line version:

  1. Go to Start > Programs > Accessories > Command Prompt.

  2. Change to the Program Files\WPSS_Tool directory.

  3. Run the program open_data_tool.pl with the command:

     open_data_tool.pl –cli –o <profile file>
    

…where is the path to the data file containing the information to be analysed.

Top of Page

Status and Progress

As the command line WPSS Open Data Tool runs and analyses documents, the URLs of the documents appear in the console window. Use this to monitor the WPSS Open Data Tool to ensure it is actively crawling and analysing the data files.

Viewing Results

The results are stored in a number of files, one for each result tab. Each file name contains a suffix identifying the report type. For example, you save the results in the file od_results, the actual files are:

  • od_results_crawl.txt
  • od_results_od.txt
  • od_results_urls.txt

It is suggested that you use the same base name of the results as the data site profile name, and save the results in the \WPSS_Tool folder.

Language Switching

You can toggle the language of the analysis results using a command line option.

Option Result
-eng Analysis results are provided in English
-fra Analysis results are provided in French

If no language selection is made, the language is determined from the operating system.

Troubleshooting

Command Line Interpreter Error

The WPSS Open Data Tool uses a number of Perl modules. These modules may have errors that cause the program to fail. You may encounter the following message:

Perl error message

Message Perl Command Line Interpreter has encountered a problem and needs to close.
Cause An error in the supporting Perl modules.
Correction Restart the WPSS Open Data Tool; if the problem persists send an email to PlanificationWeb-WebPlanning@tpsgc-pwgsc.gc.ca.

Top of Page

500 Internal Server Error

When analysing data, the WPSS Open Data Tool may report a “500 Internal server error”. This may be due to a limitation on the Web server and not the WPSS Open Data Tool. The problem may be due to the Web server not handling the “Range” setting in the HTTP GET operation. The WPSS Open Data Tool sets a size limit on GET operations to avoid getting extremely large documents.

To avoid this error, you can change the WPSS Open Data Tool configuration file setting to not include the “Range” setting in a HTTP GET operation.

Using a simple text file editor, such as WordPad:

  1. Open the file c:\Program Files\wpss_tool\conf\wpss_tool.config.

  2. Locate the following lines in the file:

    # Max User Agent Size limits the size of files accepted in a GET
    # request. A value of 0 means we can accept documents of any size.
    # A value of 0 also removes the Range field from the HTTP header.
    #
    #User_Agent_Max_Size 0
    
  3. Remove the leading ‘#’ character from the #User_Agent_Max_Size 0.

This functionality is not available through the user interface, only by directly editing the configuration file.

Test Cases

This section describes the criteria the WPSS Open Data Tool uses to verify the content submitted for the Open Data initiative.

OD_1 - Open data URL

Description Open data URL.
Checks That the dataset file (dictionary, data and resource) exists.
Does Not Check That the dataset file is available to the public. For example, available on the Internet.

OD_2 - Character encoding requirements

Description Character encoding requirements.
Checks That the content of the dataset file is UTF-8 encoded.

OD_3 – Mark-up language requirements

Description Mark-up language requirements.
Checks The dataset dictionary file has an acceptable mime-type of TXT or XML.
The dataset file has an acceptable mime-type of CSV or XML.
The data files have content.
The CSV data files are well formed. That is, properly delimited and quoted.
The TXT dictionary files have content.
The XML dictionary files have content.
The XML data files are well formed with nested and closed tags.
Does Not Check That the XML dictionary files contain terms and definitions, such as no fixed schema or XML dictionary files.
That XML dictionary files contain duplicate terms.
That if multiple XML dictionary files are specified, that they all contain the same terms with no extra terms or missing term.

Top of Page

OD_CSV_1 – CSV Data

Description CSV Data.
Checks That CSV data files have the same number of fields in each row.
If at least 25% of the fields on the first row of a CSV data file matches the data dictionary terms, then checks to see if all fields match the dictionary terms.

OD_TXT_1 – TXT Dictionary

Description TXT Dictionary.
Checks The TXT dictionary files are coded such that terms are followed by one blank line before the definition.
The TXT dictionary files are coded such that the terms consist of exactly one line of text.
The TXT dictionary files with multiple terms have at least two blank lines between a definition and a subsequent term.
The TXT dictionary files do not have duplicate terms.
If multiple TXT dictionary files are specified that they all contain the same terms. That means no extra terms or missing terms.

OD_API_1 - Open data URL

Description Open data URL.
Checks If the API URL exists.
Does Not Check That the API URL is available to the public. For example, available on the Internet.

OD_API_2 - Character encoding requirements

Description Character encoding requirements.
Checks That the content of the dataset file is UTF-8 encoded.

Top of Page

OD_API_3 – Mark-up language requirements

Description Mark-up language requirements.
Checks If the API has an acceptable mime-type (JSON or XML).
That API has content.
That JSON content is well formed (i.e. properly delimited and quoted).
That XML content is well formed (i.e. nested and closed tags).

TP_PW_OD_CSV_1 – CSV Header Row

Description CSV header row.
Checks That CSV data files have a header row containing data dictionary terms.

Top of Page

Clone this wiki locally