-
Notifications
You must be signed in to change notification settings - Fork 18
Open Data Tool User Guide
Version 1.0 December 20, 2013
[Installation] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-installation)
-
[System Requirements] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-system- requirements)
-
[Remove Existing .pl File Associations] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-remove-existing-pl-file-associations)
-
[Installing the WPSS Open Data Tool] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-installing-the-wpss-open-data-tool)
[Uninstall] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-uninstall)
-
[Uninstalling Perl] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-uninstalling-perl)
-
[Uninstalling Python] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-uninstalling-python)
[Using the WPSS Open Data Tool] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-using-the-wpss-open-data-tool)
-
[Open Data Tab] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-open-data-tab)
-
[Configuration Tab] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-configuration-tab)
[Profiles] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-profiles)
[Results Window] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-results-window)
-
[Stopping the Analysis] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-stopping-the-analysis)
-
[Reporting Passes and Fails] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-reporting-passes-and-fails)
-
[Saving Results] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-saving-results)
[Command Line Interface] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-command-line-interface)
-
[Status and Progress] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-status-and-progress)
-
[Viewing Results] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-viewing-results)
-
[Language Switching] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-language-switching)
[Troubleshooting] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-troubleshooting)
-
[Perl Command Line Interpreter Error] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-perl-command-line-interpreter-error)
-
[500 Internal Server Error] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-500-internal-server-error)
[Test Cases] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-test-cases)
-
[OD_1 - Open data URL] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-od_1---open-data-url)
-
[OD_2 - Character encoding requirements] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-od_2---character-encoding-requirements)
-
[OD_3 – Mark-up language requirements] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-od_3--mark-up-language-requirements)
-
[OD_CSV_1 – CSV Data] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-od_csv_1--csv-data)
-
[OD_TXT_1 – TXT Dictionary] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-od_txt_1--txt-dictionary)
-
[OD_API_1 - Open data URL] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-od_api_1---open-data-url)
-
[OD_API_2 - Character encoding requirements] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-od_api_2---character-encoding-requirements)
-
[OD_API_3 – Mark-up language requirements] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-od_api_3--mark-up-language-requirements)
-
[TP_PW_OD_CSV_1 – CSV Header Row] (https://github.com/wet-boew/wet-boew-wpss/wiki/Open-Data-Tool-User-Guide#wiki-tp_pw_od_csv_1--csv-header-row)
The PWGSC WPSS Open Data Tool provides a method to perform tests on datasets and dictionaries being submitted for the Open Data Registry to ensure they are compliant with Open Data standards. The WPSS Open Data Tool reviews all of the documents and analyses each one for Treasury Board compliance.
The output of the Open Data Tool is in English only. The tool is using third party software components. The source of these components is available only in the language that it was authored.
The WPSS Open Data Tool requires a Perl distribution installed on the workstation, and .pl files are associated with the Perl interpreter. The WPSS Open Data Tool has been tested with Strawberry Perl 5.18.1 and ActivePerl 5.14. Other versions of Strawberry Perl or ActivePerl, or other Perl installations may not work as expected or may be missing required modules.
- Microsoft Windows XP, 7, 1-,
- Java runtime environment 1.8.0 (other versions may not work),
- Python version 2.7.6,
- Strawberry Perl (32 bit) 5.18.1 or newer (does not work with Perl 5.25 or later),
- Only one installation of Perl on the system. Multiple installations may cause problems.
If you do not have the Perl or Python, you will need to install them manually. You can find installations at:
- Python 2.7.6 available from http://python.org/ftp/python/2.7.6/python-2.7.6.msi
- Strawberry Perl (32 bit) available from http://strawberryperl.com/releases.html
For the Perl and Python installs, accept the default settings during the installation process.
Before installing the WPSS Open Data Tool, remove any existing .pl file type associations. The installation of Strawberry Perl will create a new association to ensure the proper execution of Perl applications.
To remove the .pl file association:
- Go to Start > Settings > Control Panel.
- Click Folder Options.
- Click the File Types tab.
- In the Registered file types list, locate and click the .pl entry.
- Click Delete.
- Click OK.
To install the WPSS Open Data Tool, double-click the WPSS_Install.exe file and follow the instructions on the screen.
The default installation folder for the WPSS Validation Tool is C:\Program Files\WPSS_Tool.
To remove the WPSS Open Data Tool from a workstation, run the uninstall script.
To uninstall the Open Data Tool:
Go to Start > Programs > WPSS_Tool > Uninstall.
To remove the Perl installation:
- Go to Start > Settings > Control Panel.
- Click Add or Remove Programs.
- Locate the Strawberry installation and click Remove.
To remove the Perl installation:
- Go to Start > Settings > Control Panel.
- Click Add or Remove Programs.
- Locate the Python installation and click Remove.
To start the PWGSC WPSS Open Data Tool:
- Go to Start > Programs > WPSS_Tool.
- Click the Open_Data_Tool icon.
Alternatively, using Windows Explorer, navigate to the C:\Program Files\WPSS_Tool folder and double click the open_data_tool.pl file.
The main window consists of two tabs:
- Open Data is for entering the dictionary, data and resource file references for review.
- Configuration is for configuring the WPSS Open Data Tool profile.
To analyse the documents identified for the Open Data Registry, enter the URLs for the dictionary, data and resource files into the WPSS Open Data Tool. Use the Open Data tab to enter this information.
The Configuration tab enables you to select the option for the analysis. There is currently only one profile to select.
You can save the open data details in a configuration profile for sharing, or easy access if you want to use the WPSS Open Data Tool again.
To save the data profile:
- Go to File > Save Open Data Config.
- Select a folder and file name for the configuration file.
- Click OK.
To best manage the configuration files, it is suggested that you save the configuration files to the C:\Program Files\WPSS_Tool\profiles folder.
You can load a previously saved site configuration. Loading a saved configuration file loads the Open Data tab fields. Once loaded, you can modify the information if required.
To open a saved profile:
- Go to File > Load Open Data Config.
- Locate the folder and file.
- Click OK.
The Results Window includes three tabs containing the output of an individual analysis from the WPSS Open Data Tool. The output in each tab includes a header that lists information and the time and date when the analysis started.
Crawled URLs tab – Provides a list of the URLs the WPSS Open Data Tool analysed. It lists the referrer page to indicate how the crawler reached a particular page. Use this tab to monitor the WPSS Open Data Tool to ensure it is actively crawling and analysing the files.
Open Data tab – Lists the documents that contain open data check violations.
Document List tab – The Open Data Tool writes information to this tab after completing the site analysis. It contains the sorted list of documents found and reviewed.
If you need to stop the analysis while it is running, in the Results Window, go to Options > Stop Crawl. This stops the WPSS Open Data Tool after processing the current document. The results include a note at the bottom of each output tab in the Results Window indicating that the analysis was aborted.
The default behaviour of the analysis tools is to report only URLs that fail checks. You can view results for both passes and fails. To see both passes and fails, in the WPSS Open Data Tool window, go to Options > Report Fails and Passes. The URL for documents that pass checks are recorded in the results output. To see only failed pages, go to Options > Report Fails Only.
To save the analysis results, in the Results Window, go to File > Save As. Select the file name and folder path in the file chooser dialog box. The results are stored in a number of files, one for each result tab. Each file name contains a suffix identifying the report type.
For example, you save the results in the file od_results, the actual files are:
- od_results_crawl.txt
- od_results_od.txt
- od_results_urls.txt
It is suggested that you use the same name for the results as the data site profile name, and save the results in the \WPSS_Tool folder.
The WPSS Open Data Tool is available from the GUI interface and from the command prompt.
To access the command line version:
-
Go to Start > Programs > Accessories > Command Prompt.
-
Change to the Program Files\WPSS_Tool directory.
-
Run the program open_data_tool.pl with the command:
open_data_tool.pl –cli –o <profile file>
…where is the path to the data file containing the information to be analysed.
As the command line WPSS Open Data Tool runs and analyses documents, the URLs of the documents appear in the console window. Use this to monitor the WPSS Open Data Tool to ensure it is actively crawling and analysing the data files.
The results are stored in a number of files, one for each result tab. Each file name contains a suffix identifying the report type. For example, you save the results in the file od_results, the actual files are:
- od_results_crawl.txt
- od_results_od.txt
- od_results_urls.txt
It is suggested that you use the same base name of the results as the data site profile name, and save the results in the \WPSS_Tool folder.
You can toggle the language of the analysis results using a command line option.
Option | Result |
---|---|
-eng | Analysis results are provided in English |
-fra | Analysis results are provided in French |
If no language selection is made, the language is determined from the operating system.
The WPSS Open Data Tool uses a number of Perl modules. These modules may have errors that cause the program to fail. You may encounter the following message:
Message | Perl Command Line Interpreter has encountered a problem and needs to close. |
Cause | An error in the supporting Perl modules. |
Correction | Restart the WPSS Open Data Tool; if the problem persists send an email to PlanificationWeb-WebPlanning@tpsgc-pwgsc.gc.ca. |
When analysing data, the WPSS Open Data Tool may report a “500 Internal server error”. This may be due to a limitation on the Web server and not the WPSS Open Data Tool. The problem may be due to the Web server not handling the “Range” setting in the HTTP GET operation. The WPSS Open Data Tool sets a size limit on GET operations to avoid getting extremely large documents.
To avoid this error, you can change the WPSS Open Data Tool configuration file setting to not include the “Range” setting in a HTTP GET operation.
Using a simple text file editor, such as WordPad:
-
Open the file c:\Program Files\wpss_tool\conf\wpss_tool.config.
-
Locate the following lines in the file:
# Max User Agent Size limits the size of files accepted in a GET # request. A value of 0 means we can accept documents of any size. # A value of 0 also removes the Range field from the HTTP header. # #User_Agent_Max_Size 0
-
Remove the leading ‘#’ character from the #User_Agent_Max_Size 0.
This functionality is not available through the user interface, only by directly editing the configuration file.
This section describes the criteria the WPSS Open Data Tool uses to verify the content submitted for the Open Data initiative.
Description | Open data URL. |
Checks | That the dataset file (dictionary, data and resource) exists. |
Does Not Check | That the dataset file is available to the public. For example, available on the Internet. |
Description | Character encoding requirements. |
Checks | That the content of the dataset file is UTF-8 encoded. |
Description | Mark-up language requirements. |
Checks | The dataset dictionary file has an acceptable mime-type of TXT or XML. |
The dataset file has an acceptable mime-type of CSV or XML. | |
The data files have content. | |
The CSV data files are well formed. That is, properly delimited and quoted. | |
The TXT dictionary files have content. | |
The XML dictionary files have content. | |
The XML data files are well formed with nested and closed tags. | |
Does Not Check | That the XML dictionary files contain terms and definitions, such as no fixed schema or XML dictionary files. |
That XML dictionary files contain duplicate terms. | |
That if multiple XML dictionary files are specified, that they all contain the same terms with no extra terms or missing term. |
Description | CSV Data. |
Checks | That CSV data files have the same number of fields in each row. |
If at least 25% of the fields on the first row of a CSV data file matches the data dictionary terms, then checks to see if all fields match the dictionary terms. |
Description | TXT Dictionary. |
Checks | The TXT dictionary files are coded such that terms are followed by one blank line before the definition. |
The TXT dictionary files are coded such that the terms consist of exactly one line of text. | |
The TXT dictionary files with multiple terms have at least two blank lines between a definition and a subsequent term. | |
The TXT dictionary files do not have duplicate terms. | |
If multiple TXT dictionary files are specified that they all contain the same terms. That means no extra terms or missing terms. |
Description | Open data URL. |
Checks | If the API URL exists. |
Does Not Check | That the API URL is available to the public. For example, available on the Internet. |
Description | Character encoding requirements. |
Checks | That the content of the dataset file is UTF-8 encoded. |
Description | Mark-up language requirements. |
Checks | If the API has an acceptable mime-type (JSON or XML). |
That API has content. | |
That JSON content is well formed (i.e. properly delimited and quoted). | |
That XML content is well formed (i.e. nested and closed tags). |
Description | CSV header row. |
Checks | That CSV data files have a header row containing data dictionary terms. |