Skip to content
w3act is an annotation and curation tool for building web archive collections
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
app Right Click on Collection to open it in new tab doesn't work #595 Jul 5, 2019
conf Make the checks work on the paged request, to avoid overload. c.f. #611 Jun 20, 2019
doc/data-model + update of documentation regarding Role/Permission relation in datab… Sep 10, 2014
integration-test/pgdump Added compose setup to allow realistic testing. May 27, 2019
project Disable checksums for now. May 10, 2017
public Right Click on Collection to open it in new tab doesn't work #595 Jul 4, 2019
test Fix up test. Jun 20, 2019
.gitignore Updated versions to latest point releases. Aug 4, 2015
.gitmodules Restructured login and collections/search page. Added sub module Jan 13, 2014
.travis.yml Temporary workaround for Travis bug. Mar 21, 2019
CHANGES.md Added v1.0.2 changes. May 27, 2015
Dockerfile Ensure GeoIP DB actually gets installed in the final image. Jun 19, 2019
LICENSE Merge branch 'master' of https://github.com/ukwa/w3act Feb 11, 2014
README.md Add important note! May 27, 2019
RELEASE_PROCESS.md Uppercased documentation name May 27, 2015
adminUser_import.sh merge of branches ddhapt and schemarefactor, currently unstable Feb 27, 2015
build.sbt Re-instate GeoIP functionality #610 Jun 7, 2019
cleanup-evolutions.bat Merge branch 'master' of https://github.com/ukwa/w3act Feb 11, 2014
cleanup-evolutions.sh Merge branch 'master' of https://github.com/ukwa/w3act Feb 11, 2014
cleanup.bat merge of branches ddhapt and schemarefactor, currently unstable Feb 27, 2015
cleanup.sh merge of branches ddhapt and schemarefactor, currently unstable Feb 27, 2015
cleanupDB.sql
crawl.log documents included in main menu Oct 16, 2014
createNewQAStatus.sql #490 New QA Status Added - data change Mar 4, 2016
data_import.bat data_import.sh works with play-2.2.1 Jun 11, 2014
data_import.sh merge of branches ddhapt and schemarefactor, currently unstable Feb 27, 2015
docker-compose.yml Added compose setup to allow realistic testing. May 27, 2019
exportBadUrls.bat merge of branches ddhapt and schemarefactor, currently unstable Feb 27, 2015
exportBadUrls.sh
get-last-version.sh + prepared script for calculation of last commit version hash on linux. Mar 17, 2014
resetIdSequence.sql merge of branches ddhapt and schemarefactor, currently unstable Feb 27, 2015
resetSequence.sh merge of branches ddhapt and schemarefactor, currently unstable Feb 27, 2015
sqlCommands sql commands run on actstage Oct 2, 2015
updateFieldUrlDomains.sh Script to update extract domain's from urls' and save to column in Mar 12, 2015
waybackinstance_import.bat merge of branches ddhapt and schemarefactor, currently unstable Feb 27, 2015
waybackinstance_import.sh merge of branches ddhapt and schemarefactor, currently unstable Feb 27, 2015

README.md

w3act

w3act is an annotation and curation tool for web archives

How to install and use

Requirements

To install you need:

  • W3ACT sources
  • Play Framework
  • PostgreSQL
  • Java
  • Maxmind GeoIP2 database
  • Whois gem
  • Maven

Download

Version Size Tool Link
v1.0 221 KB W3ACT source code download-w3act
v2.2.0 108.1 MB Play Framework download-play
v9.3.1 51.6 MB PostgreSQL database download-db
v1.6.0_33 178 MB Java Developers Kit (e.g. JDK 6) download-java
v0.7.0 13.6 MB Maxmind GeoIP2 database download-geoip
v1.7.9 12.9 MB Whois mapping between domain and country [download-whois]
v3.1.1 2.8 MB Maven tool [download-maven]

Install instructions

Please refer to the installation instructions of associated tool.

Whois

In order to install Whois lookup functionality:

Download JRuby JARs from [download-whois]. Extract ZIP and in folder jruby-1.7.9 execute:

gem install whois

in order to download whois gem for JRuby.

Then copy JRuby JARs to the "lib" folder of the project. We need jruby.jar and jruby-complete-1.7.9.jar. Download [ukwa-whois] maven project. Compile it using command

mvn clean install

Create JAR package

mvn package

Copy generated project to the "lib" folder of the project. We will get a jruby-whois-3.4.2.2-SNAPSHOT.jar

To use the application in production mode:

Configuration details

The configuration file prod.conf for production should include necessary database entries for PostgreSQL or import them from application.conf:

For H2

db.default.driver=org.h2.Driver db.default.url="jdbc:h2:mem:play;DB_CLOSE_DELAY=-1"

For PostgreSQL

db.default.driver=org.postgresql.Driver

To create database 'w3actprod' with user 'training' db.default.url="postgres://training:(password)@127.0.0.1/w3actprod"

In order to add and activate Travis CI application profile please add a new configuration file: conf/travis-ci.conf This file overrides the default application.conf database (PostgreSQL) with the H2 one. Then edit .travis.yml to pass the new config to play, i.e. change this line: script: play-${PLAY_VERSION}/play test to this script: play-${PLAY_VERSION}/play -Dconfig.file=conf/travis-ci.conf test

Open terminal and execute the following command:

play clean stage

This command creates BAT file for Windows or SH file for Linux that can be started then.

Note that if you want to use "play start" instead that could cause a problem with not killed PIDs if you close application. Also RUNNING_PID file will be created in root directory of the project that should be also removed then.

For the case you use application on Windows, in order to see processes you could use “tasklist” command. And for killing process with e.g. PID 1304 use “taskkill /pid 1304 /F” command.

Execution steps for Linux

[RHEL installation] wiki describes exact commands with comments for deployment on Linux.

To use the application in development mode:

Open DOS window and run the following command:

play run

Start browser and use URL:

localhost:9000/actdev/

Testing

play test

Documentation

Description of the domain object model and user flows can be find in [wiki]

Initial permissions and roles definition according to the requirements document is in initial-data.yml

Develop

Build status is supported by Travis [build-status]

Requirements

To build you require:

  • Git client
  • Java Developers Kit (e.g. JDK 6)
  • Play Framework
  • PostgreSQL database

For using the recommended IDE you require:

  • Eclipse Kepler Service Release 1 with m2eclipse plugin [eclipse]

In order to setup Java project with W3ACT sources use command:

play eclipse

Troubleshooting

Getting SQL errors in browser

To solve this problem adapt paths in clean up script according to your installation and execute it:

cleanup.bat

sometimes helps also:

play clean

or simply manually delete all "target" folders in your project

[build-status]: https://travis-ci.org/ukwa/w3act)](https://travis-ci.org/ukwa/w3act [wiki]: https://github.com/ukwa/w3act/wiki [eclipse]: http://eclipse.org/eclipse [download-whois]: http://www.jruby.org/files/downloads/1.7.9/index.html [RHEL installation]: https://github.com/ukwa/w3act/wiki/Installation-instructions [ukwa-whois]: https://github.com/ukwa/jruby-whois/blob/master/src/main/java/uk/bl/wa/whois/JRubyWhois.java [download-maven]: http://maven.apache.org/download.cgi

Help with submodules for W3ACT Source

$ git submodule init
$ git config -l
$ git submodule update

Running an Multi-service Integration Test

If you place a copy of a recent W3ACT database dump to integration-test/pgdump/w3act.pgdump you should be able to use the provided Docker Compose file to build and run your development version.

** NOTE that we cannot include a copy of the W3ACT database here as it contains personal information from third-parties! **

First setup the database:

# Run PostgreSQL in the background:
docker-compose up -d postgres
# To restore from the pgdump file
docker-compose up pg_restore

Then build and run your version of W3ACT:

docker-compose build
docker-compose up w3act

Note that right now the Docker Compose setup does not include OutbackCDX, Wayback or pdftohtmlex, so some pages will render slowly (due to failing to talk to these services) as well as the parts that depend on these additional services not working.

Using the API from Curl

Login, then download via the API:

$ curl -c cookie.jar -i --data "email=user@example.org&password=PASS" https://localhost:9000/act/login
$ curl -o 42.json -b cookie.jar https://localhost:9000/act/api/targets/42
You can’t perform that action at this time.