Skip to content
Ondřej Košarko edited this page Oct 27, 2017 · 4 revisions

Configuration

See the piwik section in configuration There are three trackers (site_ids) - oai, bitstreams and views (statistics.api). Bitstreams are the actual file downloads. Those are logged by *BitstreamReader. Views are "page views" logged by tracking code in footer. These two are displayed to the users from either the statistics menu or as a pdf report. OAI tracks the access to the machine to machine OAI-PMH endpoint.

In piwik you have to create user(s) with auth token, that can read/write from these sites. And paste the tokens into the configuration.

With 2017.04 there are two new configuration options:

access for dspace users

If you want to enable access to piwik statistics create a group called "statistics_viewers" and add the users that should have access (hint. authenticated/anonymous group).

Managing piwik database

Outlier in "Visits Over Time" (new bot or whatever)

Still on the web page, open the particular day; if you are lucky "Visitor Log" on it's first page contains entry with a lot of actions, open the visitor profile -> ID, IP address otherwise use the sql below...

details about piwik backend: https://developer.piwik.org/guides/persistence-and-the-mysql-backend

mysql -u user -p'password' get the details in /var/www/config/config.ini.php

mysql crash course:

show databases;
use piwik_db;
show tables;
show columns from piwik_log_link_visit_action;

Finding visitors with a lot of actions

//hex(idvisitor) is what you see in the visitor profile, idvisitor is binary
//get top 15 ids from a site (4 is repository downloads in this case)
select hex(idvisitor),count(*) as count from piwik_log_link_visit_action where idsite=4 group by idvisitor order by count DESC LIMIT 15;
//or for particular date or date range use the server_time column e.g.
select hex(idvisitor),count(*) as count from piwik_log_link_visit_action where idsite=4 and server_time between '2016-04-08 00:00:00' and '2016-04-08 23:59:59' group by idvisitor order by count DESC LIMIT 15;
//or grouping by dates
select hex(idvisitor),count(*) as count, year(server_time), month(server_time),day(server_time) from piwik_log_link_visit_action where idsite=4 group by idvisitor, year(server_time), month(server_time), day(server_time) order by count DESC LIMIT 15;

deleting the records

delete from piwik_log_link_visit_action where idsite=4 and idvisitor=unhex('bd6611fbe712b84b') and server_time between '2016-04-08 00:00:00' and '2016-04-08 23:59:59';
//also clean the log_visit table
delete from piwik_log_visit where idsite=4 and idvisitor=unhex('bd6611fbe712b84b');

drop precomputed stats for the year_month you've touched

drop table piwik_archive_blob_2016_04;
drop table piwik_archive_numeric_2016_04;
Query OK, 0 rows affected (0.12 sec)

recompute:

root@piwik:/var/www# ./console core:archive --force-all-periods=315576000 --force-date-last-n=20 --url=http://ufal.mff.cuni.cz/piwik/ --force-idsites=4
  1. --force-all-periods Limits archiving to websites with some traffic in the last [seconds] seconds. (last +- 10 years in the example)
  2. --force-date-last-n This script calls the API with period=lastN. (calls for last 20 days/weeks/months/years)