Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics & Pageviews take very long #66

Closed
SoftCreatR opened this issue Aug 24, 2020 · 21 comments
Closed

Metrics & Pageviews take very long #66

SoftCreatR opened this issue Aug 24, 2020 · 21 comments

Comments

@SoftCreatR
Copy link

I'm analyzing a relatively large website since Aug 21 and it already takes ages until the stats are loaded.

image

The metrics took 15.91s and the pageviews 19.68s to load. So after a week, it will take more than 30 seconds, which is way too long.

@mikecao
Copy link
Collaborator

mikecao commented Aug 24, 2020

That sounds really long. Those queries should be using indexed columns so it shouldn't take that long. What kind of server is it running on?

@SoftCreatR
Copy link
Author

Umami is hosted on a small Hetzner Cloud machine (CX 21). I'm using MySQL as database.

@mikecao
Copy link
Collaborator

mikecao commented Aug 24, 2020

I'm running on a much smaller server (although fewer users) and I haven't had queries that long. Check your database and make sure the indexes were created.

@SoftCreatR
Copy link
Author

SoftCreatR commented Aug 24, 2020

Indexes are there.

image

@mikecao
Copy link
Collaborator

mikecao commented Aug 24, 2020

You can try doing an EXPLAIN on the query to see what's going on. This is the query for metrics for MySQL:

select sum(t.c) as "pageviews",
  count(distinct t.session_id) as "uniques",
  sum(case when t.c = 1 then 1 else 0 end) as "bounces",
  sum(t.time) as "totaltime"
from (
  select session_id,
  date_trunc('hour', created_at),
  count(*) c,
  floor(unix_timestamp(max(created_at)) - unix_timestamp(min(created_at))) as "time"
from pageview
where website_id=?
  and created_at between ? and ?
  group by 1, 2
) t

Replace the ? with parameters.

@mikecao
Copy link
Collaborator

mikecao commented Aug 24, 2020

Also, what is the date range on that query?

@SoftCreatR
Copy link
Author

SoftCreatR commented Aug 24, 2020

Running the query itself, it takes "just" 7,125s:

select sum(t.c) as "pageviews",
  count(distinct t.session_id) as "uniques",
  sum(case when t.c = 1 then 1 else 0 end) as "bounces",
  sum(t.time) as "totaltime"
from (
  select session_id,
  date_trunc('hour', created_at),
  count(*) c,
  floor(unix_timestamp(max(created_at)) - unix_timestamp(min(created_at))) as "time"
from pageview
where website_id=2
  and created_at between from_unixtime(1597788000) and from_unixtime(1598392799)
  group by 1, 2
) t

Explained:

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY ALL 242784
2 DERIVED pageview ref pageview_created_at_idx,pageview_website_id_idx pageview_website_id_idx 4 const 242784 Using where; Using temporary; Using filesort

Also, what is the date range on that query?

7 days is set in the UI. However, the first entry was created on 2020-08-20 03:14:25.

@mikecao
Copy link
Collaborator

mikecao commented Aug 25, 2020

Try creating a new composite index and see if that helps:

create index pageview_website_id_created_at_idx on pageview(website_id, created_at);

@SoftCreatR
Copy link
Author

Still no change.

metrics: 10.96s
pageviews: 19.03s

@mikecao
Copy link
Collaborator

mikecao commented Aug 25, 2020

@SoftCreatR do you think you can send me a mysql dump of your data? I only have postgresql data. I ran a test with 500K records and it returned in 30ms. I want to see if there is something mysql specific.

@mikecao
Copy link
Collaborator

mikecao commented Aug 25, 2020

Ok, I mocked up a database in MySQL with a million records. Even this simple query it takes 5 seconds:

select session_id,
count(*) c
from pageview
where website_id=1
group by 1

The same query in Postgres takes 13ms. It seems MySQL has issues with group by and count in the query. May need someone with more MySQL experience to chime in here.

@SoftCreatR
Copy link
Author

I guess, the problem here is the numeric grouping. I'm not sure, but it could be, that indexes don't have that much effect in this case.

@mikecao
Copy link
Collaborator

mikecao commented Aug 25, 2020

I'll keep trying with different queries but it seems really odd that such is simple query is so slow.

@mikecao
Copy link
Collaborator

mikecao commented Aug 25, 2020

@SoftCreatR try adding this index:

create index pageview_website_id_session_id_created_at_idx on pageview(website_id, session_id, created_at);

@SoftCreatR
Copy link
Author

No change. Ill send you a dump of my db. Maybe that helps.

@mikecao
Copy link
Collaborator

mikecao commented Aug 27, 2020

@SoftCreatR Getting closer to a solution. I've got the main queries down to less than 2 seconds.

@mikecao
Copy link
Collaborator

mikecao commented Aug 28, 2020

@SoftCreatR try pulling the lastest build. I made some improvements. And make sure you create this index:

create index pageview_website_id_session_id_created_at_idx on pageview(website_id, session_id, created_at);

@SoftCreatR
Copy link
Author

Created the index, pulled the latest commits, rebuilt it. Result:

image

@SoftCreatR
Copy link
Author

It's even worse: It consumes all available RAM and CPU so it's completely useless for me atm.

@SoftCreatR
Copy link
Author

SoftCreatR commented Aug 28, 2020

So, after some analysis, 3 queries are fired:

select sum(t.c) as "pageviews",
    count(distinct t.session_id) as "uniques",
    sum(case when t.c = 1 then 1 else 0 end) as "bounces",
    sum(t.time) as "totaltime"
  from (
     select session_id,
       DATE_FORMAT(created_at, '%Y-%m-%d %H:00:00'),
       count(*) c,
       floor(unix_timestamp(max(created_at)) - unix_timestamp(min(created_at))) as "time"
     from pageview
     where website_id=2
     and created_at between TIMESTAMP'2020-08-21 22:00:00' and TIMESTAMP'2020-08-28 21:59:59.999000'
     group by 1, 2
 ) t

Execution time (Avg): 10.375s
Explanation:

id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
1 PRIMARY ALL 460892 192252,00 100,00 100,00
2 DERIVED pageview ref pageview_created_at_idx,pageview_website_id_idx,pageview_website_id_session_id_created_at_idx pageview_website_id_session_id_created_at_idx 4 const 460892 950738,00 100,00 88,77 Using where; Using index; Using temporary; Using filesort
select DATE_FORMAT(convert_tz(created_at,'+00:00','+02:00'), '%Y-%m-%d') t,
    count(distinct session_id) y
  from pageview
  where website_id=2
  and created_at between TIMESTAMP'2020-08-21 22:00:00' and TIMESTAMP'2020-08-28 21:59:59.999000'
  group by 1
  order by 1

Execution time (Avg): 2.750s
Explanation:

id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
1 SIMPLE pageview ref pageview_created_at_idx,pageview_website_id_idx,pageview_website_id_session_id_created_at_idx pageview_website_id_session_id_created_at_idx 4 const 460860 843947,00 100,00 100,00 Using where; Using index; Using filesort
select distinct url x, count(*) y
  from pageview
  where website_id=2
  and created_at between TIMESTAMP'2020-08-21 22:00:00' and TIMESTAMP'2020-08-28 21:59:59.999000'
  group by 1
  order by 2 desc

Execution time (Avg): 18.004s
Explanation:

id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
1 SIMPLE pageview ref pageview_created_at_idx,pageview_website_id_idx,pageview_website_id_session_id_created_at_idx pageview_website_id_idx 4 const 460709 950401,00 100,00 88,77 Using where; Using temporary; Using filesort

I've updated the dump, that I've sent to you already.

@mikecao
Copy link
Collaborator

mikecao commented Aug 29, 2020

@SoftCreatR The data dump was very useful. I was able to fix a couple bugs from it. For example the domain field for the website should just be the domain, not including http. So thanks for that.

Using the new dump and I made a few small improvements. Here are the numbers from the details page:

image

A little higher than I'd like but not terrible. But I think I understand what is going on with your server. You are running it on a live site while I am just using a local MySQL instance with no incoming traffic. With the amount of hits you are getting, it's probably taking up all your resources for these queries. You might need a much stronger server or set up a read replica.

Originally I had a cookie using localStorage that would cache some data and save a few queries, but I removed it due to GDPR concerns. It would probably help a lot in your case. I can add it back and make it configurable. What do you think? Since you're EU based it's probably more of a concern for you.

This was referenced Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants