Statement load graph and Top statements feature implemented #44

s-soroosh · 2015-02-15T10:23:16Z

First of all pleaese update pgobserver.yaml file
Then you need to update your schema, so if tables are already exist. just run sql/patch/00_STAT_STATEMENTS_DATA.sql

Note: I made some changes in StatStatementsGatherer to gather statement calls incrementally, so we are able to filter them in different time windows. Also i have added ssd_user_id to stat_statements_data. so it is possible to filter queries by user_id.

… to scheduleWithFixedDelay

…ind number of calls (or any other parameters) based on time window

…modules

kmoppel · 2015-02-17T15:05:42Z

frontend/src/topstats.py

+    return getTop10Interval(order, "('now'::timestamp - %s::interval)" % ( adapt("%s hours" % ( hours, )), ), hostId, limit)
+
+
+def getStatLoad(hostId, days='8'):


From this func I got an exception..

File "/ssd/code/temp/pgo-psycho-ir/frontend/src/topstats.py", line 115, in getStatLoad
cur.execute(sql)
File "/usr/local/lib/python2.7/dist-packages/psycopg2/extras.py", line 120, in execute
return super(DictCursor, self).execute(query, vars)
ProgrammingError: can't execute an empty query

kmoppel · 2015-02-17T15:18:32Z

After commenting out the "if not tplE._settings['run_aggregations']:" I got:

File "/ssd/code/temp/pgo-psycho-ir/frontend/src/topstats.py", line 115, in getStatLoad
cur.execute(sql)
File "/usr/local/lib/python2.7/dist-packages/psycopg2/extras.py", line 120, in execute
return super(DictCursor, self).execute(query, vars)
DataError: numeric field overflow
DETAIL: A field with precision 6, scale 2 must round to an absolute value less than 10^4

kmoppel · 2015-02-17T15:53:06Z

sql/patch/00_STAT_STATEMENTS_DATA.sql

+SET client_min_messages = warning;
+
+ALTER TABLE stat_statements_data
+   ADD COLUMN ssd_user_id integer DEFAULT 0;


"ADD COLUMN + DEFAULT 0" in one statement wouldn't work also in regards to existing DBs. In our company we have for example gigabytes of old statement data and it makes no big sense to overwrite that. Doing "SET DEFAULT" in a separate command would alleviate the problem.

Do you mean firstly i add a nullable column then update the field and then make it not null?
If not please give me a simple code snippet about implementation.
Thanks.

kmoppel · 2015-02-17T16:06:17Z

The feature of adding userId to stat_statements_data is currently problematic. Currently it would brake intended use of working frontend functions likegetStatStatements() and getStatStatementsGraph() as they're intended to work across all users e.g. one should also change them.

And also, the way of configuring "interested users" in the config.yaml file is not "scalable" in my opinion as changing them would require an application restart. So all in all I'm not sure if it makes sense to add this "userId" to the picture. Also so far all of PgObservers data is unpersonalized intentionally. Your comments on the idea are welcome.

s-soroosh · 2015-02-17T16:58:07Z

Thank you for code review. I will have solved problems by 1day.

But Just about UserId. Before i added userid field, i analyzed pg_stat_statemenrs and figured out many internal queries executed which are persisted and are shown in pg_stat_statemets.
So i thought maybe we need a field to be sure that we are monitoring just statements that are executed by specific user.

About intention to work across all users, i think there is no problem and just by aggregation functions we can generate aggregated statistics.

I am agree about "interested users" with you. but for this version just i wanted to get feedback if this approach is usable of not. If it is OK i will move them to a more configurable and scalable place (A Table with entry form in frontend).

Please Let me know your thoughts.

kmoppel · 2015-02-18T14:10:57Z

Thanks for the background!

But ok, I think if this "user" filtering concept is redesigned into using a table (similar to sproc_schemas_monitoring_configuration), the patch would be acceptable. But I would think it would be still best to keep the actual "stat_statements_data" table as it is, and just to filter out "listed users" in the data gathering query on Java side.

s-soroosh · 2015-02-18T15:15:35Z

OK, i will add configuration table.
About stat_statements_data. Without change i think we will miss some flexibility. For example we cannot filter queries by user, Assume you see query1 calls are too much. But maybe different users call it.
By clicking on it you can figure out how many calls there are from different users (I have not implemented this feature yet, but i think it is feasible by having userid).

I am agree about username and i will change the implementation to persist username instead of userid and its more meaningful.

Please let me your thoughts.

s-soroosh · 2015-02-18T15:19:54Z

"""

After commenting out the "if not tplE._settings['run_aggregations']:" I got:

File "/ssd/code/temp/pgo-psycho-ir/frontend/src/topstats.py", line 115, in getStatLoad
cur.execute(sql)
File "/usr/local/lib/python2.7/dist-packages/psycopg2/extras.py", line 120, in execute
return super(DictCursor, self).execute(query, vars)
DataError: numeric field overflow
DETAIL: A field with precision 6, scale 2 must round to an absolute value less than 10^4
"""

I could not reproduce this error.
Could you please let me know how it happens, or give me a unit test that fails with your conditions.

kmoppel · 2015-02-19T17:25:41Z

Problem seems to be that you're sum-ing absolute values instead of diffs between timestamps, and I get an overflow on real life data. Something like "COALESCE(ssd_total_time - lag(ssd_total_time) OVER w, 0::bigint) AS delta_total_time" needs to be added to the innermost query

kmoppel · 2015-02-19T17:26:59Z

Also if I start to think about it, this kind of "load" is not too useful also. Probably it's just better to derive and graph the average query execution time for pg_stat_statement data.

s-soroosh · 2015-02-19T19:20:55Z

pg_stat_statement is not time aware, so we cannot find for example the number of calls in last 1 hour.
So if these kinds of graphs are not useful.
I will changed the gatherer to gather just interested users and remove user_id from stat_statements_data.

kmoppel · 2015-02-24T17:49:58Z

ok, that should make it passable :)

but about your comment - "pg_stat_statement is not time aware, so we cannot find for example the number of calls in last 1 hour"...well true, none of data provided by PG system views is time aware, thatswhy we append the "timestamp" columns to all data saved to pgobserver datastore. Currently we already use the calls count change to display the avg. stat_statement runtime on "/perfstatstatements" screen for example.

kmoppel · 2015-12-02T15:02:58Z

Hey Soroosh, I resuscitated this feature now, and re-implemented it in a more simpler form using also some bits from your PR, so thank you! Please check #62 for details.

s-soroosh added 20 commits February 14, 2015 13:47

show_stat_load added to pgobserver.yaml files

b5e4a22

PEP8 refactoring

c4f5830

getstatload for not run_aggregation situation created

48585b1

statement load graph feature added in default controller

c9dfa1d

ssd_user_id column added to stat_statements_data table

6739b47

patch script for stat_statements_data upgrade added

bd29777

userId filed added

e1c82ba

refactored, not to use implicit StringBuilder on every call

634be9f

refactored to remove is null checking by == and PEP8

c918e73

userId persistence

2ed11b8

to prevent executions overlapping, i have changed scheduleAtFixedRate…

1bb04b2

… to scheduleWithFixedDelay

Changed the logic to save statements calls incrementally, so we can f…

aa30c6d

…ind number of calls (or any other parameters) based on time window

time_utils.py added to make makeTimeIntervalReadable usable in other …

e7cba8f

…modules

new cofings added to yaml files

a503472

new config added to yaml files

0e79f9d

top10 statement functions added to topstats

2c2dc5b

some refactoring and PEP8

b86206a

top_stats section added to index.html

282a019

stat_table (something like) table.html generated

2af65af

default controller changed to show top stats

806996f

s-soroosh mentioned this pull request Feb 15, 2015

Add a top pg_stat_statements section similar to top sprocs to the main screen #32

Closed

kmoppel reviewed Feb 17, 2015
View reviewed changes

s-soroosh mentioned this pull request Feb 24, 2015

Is it possible to add adhoc queries to graphs ? #45

Closed

kmoppel closed this Dec 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statement load graph and Top statements feature implemented #44

Statement load graph and Top statements feature implemented #44

s-soroosh commented Feb 15, 2015

kmoppel Feb 17, 2015

kmoppel commented Feb 17, 2015

kmoppel Feb 17, 2015

s-soroosh Feb 18, 2015

kmoppel commented Feb 17, 2015

s-soroosh commented Feb 17, 2015

kmoppel commented Feb 18, 2015

s-soroosh commented Feb 18, 2015

s-soroosh commented Feb 18, 2015

kmoppel commented Feb 19, 2015

kmoppel commented Feb 19, 2015

s-soroosh commented Feb 19, 2015

kmoppel commented Feb 24, 2015

kmoppel commented Dec 2, 2015

		return getTop10Interval(order, "('now'::timestamp - %s::interval)" % ( adapt("%s hours" % ( hours, )), ), hostId, limit)


		def getStatLoad(hostId, days='8'):

Statement load graph and Top statements feature implemented #44

Statement load graph and Top statements feature implemented #44

Conversation

s-soroosh commented Feb 15, 2015

kmoppel Feb 17, 2015

Choose a reason for hiding this comment

kmoppel commented Feb 17, 2015

kmoppel Feb 17, 2015

Choose a reason for hiding this comment

s-soroosh Feb 18, 2015

Choose a reason for hiding this comment

kmoppel commented Feb 17, 2015

s-soroosh commented Feb 17, 2015

kmoppel commented Feb 18, 2015

s-soroosh commented Feb 18, 2015

s-soroosh commented Feb 18, 2015

kmoppel commented Feb 19, 2015

kmoppel commented Feb 19, 2015

s-soroosh commented Feb 19, 2015

kmoppel commented Feb 24, 2015

kmoppel commented Dec 2, 2015