
Mean vs p99
Nikolay: Hello, hello, this is
Postgres.FM.
I'm Nik, Postgres.AI, and as usual
with me, Michael, pgMustard.
Hello, Michael.
Michael: Hello, Nik.
How's it going?
Nikolay: Everything is all right.
A lot of work.
There is a progress on many fronts.
So, yeah, excited to discuss a
quite important topic.
And maybe let's see what we can
learn from this discussion.
I don't know, like mutually learn
or something.
Michael: Yeah, I'm glad you think
it's interesting.
Nikolay: It's interesting because
it's always with us.
When we deal with performance,
we need to work with many numbers
and this topic pops up all the
time.
Michael: Yeah.
So specifically, the topic is when
we're looking at latencies
or monitoring or troubleshooting,
looking at latencies or you
know monitoring or troubleshooting,
maybe prioritization, we
often need to look at queries by
some measure of their duration.
Like often, and I see this in blog
posts, in tools, various places,
often ordered by the mean or the
average of their latencies.
Very easy to get from pg_stat_statements
for example, it has
a dedicated column, but in a lot
of other places, a lot of other,
you know, different types of monitoring
tools and I'd say more
on the application developer side.
I'm seeing increasingly people
moving to more monitoring the
p95 p99 type latencies, so percentile
based and kind of the tail
latency is not the average, so
wondering about that.
And yeah, I saw recently another
feature request to have the
ability to see these in pg_stat_statements
and I've realized it's
something we can approximate but
it's not something we can get
from that so I wrote a post about
it recently and I'm glad you
find interesting too.
Nikolay: Yeah.
So that's important topic, I think,
because we all work with
averages or mean values all the
time.
And, it's not only about positives,
right?
Mean value, meaning of average
is often misunderstood and it's
often misleading.
I remember the idea that in any
organization, if you check what
is the mean salary or average salary,
most people might find
out it's like in some organizations,
in some organizations, oh
wow, it's like this average salary
is much above my own salary,
what's happening here?
Well, what's happening, a couple
of dudes have huge salary, that's
why, right?
So it means that we like, when
we back to Postgres, back to queries,
you see average or mean time is
like 1 millisecond.
Oh, not bad, not bad, right?
But how many queries are much about that?
It might be a lot.
And I must notice that mean time, mean exact time, mean planned
time, these columns are redundant.
They could be avoided because we always can calculate them dividing
total by number of calls, right?
Or plans in case of planning time.
By the way, I read your article, I noticed you optimistically
involve planning metrics.
Yeah.
Unfortunately, 99% of cases, we don't see them because they are
off by default.
And I remember last year or previous year, it took some effort
for us to prove that basically you can turn them on, but unfortunately,
as we had episode about this, right, unfortunately, performance
cliff will be 2 times closer to you if you turn it on, right,
but again, like, default recommendation is to turn it on because
it's very valuable information.
So again, back to averages, we could always calculate them dynamically
or something, it's not a big deal.
But I guess it was just a matter of convenience to add them and
keep them already pre-calculated in pg_stat_statements and other
similar extensions and so on.
But would be really great to see percentiles, right?
Or better to see whole histogram analysis of what's happening.
But unfortunately, I guess it's a very complex topic.
I mean, I know pg_stat_monitor implements this, but we need to
check overhead.
Michael: That's the critical part.
And also, just it's more columns.
We've got already a lot of columns in pg_stat_statements.
So it has additional overhead both to tracking, but also in terms
of user experience.
And I mean, I think at this point we have so many columns that
why not have a few more?
But it is a problem.
Like, there are, I think, just in time compilation, we've got
a bunch of additional columns for every single part of that and
there's there has been Discussion about whether that should be
collapsed down because do we really need granular?
Reporting there But yeah, I I would be in favor of some histograms.
I also haven't yet seen any analysis of...
So pg_stat_monitor is an alternative to pg_stat_statements, relatively
new in the grand scheme of things, probably a few years old at
this point.
But I don't see it in many installations and I haven't yet seen
anybody publish benchmarks on its performance relative to, yeah.
Nikolay: It was created in Percona and I remember I discussed
it with Peter Zaitsev who was the founder of Percona when this
extension was created and they said instead of contributing to
pg_stat_statements, they wanted a separate, independent release
cycle.
And as we know, pg_stat_statements is a part of the set of so-called
contrib modules.
It means that they are, like, basically, this cycle, it's Postgres's
own release cycle, so once per year.
And if you want a few times per year to release something, it's
hard.
And then I know the original creator left Percona and I don't
know what's happening with development of this extension but
when it was created, our first look, we did, our team, I remember
we did some analysis and early versions looked promising but
challenging in terms of performance overhead.
And of course, if you talk about RDS and others, they don't have
it.
I saw it only maybe on 1 or 2 platforms.
So Yeah, and performance overhead is a big question.
That being said, I don't know the current performance overhead.
And it would be great to have histograms and so on, right?
So percentiles.
So yeah, and another thing is 2 aspects here additionally in
my approach.
First is that in many cases, I don't care about average at all,
because this is, as we also discussed several times, depending
on our goals, it might be relevant or not, like important or
not.
For example, if we target to optimize how much resource utilization
happens on this server, we care about total time and total values
because this is what exactly shows the picture because a query
might have very like tiny average, tiny standard deviation so
they all like similar all query executions are similar and planning
occurrences are similar but so many of them like 10,000 QPS,
queries per second.
This is going to load our server a lot and we need to optimize
this query in spite of good values in terms of average.
Total will be high.
So averages, as I feel it, it's mostly, you mostly need it when
you target to deliver good experience to your users.
Michael: So I think so.
Yeah, that you're completely right.
Obviously, if we're trying to reduce utilization, there's no
point looking at things by how slow they are on average, or even
at the extremes, at the p99 level, because if they don't run
that often, they're not things taking the most time.
So total, ordering by totals, whether that's time, I still like
to do time, or some combination of buffers, if for example your
resource that you're constrained by is I O, which could be the
case, or like if you're paying by I O, like if you're on Aurora
or something, ordering by something else could make sense.
But yeah, totals always make sense for those cases.
But yeah, this is for those other
cases where you care about
user experience, perhaps, like
how long a dashboard's taken to
load for a large customer or the
boss has complained or that
kind of thing.
Or we've talked a few times about
the issues with long running
queries.
So you might be investigating what
queries do I even have that
run long sometimes?
So I think that's the other use
case I thought of as like, why
might you want to order by some
version of either average or
percentile?
Nikolay: Yeah, let me argue with
myself.
My favorite thing.
So yeah, when we optimize for resource
consumption, lower resource
consumption, we look at totals,
but at the second step, we should
look at average, because we need
to decide Average and frequency,
which multiplied bring us to total,
right?
So we need to go down and see is
it about too frequent query
or is it about not very frequent
but very slow query and decide
which optimization methods to apply,
right?
And in this case, I'm wondering,
does it really matter to understand
that, like, to look at standard
deviation, basically, right?
To think about percentiles?
Michael: I think you're conflating
2 things here, looking at
something versus ordering by it.
And yes, when you're looking at
resource consumption, you want
to look at average time sometimes,
but you don't order by it.
Crucially, you need to order by
total time.
And the same is true for standard
deviation, I think.
I'm talking about adding maybe
1, maybe 2, depending on exactly
what percentile you want, adding
a couple of standard deviations
to the mean to order by it.
Not necessarily to look at it,
but just to get the initial prioritization
order, which things should I start
looking at, what are my top
10, because the order will be different,
assuming you have some
variants and you know, distribution
of your queries and some
of them have a longer tail than
others, the order will be different
if you order by mean plus a couple
of standard deviations than
it would be if you just order by
mean.
So it's about the prioritization
order.
And then you care about looking
at a lot of the other columns,
but you don't order by them.
So I think that's a subtle difference.
Nikolay: Yeah, yeah, yeah, I agree.
Order by standard deviation, it's
interesting.
Michael: I'm joking.
I'm joking.
Yeah.
Nikolay: Yeah.
I understand you are thinking about
ordering, for example, average
plus some factor times standard
deviation.
This is a smart idea, I think.
And before we move to that idea,
which is explained in your blog
post.
Michael: Yeah.
Nikolay: So, so again, like, okay,
we order by total time, and
then we, we look to decide which
approach to use for optimization.
And still maybe there it also makes
some sense to look, to have
percentiles for example, because
we would understand, oh, this
query has this average, which is
not that bad, but for example,
we don't have percentiles in pg_stat_statements, no histogram.
So, but we see, Oh, standard deviation
is so huge, it means that
some occurrences are okay, some
are not at all.
Right?
And this brings us to the idea,
oh, it means that we have instability
here, right?
Depending on maybe data volumes,
or maybe we were blocked by
someone.
By the way, I would rather prefer
to have this information for,
you know what, different metrics
and time, right?
Yeah.
But for hits and reads, this is
where I would really love to
see that piece, standard deviation
or even better, percentiles.
Michael: Oh my goodness.
I'm just imagining the number of
columns at that point.
Nikolay: Yeah, because it always,
in pg_stat_statements approach, it
always concerns me how much of
those slowness was associated
with query being blocked by another
query, just wait time.
We don't know and it's hidden here.
Michael: Yeah, I mean, it's a good
point that these aggregates
are always going to hide information.
You know, it's the nature of aggregations
is we don't get all
the information.
It's the trade-off, right?
We get some useful information
on aggregate, but we don't get
all of the information.
I quite often see people include
min, mean, and max in their
queries, their querying pg_stat_statements form.
So you get kind of the full distribution.
I could imagine adding a couple,
you know, like either side of
the mean to say what's the mean
minus the standard deviation,
what's the mean plus the standard
deviation.
Nikolay: Yeah, without standard
deviation, mean and max, they
can be some, like you have million
calls, but just 1 of them
was so far and like, okay, it broke
your whole picture, affected
your whole picture, while standard
deviation brings some good
sense, right?
Michael: Yeah, I mean, I guess
min is possibly more interesting
than max, but yeah of course sometimes
there'll be 1 parameter
that causes the query to run very
differently.
Nikolay: 0 rows returned, we have
min close to 0, it doesn't
make any sense, right?
Michael: Good point, yeah.
But by including more you get a
better picture of the distribution
you get more points on the on the
graph to get an idea of quite
how is this distributed is that
max and outlier or are there
quite a few queries running nearly
that long, you know, so without
some measure close to p95, p99
that kind of thing, I think it's
really hard to know that and I
think possibly what's happening
at the moment is quite a few of
these dashboards, monitoring
tools, diagnosis tools, I'll give
you an order when you look
at it by mean that's fine, it's
useful, but could be more useful
and you could maybe start at the
more problematic queries if
you ordered them by a higher percentile
and then started at the
top from there.
Nikolay: Yeah.
Because it's not it's like we say,
oh, on average, we deliver
everything like 10 milliseconds,
everything is good.
Or 1 millisecond again, like for
a queries, we want to be below
10 milliseconds ideally, definitely
not 100 milliseconds, it's
too much.
So we say it's good, but what if
almost 50% of that was 1 millisecond,
and almost 50% of that was how
many, like, not 1 millisecond.
Michael: What did you say the average
was?
Nikolay: My math is going to be
really bad.
I was trying to say you can, you
might have 2 segments or clusters
of cases.
1 is very close to 0 and other
is very bad.
And
Michael: they call it like a bimodal
bimodal distribution.
Yeah.
Because perhaps because of a plan
like different plan for different
amounts of data, maybe like
Nikolay: parameters, maybe, or
maybe this locking issues.
And then you observe something
in the middle.
Well, maybe, okay, maybe 50-50
is a bad example, because we are
going to mean the middle.
What if, what if like 70%, very
close to 0, 30% is very far from
0, and they really struggle with
performance, those users, right?
And we see average is not bad it's
going to be shifted towards
to 0 because it's average not and
oh no it's not going to look
okay depends right So but but average
will tell us it's like,
it's not that bad.
We can leave it.
Michael: It's very, yeah, it's
very easy to imagine distributions
that on average are less than some
threshold that we might not
care about, like 50 milliseconds
or something.
Maybe we set our threshold there.
On average, they're 40, 45.
But for a non-zero, you know, like
a decent number, maybe if
it's p95, p99, and we're talking
about either 5% or 1% of times,
it's running for hundreds of milliseconds.
And we've talked in the
Nikolay: past about how...
Michael: Or even seconds, yeah,
exactly.
So those are noticeable to human
durations, even though on average
it's running in 45 milliseconds.
So it's very easy, I think, to
hide things that on average look
like they maybe aren't perceptible
to humans but if the tail
is long enough then for quite a
lot of people especially you
know let's say it's 5% of people
that's 1 in 20 times somebody's
loading that dashboard or using
that feature.
1 in 20 sounds quite often.
Nikolay: Yeah, yeah.
Yeah, if it's 1 in a thousand,
well, things
happen, right?
Yeah, we can say, okay,
we have a very large user base,
some of them, like, we need to
take care of everyone, but we
cannot, like, it happens, right?
But this approach works only if
your p99 is good.
Michael: Yes.
Nikolay: Okay, 1% is not that good,
okay, but if it happens with
20% of your users, they suffer,
and you missed this fact just
looking at the average, it's not
good, it's not good.
Michael: Yeah, exactly.
Well, imagine, go back to our example.
If you've got 2 queries, most,
they look the same in terms of
average, let's say one's 45 milliseconds
on average; the other's
50 milliseconds on average, but
the one that's 45 has a p99 that's
10 times that 450 milliseconds,
and the one that's 50 milliseconds
has a p99 of only 60 milliseconds,
so only a little bit higher.
I would much, much, much rather
start, have it ranked higher,
the one that had the higher p99 in
that case, even though it has
a lower average duration.
So that's kind of the whole point.
Nikolay: Yeah, let's explain the
idea you had.
The idea is simple.
We don't have buckets, we don't
have percentiles, we don't have
histogram, we cannot draw it, but
we have standard deviation.
And the idea is how I learn, how
I hear it or read it.
In all analysis we have over the
last years with pg_stat_statements
and other things, pg_stat_kcache,
pg_wait_sampling maybe and so
on, we always look at throughput
and average, throughput and
average latency, that's it.
Yeah, we know that from, you mentioned,
the article mentioned
that from application side, some
monitoring systems, they bring
you p99, p99.5, it's great, or 2
values even better than just
1 here.
Yeah.
And sometimes colorful, good looking
graphs.
But in Postgres we don't do it
somehow.
But we have standard deviation
and in which Postgres monitoring
standard deviation is really appreciated?
I don't know.
Michael: I don't know any, yeah.
Nikolay: We don't have p99 or p95,
p90, nothing like that.
But what if we assume we like distribution,
well, your assumption,
it's normal, it's not normal, like
half normal.
But idea is let's take average
and standard deviation and have
some multipliers, so we have formulas,
average plus some multiplier
times standard deviation, and we
have a kind of p something,
p 90 something, kind of, right?
Michael: Yeah, I called it approximate,
just only because it
really does assume a normal distribution,
like standard deviation
only makes sense in a, like, not
that it only makes sense, you
can only approximate if you assume
a normal distribution and
I made the case that I think it
would be quite rare for you to
have queries, or at least problematic
queries that are normally
distributed.
I suspect a lot of the problematic
ones are going to have long
tails but it definitely isn't a
safe assumption that they're
all going to be normally distributed,
but once you factor that
in, I think this is still useful.
I think queries that have a low
average and a high standard deviation,
where this ordering difference,
where this change makes a difference,
are probably going to have, well,
are definitely going to have
high p99s as well.
It's still directionally correct,
I think I used that phrase,
so it still points you probably
in the right direction.
This order is probably not going
to be that different to the
order by true p99.
So it's a step in the right direction,
it's not necessarily accurate,
but it's directionally correct.
Nikolay: S.
Yeah, I agree.
And since I suck at statistics,
as every listener already should
understand by this point.
I talked to my daughter who is
learning data science at UCSD
right now.
And she told me, oh, this is quite
known, like p95 or something
for normal distribution we just
standard deviation times 2 to
both sides this is how you can
get p90 I should say or p95 I
forgot like yeah so this is reasonable
she said but then like
of course it's not normal distribution,
and it's a big assumption.
And what I thought, idea number
1 I had, and maybe I will encourage
some, I don't know, some students
or some people who are willing
to do something here.
What if we had some raw data from
somewhere?
I don't know who can afford logging
all queries with duration.
Since we talked last week, we talked
how dangerous it can be
to log all queries.
But imagine we collected some raw
data and we see what's present
in persistent statements.
And what I'm trying to say, we
could find, we could understand
distribution nature, maybe observing
many, many cases from OLTP,
like web apps, mobile apps.
And then we could probably find
some heuristic based multiplier,
right?
Which would serve like closer to
reality, I don't know, like
to avoid this assumption that it's
normal, but it's a big work.
I know like it's, you need to collect
a lot of data process it
and so on.
What do you think?
Michael: Well, and it would be
different for different queries
like unless we find out that it's
just I think it's a tricky
problem unless you actually approximate
it in while you're monitoring.
Nikolay: Look, I understand what
you said.
So if we take assumption, we like,
it's kind of normal.
It sounds for me like fragile assumption.
If we learn how some OLTP system,
social media or something,
with all these queries, like select
2 users, select 2 something,
join some tables, and so on, how
these queries behave, at least,
I don't know, a few dozens of them,
how they behave.
This like practice learned multiplier,
I'm pretty sure it will
be in another system it will be
very close, you know?
Michael: Yeah, actually it's a
good point.
You could get more accurate with
the number, like the exact multiple
to get a more likely to be p99
measure.
It might turn out that the number
I've suggested based on normal
distribution is actually more likely
to be a p96 than a p99.
Or p97 instead of p99.
Does it make that much difference?
No, no.
Nikolay: If it's just order by,
no.
I agree with you.
And what I'm proposing breaks Pareto
principle, right?
Michael: Yeah, It's a lot
of work for arguably not that
much additional, like, I think
if someone's going to put a lot
of effort into this, it would be
much better to actually track
this and put it into P statements.
Nikolay: This will be, again, overhead
question.
It would be great to have, but
again, I understand.
What I'm proposing is going to
be outside of p80 of the effort
we're going to have.
Or p20.
I like it.
p20, p20.
I don't know.
p80 of the value, p20 of...
Okay, so another thing, another
thought I have.
I noticed over time that when I run some benchmarks, very often
with pgbench, I'm absolutely unsatisfied with...
So I use, we have it in automation as well, like all my team
members use it.
So I have lowercase r to report results in the end for each statement
used in transactions, right?
Sometimes we use multiple files with different transactions and
like with some weights assigned to them.
It can be achieved with option F and add sign.
It's great feature, by the way, underappreciated.
But anyway, I see these reports and we discuss them and like,
Oh, this looks great.
Oh, it looks not that great, but I realized it's not enough.
I want histograms.
Right.
Michael: Yeah.
Nikolay: And to mitigate it, at least somehow, we started to
use capital P option.
For example, P30, it means like every 30 seconds, it will be
reported.
Oh,
Michael: interesting.
Different P notation.
Yeah.
Nikolay: Yeah, yeah.
So it will be reported as progress.
So it reports latencies every 30 seconds, for
Michael: example.
Nikolay: Nice.
Or every 10 seconds if benchmark is very fast, or every minute
if it's longer, depends.
And from there, we already understand, oh, distribution is like
this, right?
So we can feel some distribution.
It reports by the way a standard deviation as well, but it's
still not enough.
There is another aspect here, not only about distribution of
np90, np95, It's also about the distribution in time, over time,
because there might be a spike of latency, right?
And then it's better.
True.
So, yeah, yeah.
But still, I think what I was trying to say, actually, there's
an idea to implement histograms or percentiles.
Maybe it's better to start in pgbench.
Yeah, not in pg_stat_statements first.
Michael: Yeah, interesting idea.
I hadn't thought, I get where you're coming from.
I think it'd be easier, right?
You don't worry so much about overhead.
Nikolay: Well, you worry about
overhead, but this is a tool which
used to make decisions and so on.
Well, pgbench also is used.
But anyway, maybe it's easier because
it's less risky.
The idea is it's not production.
So...
Michael: Are you thinking it's
more like, are you thinking educationally,
you could see the idea that this
is a useful thing and once people
start seeing it, like, oh, this
is really helpful, Why can't
I have it elsewhere?
So it kind of builds.
Nikolay: I think most of people
understand that this is helpful,
but it's just risky because pg_stat_statements
have overhead.
We know it.
And it's probably increasing because
more and more things are
being added.
And just again, pg_stat_statements is not about
production, usually, depends
of course.
And I think-
Michael: pgbench is not about-
Nikolay: It's not used in production,
it's a learning tool for
experiments.
Yeah, pg_stat_statements is for production,
so it's harder to bring something
heavy there.
pgbench, again, like optionally,
and see how it works, like
it's easier in my opinion.
And it's also, like, first of all,
it's needed there.
It's needed.
It's very useful to have it there,
it would be very useful.
Do you know if Sysbench is bringing
histograms?
Michael: No, I don't know.
Nikolay: I don't remember.
Yeah, it brings percentiles.
Michael: Yeah.
How many buckets or which?
I don't know.
Just report a couple.
Nikolay: It brings p95.
Yeah.
And I don't know, I don't see any
more.
I don't know.
Anyway, it's already better than
just standard deviation I guess,
right?
But I agree with your point if
we order by just like maybe just
2 times standard deviation.
Michael: Okay, I
did, yeah, in the blog
post I did consider rounding
just to 2 instead of using the
1 that's actually an approximation
Yeah exactly because at
that point I'm saying look it's
this is approximate already why
are you multiplying by such a
specific number but then what are
we approximating it's like
a different p value and I just
didn't really I thought it would
just be confusing as a kind of
to people like people reading
the Results rather than people
looking at the query.
I figured more people would probably
end up If you've got if
you put these into dashboards and
things, you're gonna have more
people reading the dashboards than
looking at the queries behind
the dashboards.
So I figured I'd rather have something
confusing in the query
than something confusing in the
like column header.
Nikolay: Makes sense, makes sense,
Yeah.
Michael: So yeah, quite strange
to use such a precise figure
to calculate an approximation.
Nikolay: You know what, I will
think about next time we will,
we will maybe do some reports or
dashboards, maybe we will implement
this approach.
We'll call it p??
Michael: I suggested calling
it approx_p99, just so that
it's really clearly, Like, you're
leading with the word approximate.
Nikolay: So it's Mysterious percentile.
Yes.
Michael: And even if you don't
include, like, it's probably confusing
to look at a dashboard that doesn't
include it in the results,
but still include all the other
columns you're going to include,
just ordering by that is then,
I think, a useful
Nikolay: ordering means in monitoring,
it means another, chart
you, you like exactly top end,
top end by what, and this is ordering
and, instead of top end by average
or in addition to top end
by average?
This would be helpful to bring
sense of how users feel about
your service, right?
Michael: And I really respect monitoring
tools and reports that
try and limit the number of pages
they or dashboards they show
but most I come across don't don't
seem to try and limit it too
much so I think another what's
another chart but if you are worried
about quite how many charts you
have, I might argue that I'd
rather have, well, my argument
is I'd rather have this than the
1 ordered by me, which all of,
pretty much all of them include.
So if you include that 1, I would
rather have this 1.
Nikolay: Yeah.
And Do you think it's worth keeping
a chart for average?
For pure average?
Michael: Again, it depends on the
constraints, right?
If you've already got a hundred
charts and you're showing it's
99 versus 100, why not include
the average still?
But if you're talking about only
having 2, then 2 versus 3 is
quite a big difference.
So I think it depends on the content.
Nikolay: And this chart is going
to be regular, like not non-stacked,
because you cannot summarize it,
like it's not cumulative thing,
averages, right?
Yeah.
Yeah, And probably it should include
only like 10 or something.
Michael: Well, it's not really
a chart, it's just like a top
end.
Nikolay: Ah, table.
Michael: Yeah, like a table.
Yeah, I think so.
Nikolay: Chart also matters, like
again, table is good, but it
loses historical information.
Michael: Yeah, okay, so maybe your
chart is, yeah, your most
important queries looking at their
p99 over time, but then you
can include the mean on the same
chart, can't you?
I've seen those in monitoring tools
where you're looking at the
same...
Nikolay: Interesting thing, for
the chart we can get average
for specific period just dealing
with total and calls.
We have delta for total, delta
for calls, we can have pure average
for specific period of time.
But what about standard deviation?
We don't have it.
Michael: You can calculate it using
deltas, but well, the same
way pg_stat_statements recalculates
it every time it has a new
query come in.
Nikolay: Oh, yeah.
Michael: So, Indeed.
Yeah, I was looking at the source
code, I was like, oh, that's
clever.
Nikolay: Yeah, yeah, yeah, yeah,
yeah, exactly.
Otherwise, yeah, exactly.
Michael: But you also have the
option of snapshotting pg_stat_statements.
Some people do that, right?
Reset snapshot, like some monitoring
tools or some providers
reset it regularly.
Nikolay: It's not a good idea.
Because to do it often you're going
to have troubles under load
because it doesn't come for free
to add new entries.
Michael: But then we have deltas.
Deltas is a good idea.
Nikolay: Yeah, yeah, yeah.
Well, and standardization can be
also understood for a specific
period of time.
And in this case, you can have
this metric with like magic, mysterious
percentile.
And you can have a chart of, for
example, 10, the top 10 queries.
Yeah, and maybe also average of
everything.
Michael: Well, it's a good point
of like there are a couple of
I don't even know if I mentioned
this in the blog post, but there's
at least 1 major caveat here is
a lot of people have a timeout,
often like 30 second, like statement_timeout.
It won't be registered.
So it does quite seriously affect
those metrics like the max,
the standard deviation and
Nikolay: Affects in which way?
It's not registered, so you miss
it there.
Michael: You miss it.
You miss an outlier.
So it's bringing the average down.
It's bringing the p19
Nikolay: looks more positive than
it is.
Yes.
But you know, like if a query was
cancelled, job is not done,
we don't count it.
As I remember, this is another
point, pg_stat_monitor aims to solve.
Yes.
Like, it's a common problem with
pg_stat_statements and all its
friends, like pg_stat_kcache, that
canceled queries are not counted.
And I think also a big missing
piece here, honestly, is that
in pg_stat_database we have TPS,
2 metrics committed rolled back,
right?
But we don't have QPS, we don't
have latencies, standard deviation,
it should be there, actually.
This is another idea to register,
like, these metrics, calls,
mean, max, mean, max, and min,
average, and standard deviation
at the database level for everything,
regardless of the statements.
And maybe involving error doubt
queries, Maybe.
This is I'm not sure because this
is like, if we involve them,
the values observed by producer
statements and at this high level,
they won't match at all.
Michael: Yeah.
I was unsure what ideal looks like
here.
I did wonder about, in an ideal
world, would you add the timeout?
Like, would you add a query to
30 seconds?
Let's say 1 got canceled at 30
seconds.
Would you add an instance of a
query that ran 30 seconds and
add that to the average, add that
to the standard deviation?
Would you want that or would you
not want that?
Nikolay: It's easy.
Let's just have another pg_stat_statements.
We'll call it pg_stat_error_statements
and have a whole set of
metrics.
I'm joking, of course.
Michael: Well, it's not a bad idea,
though, right?
Like how many of each error type
do we have?
Like it's
Nikolay: kind of cool.
And could it be canceled before
statement_timeout?
It can be canceled due to deadlock_timeout, for example, or just
something else happens, I don't know.
Kill.
Term, a pg_cancel_backend, for example, came, canceled us.
So yeah, and it would be great to count all those things like
buffer hits, reads,
Michael: and so on.
So yeah, that's the long and short of it.
Was there anything else
Nikolay: you wanted to talk about?
I feel we went to some very dark corner of Postgres observability,
you know?
Michael: Yeah, yeah, yeah.
It definitely seems less explored than I was expecting.
As your daughter said, this is not a novel, like this is not
an out there revelation.
I was just surprised When I saw the feature request for it, and
I've heard it a few times, I was thinking, why don't we add,
like, why don't more people have that in their queries?
Why don't I see this in blog posts or in monitoring tools?
Nikolay: And also for buffer operations.
And WAL operations.
Michael: Well, I know why we don't have that.
It's because the standard deviation is not even available there.
Nikolay: Well, it should be available.
Let's add it.
Michael: So, yeah, my summary is I'd love to see these in pg_stat_statements.
But in the meantime, like you mentioned, if you're doing a consulting
report or if you've got a dashboard internally consider adding
this like or replacing the ordering by mean.
Nikolay: Just don't call it fake p95 call it mysterious pn
Michael: fake news p99 yeah I liked your mysterious what was
it MMP
Nikolay: magic P
Michael: yeah
Nikolay: okay
Michael: yep absolutely
Nikolay: good