Postgres FM | Transcript: Mean vs p99

June 13, 2025 • 39 Minutes

Mean vs p99

Nikolay: Hello, hello, this is
Postgres.FM.

I'm Nik, Postgres.AI, and as usual
with me, Michael, pgMustard.

Hello, Michael.

Michael: Hello, Nik.

How's it going?

Nikolay: Everything is all right.

A lot of work.

There is a progress on many fronts.

So, yeah, excited to discuss a
quite important topic.

And maybe let's see what we can
learn from this discussion.

I don't know, like mutually learn
or something.

Michael: Yeah, I'm glad you think
it's interesting.

Nikolay: It's interesting because
it's always with us.

When we deal with performance,
we need to work with many numbers

and this topic pops up all the
time.

Michael: Yeah.

So specifically, the topic is when
we're looking at latencies

or monitoring or troubleshooting,
looking at latencies or you

know monitoring or troubleshooting,
maybe prioritization, we

often need to look at queries by
some measure of their duration.

Like often, and I see this in blog
posts, in tools, various places,

often ordered by the mean or the
average of their latencies.

Very easy to get from pg_stat_statements
for example, it has

a dedicated column, but in a lot
of other places, a lot of other,

you know, different types of monitoring
tools and I'd say more

on the application developer side.

I'm seeing increasingly people
moving to more monitoring the

p95 p99 type latencies, so percentile
based and kind of the tail

latency is not the average, so
wondering about that.

And yeah, I saw recently another
feature request to have the

ability to see these in pg_stat_statements
and I've realized it's

something we can approximate but
it's not something we can get

from that so I wrote a post about
it recently and I'm glad you

find interesting too.

Nikolay: Yeah.

So that's important topic, I think,
because we all work with

averages or mean values all the
time.

And, it's not only about positives,
right?

Mean value, meaning of average
is often misunderstood and it's

often misleading.

I remember the idea that in any
organization, if you check what

is the mean salary or average salary,
most people might find

out it's like in some organizations,
in some organizations, oh

wow, it's like this average salary
is much above my own salary,

what's happening here?

Well, what's happening, a couple
of dudes have huge salary, that's

why, right?

So it means that we like, when
we back to Postgres, back to queries,

you see average or mean time is
like 1 millisecond.

Oh, not bad, not bad, right?

But how many queries are much about that?

It might be a lot.

And I must notice that mean time, mean exact time, mean planned

time, these columns are redundant.

They could be avoided because we always can calculate them dividing

total by number of calls, right?

Or plans in case of planning time.

By the way, I read your article, I noticed you optimistically

involve planning metrics.

Yeah.

Unfortunately, 99% of cases, we don't see them because they are

off by default.

And I remember last year or previous year, it took some effort

for us to prove that basically you can turn them on, but unfortunately,

as we had episode about this, right, unfortunately, performance

cliff will be 2 times closer to you if you turn it on, right,

but again, like, default recommendation is to turn it on because

it's very valuable information.

So again, back to averages, we could always calculate them dynamically

or something, it's not a big deal.

But I guess it was just a matter of convenience to add them and

keep them already pre-calculated in pg_stat_statements and other

similar extensions and so on.

But would be really great to see percentiles, right?

Or better to see whole histogram analysis of what's happening.

But unfortunately, I guess it's a very complex topic.

I mean, I know pg_stat_monitor implements this, but we need to

check overhead.

Michael: That's the critical part.

And also, just it's more columns.

We've got already a lot of columns in pg_stat_statements.

So it has additional overhead both to tracking, but also in terms

of user experience.

And I mean, I think at this point we have so many columns that

why not have a few more?

But it is a problem.

Like, there are, I think, just in time compilation, we've got

a bunch of additional columns for every single part of that and

there's there has been Discussion about whether that should be

collapsed down because do we really need granular?

Reporting there But yeah, I I would be in favor of some histograms.

I also haven't yet seen any analysis of...

So pg_stat_monitor is an alternative to pg_stat_statements, relatively

new in the grand scheme of things, probably a few years old at

this point.

But I don't see it in many installations and I haven't yet seen

anybody publish benchmarks on its performance relative to, yeah.

Nikolay: It was created in Percona and I remember I discussed

it with Peter Zaitsev who was the founder of Percona when this

extension was created and they said instead of contributing to

pg_stat_statements, they wanted a separate, independent release

cycle.

And as we know, pg_stat_statements is a part of the set of so-called

contrib modules.

It means that they are, like, basically, this cycle, it's Postgres's

own release cycle, so once per year.

And if you want a few times per year to release something, it's

hard.

And then I know the original creator left Percona and I don't

know what's happening with development of this extension but

when it was created, our first look, we did, our team, I remember

we did some analysis and early versions looked promising but

challenging in terms of performance overhead.

And of course, if you talk about RDS and others, they don't have

it.

I saw it only maybe on 1 or 2 platforms.

So Yeah, and performance overhead is a big question.

That being said, I don't know the current performance overhead.

And it would be great to have histograms and so on, right?

So percentiles.

So yeah, and another thing is 2 aspects here additionally in

my approach.

First is that in many cases, I don't care about average at all,

because this is, as we also discussed several times, depending

on our goals, it might be relevant or not, like important or

not.

For example, if we target to optimize how much resource utilization

happens on this server, we care about total time and total values

because this is what exactly shows the picture because a query

might have very like tiny average, tiny standard deviation so

they all like similar all query executions are similar and planning

occurrences are similar but so many of them like 10,000 QPS,

queries per second.

This is going to load our server a lot and we need to optimize

this query in spite of good values in terms of average.

Total will be high.

So averages, as I feel it, it's mostly, you mostly need it when

you target to deliver good experience to your users.

Michael: So I think so.

Yeah, that you're completely right.

Obviously, if we're trying to reduce utilization, there's no

point looking at things by how slow they are on average, or even

at the extremes, at the p99 level, because if they don't run

that often, they're not things taking the most time.

So total, ordering by totals, whether that's time, I still like

to do time, or some combination of buffers, if for example your

resource that you're constrained by is I O, which could be the

case, or like if you're paying by I O, like if you're on Aurora

or something, ordering by something else could make sense.

But yeah, totals always make sense for those cases.

But yeah, this is for those other
cases where you care about

user experience, perhaps, like
how long a dashboard's taken to

load for a large customer or the
boss has complained or that

kind of thing.

Or we've talked a few times about
the issues with long running

queries.

So you might be investigating what
queries do I even have that

run long sometimes?

So I think that's the other use
case I thought of as like, why

might you want to order by some
version of either average or

percentile?

Nikolay: Yeah, let me argue with
myself.

My favorite thing.

So yeah, when we optimize for resource
consumption, lower resource

consumption, we look at totals,
but at the second step, we should

look at average, because we need
to decide Average and frequency,

which multiplied bring us to total,
right?

So we need to go down and see is
it about too frequent query

or is it about not very frequent
but very slow query and decide

which optimization methods to apply,
right?

And in this case, I'm wondering,
does it really matter to understand

that, like, to look at standard
deviation, basically, right?

To think about percentiles?

Michael: I think you're conflating
2 things here, looking at

something versus ordering by it.

And yes, when you're looking at
resource consumption, you want

to look at average time sometimes,
but you don't order by it.

Crucially, you need to order by
total time.

And the same is true for standard
deviation, I think.

I'm talking about adding maybe
1, maybe 2, depending on exactly

what percentile you want, adding
a couple of standard deviations

to the mean to order by it.

Not necessarily to look at it,
but just to get the initial prioritization

order, which things should I start
looking at, what are my top

10, because the order will be different,
assuming you have some

variants and you know, distribution
of your queries and some

of them have a longer tail than
others, the order will be different

if you order by mean plus a couple
of standard deviations than

it would be if you just order by
mean.

So it's about the prioritization
order.

And then you care about looking
at a lot of the other columns,

but you don't order by them.

So I think that's a subtle difference.

Nikolay: Yeah, yeah, yeah, I agree.

Order by standard deviation, it's
interesting.

Michael: I'm joking.

I'm joking.

Yeah.

Nikolay: Yeah.

I understand you are thinking about
ordering, for example, average

plus some factor times standard
deviation.

This is a smart idea, I think.

And before we move to that idea,
which is explained in your blog

post.

Michael: Yeah.

Nikolay: So, so again, like, okay,
we order by total time, and

then we, we look to decide which
approach to use for optimization.

And still maybe there it also makes
some sense to look, to have

percentiles for example, because
we would understand, oh, this

query has this average, which is
not that bad, but for example,

we don't have percentiles in pg_stat_statements, no histogram.

So, but we see, Oh, standard deviation
is so huge, it means that

some occurrences are okay, some
are not at all.

Right?

And this brings us to the idea,
oh, it means that we have instability

here, right?

Depending on maybe data volumes,
or maybe we were blocked by

someone.

By the way, I would rather prefer
to have this information for,

you know what, different metrics
and time, right?

Yeah.

But for hits and reads, this is
where I would really love to

see that piece, standard deviation
or even better, percentiles.

Michael: Oh my goodness.

I'm just imagining the number of
columns at that point.

Nikolay: Yeah, because it always,
in pg_stat_statements approach, it

always concerns me how much of
those slowness was associated

with query being blocked by another
query, just wait time.

We don't know and it's hidden here.

Michael: Yeah, I mean, it's a good
point that these aggregates

are always going to hide information.

You know, it's the nature of aggregations
is we don't get all

the information.

It's the trade-off, right?

We get some useful information
on aggregate, but we don't get

all of the information.

I quite often see people include
min, mean, and max in their

queries, their querying pg_stat_statements form.

So you get kind of the full distribution.

I could imagine adding a couple,
you know, like either side of

the mean to say what's the mean
minus the standard deviation,

what's the mean plus the standard
deviation.

Nikolay: Yeah, without standard
deviation, mean and max, they

can be some, like you have million
calls, but just 1 of them

was so far and like, okay, it broke
your whole picture, affected

your whole picture, while standard
deviation brings some good

sense, right?

Michael: Yeah, I mean, I guess
min is possibly more interesting

than max, but yeah of course sometimes
there'll be 1 parameter

that causes the query to run very
differently.

Nikolay: 0 rows returned, we have
min close to 0, it doesn't

make any sense, right?

Michael: Good point, yeah.

But by including more you get a
better picture of the distribution

you get more points on the on the
graph to get an idea of quite

how is this distributed is that
max and outlier or are there

quite a few queries running nearly
that long, you know, so without

some measure close to p95, p99
that kind of thing, I think it's

really hard to know that and I
think possibly what's happening

at the moment is quite a few of
these dashboards, monitoring

tools, diagnosis tools, I'll give
you an order when you look

at it by mean that's fine, it's
useful, but could be more useful

and you could maybe start at the
more problematic queries if

you ordered them by a higher percentile
and then started at the

top from there.

Nikolay: Yeah.

Because it's not it's like we say,
oh, on average, we deliver

everything like 10 milliseconds,
everything is good.

Or 1 millisecond again, like for
a queries, we want to be below

10 milliseconds ideally, definitely
not 100 milliseconds, it's

too much.

So we say it's good, but what if
almost 50% of that was 1 millisecond,

and almost 50% of that was how
many, like, not 1 millisecond.

Michael: What did you say the average
was?

Nikolay: My math is going to be
really bad.

I was trying to say you can, you
might have 2 segments or clusters

of cases.

1 is very close to 0 and other
is very bad.

And

Michael: they call it like a bimodal
bimodal distribution.

Yeah.

Because perhaps because of a plan
like different plan for different

amounts of data, maybe like

Nikolay: parameters, maybe, or
maybe this locking issues.

And then you observe something
in the middle.

Well, maybe, okay, maybe 50-50
is a bad example, because we are

going to mean the middle.

What if, what if like 70%, very
close to 0, 30% is very far from

0, and they really struggle with
performance, those users, right?

And we see average is not bad it's
going to be shifted towards

to 0 because it's average not and
oh no it's not going to look

okay depends right So but but average
will tell us it's like,

it's not that bad.

We can leave it.

Michael: It's very, yeah, it's
very easy to imagine distributions

that on average are less than some
threshold that we might not

care about, like 50 milliseconds
or something.

Maybe we set our threshold there.

On average, they're 40, 45.

But for a non-zero, you know, like
a decent number, maybe if

it's p95, p99, and we're talking
about either 5% or 1% of times,

it's running for hundreds of milliseconds.

And we've talked in the

Nikolay: past about how...

Michael: Or even seconds, yeah,
exactly.

So those are noticeable to human
durations, even though on average

it's running in 45 milliseconds.

So it's very easy, I think, to
hide things that on average look

like they maybe aren't perceptible
to humans but if the tail

is long enough then for quite a
lot of people especially you

know let's say it's 5% of people
that's 1 in 20 times somebody's

loading that dashboard or using
that feature.

1 in 20 sounds quite often.

Nikolay: Yeah, yeah.

Yeah, if it's 1 in a thousand,
well, things

happen, right?

Yeah, we can say, okay,
we have a very large user base,

some of them, like, we need to
take care of everyone, but we

cannot, like, it happens, right?

But this approach works only if
your p99 is good.

Michael: Yes.

Nikolay: Okay, 1% is not that good,
okay, but if it happens with

20% of your users, they suffer,
and you missed this fact just

looking at the average, it's not
good, it's not good.

Michael: Yeah, exactly.

Well, imagine, go back to our example.

If you've got 2 queries, most,
they look the same in terms of

average, let's say one's 45 milliseconds
on average; the other's

50 milliseconds on average, but
the one that's 45 has a p99 that's

10 times that 450 milliseconds,
and the one that's 50 milliseconds

has a p99 of only 60 milliseconds,
so only a little bit higher.

I would much, much, much rather
start, have it ranked higher,

the one that had the higher p99 in
that case, even though it has

a lower average duration.

So that's kind of the whole point.

Nikolay: Yeah, let's explain the
idea you had.

The idea is simple.

We don't have buckets, we don't
have percentiles, we don't have

histogram, we cannot draw it, but
we have standard deviation.

And the idea is how I learn, how
I hear it or read it.

In all analysis we have over the
last years with pg_stat_statements

and other things, pg_stat_kcache,
pg_wait_sampling maybe and so

on, we always look at throughput
and average, throughput and

average latency, that's it.

Yeah, we know that from, you mentioned,
the article mentioned

that from application side, some
monitoring systems, they bring

you p99, p99.5, it's great, or 2
values even better than just

1 here.

Yeah.

And sometimes colorful, good looking
graphs.

But in Postgres we don't do it
somehow.

But we have standard deviation
and in which Postgres monitoring

standard deviation is really appreciated?

I don't know.

Michael: I don't know any, yeah.

Nikolay: We don't have p99 or p95,
p90, nothing like that.

But what if we assume we like distribution,
well, your assumption,

it's normal, it's not normal, like
half normal.

But idea is let's take average
and standard deviation and have

some multipliers, so we have formulas,
average plus some multiplier

times standard deviation, and we
have a kind of p something,

p 90 something, kind of, right?

Michael: Yeah, I called it approximate,
just only because it

really does assume a normal distribution,
like standard deviation

only makes sense in a, like, not
that it only makes sense, you

can only approximate if you assume
a normal distribution and

I made the case that I think it
would be quite rare for you to

have queries, or at least problematic
queries that are normally

distributed.

I suspect a lot of the problematic
ones are going to have long

tails but it definitely isn't a
safe assumption that they're

all going to be normally distributed,
but once you factor that

in, I think this is still useful.

I think queries that have a low
average and a high standard deviation,

where this ordering difference,
where this change makes a difference,

are probably going to have, well,
are definitely going to have

high p99s as well.

It's still directionally correct,
I think I used that phrase,

so it still points you probably
in the right direction.

This order is probably not going
to be that different to the

order by true p99.

So it's a step in the right direction,
it's not necessarily accurate,

but it's directionally correct.

Nikolay: S.
Yeah, I agree.

And since I suck at statistics,
as every listener already should

understand by this point.

I talked to my daughter who is
learning data science at UCSD

right now.

And she told me, oh, this is quite
known, like p95 or something

for normal distribution we just
standard deviation times 2 to

both sides this is how you can
get p90 I should say or p95 I

forgot like yeah so this is reasonable
she said but then like

of course it's not normal distribution,
and it's a big assumption.

And what I thought, idea number
1 I had, and maybe I will encourage

some, I don't know, some students
or some people who are willing

to do something here.

What if we had some raw data from
somewhere?

I don't know who can afford logging
all queries with duration.

Since we talked last week, we talked
how dangerous it can be

to log all queries.

But imagine we collected some raw
data and we see what's present

in persistent statements.

And what I'm trying to say, we
could find, we could understand

distribution nature, maybe observing
many, many cases from OLTP,

like web apps, mobile apps.

And then we could probably find
some heuristic based multiplier,

right?

Which would serve like closer to
reality, I don't know, like

to avoid this assumption that it's
normal, but it's a big work.

I know like it's, you need to collect
a lot of data process it

and so on.

What do you think?

Michael: Well, and it would be
different for different queries

like unless we find out that it's
just I think it's a tricky

problem unless you actually approximate
it in while you're monitoring.

Nikolay: Look, I understand what
you said.

So if we take assumption, we like,
it's kind of normal.

It sounds for me like fragile assumption.

If we learn how some OLTP system,
social media or something,

with all these queries, like select
2 users, select 2 something,

join some tables, and so on, how
these queries behave, at least,

I don't know, a few dozens of them,
how they behave.

This like practice learned multiplier,
I'm pretty sure it will

be in another system it will be
very close, you know?

Michael: Yeah, actually it's a
good point.

You could get more accurate with
the number, like the exact multiple

to get a more likely to be p99
measure.

It might turn out that the number
I've suggested based on normal

distribution is actually more likely
to be a p96 than a p99.

Or p97 instead of p99.

Does it make that much difference?

No, no.

Nikolay: If it's just order by,
no.

I agree with you.

And what I'm proposing breaks Pareto
principle, right?

Michael: Yeah, It's a lot
of work for arguably not that

much additional, like, I think
if someone's going to put a lot

of effort into this, it would be
much better to actually track

this and put it into P statements.

Nikolay: This will be, again, overhead
question.

It would be great to have, but
again, I understand.

What I'm proposing is going to
be outside of p80 of the effort

we're going to have.

Or p20.

I like it.

p20, p20.

I don't know.

p80 of the value, p20 of...

Okay, so another thing, another
thought I have.

I noticed over time that when I run some benchmarks, very often

with pgbench, I'm absolutely unsatisfied with...

So I use, we have it in automation as well, like all my team

members use it.

So I have lowercase r to report results in the end for each statement

used in transactions, right?

Sometimes we use multiple files with different transactions and

like with some weights assigned to them.

It can be achieved with option F and add sign.

It's great feature, by the way, underappreciated.

But anyway, I see these reports and we discuss them and like,

Oh, this looks great.

Oh, it looks not that great, but I realized it's not enough.

I want histograms.

Right.

Michael: Yeah.

Nikolay: And to mitigate it, at least somehow, we started to

use capital P option.

For example, P30, it means like every 30 seconds, it will be

reported.
Oh,

Michael: interesting.

Different P notation.

Yeah.

Nikolay: Yeah, yeah.

So it will be reported as progress.

So it reports latencies every 30 seconds, for

Michael: example.

Nikolay: Nice.
Or every 10 seconds if benchmark is very fast, or every minute

if it's longer, depends.

And from there, we already understand, oh, distribution is like

this, right?

So we can feel some distribution.

It reports by the way a standard deviation as well, but it's

still not enough.

There is another aspect here, not only about distribution of

np90, np95, It's also about the distribution in time, over time,

because there might be a spike of latency, right?

And then it's better.

True.

So, yeah, yeah.

But still, I think what I was trying to say, actually, there's

an idea to implement histograms or percentiles.

Maybe it's better to start in pgbench.

Yeah, not in pg_stat_statements first.

Michael: Yeah, interesting idea.

I hadn't thought, I get where you're coming from.

I think it'd be easier, right?

You don't worry so much about overhead.

Nikolay: Well, you worry about
overhead, but this is a tool which

used to make decisions and so on.

Well, pgbench also is used.

But anyway, maybe it's easier because
it's less risky.

The idea is it's not production.

So...

Michael: Are you thinking it's
more like, are you thinking educationally,

you could see the idea that this
is a useful thing and once people

start seeing it, like, oh, this
is really helpful, Why can't

I have it elsewhere?

So it kind of builds.

Nikolay: I think most of people
understand that this is helpful,

but it's just risky because pg_stat_statements
have overhead.

We know it.

And it's probably increasing because
more and more things are

being added.

And just again, pg_stat_statements is not about
production, usually, depends

of course.

And I think-

Michael: pgbench is not about-

Nikolay: It's not used in production,
it's a learning tool for

experiments.

Yeah, pg_stat_statements is for production,
so it's harder to bring something

heavy there.

pgbench, again, like optionally,
and see how it works, like

it's easier in my opinion.

And it's also, like, first of all,
it's needed there.

It's needed.

It's very useful to have it there,
it would be very useful.

Do you know if Sysbench is bringing
histograms?

Michael: No, I don't know.

Nikolay: I don't remember.

Yeah, it brings percentiles.

Michael: Yeah.

How many buckets or which?

I don't know.

Just report a couple.

Nikolay: It brings p95.

Yeah.

And I don't know, I don't see any
more.

I don't know.

Anyway, it's already better than
just standard deviation I guess,

right?

But I agree with your point if
we order by just like maybe just

2 times standard deviation.

Michael: Okay, I

did, yeah, in the blog
post I did consider rounding

just to 2 instead of using the
1 that's actually an approximation

Yeah exactly because at
that point I'm saying look it's

this is approximate already why
are you multiplying by such a

specific number but then what are
we approximating it's like

a different p value and I just
didn't really I thought it would

just be confusing as a kind of
to people like people reading

the Results rather than people
looking at the query.

I figured more people would probably
end up If you've got if

you put these into dashboards and
things, you're gonna have more

people reading the dashboards than
looking at the queries behind

the dashboards.

So I figured I'd rather have something
confusing in the query

than something confusing in the
like column header.

Nikolay: Makes sense, makes sense,
Yeah.

Michael: So yeah, quite strange
to use such a precise figure

to calculate an approximation.

Nikolay: You know what, I will
think about next time we will,

we will maybe do some reports or
dashboards, maybe we will implement

this approach.

We'll call it p??

Michael: I suggested calling
it approx_p99, just so that

it's really clearly, Like, you're
leading with the word approximate.

Nikolay: So it's Mysterious percentile.

Yes.

Michael: And even if you don't
include, like, it's probably confusing

to look at a dashboard that doesn't
include it in the results,

but still include all the other
columns you're going to include,

just ordering by that is then,
I think, a useful

Nikolay: ordering means in monitoring,
it means another, chart

you, you like exactly top end,
top end by what, and this is ordering

and, instead of top end by average
or in addition to top end

by average?

This would be helpful to bring
sense of how users feel about

your service, right?

Michael: And I really respect monitoring
tools and reports that

try and limit the number of pages
they or dashboards they show

but most I come across don't don't
seem to try and limit it too

much so I think another what's
another chart but if you are worried

about quite how many charts you
have, I might argue that I'd

rather have, well, my argument
is I'd rather have this than the

1 ordered by me, which all of,
pretty much all of them include.

So if you include that 1, I would
rather have this 1.

Nikolay: Yeah.

And Do you think it's worth keeping
a chart for average?

For pure average?

Michael: Again, it depends on the
constraints, right?

If you've already got a hundred
charts and you're showing it's

99 versus 100, why not include
the average still?

But if you're talking about only
having 2, then 2 versus 3 is

quite a big difference.

So I think it depends on the content.

Nikolay: And this chart is going
to be regular, like not non-stacked,

because you cannot summarize it,
like it's not cumulative thing,

averages, right?

Yeah.

Yeah, And probably it should include
only like 10 or something.

Michael: Well, it's not really
a chart, it's just like a top

end.

Nikolay: Ah, table.

Michael: Yeah, like a table.

Yeah, I think so.

Nikolay: Chart also matters, like
again, table is good, but it

loses historical information.

Michael: Yeah, okay, so maybe your
chart is, yeah, your most

important queries looking at their
p99 over time, but then you

can include the mean on the same
chart, can't you?

I've seen those in monitoring tools
where you're looking at the

same...

Nikolay: Interesting thing, for
the chart we can get average

for specific period just dealing
with total and calls.

We have delta for total, delta
for calls, we can have pure average

for specific period of time.

But what about standard deviation?

We don't have it.

Michael: You can calculate it using
deltas, but well, the same

way pg_stat_statements recalculates
it every time it has a new

query come in.

Nikolay: Oh, yeah.

Michael: So, Indeed.

Yeah, I was looking at the source
code, I was like, oh, that's

clever.

Nikolay: Yeah, yeah, yeah, yeah,
yeah, exactly.

Otherwise, yeah, exactly.

Michael: But you also have the
option of snapshotting pg_stat_statements.

Some people do that, right?

Reset snapshot, like some monitoring
tools or some providers

reset it regularly.

Nikolay: It's not a good idea.

Because to do it often you're going
to have troubles under load

because it doesn't come for free
to add new entries.

Michael: But then we have deltas.

Deltas is a good idea.

Nikolay: Yeah, yeah, yeah.

Well, and standardization can be
also understood for a specific

period of time.

And in this case, you can have
this metric with like magic, mysterious

percentile.

And you can have a chart of, for
example, 10, the top 10 queries.

Yeah, and maybe also average of
everything.

Michael: Well, it's a good point
of like there are a couple of

I don't even know if I mentioned
this in the blog post, but there's

at least 1 major caveat here is
a lot of people have a timeout,

often like 30 second, like statement_timeout.

It won't be registered.

So it does quite seriously affect
those metrics like the max,

the standard deviation and

Nikolay: Affects in which way?

It's not registered, so you miss
it there.

Michael: You miss it.

You miss an outlier.

So it's bringing the average down.

It's bringing the p19

Nikolay: looks more positive than
it is.

Yes.

But you know, like if a query was
cancelled, job is not done,

we don't count it.

As I remember, this is another
point, pg_stat_monitor aims to solve.

Yes.

Like, it's a common problem with
pg_stat_statements and all its

friends, like pg_stat_kcache, that
canceled queries are not counted.

And I think also a big missing
piece here, honestly, is that

in pg_stat_database we have TPS,
2 metrics committed rolled back,

right?

But we don't have QPS, we don't
have latencies, standard deviation,

it should be there, actually.

This is another idea to register,
like, these metrics, calls,

mean, max, mean, max, and min,
average, and standard deviation

at the database level for everything,
regardless of the statements.

And maybe involving error doubt
queries, Maybe.

This is I'm not sure because this
is like, if we involve them,

the values observed by producer
statements and at this high level,

they won't match at all.

Michael: Yeah.

I was unsure what ideal looks like
here.

I did wonder about, in an ideal
world, would you add the timeout?

Like, would you add a query to
30 seconds?

Let's say 1 got canceled at 30
seconds.

Would you add an instance of a
query that ran 30 seconds and

add that to the average, add that
to the standard deviation?

Would you want that or would you
not want that?

Nikolay: It's easy.

Let's just have another pg_stat_statements.

We'll call it pg_stat_error_statements
and have a whole set of

metrics.

I'm joking, of course.

Michael: Well, it's not a bad idea,
though, right?

Like how many of each error type
do we have?

Like it's

Nikolay: kind of cool.

And could it be canceled before
statement_timeout?

It can be canceled due to deadlock_timeout, for example, or just

something else happens, I don't know.

Kill.

Term, a pg_cancel_backend, for example, came, canceled us.

So yeah, and it would be great to count all those things like

buffer hits, reads,

Michael: and so on.

So yeah, that's the long and short of it.

Was there anything else

Nikolay: you wanted to talk about?

I feel we went to some very dark corner of Postgres observability,

you know?

Michael: Yeah, yeah, yeah.

It definitely seems less explored than I was expecting.

As your daughter said, this is not a novel, like this is not

an out there revelation.

I was just surprised When I saw the feature request for it, and

I've heard it a few times, I was thinking, why don't we add,

like, why don't more people have that in their queries?

Why don't I see this in blog posts or in monitoring tools?

Nikolay: And also for buffer operations.

And WAL operations.

Michael: Well, I know why we don't have that.

It's because the standard deviation is not even available there.

Nikolay: Well, it should be available.

Let's add it.

Michael: So, yeah, my summary is I'd love to see these in pg_stat_statements.

But in the meantime, like you mentioned, if you're doing a consulting

report or if you've got a dashboard internally consider adding

this like or replacing the ordering by mean.

Nikolay: Just don't call it fake p95 call it mysterious pn

Michael: fake news p99 yeah I liked your mysterious what was

it MMP

Nikolay: magic P

Michael: yeah

Nikolay: okay

Michael: yep absolutely

Nikolay: good

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Mean vs p99

Creators and Guests

Some kind things our listeners have said