Postgres FM | Transcript: Planning time

September 20, 2024 • 42 Minutes

Planning time

Michael: Hello and welcome to Postgres.FM,
a weekly show about

all things PostgreSQL.

I am Michael, founder of pgMustard,
and this is my co-host Nikolay,

founder of Postgres.AI.

Hey Nikolay, what are we going
to be talking about today?

Nikolay: Hi Michael, let's talk
about planning time and what

can increase it, potentially.

Michael: Nice, And this was your
suggestion.

Why is it something you're thinking
about at the

moment?

Nikolay: Well, it's we are making circles
around this topic all the time

because basically almost all of
our customers, which are mostly

fast-growing startups, they reach
the point where partitioning

is needed, they reach the point
where LockManager contention

happens, number of indexes becomes
concerned, and also many of

them still don't use prepared statements
mostly because PgBouncer

didn't support it until
very recently, right?

This means that planning time being
basically unnoticed for quite

long suddenly becomes an issue.

And we often show them, oh, it
looks like planning time is an

issue in these cases.

And actually, especially partitioning
thing is like, there's

always a question, especially for
time series.

It's always a question how big
partitions should be, which can

be translated to how many partitions
you will have.

And if you partition not well,
you may end up having many thousands

of partitions.

They will be very small.

But if you don't have prepared
statements, you can end up having

a huge problem with planning time.

And planning time can exceed execution
time for simple primary

key lookups, right?

This is not fun.

Michael: Yeah, it's not the only
case of that, right?

But it definitely makes sense that
it's a common one.

Nikolay: Yeah, Basically, last
week we discussed with a couple

of companies how to properly implement
partitioning.

And this question was in the center,
what size of partitions

to choose?

And the rule of thumb is like,
don't let them be larger than

100 gigabytes, right?

Because this is the golden rule
of partitioning.

It's time to apply partitioning
when you have risks already table

exceeded 100 gigabytes.

But it also can be applied to partitions.

Also, they should be balanced in
terms of size.

This is usually a good idea to
do.

But on the other hand, if you make
them tiny, you end up having

a lot of them and it's not good.

Why is it not good?

Because of planning time, mostly.

This is not the only reason why
you don't want a lot of partitions,

but this is 1 of the biggest reasons.

If you have, for example, 10,000
partitions, you might have

dozens of milliseconds or like
maybe a second of planning time

for each query.

If prepared statements are not used,
every time a SQL query is

executed planning is happening
in Postgres, right?

Michael: Yeah, and you're talking
about the best case.

Planning time can expand depending
on the complexity of the query.

You're talking about for the simplest
possible query.

Nikolay: Very simple.

Bit real index scan.

Michael: Yeah.

Is it worth us explaining what
are we talking about?

What is planning time?

Nikolay: Yeah, let's do it.

So when query is executed, there
are stages, right?

Do you remember stages?

Planning, it means, well, The planner
is based on statistics

and considers...

The goal of the planning is to
choose the best plan as fast as

possible.

It might be wrong.

Very often it's wrong, actually.

Not very.

It happens.

For me, it's every day I deal with
such cases because I see usually

problems, not like ideal situations.

This is when my team and I are
involved when something is wrong.

So it needs to choose as fast as
possible the kind of optimal

plan.

And then when a plan is chosen,
execution happens.

Executor executes according to
the plan.

And there are like many, many,
many possible plans.

Many possible plans.

Sometimes I think infinite number
of possible plans.

Right?

Michael: Well, it depends, right?

Like If you're doing a single table
select and there's no indexes,

there maybe is only 1 possible
plan.

But once you include...

Nikolay: What about prioritization?

There's a couple of possible plans,
right?

Michael: Yeah, but all I mean is
as the...

It gets important, like, for example,
when you start joining

multiple tables and you start to
realize that you have multiple

scan choices, multiple join orders,
multiple join types that

you could...

Nikolay: Multiple indexes are present.

Planner needs to choose them properly.

So, yeah.

Michael: And these things are combinatorial.

These things multiply to create
the number of total possible

plans.

And it quickly, like as the number
of joins, for example, increases,

that can explode exponentially.

So there are some things in Postgres
too.

Nikolay: There's also a genetic
optimizer, right?

Michael: Yes.

So because it can...

Well, I guess the simple version
is that the planner isn't always

looking for the best plan in those
cases.

It's sometimes looking for a good
enough plan in a reasonable

time frame.

Because if we're taking way longer
to plan the query, if we take

multiple seconds to plan a query
that's probably going to execute

in a few milliseconds, it's probably
a complete waste of time.

So there's a balancing act here
for complex queries.

But for simple queries, it doesn't
make sense to be paying dozens

of milliseconds for a 0.2 millisecond
primary key lookup.

Nikolay: Right, So the process
of finding ideal plan is called

optimization, plan optimization.

And the planner deals with basically
several things.

First of all, decisions are of
course heavily depend on the version

of Postgres you use because all
the time improvements are being

made.

Second of all, it uses statistics
currently present in Postgres.

That's why it's important to keep
autovacuum tuned so it maintains

the statistics.

And this is not only pg_statistic,
it's also estimated number

of tuples in Postgres rel pages,
so it's number of pages the

table or index have.

This is super important.

And finally, it also takes into
account the settings Postgres

currently has.

An interesting point, as I also
mentioned many, many times, those

who listen to us not recently,
they know it, right?

It's Interesting that the planner
has no idea about the real

physical capabilities of your machine.

It doesn't know what type of processor
you have, how many cores,

how fast disks are, or how many
gigabytes of RAM you provided.

It has no idea.

It relies on the settings.

First of all, costs.

So, random page cost, seq page
cost, and so on, like CPU, tuple

cost, and so on.

And also, settings like work mem
and some estimate of available

memory called effective cache size.

And you can put any number there
even if you don't have those

gigabytes of RAM.

This is bad because if you forgot
to tune your server, Postgres

might not know that you have a
lot of RAM, like a terabyte of

RAM, but Postgres still thinks
you have only 1 gigabyte, for

example.

It's easy to be in this situation
and it's bad.

But at the same time, it's good
for us who deal with a lot of

experiments, because it means that
we can fool the Postgres and

say, oh, you know, we have the
same number of gigabytes as production

has.

And this helps us to reproduce
the problem in lower environments

when we have a much weaker machine.

This is great for experimentation.

So good and bad.

So that's it.

This is everything that matters
when optimization happens, when

the planner chooses the plan the
executor will be using.

Michael: Yeah, anybody that's familiar
with Explain, I've had

success describing planning time
as the time it takes Postgres

to come up with that explain plan.

That's what it's doing.

And you can even see this.

By default, explain on its own
doesn't show planning time.

But if you do explain summary,
So explain with the parameter

summary, it will show you planning
time as part of that.

So it's quite a cool way of thinking.

Yes, exactly.

At the bottom.

Nikolay: Or if you ask executor
to be involved and you run explain

analyze, then summary is being
turned on automatically, right?

Michael: Yeah.

And you'll see the planning time

and the execution time then.

And the planning time is roughly
the time it took to run Explain

on its own, and the execution time,
in addition to the planning

time, is the time taken roughly
to run explain analyze.

So that's quite a cool way of thinking
about the difference.

Nikolay: Yeah.

And the worth mentioning, again,
those who listen to us for long

know it very well, the word buffers.

Since some version, like several
years ago, it was implemented,

I think Postgres 13 maybe.

You can use buffers, expand buffers
even without analyze, only

planning phase, and you can see
how many buffers were involved

into planning.

And this is super cool because
usually bad timing is because

of a lot of IO, which is buffers,
right?

IO is number 1 reason why Postgres
can be slow in anything.

Michael: Yeah.

And you might think like, what's
Postgres reading during planning

time?

Nikolay: Yeah.

Yeah.

Actually, you know what, I explained
some picture, which is like

maybe true in 99%, but I now remember
a case where something

else was happening.

It was some interesting query.

I don't remember how many tables,
but definitely it was a join,

maybe a couple of joins.

And I remember planning time was
completely out of normal.

It was like half a minute or so.

Michael: I think I remember this
case.

Was it the merge-join?

Nikolay: Exactly.

Yeah.
It was not merge-join.

Merge-join was not considered the
best plan, it was not chosen,

but it was the reason why it was
so slow.

Considering merge-join path, the
planner used not only what I

just described, but additional
information.

It tried to find the minimum or
maximum values for some subsets

of data, edges, real values from
table.

So it needed to, during planning
time, it needed real values.

And this, due to some reason, was
super slow.

So planning became many, many,
many seconds exceeding statement

timeout and failing.

It was extremely hard to diagnose,
actually.

And I actually...

Michael: Do you remember why?

Nikolay: Yeah.

Why what?

Michael: Why it was hard to diagnose?

Nikolay: Because the explain doesn't
show other options.

It shows only the winner.

Michael: Well, and also extra reasons
it was difficult.

I remember talking to you at the
time, auto_explain doesn't show

planning time, and therefore, when
you're trying to diagnose...

It's not

Nikolay: a problem for us because,
well, we didn't use it at

that time for this project.

We didn't use auto_explain at all,
but we had DBLab Engine installed

and I knew when these problems
started happening, so I quickly

rewinded, created a snapshot and
a clone.

So yeah, I worked on the clone
and reproduced the problem easily.

Interesting that like 1 hour later
it was different.

And statistics changed somehow
and planner stopped considering

merge path, merge join path.

Yeah, so it auto healed, but it
was super scary what happened

1 hour ago, right?

But thanks to DBLab Engine, we
easily reproduced it.

So I agree with you, auto_explain,
not showing planning time

at that time.

It was an issue, but not in this
project particularly.

What was an issue is you reproduced
it, you see super slow planning

time, but you have no idea what's
happening.

And you try to understand if we
had a way to see second best,

third best plans.

In this case, we probably, yeah,
second best probably would be

merge join, it lost eventually,
but during consideration of it,

it was like Postgres planners spent
a lot of time considering it

because it needed real values.

I think what helped in that case,
actually, is a friend's advice,

because Kukushkin first said, have
you tried setEnableMergeJoin

to off?

This is weird.

But once I did it, planning time
went to almost 0.

Like, boom.

It was interesting case.

And of course, when you try to
dig deeper, you can use like Perf

or BPF and understand what's happening
under the hood and in

which functions Postgres spends
time.

And you quickly find its merge-join
path.

And then read the code, because
it's open source, this is cool.

Read the code and understand that
Postgres, analyzing merge-join

sometimes needs real values from
table.

Wow.

It was exotic, honestly.

Michael: Yeah, that is exotic,
but it's also very helpful in

understanding how the optimizer
is working.

Because it isn't exhaustively calculating
every plan that is

possible.

It can give up on plans once it
gets to a sufficiently high cost.

So Because the way enable_mergejoin
off is working, it's penalizing

merge join.

It's not saying you can't ever
have a merge join.

It penalizes it cost-wise, at which
point the planner gives up

on that.

Nikolay: Yeah, it discards very
quickly because cost is already

super high.

Michael: But I don't think that's
how everybody thinks of how

the planner works.

They think it's going all the way
to the end of each option.

But it can abort early.

And this is 1 of those trade-offs
it's making.

It will sometimes make a mistake
because it seems like it's gonna

be an expensive plan, but then
it isn't.

So it isn't always looking for,
in my opinion, the fastest plan,

it's looking for a fast enough
plan.

Like that's quite like, well, maybe
I'm wrong in that.

Like it is looking for the fastest,
but it uses these shortcuts

to avoid spending excessive time
to do so.

Nikolay: Yeah, yeah, exactly.

As usual, I like exotic and hard
problems, but let's maybe switch

to simple things.

Michael: Yeah, good idea.

Nikolay: Yeah, you sent me a very
good post by depesz, Hubert

Lubaczewski, depesz, right?

And it was 3 years ago.

He was curious about, is it really
so that Postgres doesn't allow

more than 65,000 partitions?

And it turned out it's not so.

He reached 100,000 partitions.

But at the same time, Hubert checked
the planning time.

How exactly the planning time depends
on the number of partitions.

And this is back to

Michael: the initial question,
the initial reason you were looking

into planning time.

Nikolay: Exactly.
Yeah.

We don't want a lot of partitions,
many thousands of them, because

planning time can be already high
— dozens of milliseconds, maybe

hundreds of milliseconds, and so
on.

Basically, the growth is linear.

If you double the number of partitions,
you double planning time.

Likely.

So of course it depends on the
query you use, I guess.

But the simple rule is just if
you have a lot of partitions,

you have bad planning time.

It means we need to be...

I also think Postgres versions
matter a lot here, many optimizations

were made, right?

Michael: Well, I'm aware of 1.

So, depesz's test, he says in the
comments that he tested on

version 14, which is important
because there's a commit in version

15 a couple of years ago from David
Rowley that says that in

the release notes, it said, improve
planning time for queries

referencing partition tables.

This change helps when only a few
of many partitions are relevant,

which is the case here where we're
doing a primary key lookup.

Like, if we're just searching for
a single row in a single partition,

all of the 100,000 partitions,
only 1 of them is relevant to

this query.
We can prune that at planning time.

In those cases, planning time should
be much reduced as of version

15 onwards.

Nikolay: Well, you know, I used
different table structure and

queries, but it also was primary
key lookup.

But basically, I reproduced the
same experiment using our AI

system.

I'm just checking right now, depesz
had for 10,000 partitions,

he had like 40 milliseconds or
so, right?

30, 40 milliseconds.

It's pretty much the same as what
we observed with our AI system.

But I also used so-called shared
environment experiments, so

I was not alone on the machine.

So That's why I involved buffers
in my analysis.

Because this linear time growth,
according to my benchmarks,

is explained by linear growth of
buffer hits associated with

planning.

So if you have 10,000 partitions,
it might be several dozens

of milliseconds.

So what do you think?

What are you looking for?

Michael: I was looking at depesz's
post just to see if he gave

enough details around how, like,
did he pick the best of, you

know, sometimes when you're doing
this kind of experiment, you

would run the query a few times
and pick the fastest or the average.

Nikolay: did.
Several.

I didn't.
Yeah.

He used several, not many, I think
3 or maybe 5 runs only.

I used only 1 run for now.

I think it's a good idea to check,
to convert this experiment

to more like full-fledged experiment
maybe on separate machine

and maybe even with pgbench and
have many many competing like

not competing concurrent SELECTs
the idea is it's again about

buffers many many buffer hits for
planning and this explains

the growth of planning time.

Michael: Yeah, I like how in your
experiment, I like how clearly

linear the chart with buffers is,
whereas the 1 with timing,

it just, It looks like it might
be linear, but there's enough

noise that it's tricky to see.

Although there are a couple of
dips in the buffers chart, so

there's something interesting going
on there.

Nikolay: Yeah, I'm also interested.

We also discussed before this call
that it really matters, is

it first query when you just connected
or even not involving

prepared statements.

If you run the very first query,
you have extra penalty because

in this connection, information
from system catalogs is not cached

at all.

And this adds extra to like extra
buffer hits during planning

time.

Michael: Yeah, that's what we thought
it was.

I don't know for sure.

And it seems really interesting
like what it could be that's

going on there.

The reason I thought it might be
that is I've seen in the past

when running experiments, if I
run ANALYZE, like nothing to have

explained in this case, just gathering
new statistics for the

table, I've seen planning buffers
spike on the next query execution

and then come back down on the
second execution.

So that makes me think it might
be statistics related.

But yeah, I've seen VACUUM drastically
reduce buffers as well,

not VACUUM ANALYZE, just VACUUM
on its own, and definitely not

VACUUM FULL.

So whilst buffers are much more
reliable and stable than timings

for this kind of experiment, there
are a few things that can

affect buffers still.

Nikolay: Yeah, yeah, yeah.

So anyway, I couldn't reproduce
VACUUM thing and ANALYZE thing,

but I can imagine if you have complex
query, rebuilding statistics

leads to many more hits from pg_statistic,
for example, of new

data just collected.

Especially if you have a lot of
buckets, if the default statistics

target is increased and so on.

It's a lot of data to hit additionally.

I don't know, it's interesting.

But anyway, back to partitioning.

The current recommendation would
be, let's not allow the number

of partitions to grow to many,
many thousands, unless we are

okay with increased planning time.

We can be okay with increased planning
time if we use, first

of all, pooler matters.

So the situation when it's the
very first query in session just

established, it's very rare, right?

It's good if it's really rare,
and the pooler helps.

Even application-side pooler helps.

And second, of course, prepared
statements just help us shave

off a lot of planning time situations.

You remember Dirk from Adyen discussed
problems even if you use

prepared statements, still there
might be interesting problems

and we had a very interesting episode
number 100, right?

It was an interesting discussion,
but it's already an advanced

topic and at bare minimum it's
good if you know planning time

is going up.

It's good to consider prepared
statements and PgBouncer

supports them now, so it's great.

And I must say, like, dozens of
milliseconds is already super

slow overhead for planning.

If we aim to have all queries fast,
fast for me means in OLTP

as usual.

This is episode number 1.

What is slow query?

Slow query is if it exceeds 100
milliseconds, 200 milliseconds,

because this is human perception,
200 milliseconds.

Most people already see it's quite
slow.

And if 1 HTTP request consists of multiple SQL statements, this

means we should be below like 10 milliseconds already.

But if planning time adds 50 milliseconds, for example, if we

have 20,000 partitions, for example, wow, it's not a good situation

to be in, right?

So prepared statements should be a good tool to fight with this

problem.

Or just keep partition numbers low.

Michael: Well, a few times you've said, and I completely agree

with you, that our primary server's CPU is our most limited resource,

and planning time is CPU.

It's a pure use of that, and if we can shave it, why not?

Nikolay: It's CPU, but I insist it's gain buffers.

It's just buffer hits, it's related to memory, but yes, it's

CPU, yes.

But still it's I.O.

I consider I.O.

In a wide sense when we involve communication to memory also,

consider it I/O.

Not disk I/O, of course, in this case.

Most of the time.

It can be buffer reads as well, but in rare cases.

Sometimes we see buffer reads involved into planning as well.

Michael: You mean like shared read rather than shared hit?

Nikolay: Yeah, I call them shared buffer reads because...

Michael: Shared...

Nikolay: Yeah, we discussed it.

I consider this bad naming because it's I/O operations, not amount

of data.

Right?

Right.

Yeah.

So, and what else?

Indexes, right?

If we have a lot of indexes, planning time is increasing, increasing,

increasing also.

And I suspect it's not linear even, but it's worth double checking.

So if we have, for example, 10 indexes, that's some penalty and

planning time in a primary key lookup can be already like 1

millisecond or so, half of millisecond.

But if we go, like if our table has many columns, we are tempted

to have dozens of indexes.

And that's really bad.

And also, not only planning time increases itself due to buffer

hits, again, buffer hits also grow here, but also locking.

When planning happens, Postgres applies access share lock to

each object involved.

All indexes of a table are locked by access share lock.

And this is quite expensive, I would say.

If you have thousands of queries per second for this primary

key lookup, it might be a problem,
especially if you exceed 15

indexes, because the total number
of objects locked already reaches

16, which is a threshold where
when Postgres cannot, like above

16, it cannot use FastPath for
locking in LockManager anymore.

And we discussed this as well.

And this is the number.

Michael: Is that number being increased
in?

Nikolay: I'm not sure.

There are discussions to make these
things configurable.

Actually, I think I raised this
question.

It turned out not to be so simple.

I'm not sure.

I think in 2017, nothing happened
with this.

So what helps here?

Like, add more CPU, more cores,
or decrease number of indexes,

or again, prepared statements, right?

Prepared statements.

What else?

What else helps in this situation
when you need?

Yeah, I don't know.

Michael: Maybe even denormalizing.

If you've got a table of 15 indexes,
why do you have 15 indexes?

Nikolay: Or this also helps, decrease
QPS of this kind, caching

on application side.

If you had 5, 000 queries per second,
I see it all the time.

So every click, an application
reads from user's table checking

who is this.

Nothing changed.

So cache it.

Why are you reading this all the
time?

Or Rails, recently we discussed
with 1 customer.

Rails code is constantly bombing
Postgres system catalogs,

checking some pg_class, pg_attribute,
and so on.

Checking schema all the time, all
the time, like thousands per

second.

Why?

On replicas, nothing changed.

It's changing, but not so fast.

So lack of caching everywhere,
and this quickly can become a

problem, especially if you don't
use prepared statements again.

But interesting that we can combine
this problem and they can

amplify.

If you have, say, 10 indexes, it's
enough to have 3 partitions.

Even 2 partitions is enough if
you haven't checked partition

pruning and some queries involve
both partitions.

You already have a situation where
22 objects need to be locked

by access share lock and lock manager
contention might strike

you.

Jeremy Schneider had a great overview
of various works in this

area from various companies.

It's both about execution time
but mostly about planning time

because planning time is worse
because this is when Postgres

locks all the objects.

During execution, it locks only
what is really needed.

So yeah, the worst case, you have
partitioning, not many partitions,

but also indexes, and then a lot
of QPS, and it can put your

server down completely because
of LockManager, lightweight lock

contention.

Michael: Yeah, I think you've talked
about people who could be

doing things for good reasons still
getting caught up.

But I think there are also a couple
of cases where, like around

partitioning, people running a
lot of queries without the partition

key being supplied in them.

Like that kind of thing can...

Nikolay: How to catch those situations?

What would you do?

How to verify that?

For example, imagine we are prepared.

We prepared already a good partitioning
approach, like with 0

downtime conversion, everything.

And we need to double check that
all queries will have partition

pruning involved.

How to check it?

Michael: Holistically.
ALEXANDER Good question.

Well, actually, that brings me
back to an idea.

We've got pg_stat_statements and
we can now look at some planning

related data as of version 13.

It was a big, big release.

Nikolay: It's not on.

You need

Michael: to enable it.
Yeah.

Why is it off?

Nikolay: Good question.

Since it's off, many people, including
actually myself, still

have some fears that the observer
effect might strike you in

some cases.

So yeah, honestly, yesterday we
discussed with 1 of our clients,

we discussed that we should definitely
consider enabling it.

It's a good thing to have.

Michael: It would be good to hear
what the observer effect is

in that case.

Nikolay: I think we had experiments
checking this with synthetic

benchmarks and it was very low,
this overhead, but it doesn't

eliminate, synthetic benchmarks
don't eliminate all the fears,

right?

Michael: Of course, of course.

Nikolay: So yeah, I'm curious if
people have planning time.

Well, I know projects heavily loaded,
huge clusters, planning

time enabled.

Michael: So you mean pg_stat_statements.track_planning?

Nikolay: Not planning time, sorry.

Yes, yes.

It's not only time, it's also I,
right?

I mean, buffers for planning phase.

They are fine, but I'm curious
about opposite when people you're

not fine and decided to keep it
off otherwise I will call it

another default bullshit

Michael: right and it's

Nikolay: we have a collection of
them like random page cost or

what

Michael: this one's I think this
one's even worse because pg_stat_statements

isn't on by default.

So people already have to turn
that on.

And They

Nikolay: already agreed to pay
overhead price.

Michael: Yeah, but also it means
that in a default ship, like

if Postgres ships by default, even
if this parameter was on,

by default it would be off still
because pg_stat_statements isn't

on.

So I think this one might be one of
the easier ones to change.

Nikolay: Yeah, yeah.

Well, we need to check discussion.

I might ask our AI to check what
was discussion and provide summary.

Michael: But you asked how to monitor
for that.

I would expect to see issues there
quite quickly.

Nikolay: I'm interested in such
issues.

So anyway, I agree this is important,
observability for planning

overhead.

Michael: Also just P95, P99, like
query latencies, I would expect

to suffer.

Nikolay: But I think also I have
in mind an approach we discussed

with a couple of clients actually
over the last few weeks.

If you enable partitioning, it's
worth checking with auto_explain

in lower environments, enabling
it for everything and just having

it enabled and playing functionality.

It's about testing coverage, right?

If you cannot reproduce the whole
workload, of course, there

are chances that you miss something.

But it's good, then you can just
analyze logs and see if there

are plans which involve multiple
partitions.

But you need to partition properly
as well.

You need to have multiple partitions
for all tables, which are

partitions.
Michael: Yeah, true.

And if you were doing this and
you had thousands of partitions,

that would be a lot of information
in the logs.

And still no planning time information.

Nikolay: Yeah, yeah.

But we are interested in plan structure
in this case.

And in lower environments, we usually
like if it's a smaller

database, for example, it's not
thousands of partitions.

Like for example, we say, okay,
we want to have at least 10 partitions

for each partitioned table, and then
we just check plans and with

something, some analysis, we can
analyze them.

And it's a separate question how
to analyze them, but anyway,

we can find holistically find bad
plans.

But I like your idea about pg_stat_statements
as well.

I wish it had, you know, it had
information about plans more,

not only timing and buffers, but
for example, how many objects

were locked, for example, or involved
in the plan, or I don't

know, like some additional things
here.

For example, just was partition
pruning involved?

How many times was it involved
in plans?

You know, there's a column called
plans.

How many plans?

Partition pruning, question.

Because partitioning is hard.

It has so many surprising aspects,
I would say.

So many surprising aspects.

Michael: I have some sympathy with
them.

I think pg_stat_statements already
has a lot of columns.

Nikolay: Yes, I agree.

Still not enough.

We have interesting work happening
in the area of monitoring,

and we discussed not only pg_stat_statements,
but query analysis.

And the holistic table of query
analysis is so wide, because

we take every metric and have 3
derivatives for all metrics.

Time-based differentiation, calls-based,
and also percentage,

except column calls.

For this column, it doesn't make
sense.

Calls per calls is always 1.

So it's like 4 times more minus
1.

It's so many columns to present.

Yeah, we don't need them all the
time, but I can find cases when

some of them, like almost every
one is useful in some cases.

So query analysis becomes harder
and harder, wider and wider

in terms of how many metrics we
have.

And it's still not enough.

That's it.

So yeah, good.

I think we covered this topic quite
well.

Michael: I have 1 last question
for you.

Do you ever change people's from
collapse limit or join collapse

limit?

Nikolay: Oh, interesting.

I honestly like last, for example,
12 months, I don't remember

cases of it at all.

Somehow.

In the past I had cases where we
considered it and I think even

applied it, but it was so long
ago.

Somehow.

Michael: I think the default is
quite good.

The default for both of them is
8.

I think it works pretty well.

Nikolay: It works pretty well.

And also, I don't know, like when
I see 12 tables involved in

plan joins and like I already like,
oh, genetic optimizer already

here, right?

It has to turn up to 12 threshold,
no?

Michael: Yeah, I only learned this
today.

But I think because of from collapse
limit and during collapse

limit, GECO never kicks in.

Nikolay: Wow, okay.

I have old memory.

Okay.

Michael: Me too.

Like, well, or maybe I misunderstood
originally, but unless you

raise those to above the GECO threshold,
This is my new understanding

based on reading the docs.

Unless you raise them above the
Jack Ray threshold, it will never

kick in.

Nikolay: Well, yeah.

Somehow recently I mostly deal
with different problems when only

a few tables involved into plans.

Makes sense.

Yeah, but my experience is not
everything, right?

So I'm sure there are cases where
collapse limits matter and

worth considering, but just didn't
happen to me recently.

Michael: No, I was just interested
from like a it's 1 of the

things we look out for high planning
time It's 1 of the things

we recommend looking at in case
someone has tweaked them For

example, if somebody's increased
them too much that can massively

balloon planning time Interesting,
but I wanted to know like

in the real world for you.

Is it something you see us?

It sounds like me

Nikolay: Yeah, cool.

Well, there might be more cases
we even didn't see yet.

Michael: Oh, of course, always.

Nikolay: Always, yeah.

So because Postgres planner is
complex.

It's not as complex as SQL Server
or Oracle planners.

Michael: Not yet, no.

Well, hopefully never.

Nikolay: Already complex enough,
I would say.

So yeah.

Good.

Oh, by the way, do you know if
we use pg_hint_plan?

Michael: Oh yeah.

pg_hint_plan.

Nikolay: Right.

Because we tell the planner what
to do directly.

Does it help fight high planning
time?

Michael: I don't know.

Nikolay: Good question.

It would

Michael: make sense that it would,
but I don't know where it's

hooking in.

Nikolay: Yeah.

Yeah, I don't have this used in,
I don't observe it used in production

at all, only non-production.

So I also

Michael: don't see.

Okay, good.

Question to our listeners.

Nikolay: Yeah, it's homework.

Michael: And also, the other thing
I wanted to ask is if anybody

using some of the more complex
extensions that hook into planning,

like for example, Citus or Timescale.

Yeah, I've seen more planning time
issues.

But it just feels like a coincidence
at the moment.

I haven't looked into if it's definitely
the causal rather than

just, you know, what's the correlation
versus

Nikolay: causation.

But

Michael: yes, that's the 1.

So I don't know if it's just coincidence,
but I tend to see more

planning related issues with the
more complex extensions.

Nikolay: Maybe some extensions
help improve planning time, not

make it worse.

Michael: Well, you mentioned pg_hint_plan
would be the 1 that would

make most sense on that.

Nikolay: Well, yeah, it's just
a random guess.

I'm not sure at all.

Worth checking.

Michael: Me too.

Well, let us know.

There's definitely people out there
that know more

Nikolay: about it.

Let's do a summary.

So, try to avoid high partition
number, try to avoid a lot of

indexes, especially try to avoid
this combination of these 2

problems and use prepared statements.

Right?

Michael: Yeah.

You don't think avoiding

Nikolay: lots of
Michael: joins as well?

Nikolay: simple ones?

Maybe, yeah.

Again, like, somehow I deal with
very simple plans lately.

I don't know.

Yeah, Good point as well.

Good point.

I try to avoid a lot of tables,
many joins, and then we try to

expect good performance because
I always hunt for single index

scan and better index only scan,
always.

And if you have even 1 join, even
2 tables, you cannot build

index on 2 tables.

It's always 1, like indexes belong
to their tables always.

And materialized views, as we maybe
talked, out of scope for

my recommendations, always.

Regular materialized views.

So that's why I tried to think,
how can we simplify things and,

okay, nest loop and so on, but
make queries sub-millisecond.

But yeah, if many, many tables,
planning time might be huge.

Anyway, I wanted 3 lessons, like
low index number, low partition

number, but not super low.

So partitions should be small,
relatively.

And prepared statements.

And I think these 3 might take
years to implement in large projects.

Michael: Yeah, true.

Nikolay: Yeah, unfortunately.

Michael: Cool. Nice one,

Nikolay, thanks so much.

Nikolay: Bye-bye.

Michael: Catch you next week.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Planning time

Creators and Guests

Some kind things our listeners have said