102 Query optimization

102 Query optimization

Nikolay and Michael discuss the next step of query optimization work — the difficult topic of how to verify your changes won't make other things worse.

Michael: Hello and welcome to Postgres fm,
a weekly show about all things Postgres qr.

I am Michael, founder of PG Mustard.

This is Michael who's Nikolay founder of Postgres ai.

Hey, Nilay, what are we talking about?

Nikolay: Hello.

Let's talk about query optimization, like
second step, third step after you identified.

Maybe third step, right?

First step is you found some queries which behave not well.

Second step, you found some optimization idea and what's next

Michael: Yeah, exactly.

How do we, know whether it be safe to make the change that we are
thinking about, whether it'll make a difference, that kind of thing.

Nikolay: Yep.

And this is one of the topics our listeners.

Michael: Yeah, exactly.

I think they phrased it quite nicely.

It was like a, 1 0 2, I think they said like the American
class, like the, in terms of university courses, right?

Is that how, how it works?

1 0 1 is the intro and then 1 0 2 is the next step along

Nikolay: per person who can explain this, but Yeah.

I, I saw 1 0 1 is everywhere.

Right, Right.

So, where to start?

Let's first Make a step back and think about
how we ified our optimization idea as usual.

I have strong opinion here because I see.

a usual approach is either to rely on experience or to test a
right on production because the problem is usual problem is that a

non-production environments, they have different data, different size
of database everything is different, sometimes different hardware,

but actually hardware is less important, much less important.

We can Tune our, pause this on a weak hardware to behave in terms
of the planned choices to behave is exactly like on production.

It's possible, but the data, statistics and planner
settings, of course, these things are the most important.

So question is how you vilify your ideas.

how do you do it?

And the answer.



Michael: Well, I'm interested.


Well, so I think you've probably gone really advanced there already.

Talking about how hardware is is less important.

I think the first lesson is that testing in
a similar setup as possible is really useful.

And maybe there's some exceptions to that, but by similar,
I think the, the first thing I always try and encourage

people to do is make sure the quantity of data is similar.

So, and I guess we're testing two things, right?

The first thing we want to test is, for the single query that I was
trying to optimize or for the single problem I was trying to solve?

Does this solve that problem?

And then there's a secondary level, Does it introduce other
problems or does it work well amongst the whole load of the system?

Nikolay: anything?

Right, Right, right.

But right, So we want to, we want our non-production Postgres behave.

Ideally exactly in the same manner as production would do.

It actually production can change.

Yesterday, it chose one plan.

Today, there's no guarantee it chooses the same plan.

And this is a quite well known in pogo.

And unfortunately, POGS doesn't provide a good way to solve this problem.

I mean, to freeze the blind choice and avoid a
sudden plant flips for critical queries.

but it's our topic.

This, this our topic.

Just I, I mentioned it just to, to give the
feeling that the planner behavior is not stable.

So achieving the same planner behavior on non-production
environment, it's quite a challenge, right?

But there, there are specific tricks that can help us.

For example, of course, ideally we want the same.

because the same data will mean produce, statistic content is the same.

So we are fine.

There are trick to export, import produce,
statistic content, even if you don't have data.

There are some not very popular, but interesting extensions.

I think they came from Japan.

I don't remember names, but We will provide links.

It's possible to export a st.

I, this approach have limitations.

It's similar to hypothetical indexes partitions.

Very interesting approach.

And sometimes it's good, but what if we want like, true behavior So executor
really executes our plans that we can see the behavior of pos for our query.

In this case, we, we want the same data.

Of course there are several levels.

How we can be close to the same.

We can just take it as this.

On physical level.

This is the absolutely good approach.

We, we just take, grab the same page of data.

If we have managed s we just cl it.

Most of manage services allowed to clone.

It takes time, it costs money, but it'll give us the same data.

And so we can see behavior there, but sometimes we cannot copy on physical.

So we can only copy on logical level, but as you said,
the numbers of rows, for example, will be the same.

So we're good,

Michael: A problem, not really in the topic of
performance, but a problem I see quite a few customers

coming across with that is the kind of access to data.

So giving developers access to production data can be problematic from a

privacy point of view.


What do

Nikolay: very problematic, and in my opinion, it's not
about there are two problems that access of arbitrary

developer and company to production can bring us.

First problem is security and second problem, I think
it's more important, even more important, security we can.

Ask them to sign something to do security trainings, a lot of things.

But second problem is more interesting.

Developers want to develop, so they add a lot of
instability if we allow developers to go to production.

, it can introduce challenges in terms of stability because we want
to have established process of development, testing and deployment.

So we don't want developers to go and, and check,
for example, Oh, I have a great index idea.

I will create it right on production.

Primary note and I will check.

It's not good.

So, Right.

But security is also a concern.

but even if you don't talk about security, which is a
very big concern if we need to use logical copy of a

production, for example question is, will plan be the.

Row counts are the same, right?

Statistics should be the same.

And for example, we took the same, even same hardware
and we put the same positive settings to configuration.

Question is does it guarantee that the planner
will behave in exactly the same manner?

Michael: So the, the place my head's
going to is using analyze and it sampling.

Even when you run analyze, you need to, it gets samples of tables, right?

So maybe that's not the only way.

It could be different, but that's, One it could

Nikolay: I agree.



Some, some instability there, but also if you have a
lot of bloat, for example, the real pages is different.

So on your new note, on your, on your
logical clone rail pages will be smaller.

And this definitely this, this directly affects the planner behavior.

So it can choose different plan, so we can see different access
notes and plan not, not access notes, execution notes, and, and plan.

When you.

Production and this logical replica, unfortunately, and there is not, I
don't know, solution to this problem, but it's quite interesting problem.


Michael: And actually not just bloat.

Also, I guess insert order, right?


Nikolay: Physical distribution?



Well this you you mean, You mean buffer numbers probably different.

This is this, I understand.

For example, we need to read a thousand rows.

We, we had thousand of buffer res and hits on our source, but we have
only hundred, or we have, we have like much fewer hits and res on our

replica because more compact storage, for example, of these rows.

this is possible, right?

But question is, will, structure of the plan will be
the same, and that's where unfortunately we cannot g.

Usually it's very close, but unfortunately real pages
is different and it directly affects the plan behavior.

Michael: But we are, we are talking about edge cases, right?

Not the majority of the time.

If we're talking about simple LTP type
queries, does it use the index, does it not?

The kind of things people are probably worried most about.

They're likely to be easily replicated once
you have a similar like sample of data.

Nikolay: Right, right.

I agree and sometimes, for example, we might have multiple
production environments and they are slightly different.

So things can be very interesting in various cases,
but at least we can be as close as possible if we.

Aim to do it.

We, we say, Okay, we have a billion rows in a table.

Let's have a billion rows in a table in our production environment,
non-production environments, and then we can see the plans.

Now, back to the topic.

If we have a plan if we have a query and the plan, and
now we want to ensure that it's a good idea and we need

to approve it, of course direct comparison before, and.

It's already good, right?

So we have one query and we see that our change, for example,
we create an index or we do something else, we change

query somehow we directly see that IO numbers improved.

So conclusion is, let's do it.

But as you mentioned in the very beginning, of course there
are some cases when our change can affect other querie.

And the question is how to find it holistically, how to test it holistically.

And this is unsolved problem.

First of all, let's consider one case when it's, it can be possible.

I had it and I wrote an article about it.

Uh, We will provide it.

So it's like an interesting case when optimization of index led to.

All updates being worse in terms of execution?


Idea was simple.

For example, we have some, select, it's quite slow and it's maybe not slow,
but we see that it uses some index, which we can reduce in size significantly.

Adding a where clause we add a where clause, everything fine.

this select has better performance because index size is
smaller and, and so on, because now it's partial smaller, right?

But then we deploy it and, and suddenly see that
latency of all updates on, on average reduced.


Because we lost HOT updates because adding some column to.

Index definition, even it's in, in the wear cloud, leads to
different behavior of updates because HOT updates, heap only

tuple updates, they are possible only if no indexes have column
name in their definitions that calm, which we are changing.


So if we change some column, usual case, for example, updated at a
timestamp, we usually tend to change it when we do updates, right?

Because it's the semantics of this column.

But if we have an index on it, no such updates
can be hip on the top it, hip on the top.

Updates, they are much faster and regular updates because
during these updates POGS doesn't touch all indexes.

It doesn't need to change all indexes.

But with regular updates, POGS needs to update each
index and this, this is called index amplification.

Index, right?

Amplification issue.

One of those which Uber h ighlighted in their
article, so losing hat updates may be painful, so

Michael: Especially on a heavily loaded system for sure.

The, the other, the thing, the place I am starting to
see some progress on in terms of tooling is people even

being aware that some of these things are problems.

So the, it's PG added an index advisor.

In the last year or two.

And one of the things that it points out is
an estimate of right performance overhead.

So not only, not only can indexes be a problem in terms of preventing
hot updates, they also add right overhead if a new rose, for example.

Nikolay: yeah, any index.

If you create one more index, definitely you add overhead.


And we can actually estimate, we can measure.

And this is interesting and this definitely it's possible to improve tooling.


I don't see best tools here.

I made couple of attempts to create tools.

One of the.

ideas was to create a benchmarking tool which will be
used in, in any change for any change related to database.

It, it was called Nancy Bot.

And the idea was we will replay workload before and after our change.

We will collect all metrics, for example, from pist, statements
from logs, auto, explain everything, and then we will

provide the direct comparison Before and after, and we see
exactly which queries degraded and which queries improved.

And this was great idea.

We had even had a good interface.

And so, but then I realized that practically it's not possible
to run full-fledged benchmarks every time developers does

any change because it's like, it just economically doesn't.

It's too expensive.

You need separate machine or couple of
machines, sometimes several in, in, in a batch.

But sometimes we ran several experiments, like, what if we do
this, this, this, Like we have 10 options and we executed them.

Then questions arise.

Like we, we executed them in parallel not
sequentially because sequentially is too slow.

If, if each experiment takes 10 minutes.

You want to execute parallel, but then questions arise.


But what about are all virtual machines the same or there are some deviations,
for example, you one of them got worse discourse, CPU or something.

So we

had to

Michael: neighbor.


Nikolay: Right, right, right, right.

So, We added a lot of logic, like micro benchmarks to ensure
that like some baseline is met and so on, and so on and so on.

And the complexity grew.

But the main problem was you just need to pay a lot to do such experiments.

Some campaigns can, can afford it, Also the, one
of the biggest problems was how to replay workload.

There was p play and replay.


I, I saw the other, maybe yesterday I saw the article from Frank Pa, Pa
Yuy about how to run pji replay for yuy or p play go actually for yoga.

But, but collecting queries is also a challenge.

And I did it several times.

And so the bottom line.

It's quite complex.

It's possible to do it.

Some companies actually implemented something, but
it's very hard to create a universal tool here.

But still there is need here, there is need.

we want to cover all changes with some testing, right?

But the good news here is that we don't need full-fledge.

CL we we can do in a shy environment.

We can focus on IO metrics and run many experiments on just one machine.

This is what we do with Database Lab and think loans and database branching.

and it's possible.

So we can have think loans.

Of course they have different timing because different
file system, for example is SS or, or anything else.

BTS is also an option.

And then you run multiple experiments right on
one machine and you don't care about timing.

You care about plan structures and.

Are you numbers and you can verify many things.

So question.

Next question is what to verify.


We have our query.

We want to optimize.

We found some index.

We ed that query is improved.

How to check other queries?

Well, some new tool link needs to be built here, right?

And I think it consists of two areas.

First area is a reaction.

We need to reactive approach.

We want to grab the content of psal statements periodically
from production and have some regression testing.

We can reduce the scope.

If we we can pass queries or already normalized queries.

using PG query library from Lucas Fit.

Right to identify which tables are involved, and if we consider
optimization for a particular table, we can find only those

queries from top hundred or top 500 by total time by calls.

Somebody needs to decide what is the most important metric and
which queries are most critical for our application, but we

can find those queries which potentially might be affected.

Of course, we can also.

Think about triggers and various dependencies, foreigners, and so on.

So this is tricky.

But anyway, we can find it automatically and using
Think loan, using single connection sequentially.

We can find, we can test many queries, but the problem will
be how to find parameters, which, which parameters to use for.

Queries and this is unsolved problem.

I already mentioned that it's, this is
big problem how, which parameters to test

Michael: In, in reality, what do you see?

Like in reality, what I see people doing, I think it's probably at
a smaller scale to you, is that they reason about the potential to.

Effects talk, discuss it amongst the team.

During the PR process, during their discussions, then they
monitor, well, they, then they deploy to production and monitor.

Maybe they provision a little bit

Nikolay: Purely reactive approach, right?

So we, we deploy and see, yeah, this is possible.

First of all, of course, code review helps.

And having a bunch of very experienced positive experts helps a lot, right?

They can say, Oh, you know what?

This is good idea, but what will happen?

What other head from it?

Let's think about it.

Of course, we can manually analyze and try to predict, and
if you have experienced 10 plus years, It'll work quite well

unless you are very tired, very busy have days off and so on.

So I, I still think about fully automatic, automatic approach, but
it's, yeah, it's possible and we can check metrics and have some alerts.

For example, if updates are very important
for us, if the ratio of hot updates.

After our release, it can be somehow flagged, alerted, and so on.

This is good approach.

Yes, but it's purely reactive.


And actually most projects even they don't even don't have this, even this.

So I agree this could be a good step towards it.

But back to fully automated fashion, which queries to check, we
need to check queries from production for regression testing.

And that's possible.

And we actually in some, in a couple of cases, we already have
something like, For with our best lap angel everything is done in c I c.

and the user users see the difference and
potential degradation before we deploy it.

It's, it's very good, but additionally we
need to think about absolutely new queries.

For example, if it's a new feature, new feature branch
and it introduces some new queries or Rewrite or for

existing queries, production doesn't have these queries.

So second part of it is even more challenging.

We need to test absolutely new queries here.

And I, I'm just explaining our thoughts we move
to in this direction with database lab and.

, I'm excited to, to let in future we will have fully automated testing of
all things related to databases, even for those who are not experts at all.

And when experts are involved, they have a lot of artifacts
metrics to analyze and understand much quicker what to

do, how to fix it so every, everyone can move faster.

With better quality and so on.

so right now without this tool idea is right, we need to
have code review and some every database schema change.

We need to test the change itself, but
also think about how it'll affect workload.

Michael: Yeah.

And I guess first step, a lot of, some people
don't even check about the locks involved, right?

They, that's, that's like, I guess that maybe that's 1 0 1 than
1 0 2, but making sure that it's, it's not a blocking change.

I know this has got a lot better in more

Nikolay: Well, right.

This is, I I even don't discuss it considering it's too trivial.


So like creating should have concurrently and updates should be in batches.


And this is actually, we solve the with database lab, we
have fully automated verification that no exclusive logs.

So last more than specified number of seconds.

Well, for example, one second, or, Yeah.

This is possible to fully automate for the change itself.

we spent some time understanding how to test
the change itself and quite already solved it.

We can put it to your C C I C D, but now
question is, I, I think it's a bigger question.

For example, some developer created something but forgot,
index, how to see that it'll be better if we deploy.

And how to see it without involving very, very
expensive and very experienced DBAs every time, right?

So we should involve them only if we have a hard question, not a simple
question like forgotten limit or index which degrades our other queries.

Michael: Yeah.

And I was You mentioned PG Replay Go.

That's my, the old, the company I used to work at
Go Cardless and I, I think that's how they reason.

Nikolay: right.


Michael: So I think that's how I remember it being used most.

It was big changes, it was risky things.

It was things that people already knew were potentially

Nikolay: All.

Example, a big, functionality.

We deployed big functionality, or we had
some de factoring during several months.


Michael: large migration, that

Nikolay: Right.


Well, in my opinion benchmarks with multiple sessions should
be applied very rarely when something big changes, either like

big new release happens or migration to new , major s version
or change of infrastructure, change of operational system.

It happens couple of times per quarter maximum usually, right?

Even in

that much less normally.

It doesn't depend on the size of company actually, because anyway.

It's very expensive change usually.

And it's a good, it's a good that we need benchmarks, full-fledged benchmarks.

So not every day, but application changes might
happen every day, sometimes several times per day.

And they might change some small part of database.

Good news on the small part of database, right?

We don't perform huge refactoring every day.

So if we change only one table, two table.

We should be able to test it in more light manner.

This is why I'm so big fan of think learning and I'm so excited to see others.

Also think about database branching, new database they say about
it, like they say, you can test a lot of things in C I C D.

It's, it'll be cheap.

Actually, AOR has, as we discussed, Avor has thin loans, but it's not cheap.

You need to pay for each loan for compute power separately.

I, I hope new one

Michael: Interest.

Nikolay: will be different running on one machine and you pay for single
machine, but run many, many tests at the same time on one machine.

Otherwise it doesn't scale.

Cause you sometimes need the dozens of.

That's happening in parallel, if you have big organization,
it can be very costy, so you will turn it off and you

will be without testing, returning to bed times again.

But the advice right now is yes have database
reviews, try to automate some things.

Learn your pta ptat user tables.

Right, because it shows you how many updates
and hard updates you have some type statistics.

It's very good to, understand it.

I, I'm not, I don't agree with a data doc approach.

If you go to database dashboard in Data Doc, they have
a lot of improvements lately over the last year or two.

But if you check what they name, they present it as, Tap statistics?

No, No.

Throughput is tps.


This is throughput.

Our global throughput for our data is not, but tap statistics is
important, but it should be in details, but it's off topic anyway.

Understanding when you change something for a table understanding
Apple statistics from that user tables, it's a good thing to have.

And you, can see, Oh, I lost updates.



Michael: How would your like advice change
if we're talking about smaller teams?

If we're talking, let's say like you're a six person startup, you've got
a decent amount of data, but not necessarily, you know, not billions

of rows and Maybe the team is currently prioritizing moving fast, and
if, if they have a few problems that they have to fix, that's okay.

Is there any changes in this?

Like, because this feels like a very safe approach, like
the kind of tooling we've discussed seems very very much

optimized for, try to avoid a problem at, at almost any cost.

Nikolay: Well, yeah, good question.

First of all, I would, anyway, change mindset
to use buffers instead of timing anyway.

Even, even if you are small, because it will save you so much.

Like of course there is such thing as a premature
optimization when you try to optimize too early, too much.

But still if you know that some table, you
will use it as a, like in the queue pattern.

Insert, update and delete, delete, insert, update, delete.

It'll belo at a lot of you need some experiment Anyway, This
experiment by the way, can still be single user, single session

experiment in shared environment because you don't actually
need to use bench and utilize all CPUs to, You don't need bug.

Some kind of background workload, unless
this workload also deals with your table.

What you do need is to think about the future, like what will happen one
and two years with this table, how big it'll be, and run some benchmark

with one session to fill it and then to run one more session to next.

You run to update it heavily and so on, and see What
blood will be and do, Do you have good indexes here or no?

Like if you focus on buffers, you will quickly, This mindset
change is very important because you will start comparing

like, like in your, in your tool pitch master to you compare,
for example, rose buffers, buffer numbers to get one row.

We, we deal with thousand buffers.

It doesn't feel good at all.

So, and if in one year it will be even worse.

probably, we should prepare for better growth right here in this area.

So we need to test it.

We need to test our queries and see what will
happen if we, if we have more data, for example,

Michael: Yeah, I think if I, The only other thing I would add
is I think I see kind of a couple of extreme failure cases here.

I see the failure case where people have not
really started adding many indexes at all.

They almost have no indexes, and then they
start to scale and they have every problem.

Nikolay: column, right?


Michael: Or Yeah, exactly.

Or the other extreme where they've added
an index every time they've seen a problem.

Or Yeah.

I, I, I've not seen an every column case I've heard about them.

But definitely, you know, every time they see a
potential for a multi column index, they add one.

So a single table on even a table of six columns could have 20 indexes on it.

Different orders, different.

So I've seen both of those failure cases, and I guess if you know you're
in one of those, Then it kind of changes your approach quite a lot.

If you're in the, we already have too many indexes, maybe you
need to be a bit aware of that and be looking at reducing that.

If you know you're in the case of we don't have enough, maybe you've got a
little bit more leeway to not have to worry about as much impact of adding

Nikolay: You need to understand your workload to,
to know what to test, and like this is true, but.

Change of mindset to buffers and just experiment a lot.

This is a good thing, like experiment, experiment, experiment.

And you, you, you will see and you then you will probably start if, if
you're, everything is based on human decisions on like based on gut, right?

In this case, At some point you will start skipping some checks because
you already know that it's, it's, it's okay, but sometimes you'll be wrong.

Anyway, this is a bad thing,

Michael: Yeah, in fact, the buffers thing came up.

I just, the other day I saw a really interesting blog post from a
company that moved from RDS for Postgres to Aurora for Postgres, and.

They, in my opinion, must have not had enough indexes.

They must have ha been doing too much IO cuz their costs
went maybe tripled or something according to the graphs.

And it was really interesting to see that and see that they thought
this simple migration would be straightforward, but actually because

they hadn't optimized for buffers, they hadn't reduced, you know,
being quite efficient on the IO front hit them with a big bill

Nikolay: Yeah.

On a raw optimization it's not about performance, it's about money as well.


And I, I, ayo centric optimization is, it's the best thing there
because if you focus on reduction of Ayo, I don't remember.

It's about, is it about rights or reads?

I remember something different.

Was there this detail I already forgot.

But either rice or risk can, if you have them a.

You need to pay for at Aurora.

So you definitely want to reduce and you want
your buffer pool to have better efficiency.

a ratio in terms of heat rate versus res and just you
can check your queries and optimize them and so on.

And, and if, for example, indeed, if you, if your index
added more, for example, it broke all your hot updates.

So hot, hot updates ratio became zero.

This is what happened with me a couple of times.

In this case, you'll start paying more , so
you optimize one query, but not only.


you have worse performance for other queries, but you're also paying more.

It's not good.

So my advice is, is know your workload.

Learn it from p statements, for example, right?

Focus on buffers.

Not timing one.

Optimizing timing is final, our final goal for optimization,
but it should not be used during the process of optimization.

It should be only like checked in the beginning and the end.

But, but in inside of the process optimization, we should be focused
on buffers and final experiment a lot and to experiment a lot.


Just be ready to loan and reset a lot and so on.

That's why I think Nik is good and as I've said, Aurora has think
cloning so you can run additional load and test it there, then stop it.

I guess they have also like per minute or even per second billing ass.

In general has it two notes.

So they are per second.

So it's can be very efficiently.

You, we will need to wait some time, like minutes to
provision a node, but then at least you can quickly stop it.


But pure thing close, which are I, I, I name them local Think loans.

So we have one DM and many CL inside it.

So we pay it for only for one dm.

It's the.

For experimenting, so having such a platform
for experimenting is a good, it's a good thing.

Anyway, general device has experimented a lot.

Michael: Yeah.

Sounds great.

Anything else you wanted to add?

Nikolay: Now as usual, subscribe, like, share, provide feedback.

Feedback is very important.

Share is also very important in your social networks,
working groups like disc, Telegram, what you use.

Please share and bring us more ideas.

Actually, we have a line, have a queue of ideas anyway, but we want.

Michael: Yeah, exactly.

They don't, it's not just always request comes in and we do the exact topic.

Sometimes it's gives us an idea for a slightly different one, or we
put two together and that makes a good episode, that kind of thing.


we'll see.

Nikolay: exactly.


Michael: much everybody.

Take care.

Nikolay: Bye.


Some kind things our listeners have said