Nikolay: Hello, hello, this is
PostgresFM.

I don't remember the number of
episodes, Michael, do you remember?

Michael: No, we must be getting
close to 170 though.

Nikolay: Yeah, some really big
number.

Not as big as your max_connections,
right?

Michael: I like it, good segue.

Nikolay: Yeah, my name is Nik,
PostgresAI, and as usual here

is Michael, pgMustard.

Hi Michael.

Michael: Hello Nik.

Nikolay: So we are going to discuss
really high max_connections

in general and maybe we will touch
a little bit connection pooling

systems a little bit, and maybe
we will touch per user connections,

max_connections setting because
it's possible.

Yeah.

So in general, like is it bad idea
to raise max_connections And

is it a good mitigation action
when you have problems?

This kind of action.

What's the ideal max_connections?

Michael: And why do you, I think
you've seen some cases recently

where people are raising it and
you think it's not a good idea.

So is it worth kind of jumping
straight into some recent experiences?

Like why is this on your mind at
the moment?

Nikolay: Yeah, so I proposed this
topic after a tweet I saw where

It was shared that there was a
pattern when deployments, I guess,

schema migrations or DDL were failing,
but only in particular

hours of every working day, or
maybe every day overall.

And then they realized that at
the same time, during a couple

of hours, some data pipelines were
running.

And they just raised max_connections
and that's it.

Actually, I'm not sure max_connections
because it doesn't say

which database system it is.

I just assumed it's Postgres because
Postgres is default choice

right now.

I might be wrong.

Maybe it's different databases.

Michael: I took a note and it said
the solution was, 1, separate

connection pool for migrations,
and 2, better cross-team communication.

Which I thought was interesting.

But yeah, maybe separate connection
pool for migrations is where

you're drawing that conclusion
from.

Nikolay: Well, this doesn't sound
as a mitigation for me.

It sounds like isolation of problem.

Yeah, well, I might be wrong.

So again, like there are several
possible options here, but I

bet what happened, These pipelines
produced long running transactions,

right?

And these long running transactions
blocked some DDLs, ALTER TABLE,

add column, for example.

And this DDL blocked any queries
to this table.

And a number of active sessions
spiked and achieved, reached

max_connections.

If you start isolating, maybe some,
actually if you start isolating,

everyone will suffer anyway, right?

So this idea that somehow isolation
helped somehow, like if this

theory is right, then separate
connection pooler for migration

and pipelines doesn't help at all.

Because locks are global, right?

So you can, you don't lock a table
only in the context of some

connection pool, you lock it globally
in this database, right,

in this logical database.

Michael: Yeah, so if it really
did mitigate the issue, either

it was the other part of their
solution, which is the better

cross-team communication, or the theory's
wrong, and actually it

was a case of these migrations
taking up a few slots, a few connections

that weren't, that the application
expected to be able

Nikolay: to use.

Migration needs only 1 connection,
right?

Michael: Yeah, true.

Nikolay: So how could pipelines,
data pipelines take a few connections?

Yeah.

I don't understand how they can
exhaust the reach max_connections

at all.

Michael: Well, should we talk about
the general case a bit more

then?

Nikolay: Yeah, so again, like this
general case is described

and we discussed it a few times.

Everyone should understand that
any ALTER statement, it requires

a lock, exclusive lock on the table.

Even, well, there are exclusions,
for example.

No exclusions, actually.

ALTER TABLE is a serious thing.

Even if it's super fast, you still
need a lock.

And if you, with default settings,
fail, like if your session

fails to acquire this lock, it
will start waiting.

It will wait forever with default
settings.

Default settings means a log timeout
0, statement timeout 0,

transaction timeout 0, idle transaction
session timeout 0.

Everything 0 means forever, infinite,
right?

Waiting infinitely.

So this means it will be waiting,
waiting, waiting, but any query,

including any SELECT to this table,
will need to wait behind

us in the line.

So there's a queue forming there,
waiting queue, lock queue,

blocking queue.

And actually, you might see multiple
queues, and it forms trees,

and actually not just trees, it's
forests, so it's beautiful.

Like, it can look really great,
like forest of blocking or locking

trees, but this is eventually what
accumulates a lot of active

sessions, can overwhelm your server
basically.

Even if it's not achieving
max_connections, it can reach to high

CPU consumption of course and so
on.

But it's very common to see that
max_connections is reached because

too many active sessions.

And most of them are just waiting.

So when you reach max_connections,
any additional attempt to

connect will be, we'll see an error,
right?

Too many connections.

Yeah.

Too many clients or how's it?

I don't remember.

Too many clients I think.

Michael: Yeah.

I can't remember either.

I remember back in the day when
a lot of people used Heroku Postgres.

This used to come up all the time
because they were much stricter

on how many connections they would
let you have.

And these days, more people on
things like RDS who have much

much higher default max_connections
than Heroku ever has.

I see it much much less that people
are hitting this.

I think probably causing other
issues, I'm sure we'll get to

that, but it's interesting that
that used to come up more in

various online forums?

Nikolay: Yeah, so in the past,
so we need to distinguish the

states of sessions or connections.

Connections and sessions or backends
all are synonyms in this

context of Postgres.

So we need to distinguish active,
idle, and idle in transaction,

3 main states.

And you can see states and selecting
it from pg_stat_activity.

So before Postgres, I think 12,
13, 14, so

Michael: 5 years ago.

Yeah, is this the Andres Freund
work?

Nikolay: Right, right.

Yeah.

A couple of articles published
originally in Citus blog, then

in Microsoft blog, Microsoft Azure
Citus there, right?

So a couple of good articles, excellent
articles.

First researching and then discussing
optimization.

So usual fear was that idle connections
consume memory, but articles

prove that the main problem is
that how snapshot, work with snapshots

is organized and idle connections,
for example, if you just add

1000 idle connections, it can hurt
your performance.

And I demonstrated it with benchmarks
easily as well.

So it was quite easy to see but
after, so if you have I think

14 plus, maybe 13 plus, If you
have this Postgres, basically

everyone right now.

Michael: Oh true, all supported
Postgres versions now, that's

incredible.

Nikolay: Because 13, we have a
couple of cases where 13 is still

in production, and then talks to
upgrade it, but basically right

now if you are on supported version,
so idle connections don't

hurt performance anymore.

Too much.

Michael: Or anywhere near as much.

Nikolay: Yeah, exactly.

Yeah.

So having 1000-2000 additional connections,
It's not a big deal

these days.

And also processors improved.

I really like Graviton4.

It's like, it's amazing.

And it handles a lot of connections,
even active very well.

Much like I have my neural network
trained what to expect from

like 500 connections, for example
on a machine with fewer than

hundred vCPUs, right?

I can imagine what should happen,
but on new hardware, I see

it much better.

It handles more active sessions
much better.

Michael: When you say your neural
network, do you mean your brain?

Nikolay: Yeah, yeah, yeah, my LLM
internally, yeah.

So I just, well I expected this,
but wow guys, you are fine actually,

You are not down.

I would expect you to be down by
this point, right?

So we just see it in our monitoring
what people experience, what

their cluster experiences.

And like, wow, this is when they
check what hardware it is, oh

it's actually

Graviton4 on Amazon which is great.

I'm impressed.

And Postgres.

And

Michael: newer Postgres.

Yeah, okay.

But you're talking about lots and
lots of active sessions rather

than idle.

Nikolay: Additionally, on larger
machines, RDS and Aurora RDS,

they just set really high max_connections.

I think this decision was made.

I'm really curious how this decision
was made.

And It's like we see 2500, 5000...

It's very common.

I can already, like looking at
settings, I can tell you, oh,

this is not seeing the settings
which start with RDS dot.

Just looking at this profile of
settings, I already realize,

oh, this is RDS, or this is like
CloudSQL.

Like there are patterns of some
decisions.

You know, random page cost 4 and
max_connections 5000.

Oh, very likely it's RDS.

Michael: RDS, yeah.

That's funny.

Nikolay: So this gives us opportunity
to observe how clusters

behave with high max_connections.

And old arguments don't work anymore.

So old arguments were like, guys,
this is bad.

You just have 2000 connections,
1900 of which are idle.

Why do you do this?

Because you just basically pay
big tax over performance-wise.

Because all the latencies, it hurts
all the latencies, but now

it's not so.

And the only argument left in my
mind is that still if number

of active sessions exceeds the
number of vCPUs significantly,

then it hurts performance globally
and all queries are processed

for much longer time, latencies
increase, and none of them have

chances to be finished in time,
in statement timeout, which for

example 30 seconds.

So instead it would be better if
we reach max_connections sooner

and just say some clients retry
later, like too many clients,

but these guys would have much
higher chances to complete execution

inside statement timeout.

So we have limited resources, CPU
and disk I/O.

Yeah, to make

Michael: sure I understand, are
we talking about if you allow

too many connections, even if most
of them are going to be idle

most of the time, you run the risk
that a bunch of them will

become active at the same time.

Yeah, OK.

Nikolay: Exactly.

So if some wave of requests, unexpected
or expected, but poorly

planned, happens, if we have low
max_connections, we will have

a limited number of backends which
will do a job inside the statement

timeout, if it's limited.

If the statement timeout is limited,
it's a different story.

It's also a very wrong state for
OLTP.

We also discussed it many times.

But if we have 10, 20, 50x compared
to the number of vCPUs, we

actually have max_connections and
this wave comes.

All of them try to execute, but
none of them will succeed because

they will all bump into the statement
timeout.

Or if the statement timeout is
not set, they will bump into some

different timeout on HTTP level,
application level, or something

like 1 minute or something.

They will still, like nobody, like
we took all the chances from

everyone.

That's a problem.

If we leave it, some of them will
succeed.

Some of them will need to retry
because too many clients and

we will need to fix the problem.

And of course, if there is connection
pooler, it's great because

connection pooler will take a lot
of headache from Postgres backends.

Michael: Yeah well I wanted to
ask you about this actually because

I think sometimes when 1 of us
says connection pool I think people,

or people can assume either direction
right, They can assume

database side or application side.

And a lot of the time I see people
scaling their app and having

application side pooling.

And That's how I see, I think,
that's how a lot of the Heroku

issues happened.

Nikolay: A lot

Michael: of the people running
out of connection on Heroku because

it's so easy to spin up new versions
of your app and scale the

app horizontally.

But each 1 comes with another 10
connections.

Another 10 connections.

They're using poolers, but it's
application side, so there's

no, they don't use anything on
the database.

Nikolay: Those poolers are too
far from the database.

And exactly this is this pattern
I saw so many times.

Especially, imagine e-commerce.

And they have microservices, many
clusters, some clusters, we

see it.

Okay, max_connections is high because
they don't use PgBouncer.

They say, okay, we have Java application
and all the connection

pooling is on our side.

HikariCP for example, right?

There's such connection pooling.

And all great, I cannot convince
them to start using PgBouncer.

Not bad, okay, max_connections
we keep somehow high because we

have a lot of idle connections.

Why?

Because there are many application
nodes.

And then before Black Friday, which
happened last week, this

story was many, many years ago.

Before Black Friday, guys, infrastructure
guys who are responsible

for application nodes decided to
double capacity.

This is in their hands, their decision,
okay, we need to be better

prepared, let's just double number
of application nodes from

100 to 200, for example, or something.

And nobody thought about the configuration
of those poolers,

So they were like 40, right?

40 connections.

So it was, for example, I don't
know, like 20 nodes times 40,

it's 800, for example, right?

Maximum number of connections,
800.

But they doubled it, it became
1, 600.

I think numbers were higher in
that case.

And nobody talked to database team,
like, it's just application

nodes, let's have, they're stateless,
let's add them.

And this is how you suddenly see,
oh, many more idle connections.

The actual load didn't increase.

We don't need, like, a number of
active sessions didn't increase.

But since capacity was doubled,
the number of idle connections

doubled, obviously.

And you cannot solve this problem
having connection pooler on

application side at all?

Well, you can, but you need to
somehow memorize every time you

add or remove application node,
you need to rethink pool configuration

for all of them, to maintain the
same overall number.

Some of connections should be maintained,
right?

Michael: So you mean if we double
the number of pools, we can

halve how many connections each
pool has?

Yes.

Yeah, but then it seems brittle
to me to then need to remember

that.

Nikolay: Yeah, there are 2 numbers.

There is like minimum and maximum
number.

I'm talking mostly about like minimum
number.

These connections are open and
they're maintained.

It doesn't go down below that,
right?

So if we doubled capacity, but
the actual workload didn't increase

because it's not Black Friday yet,
why do we open more connections

to Postgres?

It's not right.

And a good solution here is to
have an additional pooler closer

to the database, where it will
be configured according to database

situation, not situation with application
nodes and the needs

to increase the fleet.

Yeah, so connection pooler helps
and usually we say like, okay,

if you have PgBouncer, each PgBouncer
can handle 10,000 connections

incoming, something like this,
thousands of them.

And it should open only 100 connections
to the database.

This depends on the number of VCPUs.

We still say take number of VCPUs
times some constant.

Usually it was 2, 3, 4.

Now we say 10.

Because we see Postgres behavior
improved, hardware improved.

Okay, 10.

But not more.

If you have only 96 connections,
OK, 1000.

If you have 32 VCPs, I mean, number
of max_connections should

be, OK, 320.

It should not be 5000.

I'm very curious how RDS team made
the decision.

Because I guess they...

I have several theories here.

1 theory is there's so many clients
without pooling and this

was just a need.

And I'm like, okay, let's do it.

Michael: It's just moving the problem,
isn't it?

It's instead of people, instead
of people thinking, oh, what,

I don't have enough connections,
they're going to hit

Nikolay: performance issues.

Yeah, maybe they have some additional
reasoning, which I don't

get.

I'm very curious.

And another thing, maybe RDS proxy
was not so good, and still

not so good.

We saw it in action, RDS proxy,
it's a very weird proxy compared

to PgBouncer behavior.

You see idle connections more.

I don't remember all the details,
but every time we touch it,

I always say, like, why is it behaving
so?

So what is this?

It's very strange.

It's very different.

There's no, like, very good multiplexing
capabilities like in

PgBouncer.

You have 10,000 connections incoming,
maximum, and you transform

them to maximum 100 backends.

PgBouncer is very reliable and
stable to solve this task.

RDS proxy, not so much.

So maybe they had issues with RDS
proxy and they needed to raise

max_connections because it was
not solving this task somehow.

Michael: Interesting theory.

If anybody knows more, let

Nikolay: us know.

Yeah, I'm very curious.

So this is my approach right now
to max_connections as maximum

10x for vCPUs, ideally lower so
database can breathe under heavy

load.

Michael: And lowers the risk, I
think, is that yeah.

Nikolay: Yeah, So just to be honest
with you, how many resources

do you have?

Do you have 1000 cores?

Maybe yes.

In this case, no problem.

Go for it.

Because it sits all the time, like
no problem, no problem but

then we have problem we have problem
and we have huge spike and

unresponsive database right system
I mean database system

Michael: so it's picking which
like what do you want your failure

case to be right do you want your
failure case to be some people

can't access it or

Nikolay: Do you want your failure
case to

Michael: be some people can't access
it or do you want your failure

case to be nobody can do anything?

Nikolay: Yeah and also if some
attempt to connect and execute

a query from backend failed, backend
should have a retry logic.

Michael: Yes, it

Nikolay: should be.

Connections and query execution, both.

So it should be handled there properly additionally if users

immediately see there's too many clients, Okay, maybe you should

additionally adjust your application code.

So, to have an ideal state is retrial logic with exponential

backoff and jitter.

So, this is like a more scientific approach.

There is a good article, an Amazon article about this.

It's not rocket science, but it's good.

There is some science behind it.

Exponential backoff and jitter.

So it can happen quite fast, and this adds some resiliency to

this system a little bit.

Of course, if you have 500 max_connections and all the time all

of them are busy, yeah, that will be, the users will notice this,

right?

But still those who are still working, they will have more chances

to complete.

And this is, I think, super important.

Then sit with welcoming performance cliff, basically.

Yeah.

But another thing which I'm curious in is how to demonstrate

the problem.

Naive attempts failed so far, so I don't have a good article

or demonstration how this performance cliff behaves.

Name it performance cliff may be wrong as well.

But I feel it's somehow like, not maybe cliff, But somehow too

high max_connections increases risks to be down for everyone.

And how to demonstrate it, I'm not sure.

So these considerations are more theoretical.

So we have many clients who listen to us, agreed.

But max_connections, to change it, you need a restart.

So they still see it with 5000.

That's it.

At the same time, I'm not super worried.

Well, I still believe we need to go down with max_connections

and plan a restart and do it.

So unfortunately, some companies are afraid of restarts and try

to minimize it.

It's another question, right?

How to stop fearing restarts and do minor upgrades more often

and so on.

But in our effect, OK, if you have 2 max_connections, it's also

good because in our monitoring, we put active session history

analysis, weight event analysis in the center right now.

And it's quite great to see very colorful, beautiful graphs if

we have a lot of max_connections a lot of areas with different

colors it's so good it was a joke right so okay I mean if you

if you have max_connections high your graphs looking better you

know

Michael: oh I see what you mean
yeah

Nikolay: this probably is not responsive
right but graphs look

really great

Michael: okay well I'm interested
in the pool like I do think

we do need to justify like why
should you have an extra, an extra

thing running, like it's more complexity
for Pete to justify,

right?

It's another layer if we're saying
you should use PgBouncer.

And added latency, like maybe not
much, but like there's a bit

of extra, right?

There's 1 extra hop.

So I do think it is sensible to
like have to demonstrate what

are, like what is the advantage.

Nikolay: Yeah, yeah, well, It's
quite easy to demonstrate the

benefit of having a PgBouncer in
between.

There are certain types of workload
where it will be very noticeable

that the PgBouncer improves.

And also if you have slow clients,
it improves things, because

backends are not working on transferring
data and like it's offloaded

to a PgBouncer.

Michael: Yeah, okay, that's a good
1.

Nikolay: Yeah, yeah.

But also if you take a lot of fast
selects and with PgBouncer,

it will be much better than without
PgBouncer.

Because, for example, you achieve
1 million TPS with select only

pgbench, right?

And PgBouncer helps Postgres to
communicate and start working

on next execution.

So you can feel it and you can
raise number of clients easier

and scale better.

This is a simple benchmark, actually.

Usually, in this stress load testing,
we start with 1 client,

2 clients, 4 clients, and so on,
a number of parameters, hyphens

c and hyphen j, lowercase, in pgbench,
we match them because

there is interesting logic not
everyone understands from beginning

how these parameters are connected
to each other.

And then we just grow.

But usually when we grow to the
number of vCPUs, we go beyond

that and TPS numbers go down, latency
goes up.

Because, and this is ideal state
for direct connection benchmark.

We just scale until the peak, which
matches number of VCPUs.

If peak starts earlier, there is
some interesting bottleneck,

I bet.

Some lightweight lock or something.

And then if you introduce PgBouncer, this peak should shift

to the right and become higher.

Yeah, that's how you can see it.

Michael: Well, is that an easier
way to sell it then?

Instead of like trying to convince
people to use it because it

reduces the risk around exhaustion
of resources.

Actually just try and sell it on
the increased performance.

There's no

Nikolay: problem to sell PgBouncer.

Everyone wants it.

Almost everyone.

Michael: Well, I don't, I'm interested.

Nikolay: Yeah, Yeah.

Yeah.

Well, I talked about that company.

There I couldn't sell it.

These days I see everyone understands
quite easier that connection

pooling is needed.

And also keep in mind that I just
said a lot of very fast SELECTs.

If you have long running statements,
quite long, for example,

10 milliseconds or 100 milliseconds,
it's quite long.

It means if 100 milliseconds, it
means execution 100 milliseconds,

including planning time.

It means that 1 backend can do
only 10 QPS, TPS.

It's not a lot at all.

And that's it.

And communication overhead, which
PgBouncer will take on its

shoulders, will not help a lot
in this case.

It will help only if you have tons
of fast SELECTs.

In this case, this overhead will
be comparable to execution time,

right?

Or maybe even higher somehow.

Some millisecond SELECTs, this
is what we need.

And guys can tell you, you know
what?

We don't have so many.

Depending on application, but sometimes
we see average Query

latency is exceeding 1 millisecond
significantly.

In this case, PgBouncer will have,
but not so much.

Well, again, depending on the client,
sometimes clients are slow,

sometimes they're under control
and also slow.

But anyway, I just, from my experience
lately, dealing with a

lot of startups, including AI startups,
which grow really fast.

Everyone understands connection
pooling is needed.

Usually it's a trade-off, like
if it's RDS, what to do, because

RDS proxy is managed, no-brainer,
but they lack good behavior,

as I said, and also they lack pause/resume,
which is weird because

there's blue-green deployments,
but RDS proxy doesn't have pause/resume.

It means that there is no 0 downtime
possibility here to maintain

connections, right?

And PgBouncer has it, but you
need to deal with it yourself.

And if you go to global database
in Aurora, it's like it's, it's,

it sticks to, it sticks you to
RDS proxy more.

Michael: So what's their blue-green
deployments doing behind

the scenes if it's not?

Nikolay: I don't know about blue-green
deployments.

They do logical replication and
everything.

This is it.

Yeah.

But this switch if it's for RDS
proxy, RDS proxy doesn't have

pause/resume, I know it, right?

Michael: Yeah, interesting.

Nikolay: Well, maybe, I haven't
checked a couple of months, maybe

I have outdated, but I didn't see
the news about pause/resume.

I think the switch is still, like,
interrupting current execution,

so it's near 0 downtime, not fully
0 downtime.

Michael: So when you were talking,
going back a bit, I'm still

a bit confused, you were talking
about folks not being able to

demonstrate a kind of a theoretical
limit of what's the added

risk, why would you need to demonstrate
it?

Like if you're seeing everybody
using a database side pool like

PgBouncer already, who's it for
that demonstration?

Nikolay: So let's distinguish 2
things here.

First is how to demonstrate the
need in connection pooling.

This is a slightly different topic
than reconfiguration of

max_connections.

Michael: Yeah, but very related,
no?

Nikolay: Related, but I find it
easier to convince that connection

pooling is needed compared to let's
reduce max_connections.

Michael: Okay, so people are like,
we'll have a pooler, but we'll

also still have 10,000 max_connections.

Interesting.

Nikolay: Well, it might be just
because it's painful to restart

for them.

That's it, honestly.

Michael: Yeah.

Yeah.

And if most things are going through
the pooler and the pooler

has a lower limit, then actually
they might not get near their

max_connections.

It's just a number, not actually
how many connections they have.

Nikolay: Yeah, I'm checking blue/green
deployments once again,

and I don't see pause/resume, and
I think switchover allows connections

in both environments, allows write
operations on 1 cluster and

only read only on another cluster.

And then you basically, You need
to take care of everything yourself.

This is undercooked, I think, situation.

They should have pause/resume
in RDS proxy or just ship PgBouncer

in a managed form as some other
managed platforms do, right?

Yeah, true.

That's it.

In this case, it would be possible
to achieve fully 0 downtime.

And we know like we our standard
criticism of blue green deployments,

that is that it's not blue green
deployments because it's you

lose 1 environment after switchover,
which is super strange.

So it's not reversible.

Michael: It's not reversible, yeah.

I don't think you lose it, but
it's not reversible, yeah.

Nikolay: Yeah, so again, I don't
find problems convincing about

the need of pooler, but how to,
and again, it's easy to demonstrate

that pooler brings benefits to
certain types of workload significantly,

but how to demonstrate, and By
the way, we could emulate slow

clients.

In this case it will be even more
interesting to see how with

PgBouncer Postgres behaves much
better, higher TPS, healthier

state, lower active sessions and
so on.

But when we move to the topic,
OK, max_connections, forget about

pooler.

How can I see that lower max_connections
is somehow beneficial?

Here I have a hard time right now.

If anyone has ideas let's work
together and create some benchmarks,

demonstration tests, experiments.

Michael: So maybe coming full circle,
the things that triggered

this for you was the discussion
of it in the context of migrations

and I mean specifically like schema
changes and data pipeline

like long maybe long running transactions
maybe things that take

heavy locks.

Nikolay: So long running transactions
because if you acquire

a lock it's released in the end
so it's a long running transaction.

Michael: Yeah, I was thinking on
the read-only ones, at least

there's, I mean there are still
massive implications, but it's

not as devastating unless they're
combined, right, unless you

have both.

Nikolay: Read-only transactions
also acquire locks.

Michael: Yes, yeah, of course.

But they're fine until they block

Nikolay: a...

Modification.

Exclusive lock coming from ALTER TABLE
cannot be acquired because

of ongoing access share lock from some
ongoing SELECT, which probably

isn't finished, but there is data
pipelines, they love complex

long transactions and some brief
SELECT, which you executed in

the beginning of transaction holds
this access share lock till the

very end of transaction.

And it blocks ALTER TABLE.

It's just ALTER TABLE should have
low lock_timeout and re-tries.

Michael: Yes, so that's the...

Nikolay: That's the approach.

This is good mitigation to...

Not to properly schedule something,
because if not, the data

pipeline can be something else.

At any time, Postgres can start
aggressive autovacuum process,

which will be freezing your tuples
to prevent transaction ID

wraparound, and this will block
your DDL.

If it hasn't happened yet, Either
it happened but you didn't

notice yet, which is very common
I think.

Some outage, we tried everything
good.

Okay, Postgres is weird.

Michael: Or even not even an outage,
maybe just some customers

getting some slow, like things
running slowly.

Nikolay: Yeah, like 5 seconds or
10 seconds latency spike and

we don't understand why.

Or you just got lucky And it awaits
you next week.

So everyone must understand this
topic and implement low log

timeout and retries until Postgres
will have work concurrently

in all operations.

Which will make things interesting
because concurrently means

you lose the transactional behavior
of DDL.

So it's very interesting.

The future is interesting because
I think ALTER TABLE should

have something like concurrently
or with 3 tries but it cannot

be inside transaction block in
this case.

So it's so interesting topic, right?

Michael: Yeah, you always combine
these, lock timeout and retries,

and it's only just struck me that
if that sounds like a lot of

work to people, and then retries
can feel like a lot of work.

Actually, the important part is
the lock timeout.

Retries, it's

Nikolay: like, whoa.

That's a good point.

Yesterday, we opened gates for
first clients for our copilot

product, PostgreSQL Copilot.

And we had a new client, which
actually originally they came

as a consulting client but use
our product inside consulting

and like we found a lot of issues
as usual, like It's great.

And observing their situation,
I think it's RDS or Aurora, I

don't remember particularly, but
I noticed lock timeout was very

low.

It was very, very low.

It's so interesting, 100 milliseconds.

I haven't talked to them yet.

I will soon.

But it's an interesting situation.

Global lock timeout is very low.

So I even started to think, okay,
lock timeout, does it affect

only DDL or row level locks as
well?

What do you think?

Michael: Good question.

I don't know.

Nikolay: Yeah, it affects all.

Yeah.

Row level lock as well.

So if you update a row but didn't
say commit, so it's like multistate

on transaction.

You updated the row, sit inside
transaction.

Another transaction just fails
after 1 second.

Oh, not after 1 second, after 100
milliseconds.

And this makes me think, okay,
something is not right here.

You know, deadlock timeout is 1
second by default.

Yeah.

But lock timeout is 100 milliseconds.

So that lock detection never happens.

Michael: Yeah, in that case.

Nikolay: Yeah.

How is it possible?

I'm very curious how this application
feels, right?

And I will be talking to them soon,
I hope.

And I think maybe I'm just overlooking,
maybe there are some

settings at session level, user
level.

So this global maybe is overwritten.

Because it's a very interesting
situation to keep it so low and

global yeah but what you say basically
okay forget about read-write

let's just set lock timeout second
or 2 seconds and maybe only

for DDL right

Michael: yeah that's all I'm thinking
is to start with start

like add lock timeouts to your,
just to your migrations.

And then if they start failing,
because they won't always like

depending on your lock timeout
and depending on your like, well,

Especially if you design them in
a way that is not super...

You know, if you're not doing full
table reworks and things.

Nikolay: You are right, but I can
tell you what happens next.

Next pipeline runs for 10 minutes,
and then it fails.

And like, fuck, damn, we need to
retry it whole.

So retries will be long, big loops.

Michael: Within the pipeline, and
within hours.

Nikolay: Yeah, yeah, yeah.

That's why I say do retries right
on.

The most radical way, extreme way
to do retries is inside transactions,

have sub-transactions.

This is the only case when I can
say sub-transactions are reasonable.

Well I think some financial applications
might need it as well,

but in general case I tend to say
Avoid sub-transactions.

But if you have a complex DDL transaction,
not DDL, it can be

DDL, DML or something, and then
inside it you want to have retries,

and you cannot lose...

You can, but it's too costly, you
will need to retry a lot of

stuff.

You don't want to lose what you
have so far in that transaction.

OK, you can use safe points there,
but you need to double check

that safe points don't go too deep
in terms of nesting, 64, right?

And also you need to check there
are no long-running transactions

that will affect the health, and
you will see sub-transfers for

your wait events.

So in this case, you just retry,
and if retry fails, you retry.

If you have a lot of timeout, if
your local DDL fails, you don't

lose everything.

You lose only until the latest
save point, and then retry again.

Usually it's done using PL/pgSQL,
and begin, exception, when,

blah, blah, and blocks.

It will create sub-transactions
implicitly for you.

And then you can retry there, and
this becomes an exception and

block.

And it's good, it's good.

I mean, this is a good thing to
have.

And in this case, retries are super
local, and the whole pipeline

of deployment won't fail.

If everything is all right.

Of course, if there is a long transaction
running 2 hours, those

retries probably won't last so
much.

Michael: Yeah, it fails anyway.

Nikolay: So that's why I connected
these 2 words.

Michael: Oh, I definitely think
a lot of timeout and retries

is great I'm just thinking as there's
there is an intermediate

step if people want to get a lot
of the benefit without all of

the work.

Nikolay: Yeah well I agree at least
lock timeout yeah this is

already it's a safeguard from from
downtime basically yeah at

least partial downtime and here
interesting that we want like

no like shift left testing you
know We want to test and find

bugs earlier.

Ideally, developer who developed
the code finds bugs immediately,

right?

This is shifted to left.

Here it tries, we want to shift
to the right.

Because if we shifted to like testing
pipelines, yeah, like it's

too, it makes retry heavier, right?

Michael: Yeah, these things are
also, I think, on the testing

front, these are the hardest things
to test as well, right?

Because they're dependent on other
activity.

Nikolay: On situation.

Michael: And very few people have
good test setups that have

concurrent activity.

You know, it's not common.

Nikolay: Yeah, this is on 1 hand.

On another hand, it's not rocket
science at all, and you just

need to avoid 2 things.

I thought about this, and you know,
like in DBLab, for example,

in our tool for database branching,
we eventually developed methodology

where we don't need background
workload because it's very uncertain.

1 day it has long running transaction,
another day, we should

be prepared for everything.

This is our approach.

And we just decided to do 2 things.

1 thing is that we need lock timeout
and retries.

This is 1 thing.

Just to be prepared that lock cannot
be acquired.

Again, lock cannot be acquired
sometimes because of autovacuum.

Michael: Oh yeah, in anti-wraparound mode you mean?

Nikolay: Yeah, and it can run, if it's a huge table, it can

run hours sometimes.

Michael: If it's

Nikolay: throttled, it can run hours.

And another thing is that we should not keep exclusive locks

for long.

That's it.

For long means like for many seconds.

They should be brief.

If you know how to acquire locks gracefully, so not waiting for

long, blocking others, and with retries.

And also, if you acquired lock, you don't hold it too long, for

long, that's it.

There are exceptions, though.

There are exceptions.

For example, if you create a new table and you load data to it,

well, technically nobody is working with this table yet.

So if you created the table, you own the lock on it, exclusive

lock.

Inside the same transaction, you can load data.

Why not?

In this case you are breaking the rule number 2.

You hold the lock too long, but it's harmless.

So, but yeah, there are exclusions, but in general these 2 rules

are serving great.

Like they are helpful if you keep them in mind.

Graceful acquisition and don't call it too long if it's very

heavy, like lock, heaviest lock, exclusive lock.

Michael: Sounds good.

Anything else you want to touch on before we call it a day?

Nikolay: I think that's it, I hope it was helpful.

Michael: Yep, nice.

And I'm interested to hear if anyone gets in touch about any

of this as well.

Nikolay: Oh yeah, I'm curious.

Maybe someone from RDS will comment on this.

Why max_connections is so high by default?

Michael: Yeah, it could be.

Good.

All right.

Thanks Nikolay.

Take care.

You too.

Some kind things our listeners have said