Postgres FM | Transcript: Top ten dangerous issues

May 9, 2025 • 46 Minutes

Top ten dangerous issues

Nikolay: Hello, hello, this is
Postgres.FM.

I'm Nik, Postgres.AI, and as usual,
my co-host is Michael, pgMustard.

Hi, Michael, how are you doing?

Michael: I'm good, how are you?

Nikolay: Great, everything's all
right.

A lot of bugs to fix and incidents
to troubleshoot, to perform

root cause analysis as we say,
RCA.

Michael: It sounds related to our
topic today maybe.

Nikolay: Oh yeah, maybe yes.

So The topic I chose is, I'm still
not 100% sure how to name

it properly, so let's decide together.

But the situation is simple for
me, relatively.

So we help a lot of startups, and
at some point I decided to

focus only on startups.

Being like raised a few startups
and helped many startups.

I know how it feels to choose technology
and grow, grow, grow

until some problems start hitting
you.

This is exactly like usually people
choose RDS or CloudSQL or

Supabase, anything.

And they don't need to hire DBAs,
DBREs, and they grow quite

well until a few terabytes of data
or 10,000 TPS, that kind

of scale.

And then problems pop up here and
there, and sometimes they come

in batches, you know, like not
just 1 problem, but several.

And here usually, for us it's good,
they come to us.

I mean, Postgres.AI, we have still
consulting wing, quite strong

and growing.

And we helped more than 20 startups
over the last year, which

I'm very proud of.

And I collected a lot of case studies,
so to speak.

And I decided to have some classification
of problems that feel

not good at very high level, for
example, CTO level or even CEO,

when you think they might start
thinking, is Postgres the right

choice or it's giving us too much
headache?

And it's not about like, oh, out
of disk space suddenly, or major

upgrades requiring some maintenance
window.

Although this also can cause some
headache.

But it's more about problems like
where you don't know what to

do, or you see it requires a lot
of effort to solve it properly.

Michael: Yeah, I've had a sneak
peek of your list, so I like

how you've described it.

I also like the thought process
of whether it hits the CTO or

the CEO, and I was thinking, let's
say you have a non-technical

CEO, if they start hearing the
word Postgres too often it's probably

a bad sign.

Ideally you might mention it once
every few years when you do

a major version upgrade, but then
nothing bad happens and they

don't hear it again for a few years.

But if they're hearing, you know,
if it's getting to the CEO

that Postgres is causing problems
over and over again, the natural

question is going to be, is there
an alternative?

What could we do instead?

Or, you know, is this a big problem?

So I guess it's these kind of dangers,
not just to the startup,

but also to Postgres' continued
use at that startup.

Nikolay: Yeah, I like the word
dangerous here because when you

deal with some of these problems,
it might feel dangerous to

have Postgres for them.

It's bad.

Like I would like if things were
better.

So I have a list of 10 items And
we can discuss and the list

is unordered and I'm going to post
it to my social networks so

folks can discuss.

And I'm like sincerely think that
this list is useful.

If you're a startup, it's great
to just to use this checklist

to see how your cluster is doing
or clusters are doing and are

you ready.

So Postgres growth readiness checklist.

And interesting that I didn't include
vertical and horizontal

scaling there.

I did it indirectly, we will touch
it.

But obviously, like this is the
most discussed topic, the biggest

danger, how Postgres scales, like
cluster, single primary, and

multiple standbys, how far we can
go.

We know we can go very far, very,
very far on a single cluster.

At some point, microservices, or
maybe sharding, it's great.

But we had a great episode with
Lev Kokotov, a PgDog and it

resonates 1 of the items I have
today it resonates with what

he said during our episode.

So anyway, let's exclude vertical
and horizontal scaling and

talk about stuff which kind of
sounds boring.

My first item is heavy lock contention.

This is very popular.

Maybe 50% of companies that come
to us have this issue.

Somehow.

So at some point I decided to start
saying everyone, if you have

queue-like workloads, or additionally,
and or, If you don't know

how dangerous it is to change schema
in Postgres, just adding

column can be a problem, right?

We discussed it, I think, many
times.

You are not ready to grow, and
at some point, sooner or later,

it will hit you.

And it will hit you as a spike
of active sessions.

And we know some managed Postgres
platforms provoke you to have

huge number of huge max connections.

Michael: Max connections,

Nikolay: yeah.

RDS like 5, 000, 2, 500.

Why do they do this?

Easier for them.

But it's dangerous because it creates kind of performance cliff

additionally.

Michael: Yeah, it's another version of these cliffs isn't it?

Nikolay: We

Michael: had another good episode recently.

Nikolay: Yeah, I plan to research this a little bit, probably

we will publish something in this area to prove that it's not

good.

It's still not good even if you have Postgres 14 plus which has

great optimizations for a large number of idle connections it's

still not good.

Michael: And there have been some improvements like I know a

very good engineer who took Postgres down by adding a column

with a default, I think it was.

But it was many years, there's some improvements in recent years

of some DDL changes that are less dangerous than they were.

Nikolay: Yeah, there are several levels.

Michael: Yes, yeah, of course.

And if they get stuck in if they don't have a lock_timeout

for example in fact yeah we're probably going to be pointing

to episodes on every single 1 of these bullet points but we have

we had 1 on 0 downtime migrations I think it's probably the best

for that and we had a separate 1 on queues actually, didn't we?

Nikolay: So yeah.

Yeah.

So definitely there are solutions here and you just need to proactively

deploy.

It's interesting that I see some companies grow quite far not

noticing this problem, for example, with DDL.

It becomes, it's like going to casino, like you can win, you

can win.

Sometimes, boom, you lose.

Because if you deploy some DDL and you get blocked, you can block

others and it can be a disaster.

We discussed it several times.

And if you had a hundred deployments successfully, it doesn't

mean you will keep winning, right?

So it's better to have.

And it concerns me, I have a feeling we should implement this

in Postgres.

Like alter table concurrently or something like this.

It should itself perform these retries with low lock_timeout.

Michael: Yeah, it's tricky, isn't it?

But I agree, but then people still need to know that it existed

to actually use it because I think the main issue here is people

not realizing that it can be a problem.

And the fact it probably hits users.

Let's say you've got a statement_timeout.

When are you actually going to notice that users have been waiting

for it?

Are you going to notice that spike
on your on your monitoring?

I'm not sure like it depends how
many users actually got stuck

waiting behind it and had slow
queries.

So and it's going to be hard to
reproduce that you might not

know why it was that.

Nikolay: log_lock_waits is off
so you don't see who blocked

you.

And you might be auto vacuum running
in this aggressive mode

or it can be another session long
running transaction which holds

access share lock to a table and
you cannot alter it.

And boom, you block others.

So this is like a reaction chain.

And yeah, it's not good.

And queue-like workloads same, like
at some smaller scale, you don't

see problems at all.

Then you occasionally experience
them.

But if you grow very fast, you
will start hitting these problems

very badly.

And they look like spikes of heavy
lock contention or just heavy

lock and lock in Postgres terminology
is the same so just lock contention

and yeah it doesn't look good so
and suggestion is so simple

like I It's funny that we talk
a lot and people that come to

us actually they mention they watch
podcast and I say like, okay,

do you like workload?

Just take care of indexes, take
care of bloat, like maybe partitioning,

but most importantly, skip locked.

That's it.

This is a solution, but we spent
hours to discuss details.

Because when you go to reality,
it's not easy to learn this,

like there are objections sometimes,
but this is what we do,

like we work with those objections
and help to implement, right?

So yeah, but we, yeah, for everything
we had episode.

There are episodes for everything.

So this was number 1, heavy load
contention.

And I chose the most popular reasons.

Of course there are other reasons.

But in my view, DDL and queue-like
workloads, not to number biggest,

the biggest ones.

Okay, next it's boring, super boring.

Bloat control and index management.

We had episodes about it, maybe
several actually.

But Since again, managed Postgres
platforms don't give you tools.

For example, RDS, they did great
job in auto-vacuum tuning, but

only half of it.

They made it very aggressive in
terms of how much resources,

like throttling, they gave a lot
of resources.

But they don't adjust scale factors.

So it visits, autovacuum visits
your tables Not often, not often

enough for LTP.

So bloat can be accumulated and
so on, and they don't give you

resources to understand the reasons
of bloat.

I'm thinking about it and I think
it's tricky and it's also a

problem of Postgres documentation
because it lacks clarity how

we troubleshoot reasons of the
bloat because we always say long-running

transaction But not every transaction
is harmful.

For example, in default Transaction
Isolation level Read Committed,

transaction is not that harmful
if it consists of many small

queries.

If it's a single query, it holds
a snapshot, it's harmful.

So I guess with observability we
should shift from long-running

transactional language to xmin horizon
language fully and discuss

that.

Anyway, like I can easily imagine
and I observe how people think,

oh, like MongoDB doesn't have this
stuff.

Or some other Database system,
they don't have the problem with

bloat.

Or indexes, indexes, oh.

Actually with indexes, my true
belief is that degradation of

index health is happening in other
systems as well.

We also discussed it.

So they need to be rebuilt.

Michael: I was listening to a SQL
Server podcast just for fun

the other day and they had the
exact same problem.

But in the episode where we talked
about index maintenance, I

think it came up that even if you're
really on top of autovacuum,

even if you have it configured
really nicely, there can still

be occasions where you get some
bloat.

If you have like a spike or if
you have a large deletion or you

have like a there's a few cases
where you can end up with sparsely

populated indexes that can't self-heal
like if for example you've

got like an or like a even UUIDv7
index and then you have a section

that maybe deletes some old data
and it's not partitioned, then

you've got a gap in your index.

So there's a bunch of reasons why
they can get bloated anyway,

even if you're on top of autovacuum.

So I think this is 1 of those ones
that, yes, autovacuum fixes

most of the problems, but you probably
still want to have some

plan for index maintenance anyway.

Nikolay: Yeah, so there are certain
things that are not automated

by Postgres itself or by Kubernetes
operators or by, Well, some

of them automated some things,
but not everyone, not everything.

Or managed service providers, even
upgrades.

Also like lack of automation there.

We can mention this lack of automation
of analyze, but fortunately

future Postgres versions will be
definitely fine because dump,

restore of statistics is implemented
finally and goes to Postgres

18, which is super great news.

Anyway, lack of automation might
feel like, oh, this is a constant

headache, but it's solvable.

It's solvable.

Fortunately, it requires some effort,
but it's solvable.

Okay.

Next thing is, next thing is, let's
talk about lightweight lock

contention.

So we talked about heavy lock contention
or just lock contention.

Lightweight lock contention is
also, this feels like, like pain

and of various kinds.

So lightweight locks can be called
latches, it's in memory.

So when some operations with buffer
pool happen, for example,

there are the lightweight locks,
Postgres needs to establish or

working with WAL or various data
structures.

Also can mention LockManager.

So things like LWLock:LockManager
or buffer mapping

or sub-trans SLRU, multi-exec SLRU.

When you hear this, for me, like
these terms, imagine like this

font, like bloody, you know, like
red blood, blood drops, drops

of blood because, because III know
so like many projects like

suffered big pain, like big incidents.

So for me, these, these terms are
like bloody terms, you know,

because, because yeah, because
it's, it's, it was a lot of pain

sometimes.

For example, you know I'm a big
fan of sub-transactions, right?

Just my natural advice is just
to eliminate them all.

Well, over time, I'm softer.

I say, okay, you just need to understand
them and use very carefully.

But LockManager, couple of years,
remember Jeremy Schneider

posted like-

Michael: Yeah, great post.

Nikolay: Horror stories, and we
discussed it as well.

So this kind of contention might
hit you and it feels like performance

cliff usually so all good all good
boom

Michael: right it's it what is
or was is it changing in 18 but

it what it was a hard-coded limit,
right?

Nikolay: 2016, you mean for fast
path?

Also SLRU sizes are now configurable,
I think in 2017 already.

Well, nice, good, but not always
enough.

Because okay, you can buy some
time, but still there is a cliff

and if you're not far from it,
again, boom.

Or this, I recently saw it, like,
remember we discussed 4 million

transactions per second.

And we discussed that we first
we found pg_stat_kcache was an

issue, it was fixed, and then pg_stat_statements.

Okay, pg_stat_statements, if the
transactions are super fast,

it's bringing an observer effect.

And we see it in newer Postgres
versions as LWLock:pg_stat_statements

because finally code is covered
by proper props, right?

Not props, like is wrapped and
it's visible in the wait event

analysis observing just that activity.

So I saw it recently at 1 customer,
I know like some layer of

lightweight lock pages and statements,
So we need to discuss what's

happening.

It happens only when you have a
lot of very fast queries, but

it can be a problem as well.

But yeah, and performance cliffs,
it requires some practice to

understand where they are.

It's hard because you need to understand
what kind of, like how

to measure usage, how to understand
like situation risks.

This requires some practice.

Michael: I think this is 1 of the
hardest ones.

I think this is 1 of the hardest
ones to see coming.

Nikolay: After all our stories
with LW LockManager, every time

I see some query exceeds 1000 QPS,
queries per second, I'm already

thinking, okay, this patient is
developing some chronic disease

you know.

Michael: Okay that's another that's
1 I haven't heard we've done

several rules of thumb before but
that's that's another good

1 so a thousand queries per second
for a single query check.

Nikolay: It's very relative also
how many vCPUs we have.

If we have less, it can hit faster.

Although we couldn't reproduce
exactly the same nature as we

see on huge machines like 196 cores,
we couldn't reproduce that

nature on 8 core machines at all.

So yeah, it's for big boys only,
you know.

This is, yeah, this is, like, or
maybe for adults.

So young projects don't experience
these problems.

Michael: That's a good point actually
the startups that have

hit this that you've written about
and things have tended to

be further along in their journeys
huge huge but yeah but still

growing quickly and it's even a
bigger problem at that point

but yeah good point should we move
on?

Nikolay: Yeah so the next 1 is
our usual suspect right it's a

wraparound of 8 byte transaction
ID and multi-exact ID.

So many words already said about
this.

It just bothers me that monitoring
doesn't cover, for example,

usually doesn't cover multi-exact
IDs.

And people still don't have alerts
and so on.

So it's sad.

Yeah, it's easy to create these
days.

Michael: I get the impression though,
I mean, there were a few

high profile incidents that got
blocked about.

I think, yes, yeah, exactly.

And I feel like I haven't seen
1 in a long while and I know there

are a lot of projects that are
having to you know, I think Agen

have spoken about If they weren't
on top of this it would be

it would only be a matter of hours
before they'd hit wrap around,

you know it's that kind of volume.

So they're really really having
to monitor and stay on top of

it all the time.

But I haven't heard of anybody
actually hitting this for quite

a while.

Do you think, I wondered if for
example there were some changes

to I think it was autovacuum being
like I think it kicks in

to do an anti-wraparound vacuum
differently or it might be lighter

a lighter type of vacuum that it
runs now I think I remember

Peter Geoghegan posting about it, something
like that.

Do you remember a change in

Nikolay: that area?

I don't remember, honestly.

I just know this is still a problem.

Again, at CTO level, it feels like,
how come Postgres still has

4 byte transaction IDs and what
kind of risks I need to take

into account.

But you are right, managed Postgres
providers do quite a good

job here.

They take care.

I had a guest at Postgres TV, Hannu
Krosing, who talked about

how to escape from it in a non-traditional
and in his opinion,

and actually my opinion as well,
in a better way, without single

user mode.

And since he is a part of CloudSQL
team, so it also shows how

much of effort managed Postgres
providers do in this area, realizing

this is a huge risk.

Michael: Yeah, and it's not even,
even if it's a small risk,

the impact when it happens is not
small.

So it's 1 of those ones where-
Absolutely

Nikolay: good correction.

It's low risk, high impact, exactly.

Michael: Yes, yes.

So I think the cases that were
blogged about were hours and hours

possibly even getting, was it even
a day or 2 of downtime for

those organizations?

And that was, that is then, I mean,
you're talking about dangers,

right?

Nikolay: That's- Global downtime,
whole Database is down.

Michael: Exactly.

People, you're gonna lose some
customers over that, right?

Nikolay: Yeah, unlike the next
item, a 4-byte integer primary

key is still a thing, you know.

I was surprised to have recently
this case, which was overlooked

by our tooling.

Michael: Oh, really?

Nikolay: I couldn't like, how come?

Yeah, because it was a non-traditional
way to have this.

Michael: Go on.

Nikolay: Well, it was first of
all, the sequence, which was used

by multiple tables.

Yeah, 1 for all of them.

And somehow it was defined, so
our report in postgres-checkup didn't

see it so when it came I was like
how come this like this is

old friend or old enemy, not friend,
enemy.

Michael: Old enemy.

Nikolay: I haven't seen you for
like so many years.

And you look differently, you know,
because multiple tables,

but still, like, it's not fun.

And this causes partial downtime
because some part of workload

cannot work anymore.

You cannot INSERT.

Michael: Yeah.

Nikolay: Yeah.

So, by the way, I also learned
that if you just do in place ALTER

TABLE for huge table, it not so
dumb as I thought.

I checked source code, I was impressed.

And this code is from 9 point something,
maybe even before.

So if you ALTER TABLE, ALTER COLUMN
to change from int4 to int8,

it actually performs a job like
similar to VACUUM FULL.

Recreating indexes.

And you don't have bloat.

I expected like 50% bloat, you
know.

Michael: Oh, why?

Nikolay: Because I thought it will
be, it will rewrite the whole

table.

I was mistaken.

It's quite smart.

Yeah, it's of course it's a blocking
operation, it causes downtime

to perform it, but you end up having
quite clean state of table

and indexes.

Not quite, clean state, it's fresh.

Michael: Yeah, so that is a table
rewrite, no?

Nikolay: Yes, well, table, yes,
well, you are right.

I was thinking about table rewrite
as a very dumb thing, like

create more tuples and DELETE other
tuples.

Michael: Got it, got it, got it.

Nikolay: But there is a mechanism
of table rewrite in the code.

Now I saw it finally, I'm still
learning, you know, sorry.

Michael: You might end up with
some padding issues if you had

it optimized well before, but yeah.

Nikolay: Yeah, it also feels like
Postgres could implement some

reshape like eventually because
there are building blocks in

the code already, I see them like
to first like offline style

to change COLUMN order and then
if you want it and then fully

online style if pg_squeeze goes
to core to core right yeah yeah

it would be great yeah I'm just
like connecting paths here and

can be very powerful in like 3
to 5 years maybe But it's a lot

of work additionally.

So all those who are involved in
moving huge building blocks,

I have huge respect.

So okay.

And I

Michael: think this is, if you
know what you're doing, this one's

easier to recover from.

I assume like with the sequence,
for example, you can handle

it multiple ways, but you can set
the sequence to like negative

2 billion and normally you've got
a good start.

Nikolay: Everyone thinks they're
smart and and this is first

thing I this is the first thing
I hear always when we discuss

this.

This was in the past, like let's
use negative values.

Of course, if you can use negative
values, do it.

Because we know Postgres integers
are signed integers.

So we use only half of capacity
of, 4 byte capacity, half of

it is 2.1 billion, roughly.

So you have 2.1 billion more, but
not always it's possible to

use.

But this is old, old, old story
still making some people nervous

and I think it's good to check
in advance.

Michael: So much better, so much
better to have alerts when you're

getting...

Nikolay: I sell several companies,
big ones, from this, just

raising this.

And I know in some companies, it
was like 1 year or a few years

work to fix it.

Michael: So what was the problem
before?

Was it looking at columns instead
of looking at sequences?

Or what was the-

Nikolay: No, sequences are always
8 bytes.

It was always so, like, if they
are 8 bytes.

Problem with report, I don't remember
honestly.

There was some problem with report.

It was not standard way, not just
create table and you have default

with sequence and you see it.

Something else, some function,
I don't remember exactly.

Okay.

But usually our report catches
such things.

Or you can just check yourself
if you have primary keys for byte,

it's time to move to 8 bytes or
to UUID version 7, right?

Maybe.

Okay, that's it about this.

Then let's heat up the situation,
replication limits.

So in the beginning I mentioned
vertical and horizontal scaling

and usually people say there's
not enough CPU or something and

disk I.O.

And we need to scale and you can
scale read only workloads having

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Top ten dangerous issues

Creators and Guests

Some kind things our listeners have said