Postgres FM | Transcript: LWLocks

October 17, 2025 • 38 Minutes

LWLocks

Michael: Hello and welcome to PostgresFM,
a weekly show about

all things PostgreSQL.

I am Michael, founder of pgMustard,
and as usual I'm joined by

Nick, founder of Postgres.AI.

Hey Nick, how's it going?

Nikolay: Hello Michael, going great,
how are you?

Michael: I am good also.

What are we talking about this
week?

Nikolay: We're talking about...

What's written on the cover?

Michael: Are we gonna write LWLocks
or lightweight locks as a full

word?

What do you think?

Nikolay: I like the shorter version
of course.

There's also a question to write
3 uppercase letters or everything

lowercase.

Yeah, lightweight locks, it's like,
it's even hard to pronounce.

So bottleneck locks also not good,
right?

LWLocks.

I like some systems call it latches,
you know?

Michael: Latches, yeah.

Yeah.

Why Postgres doesn't do it?

Nikolay: Maybe because of Linux
or I don't know.

Michael: Naming things is hard.

Nikolay: Yeah, yeah, yeah.

Because we have confusion sometimes
when we say just locks.

You know, like when we say backups
or logical backups dumps,

here also locks, there are 2 types
of locks.

There are more types

Michael: of locks, right?

Nikolay: But there are big 2 major
types.

Yeah, categories.

Yeah.

Is it category or type in pg_stats,
in pg_locks?

Michael: Good point.

Yeah, maybe types.

I don't know.

Another loaded term, types is another
loaded term.

Nikolay: Or mode or maybe more
let me let me just check right

now maybe it's called mode in pg_locks
I'm constantly confused

about yeah there are you know there
are some terms like class

type mode like they can be like
they are quite abstract right

it's called mode in pg_locks
it's called mode

Michael: cool

Nikolay: but pg_locks is about
heavyweight locks we are talking

today about lightweight locks right

Michael: yeah we did we did a whole
episode that we just called

locks because that's generally
what they referred to the heavyweight

locks yeah

Nikolay: if there is no additional
word it means heavyweight

locks so yeah and and why why so
because heavyweight locks you

can name them just locks because
they are closer to User, right?

You can see them in pg_locks, for
example, you can sometimes, not,

not always, but sometimes you can
acquire them directly using

just lock SQL command right just
lock table name or SELECT for

UPDATE and you sit in transaction
being acquired some locks right.

Michael: Yeah I also think in general
more people come across

them because they affect you at
earlier stages in project life

cycle.

Like you don't have to be at such
extreme scale to start being

affected by them or having to be
aware of them.

Nikolay: Yeah, I agree.

In general case.

Because in some cases, for example,
if you have read-only workloads,

that's it.

In read-only, you don't like, you
have heavyweight locks, but

you won't notice them because they're
like ACCESS SHARE lock,

that's it.

Right.

But if you have really a lot of
TPS, you might start observing

some lightweight locks.

Yeah,

Michael: I was thinking even like
Schema changes though, like

even in read-only.

Nikolay: This was edge case.

Michael: Yeah, okay, fine.

Nikolay: In general, I agree with
you, heavy locks, you bump

into them sooner.

Schema change is a great example.

Michael: Cool.

So where did you, I mean, starting
with the difference between

locks and lightweight locks is
probably great.

Nikolay: Yeah, let's talk more
about differences, because there

are important differences to understand
and feel and take into

account when you develop things
or tune, optimize, scale, migrate.

So heavyweight locks are acquired
during like SQL operations and

they are acquired for like Database
objects, like Relational

level locks.

By the way, this is super confusing.

You know, yeah, I'll be talking
about heavyweight locks a little

bit, trying to make it shorter,
but as you know, I write again

almost every day.

I skip some weekends, but I write
Postgres Marathon posts again,

And many days I already sit in
between heavyweight locks and lightweight

locks and research 1 of LWLock:LockManager, right?

So relation level locks, it's quite
confusing name because they

are called in documentation, they
are called Table level locks.

Yeah.

Which is misleading because they
are also this type of like the

same thing and inside documentation
it's already, it becomes

clear that indexes are also involved
and materialized views and

views.

And so all the relations, sequences
are also relations, right?

Michael: Yes.

I never really thought of it like
that.

Nikolay: Well, if you check
class and reltype, I think S

is sequence.

Maybe I should check again.

Should we develop a habit to check
things right online?

Michael: Why not?

Nikolay: Yeah, so I will be checking,
but meanwhile, you can

acquire locks, heavyweight locks
on tables, on indexes, on database,

right?

Like even higher level, you can
lock the whole database.

And we know the recent problem
when Recall.ai blog post, right?

Our clients, they posted about
database level lock acquired when

NOTIFY happens to establish sequential
NOTIFY events at commit

time.

And also row level locks.

Yeah.

Tuple or row level.

Let's leave it for another time.

So you can acquire locks on database
objects.

These are heavyweight locks.

Documentation is also confusing
because it says explicit locking.

Although most of the cases where
you have it, it's implicit locking.

You say alter table and you don't
say lock table, you say alter

table.

So it's actually implicit.

Well, this, I have also always
like some shift in my brain when

I need to Google documentation
for lock or heavyweight locks.

I just remember, I need to search.

I need to ignore the fact that
I'm going to look at explicit

locking documentation, although
I need implicit locking documentation,

right?

Michael: By the way, before we
move on, I checked pg_class and

you're right, sequences are in
there, and weirdly the rel kind

is capital S all the others are
lowercase I think well the ones

I can see anyway.

Nikolay: Does it mean something?

Michael: Don't know.

Nikolay: Yeah.

Oh, by the way, explicit locking
documentation, it mentions that

you can lock indexes with ACCESS SHARE
lock, for example.

But You cannot do it explicitly.

You cannot say lock an index name.

So I'm pretty sure you cannot do
it with sequences as well.

Yeah, anyway, so these are heavyweight
locks, right?

So you basically, your actions,
I mean your SQL, this is what

directly creates heavyweight locks.

And why is it needed?

Because we need to, we are not
working in single user mode.

We need to protect resources from
concurrent operations, reading,

writing, changing.

And usually we don't need to protect
from reading, but while

somebody is reading, another Backend
shouldn't modify it usually,

right?

Or for example, if you read from
table, dumping it for example,

other Session cannot modify, like
add a Column, for example,

cannot run DDL, for example, right?

Michael: Or drop it, for example.

Nikolay: And the important thing
about heavyweight locks compared

to lightweight locks is to understand
that once a lock is acquired,

it can be released only in the
very end of Transaction, Commit

or Rollback.

That's it.

Only 2 options to release this
lock.

You cannot release it midway.

Right, and this is super important
for understanding always.

It means that Transactions should
be shorter.

Right, so your actions won't affect
others.

Like, or chances to affect others
would be lower.

Right?

This is...

Michael: Or time that it affects
others is lower, right?

Nikolay: Yeah, yeah, yeah.

Michael: Because you will affect
people just for less time and

there's a point at which that becomes
unnoticeable or acceptable.

Nikolay: Yeah, yeah.

Or won't affect at all if they
come a little bit later.

But if you change something or
even if you read something and

keep Transaction open for hours,
it means nobody can modify this

table, no DDL is possible, autovacuum
cannot do some things

and so on, like it's bad.

Michael: Yeah, it's worse than
that, isn't it?

I know we've talked about this
before, but if DDL comes along

and doesn't have a lock_timeout,
then naturally you can suddenly

be down.

Yeah.

Nikolay: Yeah.

Yeah.

So because, yeah, this is also
a good point.

So heavy locks, they have this
ability, like this property to

be acquired.

Release happens only in the very
end.

And what you say also good point.

They also, there is a LockManager.

There's, By the way, I couldn't
find definition of LockManager.

Nowhere, nowhere.

Like it's like, it's obvious, right?

Even in the source code, it's not
defined, which is interesting.

So LockManager is responsible for
managing locks, heavyweight

locks, right?

And backends can form a queue of
waiting for a lock acquisition.

So if I'm waiting to acquire lock,
some other backends can be

waiting and they like ask where
is the end of the line and go

there, right?

So it's just natural, like in the
order of first, like in natural

order, right?

So unlike lightweight locks, lightweight
locks acquired and released

very quickly.

I think documentation, source code,
I mentioned it's like dozens

of operations.

Unlike There is underlying concept
of spin locks, which like

few operations only, like few instructions
only.

Lightweight locks are bigger, but
it's very fast as well.

Acquired and released, and they
don't wait until the end of transaction

because they work in lower level
of abstraction.

It's not closer to users, closer
to resources like memory.

So their main purpose is to protect
some physical resources like

parts of the memory, shared buffers,
and so on.

Right?

OK.

Yeah.

Like these things.

So they can be acquired and list
quickly.

There are only 2 types, exclusive
lock and share lock, unlike heavyweight

locks.

Heavyweight locks have a list.

And interesting relationships between
different ones, right?

Here it's only share and exclusive,
shared and exclusive.

Shared locks don't conflict, shared
lightweight locks don't conflict.

But exclusive lock cannot be acquired
while share lock is still

running, lasting.

Right?

Share lock should be released first,
then only you can acquire

exclusive lock, because exclusive
lock is needed to modify the

resource, right?

Share lock is needed to protect
for reading.

It's saying I'm reading, don't
change it because I'm still reading.

And when I'm done you can modify
it, right?

So this is lightweight locks.

And that's why they are lightweight,
because they are much, much

shorter living, right?

So these are main differences between
them.

What else?

Michael: Maybe types of lightweight
locks?

Well forgive the word types but
you know what what should people

be aware of Because I've only really
come across 1 type because

that's the type that seems to cause
the problems.

But what should people be aware
of at least?

Nikolay: Yeah so types, modes, I struggle, I'm mixing these terms

and it's really hard to distinguish
between them.

So if we talk about types, exclusive
and shared, we just covered

it.

If we talk about different kinds
or, let's say wait events

we observe in pg_stat_activity.

pg_stat_activity is the main cumulative
statistics system view.

Everyone should think, like, learn
about it, right?

It's super important because it
shows what's currently happening

in database.

And there are 2 columns called
wait event type and wait event.

Also, by the way, slightly confusing
because the word type is

there and so on.

I would prefer like, it would be
good to name that thing like

classes maybe or category.

I don't know because type word
is so overused or like overloaded.

Right.

Anyway, wait event type can be,
I think there are less than

10 class, 10 types.

And 2 of them which are most interesting
today is lock, meaning

heavyweight lock.

And LWLock.

And wait event type LWLock, you
can check documentation.

There are many, many, many, many
dozens of wait events for LWLock,

meaning that we have a lot of kinds
of LWLock.

These kinds, like again, types
are only like exclusive and shared,

but these kinds, it's classification
with respect to the resource

we are locking.

For example, LockManager itself,
although the main purpose of

LockManager is to handle, to manage
heavyweight locks.

When it does it, it does it using
a piece of memory, shared memory.

Special piece of shared memory
called, well it's called like

main lock table, right?

It's a big piece of memory which
is segmented partition to 16

partitions starting Postgres I
think 9.2 or 8.2 actually it was

very long.

num lock partitions
16.

And when a new information about
heavyweight lock is needed to

be written there, The partition
of this main lock table where

it needs to be written, it needs
to be locked by lightweight

lock, right?

To ensure nobody else is writing
to it.

So, LockManager can have up to
16 lightweight locks, which are

seen as LWLock:lock_manager.

16 because we have 16 partitions
of this main lock table in memory.

And how it works based on, for
example, for relational level

locks, based on the relation name,
there's a hash function which

understands which partition to
use, determines which partition

to use, right?

Michael: Yeah.

Nikolay: The same table or index
will always go to the same partition

of all of those 16 partitions.

Right?

Michael: So this was a long time
ago.

So back in the day, I'm guessing
this was 1 thing, and there

was probably too much contention.

Nikolay: Before 8.2.

Yeah.

Michael: Yeah.

Okay.

That's what you're talking about.

Great.

Nikolay: Yeah.

And this just to like, there's
a confusion because there was

another 16.

Yes.

Michael: That's what I was thinking.

Nikolay: It's a different constant,
which, which changed that

behavior changed in, in Postgres
18.

This behavior hasn't changed.

Yeah, fastpath changed.

This hasn't.

This still is 16 partitions and
if you have a lot of heavy lock

acquisition attempts for the same
relation, a lot I mean like

thousands or maybe dozens of thousands
per second, a lot, really

a lot, then exclusive lightweight
locks on the same partition

will be competing.

And while you are waiting, like
while we try to establish heavyweight

lock to some index or table, but
that partition is already locked

by exclusive lightweight locks from
different backends attempt

write heavyweight lock information
about it.

We need to wait a little bit.

And this little wait will be seen
as wait event type LWLock and

wait event lock manager.

Right?

Is it clear?

Because we are like, we have weird
combination of heavyweight

locks and lightweight locks in the
same topic here.

Michael: Especially because the
lightweight lock manager is actually

looking at heavyweight locks.

That's the confusing part for sure.

But I was just looking up in the
docs, in the table of all of

the...

They've called them types, wait
events of type LWLock, and

you're right, it's such a long
list.

I think there might be 50 or more.

Nikolay: You will see checkpoint,
autovacuum there, but isn't

it great that we don't observe
them?

It means it's quite well optimized,
right?

Like for example, yeah, yeah, you'll
find a lot of stuff.

I see many of them, I do observe,
not just a lock manager.

We saw many of them in production
and yeah.

Usually the rule is if you see
lock wait event type, it means

you need to go and think how to
redesign your application.

Because classic example is, for
example, we are doing some billing

system and we have a single account
which needs to be updated

for each transaction people do.

I mean, financial transaction.

And this is a classic example when
you shoot yourself into the

foot because updating the same
row will be like hotspot.

And you will see a lot of, you
see heavyweight lock contention

because many, many backends try
to update the same row.

Right.

Yeah.

So, or you mentioned a very good
example.

If you do DDL without lock timeout
and retries, you also can

have a chain of waiting backends,
which just wait until your

DDL finishes, but it itself is
waiting for some other SELECT.

And people, I see examples in blog
posts, people, like I see

examples, people try to explain
this problem, but many of them

involve updates, deletes.

No, just SELECT, DDL, and many
other SELECTs.

You don't need even to update anything
or INSERT or DELETE.

That's it.

Just SELECTs and DDL.

And we see a lock wait event in
pg_stat_activity.

It's bad.

But when we talk about...

Yeah, and for lock you saw like
it's a relation, object, page,

also page interesting, tuple, virtualxid,
that's interesting.

Advisory locks is kind of a different
thing.

But for LWLock, we have a lot,
and among them, there is lock manager.

I like the approach, which I think
RDS started, maybe not RDS,

but they use it a lot and I also
started using it.

We usually take WaitEventType and
WaitEvent to columns from pg_stat_activity

and write them with a colon in
between.

So it becomes LWLock:LockManager.

Just in, you know, like in texts
where we discuss problems and

do RCA or something.

Yeah, it's just convenient.

Michael: Like a naming convention.

Nikolay: And I wanted to highlight
that this problem, which related

to both heavyweight locks and lightweight
locks, in the name of

it, we have the word lock twice.

LWLock:lock_manager.

First time it's about lightweight
lock but in lock manager it's

about heavyweight lock.

That's why the lock is encountered
twice, 2 times.

What else?

We have other, we observed other
types of lightweight locks problems.

For example,

Michael: yeah.

Well, I've spotted 1 in the list,
SubtransSLRU.

Nikolay: Yeah, this is my favorite
1.

Although I must admit, I haven't
touched this topic for a few

years.

Yeah, yeah.

I touched it heavily in 2021 when
GitLab had the problem.

Yeah.

And studied it.

And yeah, and since then, SLRU,
it's simple, at least recently

used, it's small caches, Postgres
have multiple of them.

I think since Postgres 12 or 13,
we have pg_stat_slru system

view, where you can see like counters
of work of those SLRUs,

But also SLRU mechanism got some
handles, I mean, settings, GUC,

GUCs, right?

People say GUCs.

You can change them and increase
it and yeah, to postpone this

performance cliff.

So I haven't seen them often since
then.

Like there are customers who usually
read and come, we help like,

we help easily, like just try to
get rid of sub-transactions,

although, like I still think by
default you should avoid sub-transactions,

but in some cases I already see
they can be used in safe way.

You know, you need to just understand
the limits and then you

can use them.

For example, again, DDL, sometimes
complex changes of schema.

You don't want to lose part of
schema.

And this approach with attempts
to acquire lock, how can you do

it inside transaction, you need
sub-transaction, right?

Yeah.

Because if attempt fails, you don't
want to lose everything.

You want to lose only the last
step, right?

And this is exactly where I think
it's worth thinking about to

use some transaction, but you need
to understand like details.

For example, you don't want to
have a long transaction running

on the primary in parallel.

To other table, by the way, not
to any table.

And replicas which receive a lot
of like transactions per second

because they might be down because
of the use of sub-transaction

and you can see subtrans SLRU
because SLRU is overflown.

And again when it's overflown the
lightweight locks acquisition

like we see contention and we see
it as in pg_stat_activity and

wait event analysis as
LWLock:SubtransSLRU

Right.

There are other SLRUs, right, mentioned
like notify SLRU, I'm

pretty sure MultiXact upset,
MultiXact member SLRU.

Speaking of them, we had an episode
about the case from Metronome,

right?

Michael: Yeah, true.

Nikolay: Yeah, it was a great blog
post.

Like this is a great example how
company can share with community

what happened and others benefit.

Since then we had another client,
new client we had, which came

to us with very same problem related
to MultiXact member.

Michael: Wow.

Member exhaustion.

Nikolay: Yeah, exactly.

Wow.

So it's also observed like a lightweight
lock, MultiXact blah,

blah.

There are, there's a bunch of MultiXact
lightweight locks you

can see in the table.

Another 1 is a very popular 1 is
LWLock:buffer_mapping.

So usually it's called a buffer
thrashing, right?

When we let like the buffer pool
is not big enough and a lot

of eviction happens and new pages
are loaded all the time and

and we see when of course when
it's happening to protect memory

Postgres needs to use exclusive
lightweight lock when writing

happening, shared lightweight lock
when reading happening.

And this is exactly when we can
see some backends are a little

bit waiting for other backends,
right?

And this is how it's seen.

Michael: So like if somebody's
limited by the amount of shared

memory, or like shared buffers.

Nikolay: Yeah, solution is simple.

We need to increase the buffer
pool.

We need to fight bloat because
this is what like increases this

problem.

You need to get rid of unused indexes
and other things because

they also contribute to it.

Right.

Unused ones.

Of course.

Michael: Like, I was just thinking
they would be evicted and

not.

Nikolay: Think better when you
change something with insert or

non-HOT update.

Michael: All indexes need to

Nikolay: be loaded to be changed.

Yeah.
Yeah.

And they contribute to this spam
coming to the buffer pool and

also to WAL, but it's a different
story.

So we, that's why I think people
underestimate how bad bloat

is.

I feel it like we have a new wave of companies coming to for

consulting to us, which I call AI companies.

They probably are quite old companies, but they have the transition

to AI.

And they have increasing data volumes, increasing workloads,

and they underestimate the problem of bloat and unused indexes

and index write amplification, all this spam coming to memory

and WAL and it means backups, replication.

It's like, this thing is like multi-sided.

Yeah.

Michael: Like cascades, doesn't it?

Nikolay: Yeah.

Yeah, yeah.

It doesn't, and also it's not performance cliff which you like

suddenly see and oh we have a problem it's slowly slowly slowly

like growing

Michael: or like you sink into it slowly like like sand or like

a swamp

Nikolay: yeah yeah And then you need to increase instance size

or think about sharding and so on.

By the way, I like sharding a lot, but I think in many cases

it's just hiding the problem.

It's just distributing the problem and you just you pay to not

to solve problems.

For business it's sometimes a valid approach, right?

You just you don't have time to solve this.

But we also can just implement automated procedures to reduce

the amount of trash you have, right?

Michael: Yeah, I hear about it all of the time as well, even

at smaller companies that just upgrade their instance, especially

at the lower sizes.

They just don't wanna throw engineering time

Nikolay: at it.
I'm very surprised.

I recently started asking directly on the very first call when

we have consulting like guys, do you care about bloat and index

health?

And I usually hear no.

And then It's our job to explain why, right?

Michael: So it's easy.

Yeah, bloat is not the problem.

Bloat is like, well, it might be the root cause, but it's not

the problem users see.

They don't see.

They're not.

Nikolay: Well, unused indexes also they don't see.

Michael: Yeah, yeah.

Nikolay: They created some indexes and forgot about it, right?

Or redundant indexes.

Same problem.

I mean, similar in this case, but then boom, like buffer mapping,

LWLock:buffer_mapping.

Why?

We don't have enough memory.

Michael: Yeah.

Also, this might be part of it.

I think Bloat used to be worse.

Like I remember in the

Nikolay: before a

Michael: few optimizations

Nikolay: 14-14.

Michael: Yeah before then we you could come across especially

indexes that were like 99.9% you know in extreme cases you could

come across very very very bloated indexes it's just much less

likely now so I think it's

Nikolay: less likely but not much

Michael: yeah well okay

Nikolay: well only deduplication doesn't solve the problem when...

Like Postgres B-tree doesn't have merge.

Michael: But it does have bottom up deletion.

It does like a lot less spitting.

Nikolay: Well, if you have in the middle of B-tree, half empty page,

This space won't be used if you write, for example, it's an incremental

timestamp.

You're writing to the end always.

In the middle, nobody will write, so you have this Bloat which

won't be eliminated.

Michael: True.

Until like, yeah, anyway, let's get back to

Nikolay: the topic.

Until reindex.

Michael: Well, I was going to say until like logical replication

upgrade or something like that.

Nikolay: Well, yeah, yeah, yeah, Anyway, when you rebuild index

anyway, right?

Michael: Yeah.

Nikolay: By the way, let me advertise something.

We have open source component, which we just recently developed.

It's now entering beta stage and it aims to like to automatically

rebuild indexes on any platform, just reach out to me in any

way.

I will share details because we are looking for more cases before

we move on and make it more publicly available.

I don't advertise it because I want to understand the use cases

people have in terms of...

Anyway, if you want to rebuild indexes in an automated fashion,

we have an open source component for you.

Fresh 1, very interesting, not only for B-tree, but for any index,

almost any.

I think BRIN is not supported.

All others are supported.

Michael: I've never seen a bloated BRIN index.

Nikolay: Have you?

Yeah, good question.

It can degrade a lot if you modify.

Michael: We've got another good episode on that actually.

MaxMulti I think is...

Nikolay: On every sentence we say we had an episode already,

right?

Anyway, yeah, we're looking for early adopters for this small

tool.

Michael: Cool, I'll put it in the show notes.

Nikolay: Yeah, yeah.

Which is open source.

Yeah.

Okay.

WAL write is another 1 we see often.

Michael: Oh really?

Nikolay: Yeah.

Again, like when somebody hasn't dropped unused and redundant

indexes And they write a lot of, oh, I forgot to mention, of

course you can write, you can find queries which contribute to

this buffer thrashing, right?

And maybe get rid of them or make it less, causing less smaller

storms, right?

So you can optimize queries sometimes and, and, and avoid that

LWLock:buffer_mapping contention.

Yeah.

So about WAL writes, same thing, like if you have a lot of indexes

which contribute to WAL writes, or you have like very frequent

checkpoints and you have a random access pattern of writes.

When you write to the same page often, but between those writes

to the same page you have checkpoint, It means this page will

go as a full page right to WAL multiple times over and over.

You have a lot of WAL and this can be also an issue.

And if disks are, maybe disk I/O is saturated as well.

Yeah, these things.

Michael: Yeah.

Or IOPS.

Nikolay: Yeah.
There are, there are several things there.

Like there is I/O WALWrite, I think, and LWLock WALWrite.

I don't remember from top of my head, but there are interesting

nuances there.

I probably should cover it some day in Postgres Marathon, because

sometimes you are waiting on disks, but sometimes you are waiting

on locking internal structures, lightweight lock.

So if WAL buffers is like, there is a quite small amount of

WAL buffers in the memory, so if it's already fully written,

it needs to go to disk, probably you are waiting on disk.

But if you are writing to...

Yeah, so it should be checked in
detail when we have it.

There are several wait events there.

Also interesting thing which pops
up recently is SyncRep.

LWLock SyncRep.

Synchronous replication, when the
primary cannot continue because

it waits confirmation from replicas,
synchronous replicas.

Michael: So, okay, and you're seeing
a lot of those.

Nikolay: Not a lot, but this started
happening more and more

if you use synchronous replication,
Quorum commit.

Michael: What?

Yeah.

I don't...

I come across a lot of people that
think they're going to use

synchronous replication but then
end up don't.

Do you see it quite commonly used?

Nikolay: Let's say so.

Big old clusters are on async.

All new clusters should be on synchronous
replication, although

there is a bunch of issues with
it.

And there was a great talk a few
months ago presented by Alexander

Kukushkin about misunderstanding
of synchronous replication and

various anomalies you can experience
in current implementation

because it's actually not synchronous
replication.

This is the thing.

Because when commit happens, main
thing, when commit happens

on the primary, it actually happens.

It already happened.

Commit happened.

But we just are locked, by the
way, on heavy lock, right?

Heavyweight lock.

Our transaction is locked.

And we are waiting for 1 of replicas
to confirm.

Or this is actually a lightweight
lock sync replica.

Yeah, this is it.

Michael: This is it.

We are
Nikolay: locked and we are waiting.

And when a replica confirms, this
lightweight lock released.

This is a special case when we
need to wait for something outside,

which will help us unlock.

Michael: So I have watched that
talk, and I remember a really

good slide in it with like all
of the hops, like a really good

diagram of what actually happens.

So yeah, I'll include a link to

Nikolay: that.

Yeah, and this how to troubleshoot
LWLock:SyncRep is like, it's

not fully understood.

There are interesting new cases
which are not covered by in articles

and talks.

I think more materials are coming.

I know about some.

Michael: Okay.

Can you share them with me?

Nikolay: Well, it hasn't happened yet.

Check out the upcoming PGConf.EU.

Michael: Great.

So, yeah.

How would you feel about calling the episode there and actually

then talking about specifically the lock manager issues in a

different episode

Nikolay: Sounds good.

Good.
Nice because there are interesting answers inside.

Yeah, great Let's call it a day for today.

I think we covered 1 and a half percent because it's huge.

The list is huge.

Some of them I haven't seen ever.

Michael: But I think that's useful.

I think it's useful to kind of, for people to get a grasp on

like which ones are they most like?

Nikolay: Oh, main thing, always I mention when we talk about

lightweight lock and actually wait event analysis, RDS documentation

has great list of like knowledge and how to style troubleshooting

documents for many wait events, including many lightweight locks,

not all of them, only subset, but it's great documentation.

I hope it will be improved over time, extended, right?

Michael: Yeah.

Yeah.
It's

Nikolay: good.

I know a lot of effort was invested to building by many people

I recently reread the blog post by Jeremy Schneider how

it was done during a couple of years.

So it was huge effort That's why it's so good Short yeah, but

yeah short documents, but so so much wisdom inside

Michael: Yeah Done over a long period of time, but also by very

good people, like people that really know, you know, stuff.

Nikolay: So basically do this, this, this, like list of mitigation

action items.

But behind each step, many RCAs, right?

Yeah.

Cases, case studies, it's so much time paid to just write 1 line

what to do.

Or what to check, or how to change, how to improve.

That's a great example of documentation.

Michael: Thanks so much, Nikolay.

Thank you.

Look forward to talking again soon.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

LWLocks

Creators and Guests

Some kind things our listeners have said