
LWLocks
Michael: Hello and welcome to PostgresFM,
a weekly show about
all things PostgreSQL.
I am Michael, founder of pgMustard,
and as usual I'm joined by
Nick, founder of Postgres.AI.
Hey Nick, how's it going?
Nikolay: Hello Michael, going great,
how are you?
Michael: I am good also.
What are we talking about this
week?
Nikolay: We're talking about...
What's written on the cover?
Michael: Are we gonna write LWLocks
or lightweight locks as a full
word?
What do you think?
Nikolay: I like the shorter version
of course.
There's also a question to write
3 uppercase letters or everything
lowercase.
Yeah, lightweight locks, it's like,
it's even hard to pronounce.
So bottleneck locks also not good,
right?
LWLocks.
I like some systems call it latches,
you know?
Michael: Latches, yeah.
Yeah.
Why Postgres doesn't do it?
Nikolay: Maybe because of Linux
or I don't know.
Michael: Naming things is hard.
Nikolay: Yeah, yeah, yeah.
Because we have confusion sometimes
when we say just locks.
You know, like when we say backups
or logical backups dumps,
here also locks, there are 2 types
of locks.
There are more types
Michael: of locks, right?
Nikolay: But there are big 2 major
types.
Yeah, categories.
Yeah.
Is it category or type in pg_stats,
in pg_locks?
Michael: Good point.
Yeah, maybe types.
I don't know.
Another loaded term, types is another
loaded term.
Nikolay: Or mode or maybe more
let me let me just check right
now maybe it's called mode in pg_locks
I'm constantly confused
about yeah there are you know there
are some terms like class
type mode like they can be like
they are quite abstract right
it's called mode in pg_locks
it's called mode
Michael: cool
Nikolay: but pg_locks is about
heavyweight locks we are talking
today about lightweight locks right
Michael: yeah we did we did a whole
episode that we just called
locks because that's generally
what they referred to the heavyweight
locks yeah
Nikolay: if there is no additional
word it means heavyweight
locks so yeah and and why why so
because heavyweight locks you
can name them just locks because
they are closer to User, right?
You can see them in pg_locks, for
example, you can sometimes, not,
not always, but sometimes you can
acquire them directly using
just lock SQL command right just
lock table name or SELECT for
UPDATE and you sit in transaction
being acquired some locks right.
Michael: Yeah I also think in general
more people come across
them because they affect you at
earlier stages in project life
cycle.
Like you don't have to be at such
extreme scale to start being
affected by them or having to be
aware of them.
Nikolay: Yeah, I agree.
In general case.
Because in some cases, for example,
if you have read-only workloads,
that's it.
In read-only, you don't like, you
have heavyweight locks, but
you won't notice them because they're
like ACCESS SHARE lock,
that's it.
Right.
But if you have really a lot of
TPS, you might start observing
some lightweight locks.
Yeah,
Michael: I was thinking even like
Schema changes though, like
even in read-only.
Nikolay: This was edge case.
Michael: Yeah, okay, fine.
Nikolay: In general, I agree with
you, heavy locks, you bump
into them sooner.
Schema change is a great example.
Michael: Cool.
So where did you, I mean, starting
with the difference between
locks and lightweight locks is
probably great.
Nikolay: Yeah, let's talk more
about differences, because there
are important differences to understand
and feel and take into
account when you develop things
or tune, optimize, scale, migrate.
So heavyweight locks are acquired
during like SQL operations and
they are acquired for like Database
objects, like Relational
level locks.
By the way, this is super confusing.
You know, yeah, I'll be talking
about heavyweight locks a little
bit, trying to make it shorter,
but as you know, I write again
almost every day.
I skip some weekends, but I write
Postgres Marathon posts again,
And many days I already sit in
between heavyweight locks and lightweight
locks and research 1 of LWLock:LockManager, right?
So relation level locks, it's quite
confusing name because they
are called in documentation, they
are called Table level locks.
Yeah.
Which is misleading because they
are also this type of like the
same thing and inside documentation
it's already, it becomes
clear that indexes are also involved
and materialized views and
views.
And so all the relations, sequences
are also relations, right?
Michael: Yes.
I never really thought of it like
that.
Nikolay: Well, if you check
class and reltype, I think S
is sequence.
Maybe I should check again.
Should we develop a habit to check
things right online?
Michael: Why not?
Nikolay: Yeah, so I will be checking,
but meanwhile, you can
acquire locks, heavyweight locks
on tables, on indexes, on database,
right?
Like even higher level, you can
lock the whole database.
And we know the recent problem
when Recall.ai blog post, right?
Our clients, they posted about
database level lock acquired when
NOTIFY happens to establish sequential
NOTIFY events at commit
time.
And also row level locks.
Yeah.
Tuple or row level.
Let's leave it for another time.
So you can acquire locks on database
objects.
These are heavyweight locks.
Documentation is also confusing
because it says explicit locking.
Although most of the cases where
you have it, it's implicit locking.
You say alter table and you don't
say lock table, you say alter
table.
So it's actually implicit.
Well, this, I have also always
like some shift in my brain when
I need to Google documentation
for lock or heavyweight locks.
I just remember, I need to search.
I need to ignore the fact that
I'm going to look at explicit
locking documentation, although
I need implicit locking documentation,
right?
Michael: By the way, before we
move on, I checked pg_class and
you're right, sequences are in
there, and weirdly the rel kind
is capital S all the others are
lowercase I think well the ones
I can see anyway.
Nikolay: Does it mean something?
Michael: Don't know.
Nikolay: Yeah.
Oh, by the way, explicit locking
documentation, it mentions that
you can lock indexes with ACCESS SHARE
lock, for example.
But You cannot do it explicitly.
You cannot say lock an index name.
So I'm pretty sure you cannot do
it with sequences as well.
Yeah, anyway, so these are heavyweight
locks, right?
So you basically, your actions,
I mean your SQL, this is what
directly creates heavyweight locks.
And why is it needed?
Because we need to, we are not
working in single user mode.
We need to protect resources from
concurrent operations, reading,
writing, changing.
And usually we don't need to protect
from reading, but while
somebody is reading, another Backend
shouldn't modify it usually,
right?
Or for example, if you read from
table, dumping it for example,
other Session cannot modify, like
add a Column, for example,
cannot run DDL, for example, right?
Michael: Or drop it, for example.
Nikolay: And the important thing
about heavyweight locks compared
to lightweight locks is to understand
that once a lock is acquired,
it can be released only in the
very end of Transaction, Commit
or Rollback.
That's it.
Only 2 options to release this
lock.
You cannot release it midway.
Right, and this is super important
for understanding always.
It means that Transactions should
be shorter.
Right, so your actions won't affect
others.
Like, or chances to affect others
would be lower.
Right?
This is...
Michael: Or time that it affects
others is lower, right?
Nikolay: Yeah, yeah, yeah.
Michael: Because you will affect
people just for less time and
there's a point at which that becomes
unnoticeable or acceptable.
Nikolay: Yeah, yeah.
Or won't affect at all if they
come a little bit later.
But if you change something or
even if you read something and
keep Transaction open for hours,
it means nobody can modify this
table, no DDL is possible, autovacuum
cannot do some things
and so on, like it's bad.
Michael: Yeah, it's worse than
that, isn't it?
I know we've talked about this
before, but if DDL comes along
and doesn't have a lock_timeout,
then naturally you can suddenly
be down.
Yeah.
Nikolay: Yeah.
Yeah.
So because, yeah, this is also
a good point.
So heavy locks, they have this
ability, like this property to
be acquired.
Release happens only in the very
end.
And what you say also good point.
They also, there is a LockManager.
There's, By the way, I couldn't
find definition of LockManager.
Nowhere, nowhere.
Like it's like, it's obvious, right?
Even in the source code, it's not
defined, which is interesting.
So LockManager is responsible for
managing locks, heavyweight
locks, right?
And backends can form a queue of
waiting for a lock acquisition.
So if I'm waiting to acquire lock,
some other backends can be
waiting and they like ask where
is the end of the line and go
there, right?
So it's just natural, like in the
order of first, like in natural
order, right?
So unlike lightweight locks, lightweight
locks acquired and released
very quickly.
I think documentation, source code,
I mentioned it's like dozens
of operations.
Unlike There is underlying concept
of spin locks, which like
few operations only, like few instructions
only.
Lightweight locks are bigger, but
it's very fast as well.
Acquired and released, and they
don't wait until the end of transaction
because they work in lower level
of abstraction.
It's not closer to users, closer
to resources like memory.
So their main purpose is to protect
some physical resources like
parts of the memory, shared buffers,
and so on.
Right?
OK.
Yeah.
Like these things.
So they can be acquired and list
quickly.
There are only 2 types, exclusive
lock and share lock, unlike heavyweight
locks.
Heavyweight locks have a list.
And interesting relationships between
different ones, right?
Here it's only share and exclusive,
shared and exclusive.
Shared locks don't conflict, shared
lightweight locks don't conflict.
But exclusive lock cannot be acquired
while share lock is still
running, lasting.
Right?
Share lock should be released first,
then only you can acquire
exclusive lock, because exclusive
lock is needed to modify the
resource, right?
Share lock is needed to protect
for reading.
It's saying I'm reading, don't
change it because I'm still reading.
And when I'm done you can modify
it, right?
So this is lightweight locks.
And that's why they are lightweight,
because they are much, much
shorter living, right?
So these are main differences between
them.
What else?
Michael: Maybe types of lightweight
locks?
Well forgive the word types but
you know what what should people
be aware of Because I've only really
come across 1 type because
that's the type that seems to cause
the problems.
But what should people be aware
of at least?
Nikolay: Yeah so types, modes, I struggle, I'm mixing these terms
and it's really hard to distinguish
between them.
So if we talk about types, exclusive
and shared, we just covered
it.
If we talk about different kinds
or, let's say wait events
we observe in pg_stat_activity.
pg_stat_activity is the main cumulative
statistics system view.
Everyone should think, like, learn
about it, right?
It's super important because it
shows what's currently happening
in database.
And there are 2 columns called
wait event type and wait event.
Also, by the way, slightly confusing
because the word type is
there and so on.
I would prefer like, it would be
good to name that thing like
classes maybe or category.
I don't know because type word
is so overused or like overloaded.
Right.
Anyway, wait event type can be,
I think there are less than
10 class, 10 types.
And 2 of them which are most interesting
today is lock, meaning
heavyweight lock.
And LWLock.
And wait event type LWLock, you
can check documentation.
There are many, many, many, many
dozens of wait events for LWLock,
meaning that we have a lot of kinds
of LWLock.
These kinds, like again, types
are only like exclusive and shared,
but these kinds, it's classification
with respect to the resource
we are locking.
For example, LockManager itself,
although the main purpose of
LockManager is to handle, to manage
heavyweight locks.
When it does it, it does it using
a piece of memory, shared memory.
Special piece of shared memory
called, well it's called like
main lock table, right?
It's a big piece of memory which
is segmented partition to 16
partitions starting Postgres I
think 9.2 or 8.2 actually it was
very long.
num lock partitions
16.
And when a new information about
heavyweight lock is needed to
be written there, The partition
of this main lock table where
it needs to be written, it needs
to be locked by lightweight
lock, right?
To ensure nobody else is writing
to it.
So, LockManager can have up to
16 lightweight locks, which are
seen as LWLock:lock_manager.
16 because we have 16 partitions
of this main lock table in memory.
And how it works based on, for
example, for relational level
locks, based on the relation name,
there's a hash function which
understands which partition to
use, determines which partition
to use, right?
Michael: Yeah.
So
Nikolay: The same table or index
will always go to the same partition
of all of those 16 partitions.
Right?
Michael: So this was a long time
ago.
So back in the day, I'm guessing
this was 1 thing, and there
was probably too much contention.
Nikolay: Before 8.2.
Yeah.
Michael: Yeah.
Okay.
That's what you're talking about.
Great.
Nikolay: Yeah.
And this just to like, there's
a confusion because there was
another 16.
Yes.
Michael: That's what I was thinking.
Nikolay: It's a different constant,
which, which changed that
behavior changed in, in Postgres
18.
This behavior hasn't changed.
Yeah, fastpath changed.
This hasn't.
This hasn't.
This still is 16 partitions and
if you have a lot of heavy lock
acquisition attempts for the same
relation, a lot I mean like
thousands or maybe dozens of thousands
per second, a lot, really
a lot, then exclusive lightweight
locks on the same partition
will be competing.
And while you are waiting, like
while we try to establish heavyweight
lock to some index or table, but
that partition is already locked
by exclusive lightweight locks from
different backends attempt
write heavyweight lock information
about it.
We need to wait a little bit.
And this little wait will be seen
as wait event type LWLock and
wait event lock manager.
Right?
Is it clear?
Because we are like, we have weird
combination of heavyweight
locks and lightweight locks in the
same topic here.
Michael: Especially because the
lightweight lock manager is actually
looking at heavyweight locks.
That's the confusing part for sure.
But I was just looking up in the
docs, in the table of all of
the...
They've called them types, wait
events of type LWLock, and
you're right, it's such a long
list.
I think there might be 50 or more.
Nikolay: You will see checkpoint,
autovacuum there, but isn't
it great that we don't observe
them?
It means it's quite well optimized,
right?
Like for example, yeah, yeah, you'll
find a lot of stuff.
I see many of them, I do observe,
not just a lock manager.
We saw many of them in production
and yeah.
Usually the rule is if you see
lock wait event type, it means
you need to go and think how to
redesign your application.
Because classic example is, for
example, we are doing some billing
system and we have a single account
which needs to be updated
for each transaction people do.
I mean, financial transaction.
And this is a classic example when
you shoot yourself into the
foot because updating the same
row will be like hotspot.
And you will see a lot of, you
see heavyweight lock contention
because many, many backends try
to update the same row.
Right.
Yeah.
So, or you mentioned a very good
example.
If you do DDL without lock timeout
and retries, you also can
have a chain of waiting backends,
which just wait until your
DDL finishes, but it itself is
waiting for some other SELECT.
And people, I see examples in blog
posts, people, like I see
examples, people try to explain
this problem, but many of them
involve updates, deletes.
No, just SELECT, DDL, and many
other SELECTs.
You don't need even to update anything
or INSERT or DELETE.
That's it.
Just SELECTs and DDL.
And we see a lock wait event in
pg_stat_activity.
It's bad.
But when we talk about...
Yeah, and for lock you saw like
it's a relation, object, page,
also page interesting, tuple, virtualxid,
that's interesting.
Advisory locks is kind of a different
thing.
But for LWLock, we have a lot,
and among them, there is lock manager.
I like the approach, which I think
RDS started, maybe not RDS,
but they use it a lot and I also
started using it.
We usually take WaitEventType and
WaitEvent to columns from pg_stat_activity
and write them with a colon in
between.
So it becomes LWLock:LockManager.
Just in, you know, like in texts
where we discuss problems and
do RCA or something.
Yeah, it's just convenient.
Michael: Like a naming convention.
Nikolay: And I wanted to highlight
that this problem, which related
to both heavyweight locks and lightweight
locks, in the name of
it, we have the word lock twice.
LWLock:lock_manager.
First time it's about lightweight
lock but in lock manager it's
about heavyweight lock.
That's why the lock is encountered
twice, 2 times.
What else?
We have other, we observed other
types of lightweight locks problems.
For example,
Michael: yeah.
Well, I've spotted 1 in the list,
SubtransSLRU.
Nikolay: Yeah, this is my favorite
1.
Although I must admit, I haven't
touched this topic for a few
years.
Yeah, yeah.
I touched it heavily in 2021 when
GitLab had the problem.
Yeah.
And studied it.
And yeah, and since then, SLRU,
it's simple, at least recently
used, it's small caches, Postgres
have multiple of them.
I think since Postgres 12 or 13,
we have pg_stat_slru system
view, where you can see like counters
of work of those SLRUs,
But also SLRU mechanism got some
handles, I mean, settings, GUC,
GUCs, right?
People say GUCs.
You can change them and increase
it and yeah, to postpone this
performance cliff.
So I haven't seen them often since
then.
Like there are customers who usually
read and come, we help like,
we help easily, like just try to
get rid of sub-transactions,
although, like I still think by
default you should avoid sub-transactions,
but in some cases I already see
they can be used in safe way.
You know, you need to just understand
the limits and then you
can use them.
For example, again, DDL, sometimes
complex changes of schema.
You don't want to lose part of
schema.
And this approach with attempts
to acquire lock, how can you do
it inside transaction, you need
sub-transaction, right?
Yeah.
Because if attempt fails, you don't
want to lose everything.
You want to lose only the last
step, right?
And this is exactly where I think
it's worth thinking about to
use some transaction, but you need
to understand like details.
For example, you don't want to
have a long transaction running
on the primary in parallel.
To other table, by the way, not
to any table.
And replicas which receive a lot
of like transactions per second
because they might be down because
of the use of sub-transaction
and you can see subtrans SLRU
because SLRU is overflown.
And again when it's overflown the
lightweight locks acquisition
like we see contention and we see
it as in pg_stat_activity and
wait event analysis as
LWLock:SubtransSLRU
Right.
There are other SLRUs, right, mentioned
like notify SLRU, I'm
pretty sure MultiXact upset,
MultiXact member SLRU.
Speaking of them, we had an episode
about the case from Metronome,
right?
Michael: Yeah, true.
Nikolay: Yeah, it was a great blog
post.
Like this is a great example how
company can share with community
what happened and others benefit.
Since then we had another client,
new client we had, which came
to us with very same problem related
to MultiXact member.
Michael: Wow.
Member exhaustion.
Nikolay: Yeah, exactly.
Wow.
So it's also observed like a lightweight
lock, MultiXact blah,
blah.
There are, there's a bunch of MultiXact
lightweight locks you
can see in the table.
Another 1 is a very popular 1 is
LWLock:buffer_mapping.
So usually it's called a buffer
thrashing, right?
When we let like the buffer pool
is not big enough and a lot
of eviction happens and new pages
are loaded all the time and
and we see when of course when
it's happening to protect memory
Postgres needs to use exclusive
lightweight lock when writing
happening, shared lightweight lock
when reading happening.
And this is exactly when we can
see some backends are a little
bit waiting for other backends,
right?
And this is how it's seen.
Michael: So like if somebody's
limited by the amount of shared
memory, or like shared buffers.
Nikolay: Yeah, solution is simple.
We need to increase the buffer
pool.
We need to fight bloat because
this is what like increases this
problem.
You need to get rid of unused indexes
and other things because
they also contribute to it.
Right.
Unused ones.
Of course.
Michael: Like, I was just thinking
they would be evicted and
not.
Nikolay: Think better when you
change something with insert or
non-HOT update.
Michael: All indexes need to
Nikolay: be loaded to be changed.
Yeah.
Yeah.
And they contribute to this spam
coming to the buffer pool and
also to WAL, but it's a different
story.
So we, that's why I think people
underestimate how bad bloat
is.
I feel it like we have a new wave of companies coming to for
consulting to us, which I call AI companies.
They probably are quite old companies, but they have the transition
to AI.
And they have increasing data volumes, increasing workloads,
and they underestimate the problem of bloat and unused indexes
and index write amplification, all this spam coming to memory
and WAL and it means backups, replication.
It's like, this thing is like multi-sided.
Yeah.
Michael: Like cascades, doesn't it?
Nikolay: Yeah.
Yeah, yeah.
It doesn't, and also it's not performance cliff which you like
suddenly see and oh we have a problem it's slowly slowly slowly
like growing
Michael: or like you sink into it slowly like like sand or like
a swamp
Nikolay: yeah yeah And then you need to increase instance size
or think about sharding and so on.
By the way, I like sharding a lot, but I think in many cases
it's just hiding the problem.
It's just distributing the problem and you just you pay to not
to solve problems.
For business it's sometimes a valid approach, right?
You just you don't have time to solve this.
But we also can just implement automated procedures to reduce
the amount of trash you have, right?
Michael: Yeah, I hear about it all of the time as well, even
at smaller companies that just upgrade their instance, especially
at the lower sizes.
They just don't wanna throw engineering time
Nikolay: at it.
I'm very surprised.
I recently started asking directly on the very first call when
we have consulting like guys, do you care about bloat and index
health?
And I usually hear no.
And then It's our job to explain why, right?
Michael: So it's easy.
Yeah, bloat is not the problem.
Bloat is like, well, it might be the root cause, but it's not
the problem users see.
They don't see.
They're not.
Nikolay: Well, unused indexes also they don't see.
Michael: Yeah, yeah.
Nikolay: They created some indexes and forgot about it, right?
Or redundant indexes.
Same problem.
I mean, similar in this case, but then boom, like buffer mapping,
LWLock:buffer_mapping.
Why?
We don't have enough memory.
Michael: Yeah.
Also, this might be part of it.
I think Bloat used to be worse.
Like I remember in the
Nikolay: before a
Michael: few optimizations
Nikolay: 14-14.
Michael: Yeah before then we you could come across especially
indexes that were like 99.9% you know in extreme cases you could
come across very very very bloated indexes it's just much less
likely now so I think it's
Nikolay: less likely but not much
Michael: yeah well okay
Nikolay: well only deduplication doesn't solve the problem when...
Like Postgres B-tree doesn't have merge.
Michael: But it does have bottom up deletion.
It does like a lot less spitting.
Nikolay: Well, if you have in the middle of B-tree, half empty page,
This space won't be used if you write, for example, it's an incremental
timestamp.
You're writing to the end always.
In the middle, nobody will write, so you have this Bloat which
won't be eliminated.
Michael: True.
Until like, yeah, anyway, let's get back to
Nikolay: the topic.
Until reindex.
Michael: Well, I was going to say until like logical replication
upgrade or something like that.
Nikolay: Well, yeah, yeah, yeah, Anyway, when you rebuild index
anyway, right?
Michael: Yeah.
Nikolay: By the way, let me advertise something.
We have open source component, which we just recently developed.
It's now entering beta stage and it aims to like to automatically
rebuild indexes on any platform, just reach out to me in any
way.
I will share details because we are looking for more cases before
we move on and make it more publicly available.
I don't advertise it because I want to understand the use cases
people have in terms of...
Anyway, if you want to rebuild indexes in an automated fashion,
we have an open source component for you.
Fresh 1, very interesting, not only for B-tree, but for any index,
almost any.
I think BRIN is not supported.
All others are supported.
Michael: I've never seen a bloated BRIN index.
Nikolay: Have you?
Yeah, good question.
It can degrade a lot if you modify.
Michael: We've got another good episode on that actually.
MaxMulti I think is...
Nikolay: On every sentence we say we had an episode already,
right?
Anyway, yeah, we're looking for early adopters for this small
tool.
Michael: Cool, I'll put it in the show notes.
Nikolay: Yeah, yeah.
Which is open source.
Yeah.
Okay.
WAL write is another 1 we see often.
Michael: Oh really?
Nikolay: Yeah.
Again, like when somebody hasn't dropped unused and redundant
indexes And they write a lot of, oh, I forgot to mention, of
course you can write, you can find queries which contribute to
this buffer thrashing, right?
And maybe get rid of them or make it less, causing less smaller
storms, right?
So you can optimize queries sometimes and, and, and avoid that
LWLock:buffer_mapping contention.
Yeah.
So about WAL writes, same thing, like if you have a lot of indexes
which contribute to WAL writes, or you have like very frequent
checkpoints and you have a random access pattern of writes.
When you write to the same page often, but between those writes
to the same page you have checkpoint, It means this page will
go as a full page right to WAL multiple times over and over.
You have a lot of WAL and this can be also an issue.
And if disks are, maybe disk I/O is saturated as well.
Yeah, these things.
Michael: Yeah.
Or IOPS.
Nikolay: Yeah.
There are, there are several things there.
Like there is I/O WALWrite, I think, and LWLock WALWrite.
I don't remember from top of my head, but there are interesting
nuances there.
I probably should cover it some day in Postgres Marathon, because
sometimes you are waiting on disks, but sometimes you are waiting
on locking internal structures, lightweight lock.
So if WAL buffers is like, there is a quite small amount of
WAL buffers in the memory, so if it's already fully written,
it needs to go to disk, probably you are waiting on disk.
But if you are writing to...
Yeah, so it should be checked in
detail when we have it.
There are several wait events there.
Also interesting thing which pops
up recently is SyncRep.
LWLock SyncRep.
Synchronous replication, when the
primary cannot continue because
it waits confirmation from replicas,
synchronous replicas.
Michael: So, okay, and you're seeing
a lot of those.
Nikolay: Not a lot, but this started
happening more and more
if you use synchronous replication,
Quorum commit.
Michael: What?
Yeah.
I don't...
I come across a lot of people that
think they're going to use
synchronous replication but then
end up don't.
Do you see it quite commonly used?
Nikolay: Let's say so.
Big old clusters are on async.
All new clusters should be on synchronous
replication, although
there is a bunch of issues with
it.
And there was a great talk a few
months ago presented by Alexander
Kukushkin about misunderstanding
of synchronous replication and
various anomalies you can experience
in current implementation
because it's actually not synchronous
replication.
This is the thing.
Because when commit happens, main
thing, when commit happens
on the primary, it actually happens.
It already happened.
Commit happened.
But we just are locked, by the
way, on heavy lock, right?
Heavyweight lock.
Our transaction is locked.
And we are waiting for 1 of replicas
to confirm.
Or this is actually a lightweight
lock sync replica.
Yeah, this is it.
Yeah, this is it.
Michael: This is it.
We are
Nikolay: locked and we are waiting.
And when a replica confirms, this
lightweight lock released.
This is a special case when we
need to wait for something outside,
which will help us unlock.
Michael: So I have watched that
talk, and I remember a really
good slide in it with like all
of the hops, like a really good
diagram of what actually happens.
So yeah, I'll include a link to
Nikolay: that.
Yeah, and this how to troubleshoot
LWLock:SyncRep is like, it's
not fully understood.
There are interesting new cases
which are not covered by in articles
and talks.
I think more materials are coming.
I know about some.
Michael: Okay.
Can you share them with me?
Nikolay: Well, it hasn't happened yet.
Check out the upcoming PGConf.EU.
Michael: Great.
So, yeah.
How would you feel about calling the episode there and actually
then talking about specifically the lock manager issues in a
different episode
Nikolay: Sounds good.
Good.
Nice because there are interesting answers inside.
Yeah, great Let's call it a day for today.
I think we covered 1 and a half percent because it's huge.
The list is huge.
Some of them I haven't seen ever.
Michael: But I think that's useful.
I think it's useful to kind of, for people to get a grasp on
like which ones are they most like?
Nikolay: Oh, main thing, always I mention when we talk about
lightweight lock and actually wait event analysis, RDS documentation
has great list of like knowledge and how to style troubleshooting
documents for many wait events, including many lightweight locks,
not all of them, only subset, but it's great documentation.
I hope it will be improved over time, extended, right?
Michael: Yeah.
Yeah.
It's
Nikolay: good.
I know a lot of effort was invested to building by many people
I recently reread the blog post by Jeremy Schneider how
it was done during a couple of years.
So it was huge effort That's why it's so good Short yeah, but
yeah short documents, but so so much wisdom inside
Michael: Yeah Done over a long period of time, but also by very
good people, like people that really know, you know, stuff.
Nikolay: So basically do this, this, this, like list of mitigation
action items.
But behind each step, many RCAs, right?
Yeah.
Cases, case studies, it's so much time paid to just write 1 line
what to do.
Or what to check, or how to change, how to improve.
That's a great example of documentation.
Michael: Thanks so much, Nikolay.
Thank you.
Look forward to talking again soon.