Logical replication common issues

Logical replication common issues

Nikolay: Hello, hello, this is
PostgresFM, your favorite podcast

about Postgres.

And my name is Nikolay.

My co-host is Michael.

Michael: Hello, Nikolay.

Nikolay: Hi, Michael.

And we have a very good guest today,
Sai.

Sai: Hey, Nik and Michael.

Nikolay: Founder of PeerDB.

We are very glad to have you here.

And we just discussed that we are
not inviting guests just for

guests.

We choose topic first, so it's
topic-centric discussion.

But my first thought when we discussed
we need to talk about

logical.

And also I saw your blog post published
recently.

My first thought was you're a very
great person to have for this

discussion, right?

So that's why we invited you.

Glad you found time to join.

Sai: Thanks Nik and Michael for
inviting me.

And yeah, logical replication and
logical decoding has been my

life since the past like 7, 8 months.

And we are trying to get into as
much depth as possible to, you

know, understand how it works and
probably down the line contribute

upstream as well.

But yeah, I'm very excited to be
here.

Nikolay: Cool.

I know Michael has a lot of questions,
so I probably will be

less active this episode.

But I will have questions as well
and maybe comments.

Michael: Yeah, I definitely have
questions.

But yeah, I completely echo everything
Nikolay said.

And yeah, we did a kind of a basics
intro episode to logical,

where Nikolay also talked us through
how to handle scale in terms

of major upgrades, at least.

So that's a common use case for
logical.

But 1 thing we didn't even touch
on, which was the topic of 1

of your recent blog posts, was
I listened back and I think protocol

versions got a single mention,
kind of like an off-the-cuff mention

at 1 point in the episode, you
know, 30, 40-minute episode.

So that was where I was thinking
of starting is, it'd be awesome

to hear a little bit about the
protocol versions, why you worry

about them recently.

Sai: Yeah, absolutely.

Thanks, Michael.

So the thing is logical replication
has this concept of protocol

versions that you could specify
as a part of the subscription

or if you are like an external
client who is reading the replication

slot you could do it as a part
of the start replication API which

lets you read the replication slot
and there are like four versions

that Postgres supports now with
PostgreSQL 16, right.

The first version is the default
version which decodes transactions

that are only committed so it doesn't
like start the decoding

process before the transaction
commits and the second is the

more advanced option which does
logical decoding of transactions

that are not yet committed, right
like so it basically does decoding

for in-flight transactions, right
that is number two and then third

is it lets you decode two-phase
commit transactions, that is third

and the fourth is it lets you decode
in-flight transactions and

apply them in parallel to the target
which is the subscriber

which is PostgreSQL basically, right.

Now in that blog which you're talking
about we compared like

version one and version two because
they are more like the common

ones.

Two-phase commits are not like
very common, right that's the reason

we didn't go into that piece but
the biggest difference is the

impact of two is it improves logical
decoding performance because

you are not you're actually decoding
while the transaction is

happening, right like you are not
letting the transaction finish

and then only start decoding but
rather you're decoding as the

transactions go on.

So the idea is you're giving more
time to the decoder, the WAL

sender process to perform the decoding
and now this has a lot

of benefits, right like number one
it reduces the slot growth, right

like so in that blog we talk about
a situation where we have

two long running transactions that
are being committed.

And with version one, the decoding
process doesn't start and the

slot keeps growing, growing, growing.

And only after the transactions
are committed, the decoding starts.

And then for a few minutes until
the decoding is finished, the

slot size remains the same.

Right now, this is with one.

With two, what happens is as the
transactions happen, decoding

starts.

Right?

Like, so you do see the decoder
doing its job.

And as in when the transactions
finish, the slot immediately

falls.

And the advantage of this is so
the summary was with version

one, the slot growth of like, I think
we showed like 5, 6 gigabytes

remain for like 5, 6 minutes until
the entire decoding finished.

But with version two, it remained
there for an instance, because

you know, like the decoding was
already happening as the transaction

was going on, right so this can
have tremendous impact in use

cases which have long running or
sizable transactions and that

is very common, right like we
recently working were working

with like a fintech customer, right
like where they had a lot

they had like sizable transactions
right like hundreds of like

you know operations happening in
a transaction and then these

transactions were like interleaved
and now the advantage with

like to the two would have helped
them a lot because as the you

know operations are happening the
decoder like you know just

decodes them.

With one the problem that was happening
was there was one large

long-running transaction which
took like an hour and then every

time the decoder needs to decode
committed transactions it was

decoding this long-running transaction
also so so the order of

time is more quadratic basically
with like a version one right

because you know wall is like sequential
right like so you keep

writing writing writing and then
like and then as and when there

is a commit like the decoder like
starts like working now the

problem is with version one as the
existing like the long-running

transaction is not getting decoded
still the decoder process

decodes it for like other transactions,
right?

So but with like, you know oh this
problem wouldn't have happened

because once this long-running
transaction is committed I mean

it's already consumed it's already
decoded if that makes sense

right so the high-level summary
is version two is very helpful

when there are like, you know,
long-running transactions that

are like, you know, interleaved.

Nikolay: So the lesson is upgrade
guys, right?

Just use the latest PostgreSQL version.

Sai: Version two comes with PG 14
version four is in PG 16.

Nikolay: Sorry.

I meant a small thing.

I've noticed you mentioned a subscriber,
which is PostgreSQL.

This phrase, attracted my attention.

It's not always PostgreSQL.

But parallelization which to apply
to long transactions it's done

only for PostgreSQL subscribers,
not for others, right?

Sai: Correct.

Exactly.

So the fourth version, which does
like, you know, parallel apply

of these in-flight transactions
is more relevant for PostgreSQL

to PostgreSQL replication.

The apply piece is, you know, PostgreSQL
and that's the standard

logical replication.

And logical decoding is more like
a subset of logical replication

where external clients like pg_BADGER
can read the slot and replicate

to homogeneous or heterogeneous
targets.

Nikolay: In your experience, does
it mean that there's no sense

to have this feature for other
situations like to Snowflake or

others.

Sai: The parallel apply feature
you mean,

Nikolay: right?

Right.

Sai: Yeah.

Good question.

So that is the onus of the ETL
tool.

So in PADB we have an option to
do parallel apply or not do parallel

apply.

So it's like a single flag that
we provide.

And the difference is with parallel
apply, we cannot guarantee

the order of transactions across
tables.

Nikolay: So foreign key and referential
consistency, similar

to multiple slots in regular Postgres
to Postgres.

Correct.

Sai: Very, very similar.

Right.

And then both, I mean, customers
do it, but the advantage of

parallelism is it will improve
latency, right?

Like replication, latency reduces,
lag reduces, etc.

But it doesn't guarantee, you know,
consistency across tables.

But the sequential does that, right?

So customers like we have both
customers.

Nikolay: Yeah, that's interesting.

I know like in many cases people
think, okay, we work at like

half of our capacity, no worries
if TPS grows using a single slot.

If TPS grows, we know the bottleneck
is on the subscriber, we will

just use multiple slots.

But then they realize that with foreign
keys, you need to basically

agree that they are broken temporarily
on the subscriber and

can cause a lot of troubles if
you point the application to such

nodes, right?

That's interesting.

Yeah, absolutely.

Makes sense.

Sai: And then I did want to call
out, right?

Like for that, I mean, there is
a setting for disabling foreign

key just like, you know, complete
that there is a setting called

session replication role basically,
you can set that as replication

and that is pretty standard, it's
like that's the reason Postgres

gave this setting so you can set
that to replication and it would

disable foreign keys and triggers
on the target.

Nikolay: But by default,
if you use multiple slots, it's

disabled.

Sai: By default, is it like disabled?

Nikolay: Yes.

I know it since recent work to
use it for upgrade, zero-downtime

upgrades.

Michael: Got it.

Nikolay: It's indeed disabled and
it's interesting.

So you need to think maybe you
shouldn't use it.

I mean maybe go back to a single
slot and just choose a different

time for your workload.

I mean, work on weekends, guys,
right?

Lower traffic and you can afford
working with a single slot.

This means only for like logical
for a temporary time, like for

upgrades, we don't need it.

Exactly.

It really

Sai: depends on the use case.

Nikolay: Yeah, yeah.

But if for a long time, forget
about foreign keys maybe.

And you said latency, but it's
also about throughput.

We need to process more bytes per
time.

Sai: Absolutely.

Nikolay: Lags can be nasty.

I don't like dealing with logical
decoding in Postgres 11, 12.

It's not pleasant at all.

So these improvements are exciting
and everyone should upgrade.

That's what I see here. You just
uncovered a lot of details.

That's great.

Sai: Yeah, absolutely. And the tricky
thing I mean with the good

beauty of like Postgres is
that like this is already inbuilt

in like Postgres core, right?

Like logical replication does
this by default.

It's just like, you know, a setting
you need to do say that like,

Hey, I want to use version 2 and
it will be significantly more

performant, reduce replication slot growth.

But if it's an ETL tool, who's
using logical decoding, the transaction

logic needs to be managed by the
ETL tool.

Because as you're reading like
in-flight transactions, we need

to keep track whether this transaction
is committed or not, and

then only we should push it to
the target.

It becomes tricky for ETL tools,
which we are working on in PADB

now as we speak.

But the beauty of Postgres is that
it just gives you the setting

out of the box and you just need
to upgrade your Postgres version

to 14, 15, and 16.

Nikolay: And with this, in 16 we
have an example.

I mean, the developers of these
tools like your company, they

now have an example of Postgres to Postgres
native logical replication to show how

it can be implemented.

So the reference example exists.

So it's good.

Yeah, that's great.

Sorry if my client interrupted
you.

No, not at all.

Michael: This is a really good
diversion.

But we've also, we've talked about
3 things already, right?

We've talked about throughput,
we've talked about lag, and we've

talked about slot size growth.

And I feel like when you said initially,
the performance of the

version 2 protocol can be better,
I'm guessing we're talking

mostly in terms of that lag metric.

Like we can start processing things
faster, therefore the lag

is lower, makes total sense.

And when you said the slot size
growth is reduced, I think I

Understand where you're coming
from.

I think the specific benchmark
in the blog post shows that the

growth is the same, but then reduces
it, like it kind of reduces

quicker.

But when you mentioned the like
interleaved long running transactions

I guess it's in a less synthetic
workload where you've got lots

of potentially overlapping long
running transactions the peak

will be lower in the version if
you're using.

Correct.

Sai: Exactly right. Like because
in the scenario we did it's like

just do 2 transactions that are
long running and then we just

had 2 of them right but then and
then they ended at the same

time but in real-world workloads
like it I mean it can be very

arbitrary right but with like version
2 as the slot consumption

is faster and like slot size falls
quicker right the cumulative

effect can be very significant
in slot growth.

Michael: Nice.

Sai: If that makes sense. And that
is the next benchmark we want

to do, where in the real world benchmark,
I want to see how does

the peak slot size like compare
with version 1 and version 2.

We are actually building that feature
so we can we plan to like,

you know, we have a few like customers
we are design partnering

with, you know, to implement this
feature.

And we have like, we know, you
know, what is the peak size slot

size we are seeing with version
1.

And with this design partnership,
we will get to know, okay,

what is the peak slot size?

My like understanding is it will
fall very quickly because like,

you know, you're giving more time
to the decoder, right?

Like it's more efficient.

And with these long running transactions,
it cannot be quadratic,

like the decoding process, right?

But that's a very good question
you pointed, Mike.

And that is the next benchmark
we want to do.

Michael: Nice.

So then the question then becomes,
are there any workloads?

And I'm thinking now that it must
only be if you had, let's say

you had a lot of long run transactions
that end up getting aborted

or rolled back instead of committed.

I'm thinking that might be the
only case where the version 1

protocol might be like you might
prefer to carry on using it

or are there other cases where
you might prefer to carry on using

the v1 protocol over the v2 one
as long as you're on 14 and above.

Sai: Even when the transactions
are rolled back the WAL is not

removed.

So it still needs to be like decoded
basically, right?

Like for future transactions.

That's exactly what happened in
a customer scenario where there

was a long running transaction,
which the customer killed basically.

It was running for like few hours,
right?

But still that impacted other transactions
because Postgres is

not yet smart where it removes
the WAL.

Nikolay: This is my favorite use
case.

So you can overload WAL sender.

If you just do a very simple trick,
you create a table with like

a million, 10,000,000 rows and then
delete rows in transaction

and roll it back.

Massive delete rollback, massive
delete rollback.

I like this workload in many cases,
this included.

And this makes WAL sender consume
100% CPU very quickly.

Because this transaction spams
WAL, writing xmax constantly

for a lot of tuples.

And then saying, okay, this xmax,
this transaction ID was rolled

back, so it doesn't mean...

Like they're still alive, right?

The next transaction comes, does
the same, and the WAL sender

is becoming crazy.

And I think there should be some
way to optimize decoding to

understand it, maybe.

Because if you start processing...

Ah, when it starts processing,
it's not known that this transaction

is already rolled back.

But maybe there is some opportunity
to optimize because if it's

known already it's already rolled
back why we can probably skip

better or something I don't know
I don't know details here.

Sai: That's a very good point Nik
and I was recently chatting

with one of the committers and we
don't even like skip rows is

my understanding basically like
so every time like we try to

decode basically So the immediate
optimization is if there is

an entry in the WAL of a rollback
transaction, then we don't

decode that entry in the WAL.

But you bring up a great point
here.

So there is a lot of scope of improvement
there.

Nikolay: Yeah, and this led us
to the conclusion that we won't

be able to deal with logical at
a few modifying transactions

per second.

It was wrong conclusion because
this was pathological workload.

And In production, I think it's
a very rare case when a WAL

sender is hitting 100% of a single
core.

Or it's not, I'm not sure.

Because the problem is you cannot
scale it.

You add a second slot, but you
see a WAL sender is hitting 100%

of a single core CPU, with the
constantly or from time to time,

and you think, okay, I will try
to distribute workload among

2 slots or 4 slots, 8 slots.

But this doesn't help.

All of them are hitting 100% of
different cores and that's it.

So this can be bottleneck, but
it looks like it's very rare right

now in production.

Sai: Yeah, absolutely.

And that is another optimization
that like, I mean, the community

could do is where we, I mean, currently
if there are multiple

slots, right?

Like the decoding process runs
for every slot, right?

If there are ways in which the
decoder runs for just 1 slot and

the other slots reuse these like
decoded changes, right?

Like if that makes sense.

Nikolay: Yeah, exactly.

The same work that's done multiple
times.

Everyone is processing everything.

This is what like definitely there
is opportunity for improvement

here.

But I think people just don't see
this as a bottleneck in production

often.

That's why it's not optimized yet.

Sai: In production, what we see
is like, it's like these 1 off

batch operations where customers
delete like a bunch of data

or add like do copy with like millions
of rows where there is

like a WAL spike.

And you know, the recommendation
that we give right, like is

guys have more disk.

I mean, it is, I mean that it's
hard because like faster and

like, you know, larger disk because
logical replication can

only scale up to 25,000 to 30,000
like, you know, messages per

second, basically.

So have larger disk so that like
once this small spike like falls,

it'll catch up.

Okay, sure.

At that time, there is more latency.

Nikolay: It's a good CPU if 25,000,
30,000 messages per second.

Because for PgBouncer on modern
Intel and AMD, I didn't see more

than 20 messages per second.

We can compare this like also messages
per second.

PgBouncer processes transactions
just by passing them to backends,

Postgres backends and returning
the result.

And the logical replication also
similar, some messages, just

some messages, right?

Different type of work because
decoding maybe is more consuming

but 30,000 sounds very good.

Sai: Yeah, 100%, but the thing is
it has to be done well, no I mean

like it has to be done properly
because one of the tips that

we do right, like which have seen
lack in a few ETL tools, is always

consuming the slot we cannot give
up on consuming the slot we

need to constantly consume the
slot and flush the slot right

and not to lag and the thing
is if you give up and start

reading it again, it doesn't read
from the confirmed flush LSN

it reads from the restart LSN.

And sometimes that restart LSN
can be very like old.

And I don't want to get into when
Postgres updates the restart

LSN because it's a more complicated
thing.

But the idea is when I give up the connection and reacquire the connection

again, it starts from the restart LSN, which can increase my

decoding times a lot.

So this is very common.

When we initially built pglogical, if the slot size was 150, 200 gigs,

we were giving up the connection periodically.

And whenever this slot is big, for like 10, 15 minutes, it was

just stuck in Walreader.

It was not even getting 1 change.

So that is one tip I recommend, where you always consume the replication

slot.

Nikolay: I was trying to recall another problem I saw when I

inspected what's happening if you start using multiple slots.

And it's kind of an interesting problem when you start publishing

changes to a single slot you use for all tables, for example.

For example, for upgrade, it makes sense.

And this is great.

It works very well.

But if you start distributing tables among multiple slots, you

need to alter the table, I think.

Anyway, you need a lock for each table individually in the group.

So you divide groups, and then you need to start publishing changes.

And this is a similar lock as, like DDL, basically.

No, not similar.

It's not an exclusive lock, but it's surely.

I remember it was, it got stuck.

It was not, it's not an exclusive lock.

Yeah.

So it's, it's not that bad as, as altering a table and adding a column.

So I remember it got stuck.

It couldn't complete, but at least it didn't block others.

Sai: Right.

And was this while creating the multiple slots, or was this just

like creating publications?

Nikolay: Creating a publication.

So you specify tables for which you need to create a publication.

And this needs to acquire locks on all of them.

It can be successful for some of them, not successful for others.

It's a single operation, so you're waiting.

And this is not good.

But not as bad as DDL, which can block selects, which come after.

We discussed many times how dangerous it can be under heavy load.

Yeah, that's interesting.

Anyway, multiple slots are tricky.

This is the lesson here.

So you need to test and learn before using them.

Michael: Yeah, absolutely.

Nikolay: Good.

What's next?

Michael: Well, before we move on from that, I'm interested in,

Are there any other like hard,
so you mentioned that rule of

thumb, 25 to 30,000 messages per
second.

Like, are there any other like
rules of thumb or hard limits

or anything that people could be
thinking, oh, we've probably

not tuned it enough because we're
not getting that much throughput

or that low lag or something like
that.

Nikolay: I guess this number depends
on the, on the core, on

the type of CPU you have.

If it's old, it can be 10,000 for
example.

Sai: Yeah.

A hundred percent.

And then, you know, this setting
is this number that I shared

is more on the higher end in a
non-local environment where the

source, the target are not local.

They are in the same region, but
in different boxes.

So network latency is a very big
factor.

1 thing we do want to see is, what
is the logical replication

performance when both are on the
same local server, which is

not in real world that doesn't
happen.

So this, whatever I'm saying, is
network latency bound.

Because it's single-threaded, and
the network latency kicks in,

and it can only scale up to say
20-30 Mbps if it's like you know

30-40 Mbps if it's done well also
right so those are also some

things to keep in mind now coming
to gotchas right like a few

things that I did like you know
make a note of of logical replication

first is it doesn't support like
replication of DDL commands

which includes like adding of columns
like dropping of columns

like you know adding new tables
creating indexes like you know

truncating tables right like so
these are not supported out of

the box.

And this is 1 of the common concerns
that we hear from customers,

because in the real world, you
know, people, I mean, you add

tables, you like you have these
Django migrations, which like

add a bunch of indexes etc etc
right.

Michael: Partitions we talked about
last time.

Sai: Yeah, but they don't do that
also.

Michael: So yeah, when you say
customers raise it as a concern,

is that before starting or like
realizing it later on?

Where does that pop up for you?

Sai: It happens both ways, right?

Like because I mean, obviously,
Postgres logical replication

is the easiest way to like, you
know, replicate databases, right?

They start off and then there's
a column that is added and then

logical replication breaks.

And now the good thing is you can
add a column manually on the

target and it would continue But
then this becomes like difficult

in production, which is when they
reach out saying that hey you

guys like are also, you know Doing
using logical decoding and

you support Postgres as a target
and that's how they come to

us.

And we do support schema changes,
but only add columns and drop

columns.

We don't yet support creating tables
and indexes and truncate

and all of that yet, basically.

And sometimes it happens before
itself.

We're like, okay, this is a, I
mean, we keep adding tables, we

keep like adding columns very often
and like this doesn't work

for us.

So like we want another solution.

Michael: Yeah, nice.

A couple of the other things that
I see people confused or struggling

with are things like sequence synchronization.

I guess that's just a one-off task
at the beginning.

Do you see anything else catching
people out?

Sai: That's a good question, right?

Like I think DDL commands is number
1, replication slot growth

issues is number 2, which we talked
a lot about basically.

Yeah.

It's like, and, you know, always
keep reading the slot, avoid

long running or like open transactions,
you know, monitor slot

growth, right?

Like fourth is, you know, use protocol
versions, right?

Like upgrade to like latest Postgres
versions and start using

the version 2.3.4 which is faster.

The third thing that we see is
Postgres logical replication doesn't

replicate toast columns out of
the box.

You need to set up replica identity
full to make that happen.

And replica identity full could
be expensive if you are not on

Postgres 16 because for updates
and deletes it could lead to

like a sequential scan.

Nikolay: And this thing what happened
CTID was used or is used

or like why it's better than 16?

Sai: It's using indexes now.

It's like using indexes basically.

Nikolay: Okay.

Sai: And the thing is,

Nikolay: If you have them.

Sai: Yeah, if you have them and
if you have the right indexes

And another thing is some customers,
right?

Like who have primary keys, but
they have toast columns.

We still like need to do replica
identity full and that helps.

So replica identity full with primary
keys is more efficient.

And there is a good blog that the
Zalando guys wrote, which

I really liked.

And that is something that I refer
to customers where, hey guys,

like you can add this.

I think it increased like CPU and
IO by 30%.

That was their analysis.

But that is about like toast columns
where you need to have replica

identity full and the impact of
that can change based on scenarios.

Michael: Yeah.

Awesome.

I didn't know about that one.

I would love to read that blog
post.

If you send it to me afterward,
I'll include it in the show

notes as well.

Sai: Absolutely.

And the fourth thing that we have
seen is logical replication,

at least, you know, logical decoding
did not support virtual

and generated columns.

If there are like generated columns,
they didn't show up on WAL

decoding.

And the way we, I mean, we easily
solve that, I mean, is on the

target that customers set up like
a virtual column or they use

like DBT or like some transformation
tool to, you know, backfill

this, right?

Like, so that is number four.

And then last but not the least,
the fifth one that we see, right.

Like, I'm still, you know, trying
to discover, it's like slot

invalidation issues where the restart
LSN becomes null, and have

seen that happen in two scenarios.

So the first scenario is there
is a setting called max_replication_slot

size or something, I mean, like which
actually limits the size

of the slot, and in this, this setting
was recently added in 13 or

14 which lets you like safeguard
from storage out of storage

issues. So as and when the slot
hits, you know, over a 2 terabyte,

right, like or if that's the setting
that you have, it basically

invalidates the slot, and in that
scenario, the restart LSN

becomes null, and the slot gets
invalidated, and you need to restart

logical replication.

The second scenarios we have seen
is sometimes it happens randomly,

also, unseen, and we are still
figuring out why that can happen.

And now we reached out to the community,
right, like, and we heard

that like it could be because of
cloud providers' Postgres, right,

like because cloud providers have
a bunch of like backup operations,

some, some forks, you know, of Postgres, right.

And then it could be. I mean, the
community doesn't say that,

like, I mean, their point is, like,
can we try to reproduce this

on Vanilla Postgres, right?

Like not cloud.

And then that was not becoming
easy for us.

But like, we have seen that out
of the 7 to 8 months, 1 to 2

times, randomly the slot gets invalidated
and we don't have like

an RCA on when that can happen.

And immediately we think that we
check, hey, is the max_wal_size

for the slot, like, less than, is
it set up, right?

But it's still set to minus 1, it's disabled,
but still, like we run

into these issues.

So I'm on that quest of figuring
out when this happens because

it's a scenario that we need to understand better.

Michael: And that's terrifying.

And yeah, how do you even begin
to go about reproducing that?

Good luck.

Sai: Exactly. So we were able to
reproduce this on one of the cloud

providers.

They're like for high throughputs,
like over 50K transactions

per second, every 15 to 16 hours,
it was like getting invalidated.

Right?

Michael: 50,000 per second and
every 15 to 16 hours.

Sai: It's a lot.

That's a lot

Michael: of messages.

Sai: That's not real.

I mean, I don't think that happens
often, but yeah.

Michael: Right, wow.

Cool.

I don't feel like we talked about
use cases much.

Is that because it kind of these
aren't use-case dependent?

I guess other than people using
two-phase commit, do you see any

differences between people that
are using logical replication

for analytics databases versus
for, do you see any difference

between use cases basically?

Sai: Good question.

Like I mean, logical replication
is very powerful.

I mean, it supports like, you know,
migration, online migrations,

number 1, then it supports like
online upgrades, right, like

which Nik mentioned, right, like
number 2.

Number 3, it also supports like
HA and backup scenarios.

So I have seen some customers who
use logical decoding or

logical replication to have HA
across regions or in like hybrid

environments.

Like, okay, I'm running stuff on
my on-cloud and I still want

to have a backup or like a replica
on my on-premise like, you

know, self-hosted environments,
right?

And the only way it's possible
is via logical replication because

cloud providers don't give access
to like WAL, right?

That is number 3, where like HA
and backups is another use case,

right?

4 is replication to like non-Postgres
like targets or even Postgres

targets for like workload isolation,
use-case isolation, where,

okay, I have my OLTP database running
on Postgres.

Now I want to pipe this to another
database.

Like it could be Snowflake or Postgres
for analytics or for search,

like I want to pipe this on like
a Clickhouse or like Elasticsearch,

right?

Like for optimizing search.

Yeah, logical replication is very
powerful and I believe that

it's going to be the future for
Postgres and down the line it

can open up like active-active
use-cases also right like where

you know, okay, you have like cross-region and I recently read

a blog where folks were using logical
replication for like active

active I'd like so I mean it opens
up like a plethora of use

cases and makes Postgres more,
you know, powerful extensible

and yeah

Michael: Yeah, I read that blog
post too.

I think it was version 16 change
that made it possible to like

not replicate changes that got
made by logical replication.

So you could kind of like set up
logical replication in both

directions.

It seems super dangerous to me.

Like, okay, cool.

Would you recommend it at the moment
or do you think we need

more features before that?

Sai: I think it's more, I mean,
it's more like intense than that,

right?

Because like conflict resolution
and all of this is tricky, right?

Like, and, you know, I mean, if
it's like very workload specific,

where like, okay, I don't touch
the same rows, right?

Like, and, you know, maybe there
it could work.

But I mean, out of the box, like
implementing that is tricky,

and it requires more like effort.

And maybe we go there in the future,
right?

Because we are seeing a few cases
with customers where they want

Active-Active and there is not
an out-of-the-box solution.

Nikolay: Why do people need Active-Active?

Sai: Good question.

I think I have like a lot of thoughts
here.

So like the thing is, I think it
really helps with like HA, right?

Like for example, I remember this
scenario in Microsoft where

customers were having like a bunch
of like SQL Server Active-Active

across regions and then this was
a bank and then 1 of the region

went down And then every second
is like thousands of dollars.

And then they immediately pivoted
to the other region and it

kind of worked seamlessly.

Nikolay: But doesn't it mean that
all clusters need to perform

the same writes?

And I didn't buy this idea that
replaying changes through logical

replication is less expensive than
applying them initially, which

was advertised in the BDR documentation.

I didn't buy this.

I hope to find time to test it
properly and write a blog post about that.

So far didn't find time yet for
this.

It's an interesting topic because
like in BDR, it means like, for

example, you have multiple regions,
4 or 8.

I know such cases as well.

And they suffer because everyone
needs to perform the same writes.

And if there is a spike in 1 region
or this, it's not a resilient

system at all.

I don't understand this.

Sai: Yeah.

A hundred percent.

I agree.

I think, I mean, that's the reason
there is a big opportunity

there.

And I recently saw one company,

like what the PGYedge who are like
trying to do something like

this.

Nikolay: Yeah, new wave of this.

I know every CTO is dreaming about
this, I know it.

Because we are like, we work in
multiple regions.

Maybe we should first check that
all our physical standbys are

in different availability zones
than primaries, right?

Sai: There are more basic things.

Nikolay: Yeah, but multiple regions
is a great thing to have,

but still I'm very skeptical.

But I see in DBA's mind, there
is a mind shift as well.

Like 10 years ago, all DBAs said
you don't need this.

But things are changing.

So in Postgres 16, this to avoid
loops, infinite loops, right?

This feature, it's interesting.

So I don't understand use cases
and how we can scale writes if

we don't split them, like in sharding.

We split them in sharding, and
that's great.

Here, everyone needs everything.

Well, I'm very skeptical.

I don't understand this part of
I don't understand this part

of landscape.

Sai: Yeah, 100%.

And like, I think it becomes very
critical in tier 0 use cases,

guys.

I mean, not like, I mean, tier 1,
tier 2, tier 3, I think it's

kind of lesser, but like, it's
more these tier 0 where like,

it's like a Chase bank or something
like that right? Like when

it becomes like hyper.

Nikolay: But you think it's possible
to build good system?

Sai: No, I don't think it's, I mean
with Postgres I am also very

skeptical, but I think there is
an opportunity there, right?

Like, and community, both community
and like, you know, I mean,

I mean, community will be very
critical here, right?

Like, I don't think that it can
happen just by logical replication.

Nikolay: Maybe if we build system,
categorizing data in tiers

as well and replicating writes
only for most critical data between

regions, right?

Michael: We're probably tiptoeing
into a different topic here.

It's

Nikolay: not different.

It's not different.

Many people think logical leads
to multi-master, definitely,

like so-called old term.

Michael: But I think the tier 0
use case feels like it's a long

way from being supported by native
Postgres.

There are a lot of providers out
there for that kind of thing.

And I think the general term for
it is being called distributed

SQL or like that seems like dist
SQL is what I've been referring

to.

But I would warn people against
using logical for this in the

short, like anytime soon.

Yeah.

Nikolay: Okay.

Sai: I agree.

Nikolay: Okay.

Then what, what other questions
do you have?

Michael: I didn't have anything
else.

I wanted to thank Sai.

Did you have anything else Nikolay?

Nikolay: Well no.

Thank you so much.

It was interesting.

Thank you for coming.

Sai: Absolutely, guys.

I really enjoyed chatting with
you and thanks for inviting me.

Nikolay: I hope you won't stop
posting interesting technical

posts.

Sai: No, no, no.

So content is our currency, guys.

I mean, the thing is, for me, I'm
building the company because

I'm very curious, right?

Like now the thing that is like
haunting me is that slot

invalidation, not like, I'm not understanding
why it can happen.

Right.

Like, so, so I've been like, so
because of curiosity, we will

be publishing a lot of content,
but

Nikolay: yeah, benchmarks, graphs,
data and so on.

Reproducible also.

Yeah, that's great.

Thank you so much.

Michael: And also, if any of our
listeners have seen that as

well, have any theories or have
a reproduction case for it, let

us know.

Nikolay: Mm-hmm.

Absolutely.

Michael: Good.

Wonderful.

Thank you.

Thank you both.

Take care.

Sai: Thanks, guys.

Bye-bye.

Bye-bye.

Creators and Guests

Sai Srirampur
Guest
Sai Srirampur
CEO and Co-founder at PeerDB. Ex-Microsoft - Headed Solns Eng. up for Postgres on Azure, Ex-Citus (acq. Microsoft) - Early Engineer, IIIT-H Alumnus, He/His

Some kind things our listeners have said