Postgres FM | Transcript: Snapshots

March 21, 2025 • 44 Minutes

Snapshots

Michael: Hello and welcome to Postgres.FM,
a weekly show about

all things PostgreSQL.

I am Michael, founder of pgMustard,
and as usual, this is my

co-host Nikolay, founder of Postgres.AI.

Hey Nikolay, what are we talking
about today?

Nikolay: Hi Michael.

We are talking about snapshots
and not only snapshots, but backups

of large databases and RPO, RTO.

More actually RTO.

RTO means how much time it would
take for us to recover from

some disaster from backups.

So, is it minutes, hours, days?

For example, let's take some hypothetical
database of 10 terabytes

and consider what we would expect
from good backup system in

various managed Postgres situations
and also self-managed in

cloud maybe bare metal as well
okay pardon dirt sounds

Michael: it's a nice it's a nice
sound

Nikolay: it's spring I've got to
close the door and they decided

to have a nest again here.

This is what they try to do every
time.

Michael: So yeah, good topic.

Why is it on your mind at the moment?

Nikolay: Well yeah, it's a good
question.

I just observe larger and larger
databases and I also observe

different situations of managed
services recently and I see that

it's really a big pain when you
achieve some level of scale,

maybe not 10 terabytes, maybe 20
or 50 terabytes.

And at this level, it becomes a
big pain to do various operations,

including backup, I mean, recovery,
first of all, backups as

well.

We will talk about them as well,
but also provisioning of new

nodes, provisioning of replicas
or clones.

It becomes more and more difficult.

And the only solution I see is
snapshots.

And we have 2 big, I mean, 2 most
popular backup systems right

now in Postgres ecosystem.

They are pgBackRest and WAL-G,
in my opinion.

All others became less popular.

This is my perception.

I might be wrong.

I have only anecdotal data to prove
it.

But it's time for both of them
to seriously think about cloud

snapshots in cloud environments.

Snapshots of When I say cloud snapshots,
let's clarify, I mean

snapshots of EBS volumes in AWS,
of persistent disks, PD, SSD,

or others, Hyperdisk in Google
Cloud, and alternatives in other

clouds.

I have less experience with Azure,
so let's cover only all 3,

But again, I have much less experience
with Azure.

My main experience is with GCP
and AWS.

So these snapshots, I think, must
be used by those who create

large database systems, maintain
them, and especially if you

are a managed Postgres provider,
it's inevitable that you should

rely on these snapshots of disk
and restoration.

And, of course, I think AWS RDS
Postgres and Google Cloud SQL,

they already do it.

I suspect Azure also should do
it, but we have many more managed

Postgres providers, right?

Michael: So many now, yeah.

Nikolay: Right, and also we have
some people who self-manage

their Postgres clusters.

And also we have, of course, many
users who use RDS or Cloud

SQL or Azure.

And I think it's good to understand
some details about how backups

of large databases are done and
the role of snapshots in those

backups.

Michael: So yeah, so I understand
the attraction for huge, huge

databases in terms of recovery
time, but are there also benefits

for smaller databases as well?

Or is it, are we just fine just
recovering from a normal backup

like a pgBackRest style or WAL-G
as you say I know their speed

advantages or smaller sizes as
well

Nikolay: that's a good question
well yeah let's let's just let

me walk us through the steps of
backup and recovery.

Michael: Good idea.

Nikolay: With WAL-G or pgBackRest,
Let's forget about Delta

backups.

Let's consider like the basic the
most fundamental Process.

So first of all, we need to take
a copy of data directory Bring

it to archive to backup archive.

It's usually in object store like
AWS S3 or Google Cloud, GCS.

Right.

And in Azure, it's blob storage
or how they call it.

And this copy of data directory
is not consistent by default.

And it's okay.

We can have inconsistent copy.

We can copy data directory manually
with rsync, with cp.

It doesn't matter.

It will take time to copy.

And usually roughly like 4 terabytes,
it's roughly like 1 hour

or half an hour.

If it's faster, you're lucky, you
have good infrastructure.

If it's slower than a 1 hour, something
should be tuned.

But if you have 10 terabytes, so
we're ready to talk about 5

to 10 hours If you have 50 terabytes,
it's like more than 1 day

right to call and It's inconsistent,
but it's fine because we

do it between wrapping this process
up with 2 calls, pg_start_backup

and pg_stop_backup.

These 2 important calls tell Postgres
that it should keep WALs

that will be used in recovery.

It's understood that the directory
copy is not consistent, but

we have enough WALs to reach consistency
point.

So when we will recover this backup,
full backup, it will be

copy back to some normal disk,
to another machine, right?

And enough WALs and replace some
WALs to reach consistency

point.

And here's some problem.

When you recover, sometimes you
think, oh, how long it will take

to reach consistency point and
I have a recipe in my how to set

of how to articles how to understand
current position in terms

of LSN and how long it is left
to reach consistency point because

while those WALs are applied Any
connection attempt will be

rejected right and we need Postgres
needs to reach consistency

point to open the gates and accept
connections.

And I think there was some work
to improve logging of this process.

And maybe It will happen in Postgres
18.

I saw some discussions and some
patches, I think.

But before that, while we still
don't have transparent logging

of that process, I have some recipes,
so you can use them to

be less anxious, because sometimes
many, many minutes you wait

and don't understand and it's not
it's not fun right and that's

it this is full backup but it's
only part 1 because it will reach

consistency point and it will represent
backup for some time

in the past.

But we also need to have to reach
the latest state of database.

And for that, we need a second
part of backups of archives is

WAL archive.

Write-ahead log files, by default
they are 16 MB, so on RDS they

are tuned to 64 MB.

Most clusters are default 16 MB,
so this small file is also compressed

in 1 way or another.

So it takes not 16, but say 7 or
10 or 5 MB, depending on the

nature of data inside it.

And then when we recover, we reach
the consistency point, and

we start replaying WALs.

Postgres starts replaying WALs
according to restore command

configuration, which just tells
fetch next WAL, and we will

apply it, right?

Postgres will apply it.

And there's prefetching, because,
of course, object storage is

good with parallelization.

It's not a good idea to use just
a single process to fetch a

file sequentially.

We can ask to fetch 10 files in
parallel to move faster, because

in a heavily loaded system sometimes
we see the source, the production,

can generate many, many WALs per
second.

And recovery, it can be like dozens
of WALs per second, depending

on hardware and so on.

So it's worth fetching very fast.

And this process can take a long,
of course, like after we did

full backup, many WALs can be
archived until the latest point.

And you can restore to the latest
point or to some point in time.

It's called point-in-time recovery,
right?

You can configure, I need this
timestamp or this LSN, and you

need to wait.

So, of course, it's good if you
have quite frequent full backups.

It means that WAL replay, this
additional WAL replay, because

there is WAL replay to reach consistency
point, but there is

additional WAL replay to reach
the latest or desired point in

time.

It will be much less.

But usually, usually like to replay
whole day of WAL, it takes

not a whole day, definitely.

It takes like maybe like an hour
or 2 hours, it depends a lot

on a particular case.

But usually it's much faster than
to generate those WALs.

And to replay 7 days of WALs,
Well, it might take up to 1 day,

maybe less, it depends again.

So people need to configure the
frequency of full backups and

understand that up to the whole
period between 2 backups, we

might need to replay this amount
of WALs.

Right.

This affects recovery time as well.

RTO, recovery time objective.

Right.

And To improve that situation,
both pgBackRest and WAL-G support

Delta backups.

Instead of copying whole 10 terabytes
or 50 terabytes to GCS

or S3, they copy only delta compared
to previous backup.

And then recovery looks like restore
full backup and apply delta

backups.

It's just some diffs of file basically,
which is supposed to

be faster than WAL replay.

And also much cheaper because you
don't need to keep, for example,

for 7 days, you don't need to keep
7 days of full backups, but

maybe you keep only 1 full backup
plus deltas only.

It's already optimization of budget
and also optimization of

our RTO, because replaying WAL
is slower than applying those

deltas.

However, we still need to start
and have first full backup fetched

and applied.

And it's definitely slower than
if we had this full backup made

recently, so we just apply this
and so on.

Anyway, if you use pgBackRest or
WAL-G, they basically ignore

the existence of another alternative.

Instead of copying data directly
to object storage, you can use

cloud-native capabilities and snapshot
disk.

Snapshotting disk is also bringing
data to object storage in

the background, but it's on the
shoulders of cloud provider,

AWS or GCP or Azure.

And they have this automation,
they use it in many processes,

and these days it's already quite
reliable in most cases.

Of course, this is, by the way,
an objection I hear.

Like copy whole directory to object
storage, it's reliable.

What will happen with those snapshots,
who knows?

Well, yes, okay.

But a lot of mission-critical systems
already use them for many

years, right?

And it's also copied to object
storage but done in the background.

And it's done...

It's like...

It's basically...

You create a snapshot, it takes
only minutes for a large volume,

like 10 terabytes, 50 terabytes.

And it also is like, there's also
a concept of delta snapshots.

Like it's, it's incremental.

So there's a 1 snapshot and then
additional snapshots are done

faster and take less space and
object storage.

So you pay, you pay less.

And again, this is a responsibility
of a cloud provider like

AWS or GCP.

It works.

Then you can recover, restore,
snapshot, restore takes only a

few minutes.

This is the magic.

And restore takes only a few minutes,
even for like 50 terabytes,

it can take like 10-15 minutes.

Because I'm speaking about GCP
and AWS here, There is a concept

of lazy load.

So it looks like disk already exists
and data is already there,

but in background it's still pooling
data from object storage.

And that's why it's good, recovery
is fast, but It's slow.

I mean, you try to SELECT from
table, but it's slow.

And this concept is very well known
for people who use RDS, for

example, or work with EBS volume
snapshots.

It might be bad in some cases,
and there is trade-off.

You may say, okay, I have a recovery
time of a few minutes only.

RTO is perfect in our case, but
we will be slow originally.

We will need to warm up the disks.

Or you can say, OK, recovery time
RTO is bad, but we are fast

from the very beginning.

Well, we need to warm up buffer
pool and page cache anyway, but

it's a different story.

It usually takes only minutes,
not hours definitely.

So that's the story about snapshots.

And I think that I just observe
people have tendency to like

snapshots and prefer them both
for DR and just copy, cloning

of nodes for purposes of creating
replicas, standbys, or forking,

anything.

Once they reach level like 10-20
terabytes.

Before that, it's not that obvious that snapshots are beneficial.

Because, okay, we can wait a few hours.

If you have 5 terabytes, okay, we can wait 5 hours or 2 and a

half hours for backup.

We can wait to recover to a couple of hours, not big deal.

But once we have 10, 20, 30, 50 terabytes, it already exceeds

1 day.

It exceeds 8 hours of work, working day, right?

So it already becomes a two-day job for someone.

Yeah.

To control this process, right?

This is, I think, psychological threshold here.

Michael: But there are, I'm just thinking there are a couple

of other use cases where that time does matter like if you decide

that you don't want to go down the logical replication route

for major version upgrades for example and you're willing to

take a certain amount of downtime one of the steps is to clone

your database right do the upgrade on the clone and then switch

over as a as on a cloud like a lot of cloud providers that would

be quicker or less time

Nikolay: why do you need a clone

Michael: I think if you do it in place then you're down the whole

time.

Whereas if you do it on a clone and then transfer any changes

across that customers have made in the meantime, you might be

able to minimize downtime.

Nikolay: How do you transfer changes?

Michael: Script.

Let's say it's not super busy.

Nikolay: It's an interesting approach.

Yeah.

Sounds like reinventing the wheel of logical replication so you

need if you have complex system how will you understand the diff

the changes I'm

Michael: only thinking I'm thinking mostly about quiet systems

where it's unlikely that there are that many changes or maybe

Like if you're doing a quiet time and it's like an internal system.

Nikolay: Well, for me there is in place, there is in place upgrade

and there is 0 downtime involved.

And logical has variations, actually.

But what you propose, it's kind of like...

It breaks my brain a little bit.

Michael: Well, all I'm thinking is, I can imagine cases where

Being able to do take a clone in a minute or 2 Instead of it

being 10 or 20 minutes is actually quite attractive for the same

reason you mentioned about 8 hours It's a different I would stop

doing another time.

If something's gonna take 20 minutes, I'd move on to a different

task.

But if it's only gonna take 1 or
2 minutes, I might only check

my emails, you know, like there's
a very different

Nikolay: It might be you need some
to do some research on the

fork on clone I call it clone some
people call it fork you you

need to verify for example you
want to verify how long will it

take to upgrade, in place upgrade.

For testing, it's good to have
identical copy of the same machine

and just run all the detail like
and see the log and how else

can you check it right and for
if forking takes a day

Michael: well

Nikolay: and it's frustrating yeah
it's frustrating But if it

takes only dozens of minutes, it's
good.

The only catch here again, there
will be lazy load.

Lazy load, right?

So you need to understand like
you okay, you have a fork it pretends

it's up and running.

It's up and running, but it's slow.

Does that

Michael: invalidate the test then?

Nikolay: Yeah, some tests will
be invalidated for sure.

Maybe not upgrade.

For upgrade, what I would do for
test of major upgrade in place

with hard links, I would just in
this case warm up only the system

catalog part.

Because this is what pg_upgrade
does, it dumps and restores system

catalogs.

Then I would ignore the remainder
of data knowing that it will

be handled by hard links.

So this is my warm up.

It's super fast.

So this is a perfect example then.

But we need to understand details.

So we forked during 10-20 minutes,
I mean snapshot, we restore

snapshot, we have up and running
Postgres, we dump schema couple

of times, well 1 time is enough
actually.

And then we run pg_upgrade -k or
--links, and see what happens,

how long it takes for us.

And We are not worried about data
actually still being in GCS

or S3 or blob storage.

We know it doesn't matter in this
particular case.

But if it's something, for example,
you want to check how long

it will take to run pg_repack or
VACUUM FULL or pg_squeeze on a large

table without additional traffic.

It's also useful data.

You know that additional traffic
will affect timing in a negative

way, but you want to measure at
least in an empty space.

Michael: Or even adding a new index.

Like,

Nikolay: how long would that take?

Yeah, In this case, the fact that
we have a lazy load problem

with Snapshot Restore, well, it will affect your experiment for

sure.

You will have bad timing in production, you will have good timing,

and you will regret you did experiment at all.

You will stop trusting those experiments if you don't understand

lazy load, this problem.

If you understand lazy load, you first warm up particular table,

just doing, I don't know, SELECT everything from it, right?

And then create index and measure, right?

Michael: So yeah, just to get an idea, how much slower are we

talking?

Is there some rough...

How much slow?

Let's say you SELECT *, how much slower would you expect it

to be while it's lazy loading still versus on production?

Nikolay: Oh, it's a tough question.

I don't have good examples, recent examples.

First time I heard about Lazy Load was 2015 or something.

This was the first time I was working with RDS.

This is when importance of working with clones started to form

in my mind.

I was super impressed, like how fast we can clone a terabyte

Database and start working and experimenting with it but I was

upset why it's slow so I reached out to support and they explained

lazy load and their articles by the way AWS explains this quite

well lazy load.

There is explanation in various levels, I think, both at EBS,

so basic level EBS volumes, and also RDS.

But I must admit, Google Cloud doesn't do a good job here in

terms of documenting this.

I know it exists.

I know they admit it exists, but it's not well documented.

There's a good opportunity to catch up here in terms of documentation

and understanding.

As for a question of duration, I don't know.

Definitely it can be frustrating.

SELECT, for example, count star from a table, if it's a sequential

scan, it will have the same effect for warming up, right?

It will not spam your output.

Well, there is a different way not to spam your output is just

SELECT from table without specifying columns and maybe also somehow

so that the race

Michael: EXPLAIN ANALYZE

Nikolay: yeah by the way yes yes yes so, so I your timing will

be really high in that EXPLAIN ANALYZE if it's enabled I hope

it's enabled so I think if like SELECT count star is super slow

in Postgres, we know it, because it's raw storage and all the

things, MVCC and so on.

But here it can be order of magnitude slower.

Michael: Makes sense.

Nikolay: Maybe 2 orders of magnitude.

I don't have numbers here in my
head.

Michael: Yeah, I was just trying
to understand if it was 10%

or double or an order of magnitude.

No, no, no.

It's really

Nikolay: fetching data from object
storage in the ground.

So it can be frustrating, this
is I can tell you.

But you can, again, you can, if
you aim to warm up a node just

restored from a snapshot, a Postgres
node, you can again benefit

from the fact that object storage
is good in terms of parallelization

of operations.

You can warm it up, not just single
connection to Postgres.

You can run 10 or 20 parallel warming
up queries.

And This is a recommendation on
how to warm up because of lazy

load.

By the way, AWS also offers a few
years ago, they implemented

it.

I think this is called Fast Restore,
Fast Recovery, something.

Fast snapshot recovery, FSR or
something.

I might be mistaken.

But they, if my memory is right,
they support up to 50 snapshots

marked as available for fast restore
in the region, I think,

for a particular account.

So you can mark some snapshots
and lazy load problem will disappear

for those snapshots.

This is a good thing to have.

And I'm not aware of similar thing
for Google Cloud or Azure.

But this makes, like, once per,
I don't know, week, you have

a good snapshot which you can recover
very fast and then replay

WALs. But you you have super fast
recovery for experiments.

It's already super cool super cool
Right.

Michael: What about for the actual
record like the RTO discussion?

What are you seeing people with
dozens of terabytes opt for in

terms of that trade-off?

Are they happy to accept things
will be slower, it's kind of

like a brownout situation for a
while, but they're back online

at least?

Or do they prefer longer RTO but
everything works exactly as

expected once it is back online?

Nikolay: Well I think nobody is
happy when recovery takes 10,

20, 30 hours.

Nobody.

Of course.

Right?

So this becomes a problem and the
characteristics of the system

become very bad.

I mean, this is not...

It becomes not acceptable for business
also to understand that

we will be down a whole day or
2 Just to recover from backups

But before business realizes it
operation engineers realize because

it's really hard to experiment.

Michael: So, well, and the cases
where this happens are super

rare, right?

Like we're talking, we're not talking
about failover, like with

all of these setups would generally
be like high availability

anyway, right?

So we're talking only a very specific
class of recovery where

you have to recover from backups,
like there's no failover situation,

which I'm guessing is pretty rare.

People have to think about it,
but they don't face it very often.

Nikolay: 10 plus years Cloud environments
were super unreliable

and failovers happened very often.

Right now, again, at least for
Google Cloud AWS, I see nodes

can be running for years without
disruption.

So it's much better right now in
terms of lower risks.

Well, again, we have a habit to
consider cloud environments as

not super reliable.

And for example, It also depends
on particular instance classes

and the sizes, I think.

If you have small, very popular
instance family, very small,

migration might happen often.

For example, we were down because
we are on GCP, I mean Postgres.AI

AI.

We were down because Kubernetes
cluster was migrated again, like

all nodes were migrated.

And then we had a problem, let
me admit.

We had a problem with backups because
after reconfiguration half

a year ago another dot history
file for some timeline in future

which was in the past but for us
currently was in future was

left in pg_wal directory sub-directory
And after such migration

caused by migration of VMs in Kubernetes,
we couldn't start.

Because promotion happened, timeline
changed, and there was basically

a rogue file from previous life
corresponding to the same timeline

number.

It was a nasty incident.

I think we were down like an hour
until we figured out, unfortunately,

misconfiguration.

Yeah, it's a bad problem.

But it was good that it happened
only on Saturday, so not many

users noticed for our systems.

And I must thank my team for a
couple of folks who jumped very

quickly to Zoom and helped to diagnose
and fix.

So yeah, back to questions.

In my opinion, again, snapshots
are inevitable to use if you

reach dozens of terabytes scale.

And again, There are several levels
of trust here.

First level, are they consistent?

Answer is we don't care because
we have pg_start_backup, pg_stop_backup.

Second question, are they reliable
and what about possible corruption

and so on?

Answer is test them and test backups.

You must test backups anyway, because
otherwise it's a Schrodinger

backup system.

So we don't know if it will work.

We must test.

And once enough number of tests
show positive results, trust

will be built.

Right?

And I have several...

You know, in the beginning of the
year I announced that every

managed Postgres provider must
consider, or highly recommend,

to consider pg_wait_sampling and
pg_stat_kcache as very important

pieces of observability to extend
what we have with pg_stat_statements.

Even if you are RDS and have performance
insights, still consider

adding pg_wait_sampling because
this will add very flexible tooling

for your customers.

Now I say for backups of course
these Big guys like RDS already

use snapshots 100%.

We know it indirectly from the
suffer of lazy load of myself

in 2015 or 2016.

And now all competitors of RDS
or alternatives to RDS, please

consider snapshots, right?

Because I know you don't yet so
you you it's it's worth because

your customers grow and once they
reach 10 20 30 40 50 terabytes

they will go to RDS or self-managed
and stop paying 100% because

it's a big pain.

We just described this pain.

And now those who self-manage also
consider snapshots, right?

But most importantly, developers
of WAL-G and pgBackRest, consider

supporting snapshots, cloud disk
snapshots, as an alternative,

as an official alternative to full
backups and delta backups

you already have.

In this case it would become native.

And yeah, so I think it's worth
doing this.

Maybe other types of snapshots.

So like external external tool
for snapshotting, If it exists

in infrastructure and natively
supported by a backup tool, I

think it's a good idea.

So a backup tool would perform
all the orchestration and skip

some of full backups.

It can be a mix.

Ideal backup system for me would
be a mix of occasional full

backups, continuous WAL stream,
and snapshots as the main tool

for provision nodes and the main
strategy for DR.

But not the only 1.

So full backups would exist separately,
occasionally again, like

I would have full backups.

By the way, both snapshots and
backups, full backups must be

done, and I see it on some systems,
they do it on the primary.

It must be done on replicas.

They must be done on replicas because
it's a big stress for primary.

Even snapshot is a big stress for
primary, as I learned.

So, yeah.

It's good to do it on a replica.

And pg_start_backup, pg_stop_backup,
they support so-called non-exclusive

backups, so it can be done on replicas,
on physical replicas.

And even if they are lagging for
some time, it doesn't matter

because even lag for some minutes
nobody will notice because

we have continuous WAL stream
and replaying few minutes of WAL,

it's very fast.

Michael: Yeah, I saw in RDS's documentation
that for multi-availability

zone deployments, they take the
backups from the standby.

Nikolay: That's absolutely all,
like, this is what should happen.

Yeah.

Yeah and snapshots also should
be done from there I saw well,

maybe it was specific cases, but
I saw some cases when snapshot

of disk of primary affected performance
Because in background

data is being sent to object storage
and it's definitely generating

some disk, read disk IO for our
disks.

Michael: So I'm curious though,
why have, you mentioned in an

ideal world you have a mix, what's
the benefit of having both?

Like once you move to snapshots,
what's the benefit of also having

traditional pgBackRest style backups
as well?

Nikolay: It's a good question.

I remember reading Google Cloud
documentation many years ago

and it said, when we create snapshot
of disk, this snapshot is

not guaranteed to be consistent
when if application is running

and writing and then suddenly the
sentence was removed from Google

Cloud documentation It's it's a
matter of trust, right?

So Now I trust Google Cloud snapshot
Snapshots and a BS volume

snapshots in the edible.

Yes quite a lot.

I have trust but still some part
of me is Like It's paranoid,

you know, so so I would keep full
backups not very frequent.

I would keep them just Just to
be on safe safe side But maybe

over time trust will be complete
like if you have a lot of data

proving snapshot creation and recovery
is very reliable.

Well, consistency, by the way,
doesn't matter.

The problem I mentioned in documentation,
It's not about Snapshots

are consistent or no consistent
or no Honestly, I don't know

even I don't care because I have
to just start back up and stop

back up and I'm protected from
inconsistent state Postgres will

take care of it But it's a matter
of trust like lack of transparency

in documentation, understanding
what's happening and so on.

So only obtaining experience over
time can bring some trust and

understanding the system works
fine and Regular testing testing

should be automated recovery testing
of backups and and so on

so now I think It doesn't matter
if they are if they are persistent

Consistent or no It matters if
they work, right?

If they If we can recover and then
reach consistency point by

our means, I mean, by what Postgres
has, And that's it.

But still, what if I see snapshots
created, but I cannot use

them?

And also bugs happen, people change
systems.

I mean, cloud providers also are
being developed all the time

and new types of disks and other
changes are introduced all the

time.

So what if snapshots will not work?

If I have very reliable testing
process, if I test maybe, for

example, all snapshots, I restore
from them automatically all

the time and I see that over the
last few months Every day we

created snapshot or maybe every
few hours.

We created snapshot and we tested
them all Maybe at some point

I will say okay Full backups by
a pgBackRest or WAL-G are not needed

anymore.

But for now, my mind is in a state
of paranoia.

Michael: Very reasonable.

Sounds good?

Nikolay: Yeah.

1 interesting thing here is that
if you use snapshots, there

is a catch for large databases.

EBS volumes and persistent disks
in Google Cloud, they are limited

by 64 terabytes.

I think some types of EBS volumes
are limited by 128 terabytes,

if I'm not mistaken, so the limit
is increased.

But 64 terabytes, if you experience
problems being at level of

10, 20, 30, 50 terabytes, 64 terabytes
is not far already.

Michael: Yeah, that's true.

Nikolay: And then the question
will be what to do, because if

you use regular backup, full backup
by means of pgBackRest and

WAL-G.

You can have multiple disks and
again, it doesn't matter.

Thanks to pgBackRest, pg_start_backup,
pg_stop_backup, You can copy

data from multiple disks, right?

It can be a LVM and multiple disks
organized to give you more

disk space, right?

Michael: Like table spaces?

Nikolay: No, no, no, not table
space.

Table space is a different thing,
it's a Postgres level.

If you need to have, you can combine
multiple disks using LVM2.

And have 1 big, basically, logical
volume for Postgres.

And for regular backups, when you
copy data, pgBackRest or WAL-G

copies data to object storage,
it doesn't matter that the underlying

many disks contribute to build
this big logical volume because

we have only files up to 1 gigabyte
in Postgres so each huge

table it's shrink to 1 gigabyte
files so they are copied compressed

and copied by our backup tool.

But if you start using snapshots,
it's interesting.

Michael: Well, also people at that
scale, people are thinking

of the companies we've talked to
at that, we had a good hundredth

episode, didn't we?

And a lot of them are considering
sharding by that point anyway.

So I guess in some cases that...

Nikolay: Or deleting data, right?

That's my favorite advice.

Just delete something.

And keep sleep well.

Yeah.

But Well, so the problem is that
it's not a problem, again, it's

not a problem that you will create
multiple snapshots.

If those snapshots are created
between pg_start_backup and pg_stop_backup,

you will be able to recover from
them.

The question will be only, will
LVM survive this?

And this is question not about
Postgres, it's a question about

the cloud provider and how like
snapshot orchestration.

And this topic already goes beyond
my current understanding of

things here.

I can just suggest, I can recommend,
okay, use snapshots, but

if you reach a limit and you need
to start using LVM, it's a

different story.

So you need to check how LVM will
survive, snapshot restore,

Michael: and test.

Nikolay: I think it's definitely
solvable because like RDS did

solve it, right?

Michael: yeah, good point.

Nikolay: Right.

Michael: Well, that's actually
that's a really good probably

place to...

I don't know how you're doing for
time, but it makes me think

about the likes of Aurora or people
that are doing like innovations

at the storage layer where this
kind of problem just doesn't

doesn't exist anymore, you know,
they've already got multiple,

well I guess we do need to still
have a disaster recovery, you

know, whatever, all of Aurora's
down.

We need a way of recovering.

But they are promising to handle
that for us in this distributed

world, right?

That's part of the promise of keeping
copies at the storage layer.

Nikolay: Yeah, well, it's definitely
new approaches where data

is stored on object storage originally,
not only backups, but

it becomes the primary storage
originally, and then there is

a bigger orchestration to bring
data closer to compute, But it's

all hidden from user.

It's interesting approach.

But there are also doubts this
approach will win in terms of

the market share.

Right, so it's interesting.

Definitely it's where I don't know
many things.

Maybe our listeners will share
some interesting ideas and we

can explore them.

And maybe we should invite someone
who can talk about this more.

Michael: Yeah, sounds good.

Nikolay: Yeah, yeah.

But anyway, I think snapshots are
great.

I think they can be great for smaller
databases as well.

Like, small, I mean 1TB still,
because instead of 1 hour, it

will be a minute.

Right?

Michael: I think that might be...

I'm curious about that area, because
I think there are a lot

more databases in that range of
hundreds of gigabytes to a couple

of terabytes than there are companies
having to deal with dozens

of terabytes.

But if we can make their developer
experiences much better...

Nikolay: I agree.

A few terabytes, I would say, it's
already middle market in terms

of database size.

So it's very common to have 1,
2 terabytes.

And snapshots can be beneficial
because operations speed up significantly

and RTO is improved.

If you, if you remember about lazy
load and know how to mitigate

it or accept it.

Michael: Yeah.

Nikolay: Good.

Good.

Michael: Nice one, Nikolay.

Nikolay: Thank you for listening.

Again, if you have ideas in this
area I'm very... I will be happy

to learn more because I'm learning
all the time myself and I

think Michael also in same shoes,
right?

Michael: Yeah, I find this stuff
interesting I have to deal with

it a lot less than you which I'm
grateful for But yeah, definitely

let us know I guess via YouTube
comments or on various social

media things.

Nikolay: Yeah good.
Michael: Nice one.

We'll catch you next week, Nikolay.

Nikolay: Bye-bye.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Snapshots

Creators and Guests

Some kind things our listeners have said