
Snapshots
Michael: Hello and welcome to Postgres.FM,
a weekly show about
all things PostgreSQL.
I am Michael, founder of pgMustard,
and as usual, this is my
co-host Nikolay, founder of Postgres.AI.
Hey Nikolay, what are we talking
about today?
Nikolay: Hi Michael.
We are talking about snapshots
and not only snapshots, but backups
of large databases and RPO, RTO.
More actually RTO.
RTO means how much time it would
take for us to recover from
some disaster from backups.
So, is it minutes, hours, days?
For example, let's take some hypothetical
database of 10 terabytes
and consider what we would expect
from good backup system in
various managed Postgres situations
and also self-managed in
cloud maybe bare metal as well
okay pardon dirt sounds
Michael: it's a nice it's a nice
sound
Nikolay: it's spring I've got to
close the door and they decided
to have a nest again here.
This is what they try to do every
time.
Michael: So yeah, good topic.
Why is it on your mind at the moment?
Nikolay: Well yeah, it's a good
question.
I just observe larger and larger
databases and I also observe
different situations of managed
services recently and I see that
it's really a big pain when you
achieve some level of scale,
maybe not 10 terabytes, maybe 20
or 50 terabytes.
And at this level, it becomes a
big pain to do various operations,
including backup, I mean, recovery,
first of all, backups as
well.
We will talk about them as well,
but also provisioning of new
nodes, provisioning of replicas
or clones.
It becomes more and more difficult.
And the only solution I see is
snapshots.
And we have 2 big, I mean, 2 most
popular backup systems right
now in Postgres ecosystem.
They are pgBackRest and WAL-G,
in my opinion.
All others became less popular.
This is my perception.
I might be wrong.
I have only anecdotal data to prove
it.
But it's time for both of them
to seriously think about cloud
snapshots in cloud environments.
Snapshots of When I say cloud snapshots,
let's clarify, I mean
snapshots of EBS volumes in AWS,
of persistent disks, PD, SSD,
or others, Hyperdisk in Google
Cloud, and alternatives in other
clouds.
I have less experience with Azure,
so let's cover only all 3,
But again, I have much less experience
with Azure.
My main experience is with GCP
and AWS.
So these snapshots, I think, must
be used by those who create
large database systems, maintain
them, and especially if you
are a managed Postgres provider,
it's inevitable that you should
rely on these snapshots of disk
and restoration.
And, of course, I think AWS RDS
Postgres and Google Cloud SQL,
they already do it.
I suspect Azure also should do
it, but we have many more managed
Postgres providers, right?
Michael: So many now, yeah.
Nikolay: Right, and also we have
some people who self-manage
their Postgres clusters.
And also we have, of course, many
users who use RDS or Cloud
SQL or Azure.
And I think it's good to understand
some details about how backups
of large databases are done and
the role of snapshots in those
backups.
Michael: So yeah, so I understand
the attraction for huge, huge
databases in terms of recovery
time, but are there also benefits
for smaller databases as well?
Or is it, are we just fine just
recovering from a normal backup
like a pgBackRest style or WAL-G
as you say I know their speed
advantages or smaller sizes as
well
Nikolay: that's a good question
well yeah let's let's just let
me walk us through the steps of
backup and recovery.
Michael: Good idea.
Nikolay: With WAL-G or pgBackRest,
Let's forget about Delta
backups.
Let's consider like the basic the
most fundamental Process.
So first of all, we need to take
a copy of data directory Bring
it to archive to backup archive.
It's usually in object store like
AWS S3 or Google Cloud, GCS.
Right.
And in Azure, it's blob storage
or how they call it.
And this copy of data directory
is not consistent by default.
And it's okay.
We can have inconsistent copy.
We can copy data directory manually
with rsync, with cp.
It doesn't matter.
It will take time to copy.
And usually roughly like 4 terabytes,
it's roughly like 1 hour
or half an hour.
If it's faster, you're lucky, you
have good infrastructure.
If it's slower than a 1 hour, something
should be tuned.
But if you have 10 terabytes, so
we're ready to talk about 5
to 10 hours If you have 50 terabytes,
it's like more than 1 day
right to call and It's inconsistent,
but it's fine because we
do it between wrapping this process
up with 2 calls, pg_start_backup
and pg_stop_backup.
These 2 important calls tell Postgres
that it should keep WALs
that will be used in recovery.
It's understood that the directory
copy is not consistent, but
we have enough WALs to reach consistency
point.
So when we will recover this backup,
full backup, it will be
copy back to some normal disk,
to another machine, right?
And enough WALs and replace some
WALs to reach consistency
point.
And here's some problem.
When you recover, sometimes you
think, oh, how long it will take
to reach consistency point and
I have a recipe in my how to set
of how to articles how to understand
current position in terms
of LSN and how long it is left
to reach consistency point because
while those WALs are applied Any
connection attempt will be
rejected right and we need Postgres
needs to reach consistency
point to open the gates and accept
connections.
And I think there was some work
to improve logging of this process.
And maybe It will happen in Postgres
18.
I saw some discussions and some
patches, I think.
But before that, while we still
don't have transparent logging
of that process, I have some recipes,
so you can use them to
be less anxious, because sometimes
many, many minutes you wait
and don't understand and it's not
it's not fun right and that's
it this is full backup but it's
only part 1 because it will reach
consistency point and it will represent
backup for some time
in the past.
But we also need to have to reach
the latest state of database.
And for that, we need a second
part of backups of archives is
WAL archive.
Write-ahead log files, by default
they are 16 MB, so on RDS they
are tuned to 64 MB.
Most clusters are default 16 MB,
so this small file is also compressed
in 1 way or another.
So it takes not 16, but say 7 or
10 or 5 MB, depending on the
nature of data inside it.
And then when we recover, we reach
the consistency point, and
we start replaying WALs.
Postgres starts replaying WALs
according to restore command
configuration, which just tells
fetch next WAL, and we will
apply it, right?
Postgres will apply it.
And there's prefetching, because,
of course, object storage is
good with parallelization.
It's not a good idea to use just
a single process to fetch a
file sequentially.
We can ask to fetch 10 files in
parallel to move faster, because
in a heavily loaded system sometimes
we see the source, the production,
can generate many, many WALs per
second.
And recovery, it can be like dozens
of WALs per second, depending
on hardware and so on.
So it's worth fetching very fast.
And this process can take a long,
of course, like after we did
full backup, many WALs can be
archived until the latest point.
And you can restore to the latest
point or to some point in time.
It's called point-in-time recovery,
right?
You can configure, I need this
timestamp or this LSN, and you
need to wait.
So, of course, it's good if you
have quite frequent full backups.
It means that WAL replay, this
additional WAL replay, because
there is WAL replay to reach consistency
point, but there is
additional WAL replay to reach
the latest or desired point in
time.
It will be much less.
But usually, usually like to replay
whole day of WAL, it takes
not a whole day, definitely.
It takes like maybe like an hour
or 2 hours, it depends a lot
on a particular case.
But usually it's much faster than
to generate those WALs.
And to replay 7 days of WALs,
Well, it might take up to 1 day,
maybe less, it depends again.
So people need to configure the
frequency of full backups and
understand that up to the whole
period between 2 backups, we
might need to replay this amount
of WALs.
Right.
This affects recovery time as well.
RTO, recovery time objective.
Right.
And To improve that situation,
both pgBackRest and WAL-G support
Delta backups.
Instead of copying whole 10 terabytes
or 50 terabytes to GCS
or S3, they copy only delta compared
to previous backup.
And then recovery looks like restore
full backup and apply delta
backups.
It's just some diffs of file basically,
which is supposed to
be faster than WAL replay.
And also much cheaper because you
don't need to keep, for example,
for 7 days, you don't need to keep
7 days of full backups, but
maybe you keep only 1 full backup
plus deltas only.
It's already optimization of budget
and also optimization of
our RTO, because replaying WAL
is slower than applying those
deltas.
However, we still need to start
and have first full backup fetched
and applied.
And it's definitely slower than
if we had this full backup made
recently, so we just apply this
and so on.
Anyway, if you use pgBackRest or
WAL-G, they basically ignore
the existence of another alternative.
Instead of copying data directly
to object storage, you can use
cloud-native capabilities and snapshot
disk.
Snapshotting disk is also bringing
data to object storage in
the background, but it's on the
shoulders of cloud provider,
AWS or GCP or Azure.
And they have this automation,
they use it in many processes,
and these days it's already quite
reliable in most cases.
Of course, this is, by the way,
an objection I hear.
Like copy whole directory to object
storage, it's reliable.
What will happen with those snapshots,
who knows?
Well, yes, okay.
But a lot of mission-critical systems
already use them for many
years, right?
And it's also copied to object
storage but done in the background.
And it's done...
It's like...
It's basically...
You create a snapshot, it takes
only minutes for a large volume,
like 10 terabytes, 50 terabytes.
And it also is like, there's also
a concept of delta snapshots.
Like it's, it's incremental.
So there's a 1 snapshot and then
additional snapshots are done
faster and take less space and
object storage.
So you pay, you pay less.
And again, this is a responsibility
of a cloud provider like
AWS or GCP.
It works.
Then you can recover, restore,
snapshot, restore takes only a
few minutes.
This is the magic.
And restore takes only a few minutes,
even for like 50 terabytes,
it can take like 10-15 minutes.
Because I'm speaking about GCP
and AWS here, There is a concept
of lazy load.
So it looks like disk already exists
and data is already there,
but in background it's still pooling
data from object storage.
And that's why it's good, recovery
is fast, but It's slow.
I mean, you try to SELECT from
table, but it's slow.
And this concept is very well known
for people who use RDS, for
example, or work with EBS volume
snapshots.
It might be bad in some cases,
and there is trade-off.
You may say, okay, I have a recovery
time of a few minutes only.
RTO is perfect in our case, but
we will be slow originally.
We will need to warm up the disks.
Or you can say, OK, recovery time
RTO is bad, but we are fast
from the very beginning.
Well, we need to warm up buffer
pool and page cache anyway, but
it's a different story.
It usually takes only minutes,
not hours definitely.
So that's the story about snapshots.
And I think that I just observe
people have tendency to like
snapshots and prefer them both
for DR and just copy, cloning
of nodes for purposes of creating
replicas, standbys, or forking,
anything.
Once they reach level like 10-20
terabytes.
Before that, it's not that obvious that snapshots are beneficial.
Because, okay, we can wait a few hours.
If you have 5 terabytes, okay, we can wait 5 hours or 2 and a
half hours for backup.
We can wait to recover to a couple of hours, not big deal.
But once we have 10, 20, 30, 50 terabytes, it already exceeds
1 day.
It exceeds 8 hours of work, working day, right?
So it already becomes a two-day job for someone.
Yeah.
To control this process, right?
This is, I think, psychological threshold here.
Michael: But there are, I'm just thinking there are a couple
of other use cases where that time does matter like if you decide
that you don't want to go down the logical replication route
for major version upgrades for example and you're willing to
take a certain amount of downtime one of the steps is to clone
your database right do the upgrade on the clone and then switch
over as a as on a cloud like a lot of cloud providers that would
be quicker or less time
Nikolay: why do you need a clone
Michael: I think if you do it in place then you're down the whole
time.
Whereas if you do it on a clone and then transfer any changes
across that customers have made in the meantime, you might be
able to minimize downtime.
Nikolay: How do you transfer changes?
Michael: Script.
Let's say it's not super busy.
Nikolay: It's an interesting approach.
Yeah.
Sounds like reinventing the wheel of logical replication so you
need if you have complex system how will you understand the diff
the changes I'm
Michael: only thinking I'm thinking mostly about quiet systems
where it's unlikely that there are that many changes or maybe
0.
Like if you're doing a quiet time and it's like an internal system.
Nikolay: Well, for me there is in place, there is in place upgrade
and there is 0 downtime involved.
And logical has variations, actually.
But what you propose, it's kind of like...
It breaks my brain a little bit.
Michael: Well, all I'm thinking is, I can imagine cases where
Being able to do take a clone in a minute or 2 Instead of it
being 10 or 20 minutes is actually quite attractive for the same
reason you mentioned about 8 hours It's a different I would stop
doing another time.
If something's gonna take 20 minutes, I'd move on to a different
task.
But if it's only gonna take 1 or
2 minutes, I might only check
my emails, you know, like there's
a very different
Nikolay: It might be you need some
to do some research on the
fork on clone I call it clone some
people call it fork you you
need to verify for example you
want to verify how long will it
take to upgrade, in place upgrade.
For testing, it's good to have
identical copy of the same machine
and just run all the detail like
and see the log and how else
can you check it right and for
if forking takes a day
Michael: well
Nikolay: and it's frustrating yeah
it's frustrating But if it
takes only dozens of minutes, it's
good.
The only catch here again, there
will be lazy load.
Lazy load, right?
So you need to understand like
you okay, you have a fork it pretends
it's up and running.
It's up and running, but it's slow.
Does that
Michael: invalidate the test then?
Nikolay: Yeah, some tests will
be invalidated for sure.
Maybe not upgrade.
For upgrade, what I would do for
test of major upgrade in place
with hard links, I would just in
this case warm up only the system
catalog part.
Because this is what pg_upgrade
does, it dumps and restores system
catalogs.
Then I would ignore the remainder
of data knowing that it will
be handled by hard links.
So this is my warm up.
It's super fast.
So this is a perfect example then.
But we need to understand details.
So we forked during 10-20 minutes,
I mean snapshot, we restore
snapshot, we have up and running
Postgres, we dump schema couple
of times, well 1 time is enough
actually.
And then we run pg_upgrade -k or
--links, and see what happens,
how long it takes for us.
And We are not worried about data
actually still being in GCS
or S3 or blob storage.
We know it doesn't matter in this
particular case.
But if it's something, for example,
you want to check how long
it will take to run pg_repack or
VACUUM FULL or pg_squeeze on a large
table without additional traffic.
It's also useful data.
You know that additional traffic
will affect timing in a negative
way, but you want to measure at
least in an empty space.
Michael: Or even adding a new index.
Like,
Nikolay: how long would that take?
Yeah, In this case, the fact that
we have a lazy load problem
with Snapshot Restore, well, it will affect your experiment for
sure.
You will have bad timing in production, you will have good timing,
and you will regret you did experiment at all.
You will stop trusting those experiments if you don't understand
lazy load, this problem.
If you understand lazy load, you first warm up particular table,
just doing, I don't know, SELECT everything from it, right?
And then create index and measure, right?
Michael: So yeah, just to get an idea, how much slower are we
talking?
Is there some rough...
How much slow?
Let's say you SELECT *, how much slower would you expect it
to be while it's lazy loading still versus on production?
Nikolay: Oh, it's a tough question.
I don't have good examples, recent examples.
First time I heard about Lazy Load was 2015 or something.
This was the first time I was working with RDS.
This is when importance of working with clones started to form
in my mind.
I was super impressed, like how fast we can clone a terabyte
Database and start working and experimenting with it but I was
upset why it's slow so I reached out to support and they explained
lazy load and their articles by the way AWS explains this quite
well lazy load.
There is explanation in various levels, I think, both at EBS,
so basic level EBS volumes, and also RDS.
But I must admit, Google Cloud doesn't do a good job here in
terms of documenting this.
I know it exists.
I know they admit it exists, but it's not well documented.
There's a good opportunity to catch up here in terms of documentation
and understanding.
As for a question of duration, I don't know.
Definitely it can be frustrating.
SELECT, for example, count star from a table, if it's a sequential
scan, it will have the same effect for warming up, right?
It will not spam your output.
Well, there is a different way not to spam your output is just
SELECT from table without specifying columns and maybe also somehow
so that the race
Michael: EXPLAIN ANALYZE
Nikolay: yeah by the way yes yes yes so, so I your timing will
be really high in that EXPLAIN ANALYZE if it's enabled I hope
it's enabled so I think if like SELECT count star is super slow
in Postgres, we know it, because it's raw storage and all the
things, MVCC and so on.
But here it can be order of magnitude slower.
Michael: Makes sense.
Nikolay: Maybe 2 orders of magnitude.
I don't have numbers here in my
head.
Michael: Yeah, I was just trying
to understand if it was 10%
or double or an order of magnitude.
No, no, no.
It's really
Nikolay: fetching data from object
storage in the ground.
So it can be frustrating, this
is I can tell you.
But you can, again, you can, if
you aim to warm up a node just
restored from a snapshot, a Postgres
node, you can again benefit
from the fact that object storage
is good in terms of parallelization
of operations.
You can warm it up, not just single
connection to Postgres.
You can run 10 or 20 parallel warming
up queries.
And This is a recommendation on
how to warm up because of lazy
load.
By the way, AWS also offers a few
years ago, they implemented
it.
I think this is called Fast Restore,
Fast Recovery, something.
Fast snapshot recovery, FSR or
something.
I might be mistaken.
But they, if my memory is right,
they support up to 50 snapshots
marked as available for fast restore
in the region, I think,
for a particular account.
So you can mark some snapshots
and lazy load problem will disappear
for those snapshots.
This is a good thing to have.
And I'm not aware of similar thing
for Google Cloud or Azure.
But this makes, like, once per,
I don't know, week, you have
a good snapshot which you can recover
very fast and then replay
WALs. But you you have super fast
recovery for experiments.
It's already super cool super cool
Right.
Michael: What about for the actual
record like the RTO discussion?
What are you seeing people with
dozens of terabytes opt for in
terms of that trade-off?
Are they happy to accept things
will be slower, it's kind of
like a brownout situation for a
while, but they're back online
at least?
Or do they prefer longer RTO but
everything works exactly as
expected once it is back online?
Nikolay: Well I think nobody is
happy when recovery takes 10,
20, 30 hours.
Nobody.
Of course.
Right?
So this becomes a problem and the
characteristics of the system
become very bad.
I mean, this is not...
It becomes not acceptable for business
also to understand that
we will be down a whole day or
2 Just to recover from backups
But before business realizes it
operation engineers realize because
it's really hard to experiment.
Michael: So, well, and the cases
where this happens are super
rare, right?
Like we're talking, we're not talking
about failover, like with
all of these setups would generally
be like high availability
anyway, right?
So we're talking only a very specific
class of recovery where
you have to recover from backups,
like there's no failover situation,
which I'm guessing is pretty rare.
People have to think about it,
but they don't face it very often.
Nikolay: 10 plus years Cloud environments
were super unreliable
and failovers happened very often.
Right now, again, at least for
Google Cloud AWS, I see nodes
can be running for years without
disruption.
So it's much better right now in
terms of lower risks.
Well, again, we have a habit to
consider cloud environments as
not super reliable.
And for example, It also depends
on particular instance classes
and the sizes, I think.
If you have small, very popular
instance family, very small,
migration might happen often.
For example, we were down because
we are on GCP, I mean Postgres.AI
AI.
We were down because Kubernetes
cluster was migrated again, like
all nodes were migrated.
And then we had a problem, let
me admit.
We had a problem with backups because
after reconfiguration half
a year ago another dot history
file for some timeline in future
which was in the past but for us
currently was in future was
left in pg_wal directory sub-directory
And after such migration
caused by migration of VMs in Kubernetes,
we couldn't start.
Because promotion happened, timeline
changed, and there was basically
a rogue file from previous life
corresponding to the same timeline
number.
It was a nasty incident.
I think we were down like an hour
until we figured out, unfortunately,
misconfiguration.
Yeah, it's a bad problem.
But it was good that it happened
only on Saturday, so not many
users noticed for our systems.
And I must thank my team for a
couple of folks who jumped very
quickly to Zoom and helped to diagnose
and fix.
So yeah, back to questions.
In my opinion, again, snapshots
are inevitable to use if you
reach dozens of terabytes scale.
And again, There are several levels
of trust here.
First level, are they consistent?
Answer is we don't care because
we have pg_start_backup, pg_stop_backup.
Second question, are they reliable
and what about possible corruption
and so on?
Answer is test them and test backups.
You must test backups anyway, because
otherwise it's a Schrodinger
backup system.
So we don't know if it will work.
We must test.
And once enough number of tests
show positive results, trust
will be built.
Right?
And I have several...
You know, in the beginning of the
year I announced that every
managed Postgres provider must
consider, or highly recommend,
to consider pg_wait_sampling and
pg_stat_kcache as very important
pieces of observability to extend
what we have with pg_stat_statements.
Even if you are RDS and have performance
insights, still consider
adding pg_wait_sampling because
this will add very flexible tooling
for your customers.
Now I say for backups of course
these Big guys like RDS already
use snapshots 100%.
We know it indirectly from the
suffer of lazy load of myself
in 2015 or 2016.
And now all competitors of RDS
or alternatives to RDS, please
consider snapshots, right?
Because I know you don't yet so
you you it's it's worth because
your customers grow and once they
reach 10 20 30 40 50 terabytes
they will go to RDS or self-managed
and stop paying 100% because
it's a big pain.
We just described this pain.
And now those who self-manage also
consider snapshots, right?
But most importantly, developers
of WAL-G and pgBackRest, consider
supporting snapshots, cloud disk
snapshots, as an alternative,
as an official alternative to full
backups and delta backups
you already have.
In this case it would become native.
And yeah, so I think it's worth
doing this.
Maybe other types of snapshots.
So like external external tool
for snapshotting, If it exists
in infrastructure and natively
supported by a backup tool, I
think it's a good idea.
So a backup tool would perform
all the orchestration and skip
some of full backups.
It can be a mix.
Ideal backup system for me would
be a mix of occasional full
backups, continuous WAL stream,
and snapshots as the main tool
for provision nodes and the main
strategy for DR.
But not the only 1.
So full backups would exist separately,
occasionally again, like
I would have full backups.
By the way, both snapshots and
backups, full backups must be
done, and I see it on some systems,
they do it on the primary.
It must be done on replicas.
They must be done on replicas because
it's a big stress for primary.
Even snapshot is a big stress for
primary, as I learned.
So, yeah.
It's good to do it on a replica.
And pg_start_backup, pg_stop_backup,
they support so-called non-exclusive
backups, so it can be done on replicas,
on physical replicas.
And even if they are lagging for
some time, it doesn't matter
because even lag for some minutes
nobody will notice because
we have continuous WAL stream
and replaying few minutes of WAL,
it's very fast.
Michael: Yeah, I saw in RDS's documentation
that for multi-availability
zone deployments, they take the
backups from the standby.
Nikolay: That's absolutely all,
like, this is what should happen.
Yeah.
Yeah and snapshots also should
be done from there I saw well,
maybe it was specific cases, but
I saw some cases when snapshot
of disk of primary affected performance
Because in background
data is being sent to object storage
and it's definitely generating
some disk, read disk IO for our
disks.
Michael: So I'm curious though,
why have, you mentioned in an
ideal world you have a mix, what's
the benefit of having both?
Like once you move to snapshots,
what's the benefit of also having
traditional pgBackRest style backups
as well?
Nikolay: It's a good question.
I remember reading Google Cloud
documentation many years ago
and it said, when we create snapshot
of disk, this snapshot is
not guaranteed to be consistent
when if application is running
and writing and then suddenly the
sentence was removed from Google
Cloud documentation It's it's a
matter of trust, right?
So Now I trust Google Cloud snapshot
Snapshots and a BS volume
snapshots in the edible.
Yes quite a lot.
I have trust but still some part
of me is Like It's paranoid,
you know, so so I would keep full
backups not very frequent.
I would keep them just Just to
be on safe safe side But maybe
over time trust will be complete
like if you have a lot of data
proving snapshot creation and recovery
is very reliable.
Well, consistency, by the way,
doesn't matter.
The problem I mentioned in documentation,
It's not about Snapshots
are consistent or no consistent
or no Honestly, I don't know
even I don't care because I have
to just start back up and stop
back up and I'm protected from
inconsistent state Postgres will
take care of it But it's a matter
of trust like lack of transparency
in documentation, understanding
what's happening and so on.
So only obtaining experience over
time can bring some trust and
understanding the system works
fine and Regular testing testing
should be automated recovery testing
of backups and and so on
so now I think It doesn't matter
if they are if they are persistent
Consistent or no It matters if
they work, right?
If they If we can recover and then
reach consistency point by
our means, I mean, by what Postgres
has, And that's it.
But still, what if I see snapshots
created, but I cannot use
them?
And also bugs happen, people change
systems.
I mean, cloud providers also are
being developed all the time
and new types of disks and other
changes are introduced all the
time.
So what if snapshots will not work?
If I have very reliable testing
process, if I test maybe, for
example, all snapshots, I restore
from them automatically all
the time and I see that over the
last few months Every day we
created snapshot or maybe every
few hours.
We created snapshot and we tested
them all Maybe at some point
I will say okay Full backups by
a pgBackRest or WAL-G are not needed
anymore.
But for now, my mind is in a state
of paranoia.
Michael: Very reasonable.
Sounds good?
Nikolay: Yeah.
1 interesting thing here is that
if you use snapshots, there
is a catch for large databases.
EBS volumes and persistent disks
in Google Cloud, they are limited
by 64 terabytes.
I think some types of EBS volumes
are limited by 128 terabytes,
if I'm not mistaken, so the limit
is increased.
But 64 terabytes, if you experience
problems being at level of
10, 20, 30, 50 terabytes, 64 terabytes
is not far already.
Michael: Yeah, that's true.
Nikolay: And then the question
will be what to do, because if
you use regular backup, full backup
by means of pgBackRest and
WAL-G.
You can have multiple disks and
again, it doesn't matter.
Thanks to pgBackRest, pg_start_backup,
pg_stop_backup, You can copy
data from multiple disks, right?
It can be a LVM and multiple disks
organized to give you more
disk space, right?
Michael: Like table spaces?
Nikolay: No, no, no, not table
space.
Table space is a different thing,
it's a Postgres level.
If you need to have, you can combine
multiple disks using LVM2.
And have 1 big, basically, logical
volume for Postgres.
And for regular backups, when you
copy data, pgBackRest or WAL-G
copies data to object storage,
it doesn't matter that the underlying
many disks contribute to build
this big logical volume because
we have only files up to 1 gigabyte
in Postgres so each huge
table it's shrink to 1 gigabyte
files so they are copied compressed
and copied by our backup tool.
But if you start using snapshots,
it's interesting.
Michael: Well, also people at that
scale, people are thinking
of the companies we've talked to
at that, we had a good hundredth
episode, didn't we?
And a lot of them are considering
sharding by that point anyway.
So I guess in some cases that...
Nikolay: Or deleting data, right?
That's my favorite advice.
Just delete something.
And keep sleep well.
Yeah.
But Well, so the problem is that
it's not a problem, again, it's
not a problem that you will create
multiple snapshots.
If those snapshots are created
between pg_start_backup and pg_stop_backup,
you will be able to recover from
them.
The question will be only, will
LVM survive this?
And this is question not about
Postgres, it's a question about
the cloud provider and how like
snapshot orchestration.
And this topic already goes beyond
my current understanding of
things here.
I can just suggest, I can recommend,
okay, use snapshots, but
if you reach a limit and you need
to start using LVM, it's a
different story.
So you need to check how LVM will
survive, snapshot restore,
Michael: and test.
Nikolay: I think it's definitely
solvable because like RDS did
solve it, right?
So
Michael: yeah, good point.
Nikolay: Right.
Michael: Well, that's actually
that's a really good probably
place to...
I don't know how you're doing for
time, but it makes me think
about the likes of Aurora or people
that are doing like innovations
at the storage layer where this
kind of problem just doesn't
doesn't exist anymore, you know,
they've already got multiple,
well I guess we do need to still
have a disaster recovery, you
know, whatever, all of Aurora's
down.
We need a way of recovering.
But they are promising to handle
that for us in this distributed
world, right?
That's part of the promise of keeping
copies at the storage layer.
Nikolay: Yeah, well, it's definitely
new approaches where data
is stored on object storage originally,
not only backups, but
it becomes the primary storage
originally, and then there is
a bigger orchestration to bring
data closer to compute, But it's
all hidden from user.
It's interesting approach.
But there are also doubts this
approach will win in terms of
the market share.
Right, so it's interesting.
Definitely it's where I don't know
many things.
Maybe our listeners will share
some interesting ideas and we
can explore them.
And maybe we should invite someone
who can talk about this more.
Michael: Yeah, sounds good.
Nikolay: Yeah, yeah.
But anyway, I think snapshots are
great.
I think they can be great for smaller
databases as well.
Like, small, I mean 1TB still,
because instead of 1 hour, it
will be a minute.
Right?
Michael: I think that might be...
I'm curious about that area, because
I think there are a lot
more databases in that range of
hundreds of gigabytes to a couple
of terabytes than there are companies
having to deal with dozens
of terabytes.
But if we can make their developer
experiences much better...
Nikolay: I agree.
A few terabytes, I would say, it's
already middle market in terms
of database size.
So it's very common to have 1,
2 terabytes.
And snapshots can be beneficial
because operations speed up significantly
and RTO is improved.
If you, if you remember about lazy
load and know how to mitigate
it or accept it.
Michael: Yeah.
Nikolay: Good.
Good.
Michael: Nice one, Nikolay.
Nikolay: Thank you for listening.
Again, if you have ideas in this
area I'm very... I will be happy
to learn more because I'm learning
all the time myself and I
think Michael also in same shoes,
right?
Michael: Yeah, I find this stuff
interesting I have to deal with
it a lot less than you which I'm
grateful for But yeah, definitely
let us know I guess via YouTube
comments or on various social
media things.
Nikolay: Yeah good.
Michael: Nice one.
We'll catch you next week, Nikolay.
Nikolay: Bye-bye.