Is pg_dump a backup tool?
Michael: Hello and welcome to Postgres.FM, a weekly show about
all things PostgreSQL.
I am Michael, founder of pgMustard, and I'm joined as usual by
Nikolay from Postgres.AI.
Hey Nikolay.
Nikolay: Hi Michael.
Michael: And today we are delighted to be joined by 2 excellent
guests who have each contributed a lot to Postgres over many
years now and who both recently published blog posts on the topic
we're going to be discussing.
Let me introduce you both quickly.
First we have Gülçin Yıldırım Jelínek, who co-founded the Prague
PostgreSQL Meetup and is a staff engineer at Xata.
Welcome Gülçin.
Gülçin: Hello, thank you for having me.
Michael: We're delighted to.
And we're also honoured to be joined by Robert Haas, long-serving
PostgreSQL major contributor and committer and VP Chief Architect
Database Service at EDB.
Welcome, Robert.
Robert: Hello, Thank you for having me.
Michael: It's our pleasure as well.
So to kick us off, I've prepared a couple of questions to ask
each of you in turn, but I'd also like to encourage you to ask
each other questions as we go along.
Perhaps we can start with you, Gülçin.
What are your high-level thoughts on the topic of is pg_dump a
backup tool and why is it something you wanted to write about
recently?
Gülçin: It is funny because I didn't actually want to write about
pg_dump.
I just joined my current employer Xata and it was my first week.
And then I noticed something in the discord channel that we have.
Somebody's having an issue with pg_dump.
I was like, Oh, what's happening?
And I saw like some parameter that I didn't recognize, like in
the error message, I was like, restrict non-system relation kind.
I was like, I don't know this configuration option or anything.
And then I noticed it was actually introduced recently at that
time.
And I was like, oh, okay, why?
And then I check it and it is kind of related to the CVE.
I remember the number 2024-734.
It doesn't matter, But there's a blog post about it, so you can
find with this number.
And in there, it explains like, what is this vulnerability and
how can actually people use this vulnerability to actually compromise
when you are, potentially your database, because it affects the
pg_dump.
So people can actually create a non-temporary object in the database.
And then just before pg_dump begins, it changes this object with
a different thing, like a view or a foreign table so people can
INSERT SQL there.
And then when pg_dump attempts to do the backup, then it can run
the injected SQL code.
So why are we there?
Because it affects it.
And then I said, hey, this affects from Postgres 12 to 16, upgrade
the Postgres versions and test if pg_dump scripts are working,
review the user permissions, the standard recommendations when
this kind of thing happens.
And then when we were sharing this blog post on Twitter, I think
our marketing team made like, okay, it's a pg_dump, a tool to
backup Postgres databases.
It was the definition of that tool, basically.
And then I read through it and everything, and I noticed, oh,
people are saying, you know, the usual, when you say something
about pg_dump, it is not a backup tool.
And I was like, okay.
And then basically it kept going.
So I had to write another blog post to say, is it really, or
is it not?
Nikolay: Who first said this?
Gülçin: I don't know.
I didn't know this because there were so many.
And I know that because in Postgres community, whenever pg_dump
topic opens up, somebody will say, you know, pg_dump is not a
backup tool.
But then actually a few days before this discussion happened,
Peter Eisentraut committed a change, which will be in effect
in PG18, that tries to remove the backup terminology and kind
of converts to export so that people are not considering it as
a backup tool in a way.
So this, I think, made people to be more vocal, saying that,
look, this was how it was before, but not anymore, and you should
not say it.
And then I had to write another, I mean, I felt like I should
write something more about it to explain why and why not we cannot
consider pg_dump backup or not.
And in my opinion, it is a tool that can be used to backup a
database.
And it is a logical way of doing a backup.
You can call it a dump.
Maybe you can define, you know, Nikolay was saying is it backup
tool or yes, no, or define backup.
So it can be a backup.
In your case, when I was like working as a DBA for a long time,
I was using it to backup databases.
Depends basically the context of how you use it and the nuances
that you can actually utilize this tool.
So yeah, that's where I stand today.
I don't agree that it is not a backup tool.
It can be a backup tool, but there are maybe later on in the
discussion we can discuss what are the drawbacks with it and
how actually regular backup tools that are out of Postgres can
help like pgBackRest or something.
Nikolay: Can I jump straight away with a question?
Gülçin: Yeah, please.
Nikolay: Yeah, I saw also comments that it's maybe for very large
databases like many terabytes and more it's not a good tool,
backup tool, but at least for small databases it's good and also
partial, you can export only 1 table.
Imagine we have a tiny database, just like, I don't know, like
100 rows, 1 table.
And I SELECT * from this table in psql and just make a picture
on my iPhone.
This is a backup, this picture we can restore from it right
Gülçin: Well maybe it's a snapshot right yeah why not well
Nikolay: dump is also snapshot right
Gülçin: yeah and that I don't really see like why it can't be
called backup
Nikolay: okay It
Gülçin: is a moment and you can use that moment to do something
with it.
Michael: I thought it was a really good blog post Gülçin, I'll
share it in the show notes as well for anybody that hasn't seen
it.
And speaking of good blog posts on the topic, I think Robert
added a lot of good points as well.
Both of your first blog posts included a lot of technical details
like the technical aspects of why it technically could be considered
a backup tool but also the drawbacks the many drawbacks of it
and why you might recommend for a general purpose backup tool
using something else.
So Robert, how about yourself?
How would you summarize your high-level thoughts on the topic
and why it was something you want, or maybe you didn't want to
write about it either?
Robert: Well I think it just kind of got under my skin because
you know Gülçin's blog post was not the first time that I've
heard people sort of using this pg_dump as not a backup to a line
and to me that kind of came across as shouting at people without
necessarily like giving you know a reasoning right you know The
documentation said for literally 20 years that pg_dump could be
used to make backups, or I don't remember exactly what the wording
was.
When I looked into the history in Git, I actually found that
the language that it's been changed to now with exporting the
database is very similar to the original language that was used
to describe pg_dump when that code was first added to PostgreSQL,
but there was a 2 decade period
in the middle when the documentation
said, hey, you can use this to
take backups.
And from my point of view, it doesn't
even matter whether that's
true or whether you think that's
true.
If the documentation said for 2
decades that X piece of software
could be used to do Y, then nobody
should get in trouble for
saying that.
Like, nobody should get called
out for saying that.
That just doesn't make any sense
to me.
Like, I mean, honestly, I think,
you know, we, some of us, self-included
can be a little too eager to jump
on people's case from time
to time.
And I don't think that's like good
for our community.
I think we wanna be the kind of
community where when people show
up we give them help, we give them
good advice, and we don't
come down on them like a ton of
bricks.
And Gülçin is not the only person
I've seen who seemed to me
to be kind of getting beaten up
a little bit.
And I was just like, why are we
doing this?
Like, clearly, pg_dump isn't right
for every purpose.
And there are lots of situations
where it's probably not what
you want.
But I just the tone is baffling
to me, because it seemed very
hostile to me and I couldn't make
any sense of really why we
should be that hostile about anything,
but especially why we
should be so hostile about that
in particular.
Gülçin: And to that I actually
have something to add, because
after this discussion started to
come up again, and I was looking
at the groups like where people
are actually using this, like
it's not a backup tool rhetoric.
And I seen like few users that
are trying to get help from this
Postgres communities that we have
online, a lot of them.
And there was like, I noted 2 of
them for today.
1 of them is asking, pg_dump can
limit a backup by schema.
I mean, it's like using this sentence
and there's somebody answering
directly.
It's not related there, but pg_dump
is not a backup.
And then there's another user,
can someone send me the command
to take backup of partial Database?
Which actually pg_dump can, right?
We can do the Schema only, we can
do just the Data, whatever,
or we can do a Table, any type
of Object.
And then answer is like, there
is no such command.
The standard backup tools take
backups of the entire Database
cluster.
So basically, it doesn't consider
it as a backup tool, even though
there is a pg_dump command that
can actually do what people are
asking.
So that's what I find very not
helpful, right?
We could just say to people, look,
this is this pg_dump command
that you can actually take this table that you want to take,
or selective restore, whatever you want to do with it, and help
people to the direction that they're actually trying to get there.
Instead, just saying, there is no such a command.
It's not a backup tool anyway, because the standard backup tools
takes the entire database cluster.
So that I don't find helpful, is what Robert is saying.
That is not helpful at all.
You might not agree that it's a good tool for using it as a backup
solution and which we can talk where it could be improved or
why actually people should prefer backup solutions.
But this is still not helpful.
There's a tool that we were all using for a long time and it
can does all the things that these people were asking.
So nuance of the question matters, the context of it.
And that's where I am, basically.
Robert: And you know, if somebody asks about how to use pg_dump,
and you want to tell them, hey, here's how to do that thing with
pg_dump, but maybe you want to consider some other alternatives
instead.
Cool, like I got no problem with that.
That can be helpful advice.
But like pretending that the thing that they're asking about
doesn't exist when it does, that I just don't understand that
at all.
Michael: So yeah, I've definitely got some theories as to why
people are behaving and speaking like that.
But I
Nikolay: do think...
1 of such people is just here.
I can speak to him if you want.
Michael: I wanted to, yeah, I wanted to, I think you've got some
really good language around this Nikolay around logical backups
and physical backups that really helps clarify and I think if
people use that language in those sentences it would immediately
help with clarity and also limitations but I'd love to hear your
like high-level thoughts on topic as well and and yeah why is
something you say
Nikolay: so this statement is a dump is not about the capital
is reaction to the statement documentation had 20 years And we
saw so many disastrous situations in many companies who Tried
to rely on this as backup tool while growing So we did like actually
it was not my statement, right?
I just picked it, right?
I think Franck Pachot also mentioned it.
I'm not sure he was the first who reacted to the Gilson's article.
But I joined as well, and I'm sure in many, not only Discord
or Slack or IRC, anywhere many people are picking up this motto
because it's painful to observe how many companies relied on
pg_dump as dumps as backups, right?
If we call dumps And considering
backups, okay, we can do that,
but there are limitations.
There is big power in this, not
only partial.
You can take specific Tables.
These days we have many managed
Postgres offerings and they don't
share backups with us.
If you want multi-cloud backup,
you must use pg_dump.
You cannot get data or copy or
something.
You cannot get physical backups
out of RDS, for example, right?
But this pain observed for a couple
of decades caused me like
joining this Movement saying that
pg_dump is not a backup tool.
At the same time, there is a like
I told Michael there is like
it's There is like a kind of professional
shift in my mind here.
Because when somebody says backups,
I envision only physical
backups.
Although there are logical backups,
of course.
And again, this is not my idea
to introduce this language.
I checked it in Oracle and MySQL
documentation.
I think maybe it's a good idea
to borrow this concept and mention
specifically SQL.
There are 2 kinds of backups, physical
and logical.
They all have pros and cons.
For example, logical backup, If
you rely on pg_dump as a backup
tool, like for example, partial
and escaping from RDS, it's good
pros, right?
Speaking of cons, it's always like
kind of snapshot.
It puts pressure on your Database
in terms of xmin horizon, affecting
autovacuum behavior, which is
unacceptable if you have 10 plus
terabytes and heavy load.
Also, at 1 day, some bug or corruption
might happen, and you
simply cannot read your data at
logical level.
While physical backups are not
affected.
They just copy files, right?
And like, there are many pros and
cons to compare, right?
And I like the idea to split language
between logical and physical.
And for me personally, when somebody
says backups without specification,
I still see by default physical
only.
Right.
Gülçin: If we are considering the
corruption, the logical backups,
the corruption can be also in the
physical level.
Nikolay: Right.
I'm okay with that, but I have
backup, I can restore and deal
with it, right?
Gülçin: Well, then actually you
can maybe keep this corruption
between your physical backups if
you didn't notice, if it's gone
unnoticed.
And then if you had the logical
backup on top of it, maybe, you
know, it could be another tool
to fight this physical corruption
that you have.
Nikolay: Yeah, what I'm trying
to say, if I have physical backups
with corruption, I will deal with
it and so on.
But if I have corruption which
prevents pg_dump from reading data,
it will just fail and I don't have
anything.
Robert: Yeah, so I'd just like
to make a couple of comments here.
I think 1 of the things that I
find really interesting is that
people who work for different companies
that all support and
use Postgres can have very different
experiences of some of this
stuff.
And I've seen that before with
other issues and I'm seeing it
here too.
Because my typical experience with
pg_dump is not the 1 that you
were describing at all.
In fact, since I've worked at EDB,
which is the whole of my professional
Postgres career, I've never had
that situation happen.
Like not once have I run into a
situation where a customer should
have been using something other
than pg_dump and they were just
using pg_dump and then they got
into trouble.
What happens to me rather frequently
is that someone has used
some other kind of backup and things
have gone really badly wrong
for some reason and pg_dump becomes
the way that we can help that
customer to get out from under
that problem.
So just as your experience with
the customers that you've worked
with is informing the way that
you view the issue.
I have a different set of experiences,
a very different set of
experiences from what it sounds
like.
And so this thing that to you feels
like, ah, this is the catastrophe.
We've got to steer clear of this.
In my experience, that's never
the problem.
It's always the thing we reach
for to get out from under the
problem.
And I really just want to highlight
that because I'm not saying
your experience isn't valid and
I hope you'll return the same
courtesy.
Gülçin: I actually understand partially
what Nikolay is trying
to say here because I was before
EDB, before working with Robert,
I was working for Second Quadrant
and we were building our own
backup solution, Barman.
Nikolay: And
now it's EDB owns it.
And then I know, because I was
actually doing remote DBA work
And there was a lot of customers
with backup issues.
They had their own home cook scripts.
In the wrong hands, this can go
wrong, because there are some
things that pg_dump and restore,
you have to know about it.
How do you do the dump process?
How do you do restore?
Do you actually test these things?
Do you copy the whole directory
or do you consider it as just
some logs that we can actually
delete at some point and so on?
So if people don't know how to
maybe put these things together
in a way, it is not really helpful
for some people, then things
can go wrong.
And I seen that things actually
went wrong.
That's why we were steering people,
you know, if you just do
regular backups and restores, use
this tool that we have or any
other tool that can be used for
backups and you can keep the
retention period, you can keep
your backups for X days, you can
restore them and test and you can
have continuous backups that
edit.
So it's not like partial, you know,
it can be just like a continuous
thing that you don't need to worry
and you can do point in time
recovery and so on and so on.
So I understand this rhetoric and
I was the advocacy of it, but
then I also feel like it went too
far saying, you know, this
is not, this is not usable and
that I, I oppose basically.
Nikolay: Yeah.
It's like pendulum.
I agree.
Yeah.
The start of this pendulum is these
20 years of documentation.
So you raised a very good point
about restore.
When I hear backup, full-fledged
backup, it's not only physical
to me, it's also verified.
And if we have physical backup
which we test, that's great.
While with dumps, I'm very curious,
while Robert, you didn't
see an ability of pg_dump to read
some, I don't know, some database
which is corrupted and we cannot
get dump out of it.
But second question like here.
Okay.
Robert: That actually happens all
the time.
And one of the things that I often
end up helping people do is
fixing the database enough that
we can use pg_dump to get the
data out of it.
Because if the database has incurred
a lot of damage at a physical
level for some reason, we're never
going to be able to repair
that well enough to give confidence
that everything is the way
that it should be.
So a dump and restore in my professional
opinion is absolutely
essential in that situation to
get back to a clean state.
Now you are 100% correct that the
dump may also fail or the restore
may also fail, but those are problems
that we can understand
and fix.
We can look and say, ah, well you
have a pg_class entry, but
you're missing a pg_index entry,
so we need to create the one or
delete the other.
That's a problem where we can say,
ah-ha, that's something that
we as Postgres experts can look
into and understand what needs
to be done to bring this back to
a state where pg_dump is going
to run.
But the blocks being messed up
at a physical level or out of
sync with each other because we've
had some time travel of some
kind or something like that, Those
are problems we won't be able
to get out from under that ever.
Does that make sense?
Nikolay: Yeah, it makes total sense.
And moreover, it's a very popular
approach to use pg_dump to test
physical backups to see that we
can read all except indexes.
For indexes, we use amcheck,
but to test physical backups,
we use pg_dump to /dev/null, for example,
just to see that there
is no corruption, like We can read
it for sure.
And the second, like you mentioned
restore.
I remember a couple of times I
saw a dump could not be restored
because of a unique key violation,
right?
Because of corruption of uniqueness
constraint.
Because some duplicates happened
and unique key didn't save us
due to some bugs or something.
Maybe somebody disabled something,
I don't know.
Or foreign keys, foreign keys as
well.
If you disable triggers, you can
corrupt your data easily, right?
You disable triggers, you load
something and you enable triggers
and Postgres won't check it.
And during pg_dump, pg_dump you can
have, but you cannot restore
from it.
Right?
So yeah, we see some mutual points
definitely here.
And the question is just about
language I guess.
That's it.
Michael: Well I think it's also
about experience Nikolay, you
mentioned some disasters, is it
my right and understanding this
is folks who have come to you with
some issue and they've only...
It's not just that they're using
pg_dump as a backup tool, it's
their only form of backup.
And what kind of issues is that
causing?
Nikolay: Remember the first managed
service, managed Postgres
service created, popular at least.
It was called Heroku.
I think it still exists, but not
being actively developed these
days.
And they offer backups as dumps.
You can download them.
That's great actually.
If a managed service, Postgres service
provider allows you to download
backups, that's great.
But it was just backups.
And nobody does this.
I mean, nobody among very popular
managed Postgres providers
do this.
They rely on physical backups these
days, right?
And also on snapshots and so on.
I mean, cloud snapshots, full disk
snapshots.
And this also shows evolution of
backup concept in many people's
minds, Not only us.
So I think it would be great just
to agree on the language and
discuss.
I'm okay to be alone thinking that
backup is just physical backup.
Backup could include both logical
and physical, and we could
clarify documentation and language
articles and so on.
And I see it's a pendulum, right?
Again, this is my point.
Too long documentation was claiming
this is a backup tool.
This language was super harsh.
And I remember I was trying to
explain at least a couple of times
in my life, I was trying to explain
to some customers with growing
Postgres databases, exceeding terabytes
and approaching 10 terabytes.
I'm saying, don't rely on pg_dump
as a backup solution And they
just showed me documentation saying,
this is like, this is what
they say.
Vendor is saying this, right?
Robert: Yeah.
I mean, I think that there is a,
maybe a difference between something
that creates a backup and a backup
tool.
I mean, this does get down a little
bit to what you think words
mean, so it almost seems like a
silly thing to argue about, right?
But I think, you know, you asked
Gotcha at the beginning, like,
if I take a snapshot of all of
my data on a cell phone, is that
a backup?
And I think the answer is obviously
yes, but equally obviously,
that's a silly way to do a backup
because your restore procedure
is going to be very unpleasant,
which is not what you want.
I think sometimes when people talk
about a backup or a tool that
can take a backup or a backup tool,
sometimes they mean like,
can I get a copy of my data from
which I could recover?
Right, and that's 1 question.
And pg_dump will give you that,
right?
The other question, sometimes what
people mean is, they mean,
is this like, and they may have
some particular commercial product
in mind that offers a certain feature
set and their question
is am I going to get this feature
set where for example my retention
times will be managed and my my
actual process of orchestrating
the backup and orchestrating the
recovery will be managed.
And then the answer is no, pg_dump
is not going to do that for
you.
And you probably do want those
things in most cases.
So I don't know, like, I think
there's a lot of nuance that's
possible in the language here.
But for me, the important thing
is to make sure that we're clearly
able to explain what the benefits
and drawbacks of the different
approaches are rather than, you
know, spending too much time
fighting about the specific language,
which for me, it gets a
little bit silly.
Nikolay: I agree.
Yeah.
Michael: I agree as well, Robert.
In your blog post, you make a really
good case for the tone of
the statement being difficult,
and I think you actually use some
language that is that like waters
it down a little bit or explains
a little bit more it doesn't take
many more extra words to do
so but I also wanted to ask do
you see this problem in other
statements in the Postgres community
like are there other things
people are saying that remind you
of the tone of this kind of
statement as well?
Robert: I don't have specific examples
in mind off the top of
my head, but definitely yes.
I mean, it's a chronic problem
on Hackers.
You know, I think I wrote a blog
post about the sort of tone
of dialogue in the Postgres community
towards the end of last
year.
And it's always a problem because
when you post your patch on
Postgres Hackers, you're essentially
soliciting review.
And people are rarely going to
write you a review where they're
like, you know what, this patch
is amazing and I love it.
I mean, it happens.
People actually do get those kinds
of reviews, and it's a great
day when you do.
But generally, when you're reviewing
a patch, you're picking
something that you actually like
and would like to see go forward.
And then you're saying the worst
things about it that you can
think of to say.
You're like, so here's all the
problems.
Here's all of the stuff that I
think needs to be better in order
for this to become part of the
product, which I hope it will,
but these things are the things
that I think need to be fixed
first.
And so what I see is that actually
for a lot of committers, in
particular, people's mental health
is not in a great place.
You know, I kinda thought my mental
health was not in a great
place around some of this stuff,
and then I talked to some other
people and found that they were
feeling worse about it than I
was feeling by like significant
margins.
And it's, in my opinion, it's rarely
because of bad intent.
I mean, obviously people get frustrated.
People say things that they shouldn't
have said or they don't
say it in the right way or they're
pissed off.
I mean, those things happen and
I don't wanna pretend like they
don't.
But I think very, very often it's
a case of the nature of the
workflow and the nature of the
process and the kind of engineering
that we're doing.
It's difficult and it's error prone
And even the absolute smartest
people in the community make all
kinds of mistakes, you know,
over and over again, right?
Like we were doing a rewrap of
a scheduled minor release that
happened last week.
We're doing that this week because
somebody committed a fix for
a bug and the fix contained another
bug.
And it doesn't matter who made
the mistake or who didn't catch
the mistake, that's not relevant.
It happens all the time.
And I think it's really challenging
to people because we work
in a very open environment where
everybody sees every email we
write, every patch we commit, every
patch we thought about committing.
You know, it's out there constantly
and you just realize that
there are so many ways for you
to screw up and every time you
make a mistake, everybody sees
it.
So I think it's a struggle for
everybody.
As far as I can tell, every single
person who works on Hackers
encounters this problem of getting
the tone right all the time.
And I am certainly not going to
sit here and pretend like I get
it right more often than average.
I think a lot of people would say
I am below average in that
way, but I can tell you I'm very
aware of the problem and I am
trying to figure out how to do
it better because at the end of
the day, it's not enough for us
to deliver great software.
We need to deliver great software
while also creating a community
that people want to participate
in.
And that applies for me, first
of all, to the developer community
because that's where I spent most
of my time, but it also I think
applies more much more broadly
to the user community.
And I think that is part of the
reason this issue set me off
a little bit, because, you know,
it's the sort of thing that
I'm struggling, often in vain,
to do right on a daily basis.
But instead of being targeted at
other developers who at least
kind of know that the negative
feedback is coming.
Some of this felt to me like it
was targeted toward users who
like they don't realize that they're
about to get jumped on for
you know wading into a flame war
about whether something is or
isn't something you know and I
just don't want you I don't want
users that I don't want anybody
to have that experience I certainly
don't want users to have that experience.
Michael: I personally think that
only from having you articulate
that I've thought of 1 that I that
annoys me a little bit and
that's the correction of people
pronouncing or spelling Postgres
wrongly or missing the S off sometimes
happens if people are
new to the community and immediately
they get jumped on.
I think, oh, come on, they're clearly
new.
So yeah, I can definitely see that.
Robert: It also happens a lot with
people based on their language
of origin.
Like the fact that we pronounce
it PostgreSQL, I believe that's
at least 1 of the canonical pronunciations,
that is much more
natural for somebody who learned
to speak English in the United
States Than it is for somebody
who learned to speak English and
for example India, right?
Like it is English, But the way
that English is spoken in India,
it's a distinct dialect.
It has its own ways that people
say things, ways that people
communicate characteristic patterns
of speech.
And that's not the only place,
certainly.
I think actually there are probably
other countries that where
the problem is even more acute
because English isn't even used
as a common language communication
in many parts of the world
But even when it is it's not necessarily
the same as your English
and people aren't necessarily going
to be You know starting from
the same point, right?
If I read a word that is unfamiliar
and my wife reads the same
word, we're likely to pronounce
it the same way in most cases.
But if a colleague from halfway
around the world reads the same
word, their instinct may not be
the same as mine.
And that's not necessarily a question
of me being right and them
being wrong.
That's the question of we went
to different schools.
We were taught different things.
Gülçin: Yeah, I think it also points
out to the wider problem
in many communities, like the longevity
of the projects will
depend on people.
And if you are hostile to people
or like, because we all come
from different parts of the world,
I didn't learn English until
I was like, you know, an older
kid.
And that is always a problem when
I give a talk or when I write
an email.
It is still in the back of my mind
that I try to correct myself,
I use multiple tools, I try to
present myself as good as I can,
but there are limits.
I still confuse the propositions
I use in and at, all around,
randomly.
I could never fix this.
And that doesn't mean that I can't
contribute to the project,
and I could and I do.
And that's what I believe, like
these little statements, maybe
we took it to a philosophical approach
through, it's not about
pg_dump, backup or not, but like
as, you know, saying Postgres,
but we should do better in how
we handle communication because
this is the way that people interact
with today, report issues.
And if you don't accept the problems,
well, people will not report
it or they will not actually use
this and report back what they
use so that you don't actually
get the feedback from people.
And because you cut these channels
that people actually try to
communicate to you, instead of
opening all these channels that
we should actually amplify, we
should have more channels for
people to bring stuff that they
interact with Postgres or ecosystem
in general.
So that's where I was really impressed
by Robert's blog about
how open he was about this.
And I appreciate the efforts that
going on towards this, because
when I started, I also felt scared,
almost reading some of the
emails.
I was like, I wouldn't want this
reaction to come to me, for
example.
So it shouldn't be like that.
Robert: And I think it's not just
an issue of dialect either.
You know, like that is definitely
part of it.
But 1 thing that I've noticed on
Hackers is that clarity and
extreme precision of expression
is very, very highly valued,
right?
Like someone can come along with
a worse idea and because they
explain it extremely clearly and
precisely either it gets accepted
or they get feedback on how it
should be changed or positive
comments.
Welcome to the community.
Hey, great to have you, right?
Somebody else writes a worse email
about a better idea, and it
actually gets a worse response.
And I do understand some of why
that happens, right?
We value people whose style of
expression is similar to our own,
where we feel like we can freely
and easily communicate with
those people, and everybody's busy,
so you don't wanna spend
a huge amount of time trying to
understand email A if you could
very quickly and easily understand
email B But it's obviously
super off-putting to people when
you may have proposed something
that was actually great And if
somebody had given you 5 minutes
of their time, they could have
understood exactly what you were
trying to say, but they just flip
through the email really fast.
And then they moved on because
they're busy.
And that's obviously going to be
demoralizing to people.
Michael: To play devil's advocate
a little bit, I personally
err on the side of being polite
and trying to be kind and trying
to be welcoming, but I also think
sometimes that approach doesn't
always land, people don't always
take the lesson from it or learn
from the statement or realise that
maybe what I'm really trying
to say or I'm not being clear enough,
that kind of thing.
And I do think, for example, with
the comment that we started
with, I feel like there's a certain
amount of trying to save
people from themselves or trying
to shock people, deliberately
trying to be provocative in order
to make people think, oh we
shouldn't only be relying on this
tool for this purpose or you
know we maybe I should be rethinking
my thoughts, you know that
it doesn't apply to all of these
cases like mispronouncing the
project name but I've seen this
specific comment come mostly
from consultants, some experienced
consultants, some who are
very kind and also involved in
like diversity initiatives.
I've definitely seen this from
people that you wouldn't necessarily
expect to be direct and unkind
so the exact phrase pg_dump is
not a backup tool.
So I think that's coming from a
place of having seen people shoot
themselves in the foot and wanting
to save people from that and
wanting to be quite direct to avoid
it.
So I don't know for sure, but I
believe their intent is good,
but maybe they're deliberately
choosing to be provocative or
direct or I'm not sure.
I'm not sure.
Maybe I'm putting words in their
mouths basically.
Gülçin: I think it's like we are
not calling out people for just
saying, you know, this is not a
backup too, because we understand
where they come from, because we
are in the same industry, working
for ages.
We know these people, we all had
the customer stories and so
on.
But I think the general idea from
here that when somebody shares
a blog post, let's say we all wrote
about it.
I had wrote 2 blog posts and Robert
wrote 2 more.
And we just got together and talking
about it.
Let's say he's pointing out why
pg_dump is good at dependency
management, let's say.
We take it for granted, which I
wanted to bring up in today's
call to just actually showcase
that there are things we should
appreciate in this tool, why he
says it is an amazing tool.
Then towards this, somebody writes
like, but it is not a backup
tool.
Then I don't get it, because it's
not what is the discussion
about.
We are trying to discuss that there
are ways you can make this
tool in your tool set.
It's not the only tool.
There are professional solutions
for backing up your database
against disaster recovery, as we
mentioned, the retention and
the whole orchestration of the
database backups and recovery.
But when we are discussing this
tool specifically, which I feel
that is important here because
there's nuance to be discussed,
and just shutting down the discussion
saying, but it's not a
backup tool, this is where I feel
that this needs to be improved
better because then you don't really
contribute to this because
you need to say then why it is
not in this case, why don't you
agree with this?
Let's say, is it dependency management
thing is not for you or
why it could be improved?
You could say that pg_dump could
be improved because let's say
we could run vacuum after it, or
we can do, I don't know, like
do statistics better or something.
I mean, to contribute where a pg_dump
might have been improved,
because I've seen people like in
the discussions that they struggle
with mapping, let's say, pg_dump
options to pg_restore options
because they assume the order will
be the same and they don't
get it and so on.
So there are things maybe we could
get input from why people
complain about these things and
to improve.
That's where I go for issues.
I see these comments in the forums
and like, oh, OK, this is
a good idea.
Maybe I can actually talk about
this.
But then when we are discussing
this and coming with like, okay,
this is not a backup tool, it kind
of brings back to the 0 and
doesn't really improve anything.
Nikolay: But your second article
was basically agreeing that
it's not a backup tool.
Gülçin: No, in the sense that people
say, as I'm saying, as a
solution, if you want to orchestrate
your backups, use a, I don't
know, a tool that is like, you
know, Barman, Baker's or something.
But then another discussion we
have, why it can't be?
Why pg_dump?
We are discussing the, because
in the second blog post of Robert,
for example, he gives up like this,
you know, why it could be
a nice tool for these of the use
cases that he lists.
And they're getting the question
of again, that I don't agree
basically, like, okay, use a better
maybe solution if you are
managing production databases in
multiple environments that are
giant databases and you really
don't need to deal with, you know,
home run.
But he's still historically, it
is still a tool that we use,
you know, it could be used for
different cases.
Nikolay: What, what I hear is you're
saying when people come
to you to comment to your first
blog post saying, I think it
was Franck Pachot and I'm joining
him, still joining.
And he said, pg_dump is not
a backup tool.
You think it's like shuts down
some discussion and so on.
But I just explained that this
is a pain from a lot of experience
and we are just reacting and what
I hear you still try to judge
him, right?
Let's just...
Gülçin: No, no, that's definitely
not for it.
I'm just saying we discussed that,
but the second blog post was
about, you know, there's backup
tools, you should use it.
But then when Robert was describing
a part of why pg_dump is
good, in my opinion, it was like
very valuable points.
And there it was not even relevant.
We were not even discussing, should
you use this tool or not?
And I'm not targeting anybody.
I'm not targeting anybody.
So be clear about it.
Nikolay: Yeah.
So the change happened only now.
It's in Postgres 18.
And recently I had discussion this
like claiming, oh, it's not
backup tool.
Somebody said, oh, what is this
about then?
And sending me a link to pg_dump
documentation.
So I think I would not judge people
who are saying pg_dump is
not a backup tool until we have
this change in documentation
and start recovering from this
stress we had 20 years.
This is my point.
I stay on this point very strong.
And common ground is let's start
distinguishing physical and
logical backups.
We can clarify this on documentation
as Oracle and MySQL did.
And there is already part of documentation
speaking of backups,
it describes dumps, I mean, pg_dump
and then file system snapshots
and then point-in-time recovery,
full-fledged backups.
And just if we clarify documentation
and I will stop seeing customers
sending me this link saying you're
wrong, this is documentation
saying you are wrong.
Robert: But like, I think, you
know, I don't know, like, if you
can't win an argument against a
documentation link, I don't know,
it feels like something's not right
there, you know, like, I'm
not trying to be harsh.
And I just feel like, you know,
if somebody hires you to give
them good advice, and you give
them advice that is actually good,
and their response is...
Nikolay: Robert, let me interrupt
you.
Sorry.
I'm just like, I feel judgment
in you and Galaxian's words.
Like, you tell me now how you want
to be welcoming, and now you
judge me like I cannot win.
I cannot win 2 things.
pg_dump is a backup tool.
Sometimes I cannot.
They say they trust documentation
more because many more minds
behind it.
And also pg_stat_statements, documentation
says you cannot say
set it to positive value, keep
it 0 globally because globally
it's a bad idea.
I already like some customers I
win, some customers I don't.
I'm not genius, right?
But I feel in both of you, I feel
judgment.
Why don't we stop judging people
and sentiment and so on?
I bring you like improvement.
Let's say there are 2 types of
backups, logical and physical.
And then we, we develop language
from there.
And this joins us.
Right?
When you judge people saying they
came to me with this statement,
or you say, you cannot win your
customer authentication, This
splits us.
And I start fighting with you.
I don't want to fight with you.
Robert: But I mean, that's also my complaint about the language
that you were using.
So I don't know how to have this discussion without having opinions
about whether certain language is good or bad.
And I don't think I mean, you can't write like we have to be
able to talk about what the language does and to what extent
it helps or hurts.
And yeah, of course, there's some judgment there I don't know
like I definitely have been in the situation of having a customer
who?
Wouldn't listen and I The frustration that you feel with that
situation feels very genuine to me like I I can totally imagine
that happening and being a bad experience, but I don't know.
I'm not even saying it's a bad thing that we changed the language
in the documentation.
I was only reacting against sort of like conclusory statement
pg_dump is a backup tool and now I don't want to talk about it
anymore I think we should always be talking about it more I think
we should be trying to as you say bring clarity to it and bring
precision to it
Nikolay: I agree with you totally like I hear you now well and
I think we will stop saying this actually, if documentation will
be, it's already fixed, I think it can be fixed even better if
we say it's a logical backup tool, for example.
Everyone will be happy, I think, right?
And we will stop saying it's not a backup tool.
We will start saying it's not a physical backup tool, which is
obvious, right?
And this will join everything and so on, right?
I agree with your reaction, actually, which says this statement
it's not a backup tool it's like too like far from balance right
it's off balance I agree with this so it's not a good statement
actually I admit but again it's a reaction to another not a good
statement which we had in documentation which didn't say logical
backup it said just just backup okay
Michael: we're pretty much out of time okay I wanted to thank
you all for your thoughts on this I think is a difficult subject
and I think actually it's really nice to have 3 people that all
care about educating folks and teaching people how to do things
well with different opinions on how to do so or you know slightly
different approaches on how to do so but as Nikolay says, as
Gülçin pointed out in her blog post, the language around this
has been changed in the documentation.
Robert, keep fighting the good fight on the hacker, the tone
on things on Hackers.
Is there any last words anybody else wants to add?
Let's start Gülçin, did you want to say anything else at the
end?
Gülçin: No, I'm happy that we are discussing it and I don't take
things personally.
I mean, we are here just to discuss technically why this could
be useful in some cases and why not.
And Yeah, that was anyway the summary of what I said in the blog
post as well.
So if people like to read it and comment on it, and I'm happy
to discuss more.
Thanks.
Michael: Wonderful.
Well, we're looking forward to your future blog posts, whether
you want to write them or not.
Robert, any last words from you?
Robert: I just think, you know, on Nikolay's comment about making
the documentation better, what I would encourage, and of course
this is much longer than we can actually do in this forum, is,
you know, let's get down beyond the headline, right?
Like saying in the headline that it is a backup tool or that
it's not a backup tool, it's an export, it's a dump, it's a lot.
We got to get beyond that subject line and think about what we
say down deeper.
I think one of the areas where the Postgres documentation is sometimes
weak is it doesn't always do a good job listing pros and cons.
Pros and cons very often don't get listed for things.
So you know that that's probably an area where we could we could
grow as a community.
Nikolay: Big time.
Michael: Brilliant.
Well thank you so much everybody and thanks Nikolay, catch you
next week.
Nikolay: Thank you.
Thank you for coming.
Gülçin: Thanks.
Bye-bye.
Robert: Ciao.
Nikolay: Bye.
Gülçin: Ciao.