Postgres Gotchas
Michael: Hello and welcome to PostgresFM, a weekly show about
all things PostgreSQL.
I am Michael, founder of pgMustard.
This is Nikolay, founder of Postgres.AI.
Hello Nikolay.
Nikolay: Hi Michael, how are you?
Michael: I'm good, how are you?
Nikolay: Very good.
Let's talk about some crazy stuff.
Michael: I chose the topic this week and it was a listener suggestion,
it's Postgres gotchas and we've had a short discussion before
this call about what we should even count as gotchas.
So how do you define it, what do you think of when you hear Postgres
gotcha?
Nikolay: Well if you for example run, I don't know, like SELECT
5/2, what will you get?
Michael: Is this like data type casting?
Nikolay: Right right well it's not Postgres related I think it's
SQL related but still what do you expect to get?
Normal person would expect 2.5, right?
But you will get, and I just double checked because I always...
I'm always like thinking, am I right?
So you're getting 2, so 5.
Michael: 0 wow, I actually thought it was going to be 3.
Nikolay: We probably have 2 things to discuss here.
First is of course integer and integer, right?
Results should be integer, so there's no decimal, there's no
point here, there's no dot.
And, but 2 or 3?
Rounding or flooring, right?
Michael: Yeah, because like rounding conventions at least where
I grew up we generally round up from 0.5 but not clearly not
here anyway I like this almost this definition which is something
where you may expect 1 outcome but a different outcome happens.
That's
Nikolay: unexpected behavior basically in other words.
Michael: Yeah, especially unexpected if you haven't read the
documentation.
Nikolay: Sometimes documentation also wrong or misleading or
sometimes it's just confusing.
Michael: Yeah fair enough.
I just wanted an excuse to list a few things that are made very
clear in the documentation, but I know people get confused by
it because often they don't read that.
Nikolay: You know, I'm curious, I actually found a good example,
right?
It's like probably a gotcha SQL, not Postgres, because
I think it's maybe defined by standard, but, and this is super
basic, but I'm wondering why indeed it's 2 not 3 because if you
for example I what I did I double check of course I checked floor
and round functions and of course
as expected floor I converted
5 to numeric or just just to
Michael: last with the colon colon
Nikolay: you're right right so
the result is also numeric and
then we have of course 2.500 like
so but then I checked floor
gives me 2 round gives me 3 is
expected because 2.5 means like
we need if you're rounding according
to their names for these
rules, I don't remember, right?
But usually 5 you go up, 4 you
go down, right?
And it gives 3, Round gives 3.
But I was thinking, okay, why 5
slash 2 integer and integer gives
2 and not 3?
Maybe because it uses 4.
We could check the source code,
of course, but the next thing
I checked is the final thing I
checked.
5 converted to numeric slash 2
gives you 2.5.
And then you convert back to int,
right?
Again, double colon int.
It gives 3.
Amazing.
So how come we have 2 if we don't
use type conversion, explicit
type conversion at all?
Any ideas?
It was good gotcha to find.
I always knew that it's 2 because
I demonstrated.
I just didn't think like why.
Now I think why.
I have questions.
Maybe there is some good explanation,
simple, but I can see easily
that it can be gotcha.
It can be considered gotcha.
And let's agree on 1 more thing.
We don't talk about nulls today.
Michael: I actually did have them
on my list, but not to go into
detail.
Nikolay: We had a whole episode,
and this whole episode is telling
like, nulls are surrounded by gotchas
both like standard defined
standard produce and like global
SQL gotchas and local Postgres
related gotchas as well for example
when you go to a race right
Let's just exclude it because it's
a whole massive area.
Michael: I think that's very sensible.
And if you haven't listened to
that episode, I'll link it up
in the show notes.
Nikolay: Consider it to be fully
included here.
Yeah.
Right.
Okay, let's maybe switch to your
ideas.
I know you have a list, pull something
from it.
Michael: Yeah, I've got a few that
I wanted to make sure we mentioned.
I'm gonna start with some ones
I consider more beginner level
gotchas and then maybe we can work
our way up a little bit.
The first 1 I had is 1 I see all
the time in kind of forums every
now and again on Reddit, quite
a few times in like the Postgres
Slack, a few up on a few mailing
lists.
People getting confused or annoyed
when they name objects in
Postgres and they use mixed case,
so maybe camel case or some
way of naming a table like for
example like a two-worded name
and using capital letters for the
2 names and no space in between
them that kind of thing and then
realizing the table is not called
that behind the scenes and it's
all lowercase and if they've
used an ORM or something to name
these at the ORM often puts
quotes around those case names
so then they become cased in the
database.
So I think this whole area is like
a 1 big gotcha for beginners
and the people who set up the schema
in the first place, if they
do something like this, if they
don't just use lowercase object
names, can really make life a bit
painful for everybody that
comes after them having to double
quote everything.
Nikolay: Yeah, especially Java
people like it.
Michael: Oh, really?
Nikolay: Yeah, I noticed.
Usually, if we have a new customer
and we start discussing their
database, we see some camel-stack-cased
table names.
And I think, I'm already thinking,
are you Java guys?
Yeah, there are.
Usually, not always, but usually
so.
And my previous startup was using
it camel-style.
And we just decided to keep it
as is because it was inherited
from some...
I think it was in MySQL originally.
Like, we started very lean, so
we borrowed some schema from some
open source project.
And it had it.
And I just, knowing this very well,
I still decided to keep it.
But the problem with double quotes
is not like...
It's not a big deal, but when...
You know what comes with quotes
usually, right?
Backslashes.
Michael: Oh, having...
Yeah, having to escape them.
Nikolay: Yeah, and you cannot escape
from this.
You need to use it to escape.
Right?
I mean, yeah.
And backslashes, if you have automation
in many languages, like
sometimes it's nightmare.
Right now we have an ongoing problem
with backslashes and escaping
of quotes and double quotes in
our AI system.
It's a combination of Python and
Bash actually.
And yeah, it drives us crazy sometimes.
We solved some problems, but not
all of them still.
And when Yi needs to talk about
your table names and you give
the schema with double quotes,
and then you want to experiment
in DBLab to find some plans or
something, it sometimes is broken
just because of problems with backslashes
and passing these pieces
between different components.
I would prefer keeping everything
lowercase, that's it.
In post-process.
Right?
Snake style, right?
Michael: Yeah.
Nikolay: And that's it.
And just forget about it.
Michael: Yeah, plays.
Pay for plays.
Nikolay: I wanted to mention the
list of gotchas I saw actually
this is like as we discussed right
before this recording I didn't
know this the word gotcha until
2006 or 2007 and this is exactly
when this list was published.
And I remember we had 2 lists,
1 for Postgres and 1 for MySQL.
For MySQL, it's outside of scope,
right?
Again, even more than nulls.
But this list, I'm looking at it
right now and I'm very first
of all it's survived how many years
like it's insane 18 17 years.
Wow.
And I'm looking at this, it's quite
short list at SQLinfo.de,
right?
And almost everything is crossed
out.
Good.
Michael: Yeah, this I've never
seen this page before.
This is really cool.
And yeah, maybe more than half
of them have been cost
down.
Nikolay: Yeah, account start is very slow.
Let's just laugh at it.
Michael: Oh, and we had an episode
on that, right?
Nikolay: I think we should.
Unicode means UTF-8 left as well.
Well, yeah.
Yeah, constraint checking is interesting,
but maybe let's not
dive into it.
And finally, I wanted to highlight
what else is left.
Only 4 items.
5, Okay, 5.
But 1 of them is upgrades, upgrades,
related stuff.
And I was discussing on Twitter,
on X, I was discussing this
topic over the last few days.
I just checked documentation of
3 major managed Postgres providers.
So RDS, Cloud SQL, and Azure Postgres.
I saw that all of them say, okay,
upgrade is done, major upgrade.
Now it's time, don't forget, now it's time to run ANALYZE.
And they say ANALYZE like they give you a snippet, ANALYZE semicolon,
which means it's single-threaded as well.
First of all, they don't include it.
None of them include it into automation.
But we don't talk about managed Postgres providers today, we
talk about Postgres.
My question is why pg_upgrade does not include...
It prints only some text in there, but who reads the text?
Many people will say, for those who don't read, it's on their
shoulders, it's their problem.
But I cannot agree with it.
Michael: I could even make a case for the gotcha here is that
statistics are not preserved on major upgrade.
I think if you asked somebody that didn't know whether they would
expect the statistics to be wiped or to be preserved, somebody
maybe doesn't understand how difficult it might be or like the
catalog, like the reasons why they're not preserved, you'd be
forgiven to think like in a in the most advanced open source
database that that might be that might happen even though it's
complicated and I think there has been some work.
It should
Nikolay: happen and Everyone agrees on it, just not solved.
Michael: Yeah, so I think that's the main gotcha.
And then, obviously, the solution at the moment is we need to
want to analyze.
Nikolay: Yeah, let's unwrap a little bit.
So we talk about statistics, which is required for a planner
to make right decisions.
If the database is not trivial, like containing like 100 rows
in each table, that's it.
If it's not trivial, we need statistics.
Without statistics, database is slow.
And what happens, I think in a zillion cases, it happens.
Like It's hard to estimate, but when I posted this tweet, I received
several replies immediately that, oh, we got beaten by this badly.
Some person wrote, we learned this hard way, obviously.
And this is like what happens all the time.
And recently, during summer, we also like failed to help a customer
with it.
Like we discussed major upgrades, but we...
From now on, I'm pointing out, like, don't forget, don't forget,
don't forget.
But we discussed upgrades, we discussed some, like, complicated
stuff, and then they went to upgrade, and next Monday they say,
oh, today they are in Europe.
So in the morning, it was already by the evening in Europe.
And they say, we had nasty database incidents today because guess
what?
We forgot to run ANALYZE.
And this is huge gotcha not only
by managed services, they could
do it but I don't care too much
about them.
This is a gotcha of pg_upgrade.
I think it's very well understood,
there is work in progress
for export and import of statistics,
like quite a long thread
in hackers already and commitfest
entry.
It's still not done, but there
is hope.
There is 1 more thing here.
So I think pg_upgrade definitely
should have it.
And also pg_restore, but let's return
to it in a few minutes.
So when someone says you need to
recalculate statistics yourself
after upgrade, pg_upgrade.
I definitely would expect pg_upgrade
to do this for sure.
But okay, it just prints do it.
pg_upgrade prints vacuumdb, I think,
no?
Or in stages, or like vacuumdb
in stages.
It has this vacuumdb analyze only
in stages.
2 options.
Analyze only means no vacuum and
just Analyze.
In stages means first it gets only
1 bucket for each table, for
each column actually, right?
And then some midway until your...
And then default_statistics_target
or individual column based
setting for each column.
default_statistics_target by default
is 100, 100 buckets.
This is what defines how much Postgres
planner will know about
distribution in each column, right?
And it can be adjusted globally
or for individual columns.
But...
Michael: Almost like a sample size,
like the sample size increases
of what it samples.
Nikolay: Yeah.
And I quickly realized this is
not what you want to do in OLTP,
because in OLTP, you either include
Analyze inside maintenance
window.
In this case, there is no sense
to run it 3 times, or you just...
Or that's it, there is only option
actually.
Like, I don't trust in opening
gates with weak statistics.
It will be like unpredictable again
incident.
Michael: Yeah.
Yeah, risky.
But this feels like a topic that
might be worth a deeper dive.
Nikolay: Right.
Right.
Right.
I just wanted to point out that
when people say Analyze, it's
just 1 threaded, right?
And you expect Postgres has a lot
of parallelism implemented.
But when you run Analyze, it's
always single threaded.
This is unexpected as well, right?
Michael: Good 1.
Yeah, I like that.
Nikolay: It's kind of gotcha inside
gotcha we have matryoshka
style gotcha here right Russian
doll style
Michael: what do you do you recommend
doing like just kicking
off a bunch of queries analyzed
tape like per table
Nikolay: I recommend partition
I recommend vacuumdb
Michael: nice okay
Nikolay: And speed up and go full
speed inside maintenance window.
Again, it's a different story.
But this vacuumdb, it has
--jobs
or -j.
You can define.
And usually, we take as many courses
we have on the server and
go full speed inside maintenance
window to achieve.
Analyze should be quite fast if
you don't do vacuum.
There is also gotcha a little bit
in vacuumdb options because
vacuumdb's, --analyze
will also do vacuum.
So there's
Michael: also analyze only or
Nikolay: there is analyze only.
Yeah, but it's easy to overlook.
There's 1 more gotcha here inside
all this.
When people go with manual single-threaded
analyze or maybe multi-threaded
but just explicit analyze.
I've like quickly, it was interesting,
I was thinking someone
said analyze can be super expensive.
I agree, but even if it's like,
if the default_statistics_target
is 1000, it's not that expensive.
But back to this, can we throttle
it?
For example, you want it to go
very slow, because autovacuum
has autovacuum, doesn't have actually
Michael: a vacuum_cost_delay also?
Nikolay: Yes, there is
vacuum_cost_delay and vacuum_cost_limit.
By default, it's off because I
think the cost limit is 0, meaning
that it's not checked at all.
Not cost limit, cost delay is 0,
like no check.
But autovacuum has like mirrored
pair of these settings.
It's throttled.
Long story short, autovacuum is
throttled usually.
And we actually usually fight with
our customers a little bit
to increase this, make it even
more aggressive.
And managed providers already did
part of that job, which is
good.
But if we run it manually, it's
not throttled.
It will go full speed.
I doubt you will need to throttle
it, but some people want, okay,
let's throttle it.
Question, how?
We have a vacuum limit, vacuum_cost_limit, vacuum_cost_delay.
We can set vacuum_cost_delay from default 0 to some point.
But is it about Analyze, or it's only about vacuum?
Based on just the naming, it's not about Analyze.
It's about vacuum only.
Michael: Right?
Yeah, that would be my guess.
But by the line of your questioning, I'm guessing it does affect
your analysis.
Nikolay: Right.
I go to documentation and I see vacuum_cost_limit, vacuum_cost_delay,
I think.
And I see description.
They talk only about vacuum, they don't talk about analysis.
So I make conclusion, it's only about vacuum.
But some part of my very long-term memory tells me, is it really
so?
And then Sadek Dusty, who helped me a lot with, and still helping
sometimes when I write at how-to's and reviews How-To's, I'm
very thankful for that.
I have already 94 how-tos.
So I wrote how-to, how-to Analyze.
And right now I need to add this.
vacuum_cost_limit, vacuum_cost_delay, they affect Analyze.
If you scroll up to the beginning of section, you will see that
it talks about both vacuum and Analyze.
And we also, using AI, I checked source code and quickly found
that there is a function called vacuumExec, which works with
both vacuum and Analyze.
And this confusion comes from source code, propagates to the
documentation, and then to our heads, right?
This is gotcha as well.
So you can use vacuum_cost_limit, vacuum_cost_delay, to throttle
Analyze, although I usually don't do it because I think we need
to go full speed.
Michael: And in the 1 example we're talking about here, we want
to go faster, not slower.
Nikolay: Right, right.
But it's interesting, right?
So there is some small confusion.
I think there are many such confusions.
But they are slowly fixed.
They are slowly being fixed.
Michael: Great example of 1 where the documentation
wasn't helpful in avoiding that 1.
So yeah, nice.
I've got a few more beginner-friendly ones that maybe we could
rattle through.
I'm not even sure you'd consider them all gotchas, but 1 relates
of places.
I think that not only catches...
Well, you don't think it's a gotcha?
Nikolay: I think we should exclude
this on the same, like, we're
using the same approach as now
because it's a huge bag of, yeah.
Michael: Cool.
And then another 1 that I see catch
out, well, a couple that
I see catch out beginners, again,
maybe this is another huge
topic, but access exclusive locks
on like just basically DDL
blocking selects, like blocking
every, blocking inserts and like
you know all sorts of people getting,
I think people don't realize
because if they're working on small
databases at first, they
start to grow in confidence of
how to do things.
And then it only catches them out
once the startup's got really
successful and the table's much
bigger, that kind of thing.
Nikolay: Yeah, I think there is
some memes with Gauss distribution,
or how it's called, right?
Gaussian distribution.
And first, I think schema changes
are painful.
In the, like, schema changes are
super painful.
They require expertise, like, beginner,
right?
Then in the middle, you can imagine,
oh, Postgres has transactional
DDL, everything is fine, we just
need to wrap everything into
1 transaction and, and rely on
Postgres great capabilities of
transactional DDL.
Michael: Yeah, or just concurrently
and think
Nikolay: yeah Yeah, MVCC works.
Well, it doesn't block sell Excel
X are always working you can
you can have a lot of Like exciting
stuff in the middle and then
on the on the right you say post
this doesn't look was this has
come Transactional detail, but
you cannot use it
Michael: Well the in on the right
has to be the same as the left,
doesn't it?
Schema changes are painful.
Nikolay: Yes, they are painful,
transactional DDL cannot be used
in many, many cases, and you can
shoot your feet very well many
times.
So this is it, I think.
And a lot of gotchas there.
Michael: Another 1 that's not necessarily,
I'm not sure if this
counts as a gotcha, I'd be interested
in your opinion but it
does have seen it catch people
out and I guess it's a little
bit of a foot gun and that's you
know you see the memes about
this as well actually DELETE or
UPDATE without a WHERE clause
and then just suddenly seeing you
know instead of 5 updated.
But if you think like it's not
Postgres, why do you say it's
not Postgres out of interest?
Nikolay: Because the SQL standard
does it like defines this and
any relational database which follows
SQL standard to some extent
has it.
Michael: Yeah.
But I've seen clients catch these,
But the ones that ship, well,
the only, the psql doesn't.
So I don't think it's unreasonable
that Postgres does it this
way, but I've definitely seen it
catch people out.
Nikolay: I agree.
Blame SQL standard.
And there is extension to forbid
it.
And also it was maybe 1 of the
very first hacking sessions with
Andrei.
We implemented some setting, GUC
setting in Postgres, which would
allow administrators to, I don't
think error, warning, like completely
forbid or just warn, I don't remember.
I think error should be here.
But it was not accepted, of course,
because I think it was discussed
many times.
I mean, it's possible.
And right now, as I see, I think
PostgREST does it, right?
PostgREST...
Michael: Oh, does it?
Nikolay: So usually right now it's
sold, if there is some middleware,
it's sold there.
So if you develop API using PostgREST,
for example, I think, Maybe
I'm mistaken, but I think there
is a way.
If there is no way, it's a good
idea to implement, actually.
Michael: Yeah, it would be, yeah.
Nikolay: Yeah, so if in URL of
your HTTP request of RESTful API,
there is a command to run delete
without any filters, I think
it will not do it it will not go
to progress with it so but inside
database I don't I don't think
it's going to change right and
we tried actually so
Michael: yes with the well and
we tried what do you mean each
right again
Nikolay: we had a session yeah
we had a patch and they submitted
it, but it was a small discussion
and chances are close to 0
that it will be successful.
Michael: Patch in psql or somewhere
else?
Nikolay: No, not in psql.
We wanted to have a Postgres
setting, GUC setting, to just
allow administrators to forbid
unspecified, I don't remember
how we called it, maybe unspecified
delete and update, when you
don't have the WHERE clause at
all.
Just like,
Michael: cool,
Nikolay: don't do it.
There's a truncate for if you want
to hit right an update of
whole table I don't know yeah but
again like right now it's in
middleware usually sold in middle
where You can have triggers
or something to protect yourself
if you want.
Or this extension if you're on
self-managed Postgres.
There is some extension for it.
I wanted to mention 1 thing, step
back to DDL.
My favorite example is, do you
think just adding a Column, you
cannot put your traffic down or
your Database down?
You can, if it cannot acquire Lock.
It starts blocking all Queries
to this Table, even SELECTs.
Like, I cannot acquire Lock, everyone
else can wait.
Even SELECTs who come with access
Share Lock, don't touch this.
That's it.
And that's not good.
And this is super unexpected and
a super big surprise.
Michael: Yeah, especially because
you could even test that.
You could even make a fork of production
or clone it run that
on its own in isolation where there
is no Lock and it runs super
fast so you think this is safe
but even though it was super fast
it does require that heavy Lock
and that blocks everything else
definitely a gotcha that's a great
Nikolay: yeah yeah and there are
many others in the area of DDL
but this I find like affecting
every single project which had
like at least some growth like
I don't like to 1000 TPS and it's
already noticeable it's not noticeable
in tiny projects
Michael: yeah yeah which I think
is part is kind of part of the
problem right or like at least
makes it worse because anybody
that gains experience on smaller
projects as they grow Get some
confidence that these things work
or they're fine and then it
change changes at some point
Nikolay: exactly
Michael: so you've got a great
article in this actually I think
that I can share about DDL it's
all about DDL changes I think
Nikolay: yeah it's yeah yeah it's
this this 1 is definitely included
I think this 1 like has a different
blog post about like I have
few about Details okay what else
Michael: well in that in that area
of making big changes partly
by mistake you know that the DELETE
or UPDATE without WHERE Do
you think people would expect to
be able to undo something?
I'm not thinking just from people
with experience of relational
databases, but the fact that if
you do something like that by
mistake, that you have to go to
your backups to get that data
back.
If I was in my early 20s or just
out, you know, if I was in my
teens, all other software I use
has the ability to undo something
if I do it by mistake.
So it does feel slightly unexpected,
potentially if I've grown
up in that world, that I can make
1 mistake and not be able to
undo it.
Nikolay: And what Database you
can?
I mean, of course, there are like...
Michael: Yeah, I don't know.
Nikolay: Time travel or...
If branching is supported, you
can...
But it's still like not super simple, right?
Well, yeah, I don't know.
I mean, there are several projects which try to solve this problem
and I mean the data specific databases supporting branching for
example or even other I think Have you heard about this new company
where Michael Stonebraker is participating?
Michael: I've
heard of it, I don't know much about it.
Nikolay: DBOps?
Or how is it called?
I don't remember.
Not DBOps, DBOS.
Michael: Yes.
Nikolay: So I wonder if this feature will be inside it.
I don't know.
For me, it's not expected.
I cannot name this gotcha because it's a hard problem, and we
didn't get used to it, to having it, because it's rare.
But if we go to young people's minds, there are many gotchas
for them, right?
Oh, this is working not as I expected.
So naming is hard just because it should avoid gotchas.
If you chose some name, you should think about other people and
think to be very predictable.
This is the art of naming.
And we just discussed this vacuum_cost_limit affecting analyze.
So yeah, and in young minds, gotchas, because gotchas world is
much bigger because they don't understand the logic in many places,
how it's like, like common practice in many places, right?
And that's why many more unexpected situations.
Michael: Yeah, fair.
Nikolay: Yeah, by the way, if you run pg_dump on a very large
database, like 1 table after another, And then we deal with snapshot,
right?
With snapshot, it's a repeatable read and snapshot.
And even if you use multiple workers for pg_dump, they are synchronized
reading from the same snapshot.
But what if some table is dropped?
Will it be present in this table?
I'm not sure, actually.
I just think there is a God check
Michael: here.
Nikolay: If during dump there are some schema changes, can we
have some inconsistency in backups?
Not in backups, in dumps.
By the way, this confusion was resolved recently, right?
We didn't mention it in the previous discussions of confusions,
but the pg_dump page is not saying it's for dumps anymore.
So dumps are not backups.
Michael: That's coming.
Nikolay: Yeah, in 2018.
Okay, okay, okay.
We will discuss this additionally.
But this is a big confusion when
the computation says pg_dump
is a backup tool.
It's still so, even in 16.
And now we have all rights to say
it's like all people who said
pg_dump is not a backup tool, we
are right, because finally in
18 it's already changed.
I think it won't be reverted, I
hope so, this patch.
But back to our topic, pg_dump can
be expected to be a tool for
backups just because of documentation.
Then you have quite a painful process
of realizing it's very
bad for backups.
So it's a gotcha, which is already
fixed in the future in Postgres
18.
And if we, in this area, pg_restore
doesn't run analyze, which
hurts every second attempt to restore
a big database, even with
experienced people like I am.
Because, again, it's like with
pg_upgrade, we discussed it.
pg_restore, You restore a huge
dump, you start working with
it, but it's slow, and in plans
you see where you expect a simple
index scan, you see bitmap scans,
bitmap index, bitmap heap scan,
because there is no vacuum, there
is no analysis.
It also should run on a vacuum
to collect visibility maps.
Or
Michael: it might not be unreasonable
to think that maybe backups
would include the statistics and
then restore would restore the
statistics.
Nikolay: Physical backups, I agree.
Logical backups, I disagree.
You don't include indexes, which
is derivative from data itself.
Statistics is also derivative of
data.
So you include description, like
create index.
There should be, maybe should be
some word like analyze in there.
I wanted to say create statistics,
but it was analyzed already.
There's such word.
Why restore?
And restore could run it.
If we use pg_restore -j
8, for example, 8 parallel drops.
In this case, why are we not recalculating
statistics?
Right now, everyone who deals with
some automation, for example,
for DBLab Engine, we need it when
provisioning is logical.
So it's dump restore and we need
to move as fast as possible
multiple jobs, multiple workers
for dump, multiple workers for
restore.
And then of course, vacuumdb,
in this case, --analyze
to have vacuum as well and also
-j, but People forget about
this and forget forgetting means
like you expect it will be fine,
but if you don't run analyze yourself
ideally in multiple threads
and multiple workers using multiple
workers in this case performance
will be very bad and people start
thinking oh Postgres is slow
have you heard about this Like
I restore from dump and it's slow.
Michael: I have to admit I haven't
heard that complaint, no.
Nikolay: Okay, I'm hearing it and
I'm doing it myself almost
20 years.
I know about this.
Michael: No, I'm not saying it
doesn't exist, I just haven't,
you know, maybe it's just not the
category of problem maybe what
happens when people experience
that is they don't noticed online
or I didn't know it's or they don't
post they just go try something
else
Nikolay: yeah it's it's not a big
deal if you restore, and it's
working, and then you want to deal
with it 1 day later.
Because probably, likely, autovacuum
will do it.
But If you expect to run fast queries
immediately after restoration
from a dump, I'm talking about
something which is 100% very well
known to everyone, who is at least
a little bit experienced.
But it doesn't make it non-gotcha,
it's still gotcha.
That pg_restore doesn't calculate
statistics and doesn't run Vacuum.
Michael: I mean it's quite remarkable,
I know we could go on
with this list for quite a while
and there's like varying levels
of gotchas but I like that I like
the list you sent me how how
short and brief it is and I I really
think for many other databases
it would be a much much longer
list I listened to well you're
listening to your experience of
you know a couple of days of
trying to use MySQL at the beginning,
and recently I was listening
to an interview with Tom Lane,
and he mentioned, like, checking
out MySQL at the beginning and
not liking a few things about
it.
And that feels like a really common
experience.
And having worked with commercial
databases like SQL Server and
Oracle, that often move much faster
in terms of adding features
that enterprise clients really
want.
There's just so many more gotchas
that result from processes
like that.
So I imagine I could be wrong.
Nikolay: I think Postgres has lots
of gotchas, lots of them.
Michael: But compared to other
relational databases?
Nikolay: Yeah, it's a complex system,
all complex systems have
a lot of gotchas.
Some of them are just very niche,
very narrow and deep and so
on.
Some of them are like nobody knows
about them.
For example, SELECT INTO.
This is what you can write inside
PL/pgSQL, right?
So you write a FUNCTION or TRIGGER,
and you write SELECT INTO
some variable, blah, blah, blah.
And then you think, OK, this SELECT
is like maybe it's complex
SELECT, actually.
Subselect or, I don't know, joins
and so on.
But in the end, you have a scalar
or something, and you select
into a variable inside PL/pgSQL.
And then you think, I want to understand
the plan, or I want
to just run it and try to troubleshoot,
and you put it to psql,
what will happen?
If you put SELECT INTO blah blah
blah, and then some regular
select from where order by you
know
Michael: just an error no what
happens
Nikolay: it will create a table
Michael: oh okay yeah
Nikolay: go select into is 1 of
your lights it's great table
This I think just should be removed,
like completely removed,
deprecated.
Select into creating a table, it's
a DDL.
This is definitely a gotcha for
everyone.
Michael: Yes, imagine all those
parsers that just...
I know this is flawed for multiple
reasons, but look at any query
that starts with SELECT, and like,
oh, it's a read query.
Nikolay: Well, yeah, usually we
say SELECT for update is acquiring
a heavy lock, blocking all writes.
Yes, blocking all writes on these
rows, particular rows, not
the whole table.
But Select can create a table.
And we have create table as select
already.
And this is what all people use,
I think, who need it.
So select into probably just some
very old stuff that could be
potentially removed, I think.
Michael: If it's not in the SQL
standard
Nikolay: oh it's a good point if
it is no chances right
Michael: yeah well probably not
no cool anything else you want
to make sure we covered
Nikolay: maybe it's enough I think
we covered less than 1% of
what we could.
This is my feeling.
What do you think?
Michael: Well, it depends what
you count as gotchas and how niche
you're willing to go.
It turns out the more you know,
the more gotchas you know about,
I guess.
Nikolay: So I could not resist
and I checked the documentation.
SQL standard uses SELECT INTO to
represent selecting values into
a scalar variable of a host program
rather than creating a new
table.
It's definitely Postgresism and
I think it should be removed.
Michael: Okay, cool, good 1.
Nikolay: Good, okay, good chat,
maybe some kind of entertaining
more than useful, but I enjoyed
it.
I enjoyed it.
Michael: Yeah, absolutely.
And feel free to send us any more
that you've got or that you
Nikolay: Maybe we should think
about specific areas like we did
for for now and maybe we should
explore some areas from this
perspective what can be unexpected
Michael: Yeah, for sure. Thanks
so much.
Nikolay.
Catch you next week.
Nikolay: Thank you.
You too.