Health check
Nikolay: Hello everyone, this is
PostgresFM, a podcast about,
guess what, Postgres.
I'm Nikolay and hi Michael.
Michael: Hello Nikolay.
Nikolay: How are you doing today?
Michael: I'm good, thank you.
How are you?
Nikolay: Very good.
So we discussed before the call
that we have a few options for
topics.
And it was my choice, but somehow
you chose, right?
You chose the health check or checkup.
Postgres health checks and how
to check the health of Postgres
clusters periodically, why to do
so, what to look at and so on,
right?
Michael: Well you said you had
2 ideas, 1 of them was this 1
and I was like I like that idea
And then you explained the other
1.
I was like, I like that idea too.
So I think we've got lots of good
ones left
Nikolay: to go.
Yeah.
We need to discuss the internal
policy.
How transparent should we be?
Right.
Michael: Yeah.
Well, and nice reminder to people
that you can submit ideas for
future episodes on our Google doc
or by mentioning us on Twitter
or anything like that.
Nikolay: Right.
Right.
That's, that's good.
We, I, I know people go there sometimes
and I think it's good
to remind this, this, this opportunity
and this, this feeds our
podcast with ideas.
So, yeah, good point.
So, health check, where to start?
Goals, maybe, right?
Or...
Michael: Well, I think definitions,
actually.
I think, because I was doing a
little bit of research around
this, It is a slightly loaded term.
When you say health check, it could
be for example software just
checking that something's even
alive, but all the way through
to kind of consultancy services
around, you know, that kind of
thing.
Nikolay: Select 1 is working.
That's good.
Michael: Yeah, So what do you mean
when you say health check?
Nikolay: My perception on this
term is very broad, of course.
Because in my opinion, what we
mean is not like looking at monitoring
and see everything is green or
something, no alerts or nothing.
No, no, no.
It's about something that should
be done not every day, but also
not once per 5 years, slightly
more frequently, right?
Like for example, once per quarter
or twice per year.
And it's very similar to checking
health of humans or complex
technology like cars, right?
Sometimes you need to check.
And it doesn't eliminate the need
to have continuous monitoring.
Like, for example, in your car
you have various checks constantly
being working, for example, tire
pressure.
By the way, I think for humans
it's a big disadvantage that we
don't have monitoring.
It's improving, for example, a
modern smartwatch can provide
you a heartbeat rate and so on.
But still, we're monitoring...
When I complain about how poor
some Postgres monitoring systems
are, I remind myself how poor human
body monitoring is right
now still.
And we can monitor technological
systems much better than non-technological.
So yeah, we have some things constantly
being checked, but still
we need sometimes to perform deeper
and wider analysis for different
goals.
First goal is to ensure that everything
currently is fine and
nothing is overlooked.
And second goal is to predict and
prevent some problems in the
nearest future.
And the third goal is things like
how we will live in the next
few years.
It involves capacity planning and
proper planning of growth,
and prediction, and so on.
So these are 3 key goals, very
roughly, right?
What do you think?
Michael: Yeah, I really like it,
and I think the car analogy
is particularly good in that I
don't think this is true in the
US but in the UK at least, once
a car is a certain age, like
3 years old or something, it's
required by law to have an annual
health check, we call it an MOT,
but it's not just what's wrong
with the car now that you need
to fix, but it's also looking
at what like things that could
go wrong and making and trying to
get them replaced before they fail.
So there's there are certain parts
of the car that if they haven't
been changed in a certain number
of years They have to be changed.
So I think that's quite a nice
kind of that's at least 2 of the
goals looking at What's wrong now,
but also if we don't do anything
about it what could go wrong very
soon and we should get ahead
of it, which is not necessarily
something monitoring always is
good at.
Nikolay: Yeah, well, in the US
it's very different.
No pressure at all.
You just change the oil and that's
it.
Everyone is on their own, as I
understand, at least in California.
And actually, I'm going to argue
with myself and with you a little
bit.
Better technology means less need
for health checks.
For example, I own a couple of
Teslas, and I was very concerned
that nobody is replacing anything
like filters and so on and
then I needed to replace
the windshield after some trip
to Nevada. And I asked like guys,
like check everything like brake
pads and so on because of course
I'm worried like it's already
like almost 2 years and nothing and they said everything
is fine we are going to replace
just the cabin filter and that's
it. No need to replace
anything because it's better technology.
So I think, of course, the amount
of work that needs to be done...
Let's go back to Postgres because
it's a PostgresFM podcast, as
I said in the beginning.
So, if we have very old Postgres,
it probably, it perhaps needs
more attention and work to check
and maintain good health.
Michael: I think there are 2 axes.
I think you're right that technology
plays into it, but I think
also simplicity versus complexity
plays a part as well.
And as Postgres gets more complex
or as your Postgres setup gets
more complex, I think actually
maybe there's more you need to
do, whereas on the car analogy,
I think one of the arguments for
things like electric cars is
they're simpler.
They have fewer parts.
They are easier to maintain.
So I think there's also an argument
for keeping your Postgres
setup.
This is one of the arguments for
simplicity in, well, we'll get
to some of the specifics, but on
the technology side, for example,
if we could get bigger than 32-bit
transaction IDs,
Nikolay: we
Michael: wouldn't have to monitor
as often for wrap, like wrap
around would be less of an issue.
If we defaulted to int8 or bigint
primary keys instead of int4,
we don't have as many of those
issues.
So, by certain choices, we can simplify
and prevent issues, but
also the technology.
Nikolay: And defaults matter a
lot.
Defaults matter.
If, for example, all our application
code or frameworks, and
for example, Ruby on Rails made
this choice a few years ago.
As you said, default to bigint
for surrogate primary keys,
then minus one report needed, right?
At least, yeah, so we don't need
to check integer capacity
for primary keys.
And that's good.
But of course, the complexity of the
system can grow and sometimes
we check in Postgres help, we go
beyond Postgres, for example,
we check connection pooling and
how orchestration and automation
around Postgres is organized with
the goals to understand the
HA, high availability, disaster
recovery, these kinds of topics.
So and of course capacity planning.
We will grow a few years without
performance issues or not.
But anyway, we can add many things
like that.
For example, if threads topic succeeds,
started by Heikki a year
ago, if this succeeds, we won't
be worried about single CPU core
capacity for system processes.
Or for example, there is an ongoing
discussion for pgBouncer that
threading is also a good thing,
so we won't be bound, the process
won't be bound by a single core,
and this is a hard wall to hit.
And Postgres processes can also
hit, some Postgres processes
can hit single core utilization
100% and even if you have 100
more cores, it doesn't help you,
right?
So if, like, I cannot say this
is simplicity because technology
is similar to EEV It's also like
a lot of complex technology
behind it but it's simplicity of
the use, right?
Of usage, maybe.
Michael: Well yeah, I would actually
argue that let's say we
did get threading in Postgres in
a major version in a few years
time.
Actually, at first, you'd want
to be doing health checks more
often.
You'd be more nervous about things
going wrong because you've added
complexity and things that maybe
for over a decade have been
working great suddenly may not work as well or just their
patterns may change
Nikolay: Well maybe, maybe, but let's
let's find some mutual ground
here maybe like we both agree that
health checks are needed,
right?
Or right now at least.
Maybe in Postgres 25, it won't be
needed so much because it will
be an autonomous database.
Michael: I personally, I think
it's a really good way of keeping
a system reliable and avoiding
major issues including downtime
and outages and huge things.
But that's, yeah, I don't need
it as an interesting word.
Like a lot of the people I see
hitting issues are just living
in a reactionary world.
A lot of them are startups and
they.
Yeah.
But once like once things are working
well and you've got, like
maybe you're hitting like an inflection
point or you need to
capacity plan or you're in a large
established business where
outages can be really expensive,
it seems wild to me to not be
doing semi-regular or if not very
regular health checks.
Nikolay: Right.
Yeah, I think you're very right.
There are two categories of people
who, for example, go to us for
health checks.
One is probably the bigger category
of people.
People who knew that it's good
to have a health check, but they
only come to us when something happens,
something bad happens,
something already hitting.
And in this case, well, it's actually
like a psychological thing,
right?
For example, for a CTO, the Postgres
database is just one of the things
in infrastructure.
And when it hurts, only then we
realize, okay, let's check what
can hurt soon as well, what other
things can be a problem in
this area, Postgres, right?
This is a very good moment to perform
a health check.
Another category is, I would say,
wiser guys.
They do health checks before a launch
or in the initial stage,
but at the same time it's harder
to work with them because usually
the project is smaller, and it's harder
to predict workload patterns,
growth rates, and so on.
But this is better, of course,
to start checking health initially
and then return to it after some
time.
Michael: Yeah, it can go hand in
hand with, like, stress testing
as well.
Can't it?
Like before I see a lot of people
doing this kind of thing before
big events, like in the US you've
got Black Friday, Cyber Monday
peak times where you know there's
going to be increased traffic
or increased load.
Before that, there's a lot of work
that people are doing often
to just make sure they can.
Is that a good time to do a health
check?
Nikolay: Right.
Well, yes, but I would distinguish
these topics.
Load testing and preparation for
some events like Black Friday
is something else.
Health checks, as I see them, doing
for many years, it's quite
an established field.
They include review of settings,
review of index health, bloat
first, let's talk bloat first,
both table and index B-tree, then
index health, and we had an episode
about it, we had an episode
about bloat, about index health
and maintenance then query analysis,
like do we have outliers who are
attacking us too much in various
metrics I mean queries in pg_stat_statements and other extensions
Then static analysis, for example,
you mentioned int4 primary
keys, lack of foreign keys and
so on.
Because if we see it first time,
a lot of findings.
If we see it a second time, maybe
since last time, developers added
a lot of new stuff, right?
And other areas as well.
And capacity planning can be strategic,
like many years, or it
can be before some specific event
like Black Friday for e-commerce.
In this case, it may involve additional
actions like let's do
load testing, but I would say health
check, normal health check,
is a simpler task than proper load
testing.
That's why I'm trying to say it's
a separate topic, because if
you're involved, a health check
suddenly becomes bigger.
You can think about it, if you
go, like in the US, if you go
to a doctor, there is an annual health
checkup, covered by insurance
always, right?
So, it's like your primary care
physician doing some stuff, discussing
with you, checking, many things
actually, but it's not difficult.
And then if there is some question,
you go to a specific doctor.
And load testing is a specific
doctor, I would say.
Michael: Yeah, I was just thinking
about timing, if it's more
likely that people start the process
then.
And just like maybe just before
they go on a holiday maybe they
go for their annual checkup or
like I don't know if there's like
times where people are more likely
to go for those than others
but yeah should we get into some
more specifics
Nikolay: yeah well speaking of
time to finish this topic Sometimes
people just have periodical, you
know, like, life cycle episodes
in their work, right?
For example, this financial year,
right?
And you plan finances, you spend
them, and so on.
You manage them.
Similar proper capacity planning
and database health management
should be also like periodical.
You would just have some point
of time, maybe after some events,
for example, after Black Friday,
you think, okay, next year,
how we expect our growth will look
like, how we expect our incident
management will look like.
And this is a good time to review
the health, understand ongoing
problems and try to predict future
problems.
You also perform capacity planning
to understand do we need to
plan big changes, like architectural
changes, to sustain the
big growth if we predict it in
a couple of years.
In this case probably you need
to think about microservices or
sharding or something like that.
And health check can include this,
like overview.
And this is a big distinction from
monitoring, because monitoring
very often has very small retention.
You see only a couple of weeks
or a couple of months, that's
it.
For an existing evolved system, you
need a few years of observation,
data points, better to have them,
right?
To understand what will happen
in the next years.
And this is when like a health check
can help and capacity planning.
Maybe load testing as well, but
it's additionally, I would say.
Yeah, specific topics.
So the tool starts usually for
me specifically.
We actually have this tool, Postgres
Checkup and it has a big
plan.
The tool itself implements only,
I would say, less than 50% of
the plan.
The plan was my vision of HealthCheck
several years ago.
I did conference talks about it.
At that time, I remember I performed
really heavy health checks
for a couple of really big companies.
It took a few weeks of work.
It's like a lot, like deep and
wide, and like 100 pages report,
PDF of 100 pages, it's like interesting.
Executive summary alone was like
5 or 6 pages, like quite big.
Wow.
Yeah, yeah, well, it's a serious
thing.
But it was for a company which
costed more than $10 billion.
And the Postgres databases served
a few clusters that were
in the center of this serious business,
so it was worth it.
And then this tool, this was the vision,
and we implemented in this
tool, we implemented like almost
half of this vision.
And for me everything starts with
version.
Simple thing.
Let's check version.
Major and minor.
And maybe history of it.
Because both things matter a lot.
If your current version is 11,
you're already out of what?
Out of normal life, right?
Because the community won't help you,
and bugs won't be fixed unless
you pay a lot.
Michael: And out of security patches
as well.
Nikolay: Right, yeah, that includes
security also, bugs.
If you're on 12, you have time
until November, right?
So, major version is important,
and also you are missing a lot
of good stuff.
For example, from one of recent experiences,
if you're on 12, it
not only means that you soon
will be having non-supported
Postgres, but also you lack good
things like WAL-related metrics
in pg_stat_statements and explain
plans, which were added in the next
version, 13, right?
You lack a lot of improvements
related to logical decoding and
replication.
Probably you need to move data
to Snowflake or between Postgres
to Postgres and you miss a lot
of good stuff.
So major version also matters a
lot, but the minor version may be
number 1 thing.
If you also lag in minor version,
obviously you probably miss
some important fixes and security
patches.
So here our tool is using yupgrade,
depesz.com, links to explain
what...
Yeah, it's a great tool, but now
we're already moving towards
combining...
It's a huge list sometimes, right?
If you're lagging 10 minor versions
behind, it's more than a
couple of years of fixes.
And you're like, oh, what's important
here?
Security, but maybe not only security.
So, yeah, we are moving towards
trying to summarize this and
understand particular context.
Of course, guess what helps?
LLM helps here.
Like it's, yeah, I think in the
nearest future we will have interesting
things here like combining observations,
why upgrade this diff,
and like this is summary what matters
for you, like almost part
of executive summary can be read
by CTO and with links, so like
it's not like...
Yeah
Michael: I hope you're right long
term, but I think at the moment,
the release notes are important
enough like the risk of missing
something very important is still
very high.
Like I think sometimes the LLMs
at the moment still, they can
be good for like idea generation,
but if you want an exhaustive
list of what's very important,
they can miss things or make things
up.
So I would still not rely on them
for things like reading, which
like even minor release notes can
include things that are very
important to do, like maybe a re-indexing
is needed and things
like that.
Nikolay: Yeah, I agree with you,
but I think you are criticizing
a different idea than I'm describing.
I'm not saying, like, first of
all, health checks we are doing
and I think in the next few years
we'll be doing them still a
lot of manual steps for serious
projects.
So they, of course, involve DBA
who is looking at everything.
But LLM can help here.
You have usually pressure of timing.
You need to do this work and a
lot of stuff.
And change log with 200 items,
for example.
Of course you check them, but I
bet no DBA spends significant
time to check properly.
They said they check, but I don't
trust humans as well here.
Michael: So sorry, I think including
them in a report is different
to reading them before upgrading.
That's a really good point.
Nikolay: Right, so what I'm trying
to say, we try to convince
clients that it's important to
upgrade, both minor and major.
And while explaining, of course
we can provide, not we can, we
provide the list for full changes,
why upgrade is great too.
But also we want to bring the most
sound arguments, right?
And OLM helps to highlight.
It was my long-term vision a few
years ago.
I thought, why upgrade?
You take it and it would be great
to understand.
There is an item about GiST indexes,
or GIN indexes, for example.
We have a schema here, we can check
if we have GIN indexes, right?
Does it affect us?
And if it does, this should be
highlighted.
This is the idea.
And LLM is good here.
It's like it just performs some
legwork here.
I mean, not only LLM, some automation
with LLM.
And it just helps DBAs not to miss
important things, you know?
But of course, again, I'm sure
it should be combined, like ideally
it should be the work from automation
and humans.
Okay, too much for versions.
Actually, it's the smallest item
usually, right?
We don't have time for everything
else, what to do. Okay, let's
let's speed up maybe a little bit.
Michael: I think we've mentioned
a bunch of them already, haven't
we like you've done
Nikolay: Yeah, but right.
We had episodes separately, but
they are super important.
Settings go usually second.
Settings.
It's a huge topic.
But, from my experience, if I see
autovacuum settings being
default, or checkpoint settings,
first of all, max_wal_size,
in my opinion, the most important
one, being default 1 gigabyte,
or logging settings being default.
What else?
These kinds of things.
Michael: We did a whole, We've
done one on the default configuration.
Nikolay: Our favorite, random_page_cost being 4.
Recently we discussed it.
These are strong signs to me that
we need to pay attention here.
Even if I didn't see the database
yet, I already know that this
is a problem.
Michael: So this is really useful
for that first health check
you ever do.
Maybe you're a new DBA or new developer
on a team.
When you're doing a second health
check or subsequent ones, are
you diffing this list of settings
and seeing what's changed in
the last year?
Nikolay: Good question.
We have reports that can help us
to understand that, for example,
checkpoint needs reviewing, reconsideration,
checkpoint settings,
or autovacuum needs reconsideration.
So yeah, you're right.
I'm just coming here with like,
I first met this database and
reviewed it.
But yeah, second is easier usually,
but if it grows fast, it
can be challenging as well.
So sometimes we need to revisit.
Yeah, but settings, it's a huge
topic.
I just wanted to highlight the
easy, low-hanging fruit we usually
have.
Default is a low-hanging fruit.
For example, if the auto_vacuum
scale factor is short or regular,
it's default like 10%, for any
OLTP database I would say it's
too high, way too high.
So many things like that.
And logging settings, if I see
logging settings are default,
I guess people don't look at databases
properly yet and we need
to adjust them and so on.
But second time is easier usually,
I agree.
Second time you just revisit things
that change a lot.
So, bloat estimates tables, indexes.
Quite easy, right?
We usually don't want, very roughly,
we don't want anything to
be higher than 40-50%.
Depends, of course, but this is
normal.
And converting to number of bytes
always helps.
Like, people say, oh, my database
is 1 terabyte, and I have like
200 gigabytes of table and index
bloat.
Wow.
Of course we have this asterisk
like comment, remark that it's
an estimate, it can be very very
wrong.
We discussed, we had an episode
about bloat, and autovacuum
as well, very related topic here.
And then from table bloat, index
bloat, from index bloat we already
go to index health, which is of
course index bloat included,
But also unused indexes, redundant
indexes, well, rarely used
indexes, like it's a rarely used
report.
But also invalid indexes, unindexed
foreign keys.
What else?
Maybe that's it.
Michael: Are you overlapping?
Nikolay: Redundant, we call that
redundant.
So yeah, and we suggest talking about
index maintenance and I usually,
if it's the first time, I explain
why it's so important to implement
index maintenance and why we should
understand that we do need
to rebuild indexes from time to
time.
Michael: Are you doing any corruption
checks?
Or I'm guessing maybe not in this
case.
Nikolay: Oh, that's a great question
actually.
I think we can include this, but
a proper check is quite heavy.
Michael: Yeah.
And one of the nice things about
this tool is it's like a lot of
these, it's been designed to be
lightweight in this case, right?
Nikolay: Yeah, it's very lightweight.
It was better tested in very heavily
loaded clusters.
And it limits itself with statement
timeout like 30 seconds or
even 15.
I don't remember from the top of my
head, but if it takes too long,
sometimes we just miss some reports
because statement timeout.
We don't want to be a problem.
Observer effect.
Michael: Yeah.
Nikolay: So, yeah.
And if we talk about indexes, okay,
index maintenance, quite
straightforward topic already,
quite well-studied.
And of course, we don't forget replicas,
and we also don't forget
about the age of statistics, because
if it was reserved yesterday,
we cannot draw proper conclusions,
right?
Because maybe these indexes were
not used in the last 24 hours.
But okay, anyway, we had the index
maintenance episode.
Yes.
Michael: I have a couple of questions.
I have a couple of questions on
the cluster health.
So, replication, are you looking
at delay at all?
Or what are you looking at when
it comes to replication health?
Nikolay: That's a good question.
I doubt we have replication lags
being checked by this tool at
all.
It's usually present in monitoring.
So as part of broader health
check, not only we use health
check, but the postgres-checkup report,
but we also talk with the
customer and check what they have
in monitoring, and sometimes
we install secondary monitoring,
which we have, and gather at
least 1 day of observations and
include everything.
And also discussion matters.
This is probably what automation
won't be having anytime soon,
is live discussion, trying to understand
pain points, like what
bothers you, what is the issue
recently, what issue do you expect
to happen soon.
This live discussion at a high level
may help prioritize further
research.
So proper health check includes
both automated reports and these
kinds of...
So there we can check replication,
but usually if people don't
mention replication lags being
an issue, they have it in monitoring,
we just skip this, usually.
Sometimes they mention, for example,
a lot of WAL is generated
and we say, okay, let's go to pg_stat_statements
and check WAL
metrics, try to understand why
we generate so much WAL, optimize
some queries, enable WAL compression
if it's not enabled, and
so on and so on.
Also, disk, let's just check what
kind of disks you have.
It's a big topic.
Michael: Yeah, the last question I
had was another thing that I imagine
is very important to check the
first time you want to know the
overall system health, or at least
setup health, but might be
tricky to automate or quickly run
a query for, is what's the
health of the backup process or
Nikolay: the risk?
You're raising questions.
Yeah, if you check the plan, which
is in the readme of the postgres-checkup,
this is included there,
and the checkbox is not checked,
of course.
Michael: Oh, okay.
Nice.
Nikolay: Yeah, we thought about
automating this, but we have
so many different situations, different
backup tools used, different
managed providers and so on.
Every time it's quite different.
We have a wide range of situations.
Michael: Yeah, I wasn't actually
thinking about postgres-checkup
at all, I was thinking more about
health check.
As part of a health check, you'd
probably want to check you can
restore.
Nikolay: Did we have an episode
about backups and disaster recovery
at all?
If not, let's just plan it.
Because it's not a topic for 1
minute, it's a huge topic and
probably...
Michael: Of course.
Nikolay: Yeah, maybe number 1 topic
for DBA, because it's the
worst nightmare if backups are
lost and data loss happened, right?
Obviously, we'll include this to
consideration unless this is
managed service.
If it's managed service, we have
guys who are responsible for
it, right?
Well, it brings us to this topic,
managed versus self-managed.
Because for example, HA, since
not long ago, both RDS and GCP
CloudSQL, they allow you to initiate
a failover to test it.
So, you rely on them in terms of
HA, in terms of if 1 node is
down, a failover will happen.
But you can also test it, and this
is super important.
Backups, it's like a black box, we
cannot access files as we discussed
last time.
But you can test restoration, right?
And it's fully automated.
And maybe you should do it if you
have a serious system running
on RDS, because as I understand,
RDS itself doesn't do it for
you.
They rely on some global data,
as I understand.
They don't test every backup.
Michael: All I mean is if I was
a new DBA going into a new company
and I want, like 1 of my first
tasks was to do a health check,
this, that would be 1 of the things
that I would think of as
including as in that health check.
Maybe a new consultant would as
well.
Nikolay: DR and HA, two fundamental
two-letter terms, right?
You definitely, if you're starting
working with some database
seriously, you definitely need
to allocate a good amount of time
for both these topics.
Like good means at least a couple
of days for both to deeply
understand what...
For example, if we talk about backups,
you need to understand
RPO, RTO, real and desired, to
check if we have a document, where
it's documented and procedures
defined to maintain these objectives.
These are super serious topics
and for any infrastructure engineer
I think it's quite clear, for any
SRE it should be quite clear
that these are two SLOs that you
need to keep in mind and always
revisit and maintain and have monitoring
that covers them and
so on.
But this is again, like this is
maybe...
In health check, we just need to
check and understand if there
is a problem here.
To study this topic is similar
to a lot of testing topics.
It's like a special doctor.
So yeah.
What else?
To wrap up, maybe a couple of final
things.
I like to check query analysis
in table form.
In table form it's quite exotic
maybe because people got used
to some graphs of course, historical
data, it's great.
But when I look at table data for
pg_stat_statements with all these
metrics and derived metrics, it
gives me a very good feeling of
what's called.
So to understand that, it's great.
And also static analysis.
Michael: Ordered by what?
Always ordered by total time.
Total time, I think, because we're
looking at system-level health.
Nikolay: Yeah, yeah, yeah.
Total time is default for us because
we talk about resources,
but if you aim to analyze the situation
for end-users, you need
to order by mean time, and sometimes
you need to order by different
metrics.
For example, as we discussed above,
earlier, WAL metric, for
example.
Sometimes temporary files, for
example.
If we generate a lot of temporary
files, we want to order by
temporary files.
In this case, sometimes we just
generate additional reports manually.
Michael: Yeah, makes sense.
Nikolay: And actually that's it.
Not difficult, right?
Michael: No.
Do you see a lot of people doing
this in-house?
I know a lot of consultancies,
including yourselves, offer this
as a service.
Do you see the pros and cons of
those?
Nikolay: Well, usually in-house,
if you do it forever, you don't
see problems, or you don't prioritize
them well.
Because I saw we have bloat, but
we have some kind of life still,
we are not down.
I saw we have, for example, 5 terabyte
unpartitioned table, but
it's kind of fine maybe, and so
on.
But when external people come,
at least temporarily, or for example
you hire a new DBA, DBRE, this
is a good point to revisit and
reprioritize things and avoid problems
in the future.
So that's why fresh blood, fresh
look is good here.
But usually people get a lot of
value during first checkup, health
check, then smart people implement
periodical health checks.
And then just, I would say, in
1 or 2 years, if things change
significantly, this is when you
need to talk again.
Because if health check was full-fledged
in the first year, 1 or
2 years will be fine and your team
will be performing all the
problems, solving all the problems,
knowing what to do.
Yeah, nice one.
But honestly, give me any database,
I will find problems easily.
I mean, actual live production
database, not some synthetic.
Anyway, like, summary is don't
postpone this.
Perform health check, plan it and
do.
Check your health.
Michael: I had one last thing to
mention.
I met a few people at PG Day Paris,
so hi to everyone who's listening
from there.
And if you're watching on YouTube
and just listening to the audio,
we do also have an audio-only version
in the show notes.
So you don't have to waste your
precious bandwidth and data on
watching us.
And if you listened to the audio
and didn't know we had a YouTube,
then now you know that too.
Nikolay: Right, right.
We also have a transcript if you
like reading better and I think
it's quite good.
It's quite good.
Actually, guys who are listening,
please check transcript and
let us know if you think it's good
to read or it's hard to read.
I would like to know what you think
about the quality of transcripts.
I was thinking about improving
this and having some...
Not a book, but a kind of set of
our discussions we had almost
last 2 years as just a set of texts.
Because some people just like reading,
right?
So if people find the quality good,
I think it's worth investing
to organize it better and maybe
provide some more links and pictures
maybe and so on.
So this is the idea we discussed
some time ago.
I think maybe I will pursue it
soon.
But I need to know what people
think about the quality of recent
transcripts we have on PostgresFM.
Michael: What's the best way to
contact you for that?
Nikolay: YouTube comments, Twitter,
all Twitter, right?
Michael: Yeah, yeah.
Nikolay: I think, yeah.
Michael: Nice.
Nikolay: Good.
Michael: Yeah.
Thanks so much, Nikolay.
Nikolay: Thank you, Michael.
Michael: Catch you next week.
Bye.
Bye-bye.
See you next week.
Bye.