Postgres FM | Transcript: Health check

March 22, 2024 • 41 Minutes

Health check

Nikolay: Hello everyone, this is
PostgresFM, a podcast about,

guess what, Postgres.

I'm Nikolay and hi Michael.

Michael: Hello Nikolay.

Nikolay: How are you doing today?

Michael: I'm good, thank you.

How are you?

Nikolay: Very good.

So we discussed before the call
that we have a few options for

topics.

And it was my choice, but somehow
you chose, right?

You chose the health check or checkup.

Postgres health checks and how
to check the health of Postgres

clusters periodically, why to do
so, what to look at and so on,

right?

Michael: Well you said you had
2 ideas, 1 of them was this 1

and I was like I like that idea
And then you explained the other

I was like, I like that idea too.

So I think we've got lots of good
ones left

Nikolay: to go.

Yeah.

We need to discuss the internal
policy.

How transparent should we be?

Right.

Michael: Yeah.

Well, and nice reminder to people
that you can submit ideas for

future episodes on our Google doc
or by mentioning us on Twitter

or anything like that.

Nikolay: Right.

Right.

That's, that's good.

We, I, I know people go there sometimes
and I think it's good

to remind this, this, this opportunity
and this, this feeds our

podcast with ideas.

So, yeah, good point.

So, health check, where to start?

Goals, maybe, right?

Or...

Michael: Well, I think definitions,
actually.

I think, because I was doing a
little bit of research around

this, It is a slightly loaded term.

When you say health check, it could
be for example software just

checking that something's even
alive, but all the way through

to kind of consultancy services
around, you know, that kind of

thing.

Nikolay: Select 1 is working.

That's good.

Michael: Yeah, So what do you mean
when you say health check?

Nikolay: My perception on this
term is very broad, of course.

Because in my opinion, what we
mean is not like looking at monitoring

and see everything is green or
something, no alerts or nothing.

No, no, no.

It's about something that should
be done not every day, but also

not once per 5 years, slightly
more frequently, right?

Like for example, once per quarter
or twice per year.

And it's very similar to checking
health of humans or complex

technology like cars, right?

Sometimes you need to check.

And it doesn't eliminate the need
to have continuous monitoring.

Like, for example, in your car
you have various checks constantly

being working, for example, tire
pressure.

By the way, I think for humans
it's a big disadvantage that we

don't have monitoring.

It's improving, for example, a
modern smartwatch can provide

you a heartbeat rate and so on.

But still, we're monitoring...

When I complain about how poor
some Postgres monitoring systems

are, I remind myself how poor human
body monitoring is right

now still.

And we can monitor technological
systems much better than non-technological.

So yeah, we have some things constantly
being checked, but still

we need sometimes to perform deeper
and wider analysis for different

goals.

First goal is to ensure that everything
currently is fine and

nothing is overlooked.

And second goal is to predict and
prevent some problems in the

nearest future.

And the third goal is things like
how we will live in the next

few years.

It involves capacity planning and
proper planning of growth,

and prediction, and so on.

So these are 3 key goals, very
roughly, right?

What do you think?

Michael: Yeah, I really like it,
and I think the car analogy

is particularly good in that I
don't think this is true in the

US but in the UK at least, once
a car is a certain age, like

3 years old or something, it's
required by law to have an annual

health check, we call it an MOT,
but it's not just what's wrong

with the car now that you need
to fix, but it's also looking

at what like things that could
go wrong and making and trying to

get them replaced before they fail.

So there's there are certain parts
of the car that if they haven't

been changed in a certain number
of years They have to be changed.

So I think that's quite a nice
kind of that's at least 2 of the

goals looking at What's wrong now,
but also if we don't do anything

about it what could go wrong very
soon and we should get ahead

of it, which is not necessarily
something monitoring always is

good at.

Nikolay: Yeah, well, in the US
it's very different.

No pressure at all.

You just change the oil and that's
it.

Everyone is on their own, as I
understand, at least in California.

And actually, I'm going to argue
with myself and with you a little

bit.

Better technology means less need
for health checks.

For example, I own a couple of
Teslas, and I was very concerned

that nobody is replacing anything
like filters and so on and

then I needed to replace
the windshield after some trip

to Nevada. And I asked like guys,
like check everything like brake

pads and so on because of course
I'm worried like it's already

like almost 2 years and nothing and they said everything

is fine we are going to replace
just the cabin filter and that's

it. No need to replace
anything because it's better technology.

So I think, of course, the amount
of work that needs to be done...

Let's go back to Postgres because
it's a PostgresFM podcast, as

I said in the beginning.

So, if we have very old Postgres,
it probably, it perhaps needs

more attention and work to check
and maintain good health.

Michael: I think there are 2 axes.

I think you're right that technology
plays into it, but I think

also simplicity versus complexity
plays a part as well.

And as Postgres gets more complex
or as your Postgres setup gets

more complex, I think actually
maybe there's more you need to

do, whereas on the car analogy,
I think one of the arguments for

things like electric cars is
they're simpler.

They have fewer parts.

They are easier to maintain.

So I think there's also an argument
for keeping your Postgres

setup.

This is one of the arguments for
simplicity in, well, we'll get

to some of the specifics, but on
the technology side, for example,

if we could get bigger than 32-bit
transaction IDs,

Nikolay: we

Michael: wouldn't have to monitor
as often for wrap, like wrap

around would be less of an issue.

If we defaulted to int8 or bigint
primary keys instead of int4,

we don't have as many of those
issues.

So, by certain choices, we can simplify
and prevent issues, but

also the technology.

Nikolay: And defaults matter a
lot.

Defaults matter.

If, for example, all our application
code or frameworks, and

for example, Ruby on Rails made
this choice a few years ago.

As you said, default to bigint
for surrogate primary keys,

then minus one report needed, right?

At least, yeah, so we don't need
to check integer capacity

for primary keys.

And that's good.

But of course, the complexity of the
system can grow and sometimes

we check in Postgres help, we go
beyond Postgres, for example,

we check connection pooling and
how orchestration and automation

around Postgres is organized with
the goals to understand the

HA, high availability, disaster
recovery, these kinds of topics.

So and of course capacity planning.

We will grow a few years without
performance issues or not.

But anyway, we can add many things
like that.

For example, if threads topic succeeds,
started by Heikki a year

ago, if this succeeds, we won't
be worried about single CPU core

capacity for system processes.

Or for example, there is an ongoing
discussion for pgBouncer that

threading is also a good thing,
so we won't be bound, the process

won't be bound by a single core,
and this is a hard wall to hit.

And Postgres processes can also
hit, some Postgres processes

can hit single core utilization
100% and even if you have 100

more cores, it doesn't help you,
right?

So if, like, I cannot say this
is simplicity because technology

is similar to EEV It's also like
a lot of complex technology

behind it but it's simplicity of
the use, right?

Of usage, maybe.

Michael: Well yeah, I would actually
argue that let's say we

did get threading in Postgres in
a major version in a few years

time.

Actually, at first, you'd want
to be doing health checks more

often.

You'd be more nervous about things
going wrong because you've added

complexity and things that maybe
for over a decade have been

working great suddenly may not work as well or just their

patterns may change

Nikolay: Well maybe, maybe, but let's
let's find some mutual ground

here maybe like we both agree that
health checks are needed,

right?

Or right now at least.

Maybe in Postgres 25, it won't be
needed so much because it will

be an autonomous database.

Michael: I personally, I think
it's a really good way of keeping

a system reliable and avoiding
major issues including downtime

and outages and huge things.

But that's, yeah, I don't need
it as an interesting word.

Like a lot of the people I see
hitting issues are just living

in a reactionary world.

A lot of them are startups and
they.

Yeah.

But once like once things are working
well and you've got, like

maybe you're hitting like an inflection
point or you need to

capacity plan or you're in a large
established business where

outages can be really expensive,
it seems wild to me to not be

doing semi-regular or if not very
regular health checks.

Nikolay: Right.

Yeah, I think you're very right.

There are two categories of people
who, for example, go to us for

health checks.

One is probably the bigger category
of people.

People who knew that it's good
to have a health check, but they

only come to us when something happens,
something bad happens,

something already hitting.

And in this case, well, it's actually
like a psychological thing,

right?

For example, for a CTO, the Postgres
database is just one of the things

in infrastructure.

And when it hurts, only then we
realize, okay, let's check what

can hurt soon as well, what other
things can be a problem in

this area, Postgres, right?

This is a very good moment to perform
a health check.

Another category is, I would say,
wiser guys.

They do health checks before a launch
or in the initial stage,

but at the same time it's harder
to work with them because usually

the project is smaller, and it's harder
to predict workload patterns,

growth rates, and so on.

But this is better, of course,
to start checking health initially

and then return to it after some
time.

Michael: Yeah, it can go hand in
hand with, like, stress testing

as well.

Can't it?

Like before I see a lot of people
doing this kind of thing before

big events, like in the US you've
got Black Friday, Cyber Monday

peak times where you know there's
going to be increased traffic

or increased load.

Before that, there's a lot of work
that people are doing often

to just make sure they can.

Is that a good time to do a health
check?

Nikolay: Right.

Well, yes, but I would distinguish
these topics.

Load testing and preparation for
some events like Black Friday

is something else.

Health checks, as I see them, doing
for many years, it's quite

an established field.

They include review of settings,
review of index health, bloat

first, let's talk bloat first,
both table and index B-tree, then

index health, and we had an episode
about it, we had an episode

about bloat, about index health
and maintenance then query analysis,

like do we have outliers who are
attacking us too much in various

metrics I mean queries in pg_stat_statements and other extensions

Then static analysis, for example,
you mentioned int4 primary

keys, lack of foreign keys and
so on.

Because if we see it first time,
a lot of findings.

If we see it a second time, maybe
since last time, developers added

a lot of new stuff, right?

And other areas as well.

And capacity planning can be strategic,
like many years, or it

can be before some specific event
like Black Friday for e-commerce.

In this case, it may involve additional
actions like let's do

load testing, but I would say health
check, normal health check,

is a simpler task than proper load
testing.

That's why I'm trying to say it's
a separate topic, because if

you're involved, a health check
suddenly becomes bigger.

You can think about it, if you
go, like in the US, if you go

to a doctor, there is an annual health
checkup, covered by insurance

always, right?

So, it's like your primary care
physician doing some stuff, discussing

with you, checking, many things
actually, but it's not difficult.

And then if there is some question,
you go to a specific doctor.

And load testing is a specific
doctor, I would say.

Michael: Yeah, I was just thinking
about timing, if it's more

likely that people start the process
then.

And just like maybe just before
they go on a holiday maybe they

go for their annual checkup or
like I don't know if there's like

times where people are more likely
to go for those than others

but yeah should we get into some
more specifics

Nikolay: yeah well speaking of
time to finish this topic Sometimes

people just have periodical, you
know, like, life cycle episodes

in their work, right?

For example, this financial year,
right?

And you plan finances, you spend
them, and so on.

You manage them.

Similar proper capacity planning
and database health management

should be also like periodical.

You would just have some point
of time, maybe after some events,

for example, after Black Friday,
you think, okay, next year,

how we expect our growth will look
like, how we expect our incident

management will look like.

And this is a good time to review
the health, understand ongoing

problems and try to predict future
problems.

You also perform capacity planning
to understand do we need to

plan big changes, like architectural
changes, to sustain the

big growth if we predict it in
a couple of years.

In this case probably you need
to think about microservices or

sharding or something like that.

And health check can include this,
like overview.

And this is a big distinction from
monitoring, because monitoring

very often has very small retention.

You see only a couple of weeks
or a couple of months, that's

it.

For an existing evolved system, you
need a few years of observation,

data points, better to have them,
right?

To understand what will happen
in the next years.

And this is when like a health check
can help and capacity planning.

Maybe load testing as well, but
it's additionally, I would say.

Yeah, specific topics.

So the tool starts usually for
me specifically.

We actually have this tool, Postgres
Checkup and it has a big

plan.

The tool itself implements only,
I would say, less than 50% of

the plan.

The plan was my vision of HealthCheck
several years ago.

I did conference talks about it.

At that time, I remember I performed
really heavy health checks

for a couple of really big companies.

It took a few weeks of work.

It's like a lot, like deep and
wide, and like 100 pages report,

PDF of 100 pages, it's like interesting.

Executive summary alone was like
5 or 6 pages, like quite big.

Wow.

Yeah, yeah, well, it's a serious
thing.

But it was for a company which
costed more than $10 billion.

And the Postgres databases served
a few clusters that were

in the center of this serious business,
so it was worth it.

And then this tool, this was the vision,
and we implemented in this

tool, we implemented like almost
half of this vision.

And for me everything starts with
version.

Simple thing.

Let's check version.

Major and minor.

And maybe history of it.

Because both things matter a lot.

If your current version is 11,
you're already out of what?

Out of normal life, right?

Because the community won't help you,
and bugs won't be fixed unless

you pay a lot.

Michael: And out of security patches
as well.

Nikolay: Right, yeah, that includes
security also, bugs.

If you're on 12, you have time
until November, right?

So, major version is important,
and also you are missing a lot

of good stuff.

For example, from one of recent experiences,
if you're on 12, it

not only means that you soon
will be having non-supported

Postgres, but also you lack good
things like WAL-related metrics

in pg_stat_statements and explain
plans, which were added in the next

version, 13, right?

You lack a lot of improvements
related to logical decoding and

replication.

Probably you need to move data
to Snowflake or between Postgres

to Postgres and you miss a lot
of good stuff.

So major version also matters a
lot, but the minor version may be

number 1 thing.

If you also lag in minor version,
obviously you probably miss

some important fixes and security
patches.

So here our tool is using yupgrade,
depesz.com, links to explain

what...

Yeah, it's a great tool, but now
we're already moving towards

combining...

It's a huge list sometimes, right?

If you're lagging 10 minor versions
behind, it's more than a

couple of years of fixes.

And you're like, oh, what's important
here?

Security, but maybe not only security.

So, yeah, we are moving towards
trying to summarize this and

understand particular context.

Of course, guess what helps?

LLM helps here.

Like it's, yeah, I think in the
nearest future we will have interesting

things here like combining observations,
why upgrade this diff,

and like this is summary what matters
for you, like almost part

of executive summary can be read
by CTO and with links, so like

it's not like...

Yeah

Michael: I hope you're right long
term, but I think at the moment,

the release notes are important
enough like the risk of missing

something very important is still
very high.

Like I think sometimes the LLMs
at the moment still, they can

be good for like idea generation,
but if you want an exhaustive

list of what's very important,
they can miss things or make things

up.

So I would still not rely on them
for things like reading, which

like even minor release notes can
include things that are very

important to do, like maybe a re-indexing
is needed and things

like that.

Nikolay: Yeah, I agree with you,
but I think you are criticizing

a different idea than I'm describing.

I'm not saying, like, first of
all, health checks we are doing

and I think in the next few years
we'll be doing them still a

lot of manual steps for serious
projects.

So they, of course, involve DBA
who is looking at everything.

But LLM can help here.

You have usually pressure of timing.

You need to do this work and a
lot of stuff.

And change log with 200 items,
for example.

Of course you check them, but I
bet no DBA spends significant

time to check properly.

They said they check, but I don't
trust humans as well here.

Michael: So sorry, I think including
them in a report is different

to reading them before upgrading.

That's a really good point.

Nikolay: Right, so what I'm trying
to say, we try to convince

clients that it's important to
upgrade, both minor and major.

And while explaining, of course
we can provide, not we can, we

provide the list for full changes,
why upgrade is great too.

But also we want to bring the most
sound arguments, right?

And OLM helps to highlight.

It was my long-term vision a few
years ago.

I thought, why upgrade?

You take it and it would be great
to understand.

There is an item about GiST indexes,
or GIN indexes, for example.

We have a schema here, we can check
if we have GIN indexes, right?

Does it affect us?

And if it does, this should be
highlighted.

This is the idea.

And LLM is good here.

It's like it just performs some
legwork here.

I mean, not only LLM, some automation
with LLM.

And it just helps DBAs not to miss
important things, you know?

But of course, again, I'm sure
it should be combined, like ideally

it should be the work from automation
and humans.

Okay, too much for versions.

Actually, it's the smallest item
usually, right?

We don't have time for everything
else, what to do. Okay, let's

let's speed up maybe a little bit.

Michael: I think we've mentioned
a bunch of them already, haven't

we like you've done

Nikolay: Yeah, but right.

We had episodes separately, but
they are super important.

Settings go usually second.

Settings.

It's a huge topic.

But, from my experience, if I see
autovacuum settings being

default, or checkpoint settings,
first of all, max_wal_size,

in my opinion, the most important
one, being default 1 gigabyte,

or logging settings being default.

What else?

These kinds of things.

Michael: We did a whole, We've
done one on the default configuration.

Nikolay: Our favorite, random_page_cost being 4.

Recently we discussed it.

These are strong signs to me that
we need to pay attention here.

Even if I didn't see the database
yet, I already know that this

is a problem.

Michael: So this is really useful
for that first health check

you ever do.

Maybe you're a new DBA or new developer
on a team.

When you're doing a second health
check or subsequent ones, are

you diffing this list of settings
and seeing what's changed in

the last year?

Nikolay: Good question.

We have reports that can help us
to understand that, for example,

checkpoint needs reviewing, reconsideration,
checkpoint settings,

or autovacuum needs reconsideration.

So yeah, you're right.

I'm just coming here with like,
I first met this database and

reviewed it.

But yeah, second is easier usually,
but if it grows fast, it

can be challenging as well.

So sometimes we need to revisit.

Yeah, but settings, it's a huge
topic.

I just wanted to highlight the
easy, low-hanging fruit we usually

have.

Default is a low-hanging fruit.

For example, if the auto_vacuum
scale factor is short or regular,

it's default like 10%, for any
OLTP database I would say it's

too high, way too high.

So many things like that.

And logging settings, if I see
logging settings are default,

I guess people don't look at databases
properly yet and we need

to adjust them and so on.

But second time is easier usually,
I agree.

Second time you just revisit things
that change a lot.

So, bloat estimates tables, indexes.

Quite easy, right?

We usually don't want, very roughly,
we don't want anything to

be higher than 40-50%.

Depends, of course, but this is
normal.

And converting to number of bytes
always helps.

Like, people say, oh, my database
is 1 terabyte, and I have like

200 gigabytes of table and index
bloat.

Wow.

Of course we have this asterisk
like comment, remark that it's

an estimate, it can be very very
wrong.

We discussed, we had an episode
about bloat, and autovacuum

as well, very related topic here.

And then from table bloat, index
bloat, from index bloat we already

go to index health, which is of
course index bloat included,

But also unused indexes, redundant
indexes, well, rarely used

indexes, like it's a rarely used
report.

But also invalid indexes, unindexed
foreign keys.

What else?

Maybe that's it.

Michael: Are you overlapping?

Nikolay: Redundant, we call that
redundant.

So yeah, and we suggest talking about
index maintenance and I usually,

if it's the first time, I explain
why it's so important to implement

index maintenance and why we should
understand that we do need

to rebuild indexes from time to
time.

Michael: Are you doing any corruption
checks?

Or I'm guessing maybe not in this
case.

Nikolay: Oh, that's a great question
actually.

I think we can include this, but
a proper check is quite heavy.

Michael: Yeah.

And one of the nice things about
this tool is it's like a lot of

these, it's been designed to be
lightweight in this case, right?

Nikolay: Yeah, it's very lightweight.

It was better tested in very heavily
loaded clusters.

And it limits itself with statement
timeout like 30 seconds or

even 15.

I don't remember from the top of my
head, but if it takes too long,

sometimes we just miss some reports
because statement timeout.

We don't want to be a problem.

Observer effect.

Michael: Yeah.

Nikolay: So, yeah.

And if we talk about indexes, okay,
index maintenance, quite

straightforward topic already,
quite well-studied.

And of course, we don't forget replicas,
and we also don't forget

about the age of statistics, because
if it was reserved yesterday,

we cannot draw proper conclusions,
right?

Because maybe these indexes were
not used in the last 24 hours.

But okay, anyway, we had the index
maintenance episode.

Yes.

Michael: I have a couple of questions.

I have a couple of questions on
the cluster health.

So, replication, are you looking
at delay at all?

Or what are you looking at when
it comes to replication health?

Nikolay: That's a good question.

I doubt we have replication lags
being checked by this tool at

all.

It's usually present in monitoring.

So as part of broader health
check, not only we use health

check, but the postgres-checkup report,
but we also talk with the

customer and check what they have
in monitoring, and sometimes

we install secondary monitoring,
which we have, and gather at

least 1 day of observations and
include everything.

And also discussion matters.

This is probably what automation
won't be having anytime soon,

is live discussion, trying to understand
pain points, like what

bothers you, what is the issue
recently, what issue do you expect

to happen soon.

This live discussion at a high level
may help prioritize further

research.

So proper health check includes
both automated reports and these

kinds of...

So there we can check replication,
but usually if people don't

mention replication lags being
an issue, they have it in monitoring,

we just skip this, usually.

Sometimes they mention, for example,
a lot of WAL is generated

and we say, okay, let's go to pg_stat_statements
and check WAL

metrics, try to understand why
we generate so much WAL, optimize

some queries, enable WAL compression
if it's not enabled, and

so on and so on.

Also, disk, let's just check what
kind of disks you have.

It's a big topic.

Michael: Yeah, the last question I
had was another thing that I imagine

is very important to check the
first time you want to know the

overall system health, or at least
setup health, but might be

tricky to automate or quickly run
a query for, is what's the

health of the backup process or

Nikolay: the risk?

You're raising questions.

Yeah, if you check the plan, which
is in the readme of the postgres-checkup,

this is included there,
and the checkbox is not checked,

of course.

Michael: Oh, okay.

Nice.

Nikolay: Yeah, we thought about
automating this, but we have

so many different situations, different
backup tools used, different

managed providers and so on.

Every time it's quite different.

We have a wide range of situations.

Michael: Yeah, I wasn't actually
thinking about postgres-checkup

at all, I was thinking more about
health check.

As part of a health check, you'd
probably want to check you can

restore.

Nikolay: Did we have an episode
about backups and disaster recovery

at all?

If not, let's just plan it.

Because it's not a topic for 1
minute, it's a huge topic and

probably...

Michael: Of course.

Nikolay: Yeah, maybe number 1 topic
for DBA, because it's the

worst nightmare if backups are
lost and data loss happened, right?

Obviously, we'll include this to
consideration unless this is

managed service.

If it's managed service, we have
guys who are responsible for

it, right?

Well, it brings us to this topic,
managed versus self-managed.

Because for example, HA, since
not long ago, both RDS and GCP

CloudSQL, they allow you to initiate
a failover to test it.

So, you rely on them in terms of
HA, in terms of if 1 node is

down, a failover will happen.

But you can also test it, and this
is super important.

Backups, it's like a black box, we
cannot access files as we discussed

last time.

But you can test restoration, right?

And it's fully automated.

And maybe you should do it if you
have a serious system running

on RDS, because as I understand,
RDS itself doesn't do it for

you.

They rely on some global data,
as I understand.

They don't test every backup.

Michael: All I mean is if I was
a new DBA going into a new company

and I want, like 1 of my first
tasks was to do a health check,

this, that would be 1 of the things
that I would think of as

including as in that health check.

Maybe a new consultant would as
well.

Nikolay: DR and HA, two fundamental
two-letter terms, right?

You definitely, if you're starting
working with some database

seriously, you definitely need
to allocate a good amount of time

for both these topics.

Like good means at least a couple
of days for both to deeply

understand what...

For example, if we talk about backups,
you need to understand

RPO, RTO, real and desired, to
check if we have a document, where

it's documented and procedures
defined to maintain these objectives.

These are super serious topics
and for any infrastructure engineer

I think it's quite clear, for any
SRE it should be quite clear

that these are two SLOs that you
need to keep in mind and always

revisit and maintain and have monitoring
that covers them and

so on.

But this is again, like this is
maybe...

In health check, we just need to
check and understand if there

is a problem here.

To study this topic is similar
to a lot of testing topics.

It's like a special doctor.

So yeah.

What else?

To wrap up, maybe a couple of final
things.

I like to check query analysis
in table form.

In table form it's quite exotic
maybe because people got used

to some graphs of course, historical
data, it's great.

But when I look at table data for
pg_stat_statements with all these

metrics and derived metrics, it
gives me a very good feeling of

what's called.

So to understand that, it's great.

And also static analysis.

Michael: Ordered by what?

Always ordered by total time.

Total time, I think, because we're
looking at system-level health.

Nikolay: Yeah, yeah, yeah.

Total time is default for us because
we talk about resources,

but if you aim to analyze the situation
for end-users, you need

to order by mean time, and sometimes
you need to order by different

metrics.

For example, as we discussed above,
earlier, WAL metric, for

example.

Sometimes temporary files, for
example.

If we generate a lot of temporary
files, we want to order by

temporary files.

In this case, sometimes we just
generate additional reports manually.

Michael: Yeah, makes sense.

Nikolay: And actually that's it.

Not difficult, right?

Michael: No.

Do you see a lot of people doing
this in-house?

I know a lot of consultancies,
including yourselves, offer this

as a service.

Do you see the pros and cons of
those?

Nikolay: Well, usually in-house,
if you do it forever, you don't

see problems, or you don't prioritize
them well.

Because I saw we have bloat, but
we have some kind of life still,

we are not down.

I saw we have, for example, 5 terabyte
unpartitioned table, but

it's kind of fine maybe, and so
on.

But when external people come,
at least temporarily, or for example

you hire a new DBA, DBRE, this
is a good point to revisit and

reprioritize things and avoid problems
in the future.

So that's why fresh blood, fresh
look is good here.

But usually people get a lot of
value during first checkup, health

check, then smart people implement
periodical health checks.

And then just, I would say, in
1 or 2 years, if things change

significantly, this is when you
need to talk again.

Because if health check was full-fledged
in the first year, 1 or

2 years will be fine and your team
will be performing all the

problems, solving all the problems,
knowing what to do.

Yeah, nice one.

But honestly, give me any database,
I will find problems easily.

I mean, actual live production
database, not some synthetic.

Anyway, like, summary is don't
postpone this.

Perform health check, plan it and
do.

Check your health.

Michael: I had one last thing to
mention.

I met a few people at PG Day Paris,
so hi to everyone who's listening

from there.

And if you're watching on YouTube
and just listening to the audio,

we do also have an audio-only version
in the show notes.

So you don't have to waste your
precious bandwidth and data on

watching us.

And if you listened to the audio
and didn't know we had a YouTube,

then now you know that too.

Nikolay: Right, right.

We also have a transcript if you
like reading better and I think

it's quite good.

It's quite good.

Actually, guys who are listening,
please check transcript and

let us know if you think it's good
to read or it's hard to read.

I would like to know what you think
about the quality of transcripts.

I was thinking about improving
this and having some...

Not a book, but a kind of set of
our discussions we had almost

last 2 years as just a set of texts.

Because some people just like reading,
right?

So if people find the quality good,
I think it's worth investing

to organize it better and maybe
provide some more links and pictures

maybe and so on.

So this is the idea we discussed
some time ago.

I think maybe I will pursue it
soon.

But I need to know what people
think about the quality of recent

transcripts we have on PostgresFM.

Michael: What's the best way to
contact you for that?

Nikolay: YouTube comments, Twitter,
all Twitter, right?

Michael: Yeah, yeah.

Nikolay: I think, yeah.

Michael: Nice.

Nikolay: Good.

Michael: Yeah.

Thanks so much, Nikolay.

Nikolay: Thank you, Michael.

Michael: Catch you next week.

Bye.

Bye-bye.

See you next week.

Bye.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Health check

Creators and Guests

Some kind things our listeners have said