Postgres FM | Transcript: pg_flight

May 15, 2026 • 43 Minutes

pg_flight_recorder

Michael: Hello and welcome to Postgres.FM,
a weekly show about

all things PostgreSQL.

I'm Michael, founder of pgMustard
and I'm joined as always by Nik,

founder of PostgresAI.

Hey Nik.

Nikolay: Hi Michael.

Michael: And we have a special
guest, David Ventimiglia, Solutions

Architect at Supabase and the
creator of pg_flight_recorder,

which we're talking about today.

Welcome, David.

David: Thank you so much.

Good morning, and I guess good
afternoon and good evening.

It's nice to meet you.

Michael: Yeah, I think we've got
all 3 bases covered.

Where would you like to start,
perhaps, with why another tool

in this area?

What's the origin story?

What's the motivation?

David: The motivation, so as a
solutions architect at Supabase,

my job evidently is to help our
customers and often they come

to us and they say, I've got this
problem.

My database is slow, or this query
is behaving weirdly.

Please help us.

And then we try to bring whatever
tools we have to bear on the

subject.

And to be honest, often we're starting
out with no idea what

the problem is.

And our beautiful customers cannot
be relied upon to, you know,

relay all the information perfectly.

And I just needed more.

I was telling Nik this at 1 point,
you know, some people say

less is more.

I say more is more.

I needed more data and I just wasn't
getting it.

And I did the usual thing that
people would do these days.

Version 0 of this, maybe 4 or 5
months back, it was a rush job

with our good friend Claude Code.

And I just cobbled something together
to get the data that I

needed for a particular customer.

And we got the data that we needed
and we were able to get through

that particular instance rather
swimmingly.

And I was pleased by the outcome
and I thought, you know, let's

try to turn this something into
something a little better and

a little something more real.

As a sidebar, I think we're all,
most of us are familiar with

the excellent pg_wait_sampling,
which is a excellent extension,

but sadly is not available on all
managed Postgres services.

Even Supabase...

Nikolay: Cloud SQL

David: is the only 1.

Cloud SQL, but even Supabase,
on whose staff is Alexander Korotkov,

who wrote it, but we don't have
that extension on the platform.

And there's, you know, there's
some resistance within managed

platforms to get new extensions
added.

I mean, there's a rich and vibrant
ecosystem for extensions and

that's 1 of the strengths of PostgreSQL,
but a weakness is getting

those into all of the places where
you needed it.

So I needed to, I needed something
that was a poor person's substitute

for pg_wait_sampling.

That's really how it started out
was just a worse version of

pg_wait_sampling written in SQL
in PL/pgSQL.

And then you know how these things
go.

It just took off from there.

Nikolay: Yeah.

There's a lot of unpack here actually.

Yeah.

And, and I agree with you starting
from the end of your intro.

I keep saying extensions are not
extending us anymore, because

in reality of managed Postgres,
we are limited by only the set

of extensions which are present.

There is pg_tle, and we should discuss
that separately, which

is, I think, a great idea.

Actually, you told me.

It's a lot of happening also.

Let me tell my story.

I think Every experienced Postgres
DBA at least once wrote this

snapshot tool for pg_stat_statements
and other pg_stat views because

it's only cumulative statistics.

Some numbers are growing, that's
it.

We need persistent storage.

Usually for bigger clusters we
have monitoring.

But I remember I was working with
a really big company called

Chewy.

And I remember they had great monitoring
already.

Some clusters were on RDS already,
so they also have some performance

insights.

Still I wrote my snapshot tooling
because I didn't fully trust

monitoring things.

I also wanted to verify, and some
details were missing because

they didn't capture all metrics.

I needed specific metrics and so
on.

So meanwhile, there are some projects
which implement this idea,

but they are extensions, so they
are Not available.

pg_profile, I think, there is such
a thing, right?

pg_profile.

So that's why many, many DBAs,
maybe not all, but many DBAs at

least once wrote some snapshotting
tool.

And I remember our checkup tool
was also snapshotting.

First versions of checkup tool,
it was a shell script.

So we snapshot it.

You have 2 snapshots, now you need
diff.

And since we were in Bash, we didn't
want to do diff, so we sent

these snapshots back to do diff
on the observed Postgres.

So I saw a lot of solutions which
try to make this data persistent

and have snapshots and then show
like everything without the

need to set up full-fledged monitoring.

And even if it exists still sometimes
we need additional lightweight

solution.

This is 1 thing to warm up us why
it's needed.

Another thing is that I guess Supabase
has a lot of clusters.

Some of them are,

David: we have a few.

Nikolay: Yeah.

Yeah.

Like millions, right?

It's quite unique, interesting
story.

And many of them are small and
you cannot justify full-fledged

monitoring, right?

David: And getting smaller, yeah.

Nikolay: Getting smaller, yeah,
because people just experiment

so much, right?

And they need so many databases.

David: Yeah, we are now, I mean,
we have a skewed distribution,

of course.

We have a few giant customers,
but we have lots of medium-sized

customers and many more small ones
and millions of tiny ones.

And now with the AI builders, we
are shoveling millions of nano

instances into the AI builder furnaces.

Like where are these databases
going?

What's the long-term prognosis
for these tiny databases?

Who knows, who cares?

That's a completely different animal.

But yes, it's a very, it's a vibrant
ecological niche that we've

developed here.

Michael: Yeah.

On that note, who's this tool for?

Like of that distribution, are
there some where it's not appropriate

for them and some where it's ideal
or like where does it fit?

David: Yeah, that's a great question
Michael.

Again, and as Nik indicated, this
is, you know, all these things

what time is a flat circle, all
these things will happen before

and will happen again.

You know, versions of this have
been written before and versions

of this will be written in the
future.

There's nothing really that profound
about this, but this is

the tool that I needed right now
at this time for the reasons

that Nik just described.

Among which is, you know, we have,
I, who, who this tool would

be for, I think would be startups,
SMBs, builders, sort of the

canonical Supabase customer.

Those who are starting out, you
know, building a business, building

a backend, building a project,
they need PostgreSQL and off they

go.

You know, we at Supabase, we,
we don't, I don't think it's any

big surprise.

We don't really have that many
migrations onto Supabase.

I mean, we would love to have more
than anybody who's willing

to bring giant workloads over to
Supabase.

Come on over.

But we know that databases are
infamously sticky tools anyway.

People don't really migrate that
often.

And they're probably less likely
to migrate giant workloads over

from Oracle or Microsoft SQL Server
to Supabase, although we

are entertaining that option.

But if we did do those things,
those folks would probably come

over with DBAs, database experience,
database expertise.

So our sort of customer portfolio
doesn't really reflect that.

What we have are, even our largest
customers, I would say, tend

to be, I mean, they may have 3
or 4 or 5 years of experience

with PostgreSQL now, by dint of
hard effort, but they all started

out small.

Every 1 of our large customers
was a little acorn that grew into

a giant oak.

And we try to make Supabase easy,
and we do.

It's certainly easy to get into.

I've used this analogy too many
times, but it's like the car

dealership.

You can drive it off the lot in
5 minutes, but actually Operating

it, especially at scale is something
different.

And we, Nik knows this, Michael,
you know this, we would all

benefit from more and better automation
and it's coming.

It's coming from within the community
and it's coming from Supabase.

We will be able to help these customers
more seamlessly and operate

their databases in the future.

But right now what we need is tooling
to help customers as they

grow and as they scale.

So in a nutshell, like who's this
for?

People who are not DBAs, not database
experts.

They just want to run a business.

They want to grow that business
and they want some tools to help

them.

That's it.

Nikolay: Some of them start with
some small, very small database

instance paying 25 or some very
low number of bucks, right?

And it's hard to justify paying
right away 150 or $400 or $500

for monitoring a full-fledged solution.

And then you need to spend time
there and so on.

It cannot be justified easily.

And also I wanted to mention, it's
quite elastic.

So if you just inject this tool
inside your Postgres database,

it starts collecting inside, like
self-observed.

And you pay a little bit for those
megabytes per day, I don't

know.

Since I helped with storage to
rewrite it, it's quite efficient.

I again used this approach for
PgQue, rotation of partitions and

truncates, So it's very efficient
and so on.

And I'm just saying it's like a
little bit, you pay a little

bit and it's self observed, right?

And when I was thinking what are
pros and cons, self-observed

versus externally observed, ideally
we need to have both actually,

Because you cannot install agent
right to RDS or super-based

machine.

So if you observe it outside with
external monitoring tool, if

something bad happens, maybe you
don't have connectivity, right?

While this thing sitting inside,
it still keeps observing, right?

David: That's right.

Nikolay: At the same time, if everything
is down, you don't see,

you cannot reach the data, right?

So like external tools also have
benefits.

They have both pros and cons if
you think about it.

It's interesting.

So in my realization, even bigger
clusters should have maybe

a small black box or flight recorder,
right?

While we have a full-fledged solution
outside, they're both remote

telemetry and something internal,
right?

David: Yeah, that's right.

And I think I landed on the name
pg_flight_recorder.

And then at some point, I think
it had some reservations because

I thought, sure, but in the event
of a crash, then maybe the

data aren't available and it's
not really that useful.

But then, I mean, if an actual
airplane crashes, then that airplane

also is not really useful either.

That airplane is dead.

No 1

Nikolay: will be using it.

Just a side note, I just learned
David has a PhD in astrophysics,

so this name is not a random thing,
I guess, right?

David: And a master's in aerospace
engineering.

But at every turn, I was trying
to do something else.

And I was trying to get away from
computers.

And it just, I just kept getting
sucked back in.

But I grew up in the 70s when It
seemed like airplanes were crashing

all the time when they weren't
being hijacked.

Mercifully, that doesn't really
seem to happen all that often.

But I think, I'm not a pilot, but
it's my understanding that

actual, you know, flight recorders
are useful for far beyond

crash investigation.

They're useful for optimization,
for troubleshooting, like, in-flight

incidences.

And so I think the nature of this
is hopefully a little bit more

like that.

Nik, you would know better than
I would, but I have the feeling

that in reality, databases don't
really actually crash all that

often.

What they do is they exhibit behavior,
and we want to be able

to investigate that behavior.

And that's what this helps us do.

Briefly about the tool itself,
again, all it really does is it

takes snapshots of wait events.

That's how it started.

It's like pg_ash.

Nikolay: We developed it in parallel,
actually.

Yeah.

So when I told David that there
should be something small which

self-observes, David just sent
me the link.

It's already done.

It was interesting that we had
parallel courses of development

of pg_ash and pg_flight_recorder.

They're very similar in this case.

David: Yeah, very similar and like
that it captures active session

history, wait events, nature of
course of Vacuum and Idle Hands

of the Devil's Workshop.

You know, with the tools available,
I couldn't resist the urge

to just keep pouring more into
it.

So it records lock activity and
checkpointer activity and background

activity and IO stats and statement
stats and...

Nikolay: Config changes, right?

David: Config changes as well.

And hopefully the conjecture is
that there's some value in, if

not capturing everything, having
an opinionated and curated set

of many things that are captured
simultaneously in a correlated

fashion so that maybe you experience
a checkpoint storm and then

you notice that there has been
a config change recently and you're

able to bring these things together.

So that's the idea.

As Nik indicated, it was when
we talked about this, version

0 was done.

Again, it's a pretty simple tool.

I had a few guiding principles,
1 of which was sort of the Hippocratic

oath, try to do no harm.

I put a lot of effort into making
this safe to run.

Nikolay: Statement timeouts and
so on, right?

David: Yeah, statement timeouts,
circuit breakers, graceful degradation

of some of the components, dozens,
too many configuration settings,

but then configuration profiles
that that capture those to make

it easy to use.

I think I got 3 quarters of the
way there or maybe 50% of the

way there, but there were still
some improvements to be made,

among which the storage engine,
which we can thank Nik for rewriting

using PgQ or essentially the engine
that is part of PgQ.

Correct, Nik?

Nikolay: Yeah, it's like partitions,
rotation, I think daily

partitions.

There is also a roll up for all
data to have it less precise,

not raw but aggregated.

Everything is already implemented.

And I remember I was brainstorming
with Claude Code, like what

kind of storage we should choose
because I think originally you

used a lot of JSON, right?

It's quite bloated in my opinion
sometimes.

David: Well, It wasn't JSON, but
I had originally, I was using

skip locked and unlogged tables in
a vain attempt to mitigate dead

tools and bloat.

But If you're not diligent, then
they're still there.

And so that's why you had to rewrite
the engine.

Nikolay: Also, like, data format
is interesting.

And I had multiple ideas, and I
have some, like, brainstorm document

where, like, thinking what to choose.

And some ideas were compressing
data quite a lot, but it was

hard to deal with because it was
basically encoded so much that

it's inconvenient, so I did a trade-off
choice that it should

be human-readable even in raw form,
although I did apply some

tricks from pg_ash as well, like timestamps
are relative to, I think,

I don't know, 2020 or something.

Like Unix timestamp by shifted,
so we have capacity until the

end of century.

And you have as few bytes wasted
as possible, like very compact

way.

And plus this PgQ style rotation.

And also, worth mentioning, there
is a soft requirement, pg_cron.

It's not a requirement, but it's
very recommended, because this

is how it's ticking as well, right?

David: That's correct.

So again, it's a simple tool.

It's 2 sort of packages, 2 simple
install scripts, 2 schemas,

1 of which is required, 1 of which
is optional.

The part that's required, it's
the data model, the tables and

the views and the functions to
record those data.

And then the other optional piece
is a set of functions for analyzing

those data.

But again, they could be analyzed
in raw form in whatever way

you like.

But as Nik indicated, somebody
has got to generate the ticks.

Somebody has got to force the samples
and that could be pg_cron,

it could be something like an outside
scheduler, somebody's got

to do it.

The sort of default way is with
pg_cron.

Nikolay: And pg_cron is available
everywhere, right?

David: Yeah, pg_cron sort of snuck
in before the sort of iron

curtain started to drop on extensions,
maybe.

So it's in a lot of places at least,
you know.

Michael: I, I, yeah, maybe that's
true.

I got the impression it solved
such a useful problem and was

from such a reputable author that
I think people trusted it and

also thought it's simple enough
that we can maintain.

I understand why managed service
providers don't just offer any

extension but if you think about
how much work it would be to

maintain pg_cron if the author ditched
it or you know actually

it's not huge and it's

Nikolay: so useful.

And it should be in core.

David: There's all of it.

That is true.

Nikolay: I wish pg_cron was in core.

And we would have, for example,
automated new partition creation

out of the box without any extensions.

Magic.

So simple thing, actually, right?

Michael: 1 thing about pg_cron from
a database perspective is it

is, I think, once per second is
the lowest you can schedule.

So what do you use, David, when
you're using this with people?

Do you use pg_cron with a one-second
tick, or do you suggest

something else?

David: So far, I have used it just
with pg_cron.

The resolution that has been sufficient
so far, you know, because

with the customers that I've worked
with, the resolution before

has been, I guess, infinite as
in they didn't have this at all.

So it's just worth having the data.

And a finer resolution, I haven't
encountered a demand for that

or a need for that yet, although
certainly plausible.

But again, yeah, that has worked
well so far.

And Again, this is just a simple
tool.

The idea, the objective anyway,
is sort of a set it and forget

it.

Install the tool, then forget that
you've installed the tool.

But because it's safe to run, by
virtue of Nik's hard effort,

it is safe to run.

So just forget that it's there.

And then you have an incident,
and then you think, oh wait, I

have pg_flight_recorder.

Let's find out what happened.

Nikolay: And just point your eye
to the data, maybe a dump of

that data or something, and that's
it.

And the second package you mentioned,
it has interesting functions,

like what happened at or something,
right?

It's based on function names.

I see already your thought, oh,
AI should guess, right?

You designed it for, So it's self-explanatory,
right?

So it's great.

But I also wanted to...

About pg_cron a little bit.

Version 1.5 is, I remember, the
lowest resolution, once per second,

and I use it for pg_ash, but I guess
you use it by default at much

less frequency, especially for
Azure data, right?

Maybe once per 30 seconds or 60,
but it's tunable, right?

David: Everything is tunable.

You know, default sample collection
is, I think I have it set

to once per second, but There are
aggregates that are taken at

a coarser resolution.

There are roll-ups that are taken
at a coarser resolution.

Data are archived at a coarser
resolution.

Then there's retention for the
core tables, as I think that my

default is 7 days for the aggregates
of the 7 days.

And then for, for snapshots, I
think it's by default 30 days,

but all of these are configurable.

So there, there are a few different
cadences that are happening.

And there is, again, as you indicated
on the analyze side, there

is a wall of functions, appropriately
named, meant to be understandable

by humans and by AI alike, so that
they can use these functions

to analyze the data.

But then, of course, it's always
available to be analyzed in

raw form as well.

Nikolay: Yeah, I can share some
interesting story from PgQ about

function names.

When I was developing recently
client libraries for PgQue, multiple

times Claude Code made a mistake
because there is a function

force tick, but it's not ticking,
it's just shifting this pointer.

And then you need to run ticker
in a separate transaction.

And Claude couldn't get it because
it's confusing name actually.

And it made mistake multiple times
developing this.

And this, I had huge flashback
to 15 years, 15 plus years ago

when I made the same mistake manually
without AI, because it

was so confusing to me 15 years
ago.

So I just renamed that function
to force next tick.

So you immediately understand this
is about next tick.

You're not doing, you're just preparing
this job.

And looking at your functions,
what happened at, incident timeline,

I'm just thinking, this is self-explanatory.

Maybe long, but everyone will understand
what it is for.

So it's worth making a lot of.

David: Yeah.

Yeah.

That's right.

And if there are too many functions,
then, again, we can use

AI to paw our way through and figure
out which ones to use.

Just a final thought.

It's meant for a few things, not
just incident response, but

also capacity planning, blast radius,
evaluation.

You know, again, where I intend
to go with this is just getting

back to Supabase briefly.

I have lots of customers that I
have to, I should say I'm blessed

with helping, but so many of them
I want to get to early.

It's, you know, again we talked
about this, but all of these

databases are small.

They start out small anyway.

I would say from a certain point
of view, from the point of view

of scale, many of them are sort
of doing things wrong, but that's

okay because they're small.

You can do, with a small database,
you can do everything wrong

and it's fine, no problems.

But it's when you start to scale
that you need to think about

this.

So it was like exercise.

It's something you have to get
into the habit of doing it early

and often, even though you don't
want to, and maybe you would

benefit from a personal trainer
and some encouragement to get

you on the path early so that it
pays dividends.

When you're old, like I am, and
when you're a big database, like

some of these eventually will become.

Nikolay: I wanted also to mention,
by default, it's consuming

up to a couple of gigabytes for
those 7 days, right?

But again, it's tunable if you,
or less.

David: Yeah, it's tunable.

It's a few gigabytes.

Nik, I think you and I, we benchmarked
this.

I think we, with the new storage
engine, I think we estimate

maybe for like under, on the happy
path, maybe I think around

like 20 gigabytes for the month.

Nikolay: It depends.

David: There's some, 1 of my goals
is to sort of draw more attention

to this so that I could get feedback
and improve it.

And there is some low-hanging fruit
to be plucked insofar as

data retention for path-wise.

Nikolay: I think it depends on
how many queries you have in Pagesaw

statements by default, up to 5,
000.

And Also, you collect data about
indexes and tables, so how many

tables and indices you have.

I think you have limits there,
but still it depends a lot on

the cardinality of these things.

About use cases, I used it recently
for benchmarking PgQue.

For me it's so natural.

I had multiple already projects
like this and I just, okay.

I injected both pg_ash and pg_flight_recorder
because pg_ash has

more frequency and more details
about ash data.

pg_flight_recorder brings a lot of
stuff, right?

So I just injected it into some
synthetic database as provisioned

with pg_cron configured.

And I just asked AI, of course,
to do it, so just inject it.

And then don't forget to dump after
each run.

And then visualize it.

That's it, Only 3 sentences.

David: Yeah, exactly.

Nikolay: And this is how I created
a beautiful looking.

I actually asked to animate benchmarks
because it's great to

look how lines go.

And this is what brought PgQue good
attention because this data

is like easy to understand Right.

So for example, how much WAL was
generated, right?

Lot of stuff.

What was the behavior of check
pointer or autovacuum and so on?

Michael: That was gonna mean my
next question actually on the

wall front.

You mentioned a few minutes ago
about how you originally went

with unlogged tables.

Does that mean these are now logged
and there is

Nikolay: WAL generated?

Yeah, it was my decision to say
that, first of all, important

limitation of all those tools which
are ticking on pg_cron and

write something and it's only PL/pgSQL,
it's primary only.

But we live in this strange situation
for me, old DBA, when a

lot of clusters are single node.

I have even cases clients are coming
like 10 plus like 15 terabytes

on single node and they are fine.

Cloud like resources became quite
relevant.

For some actually I think it's
okay Because backups matter more

than HA because they are fine to
be done, but not to pay for

additional couple of nodes and
so on.

So anyway, this is primary only
because we cannot write to- Can

Michael: I add something?

Yeah.

Can I add something to that?

Because you say single node, but
I think that's slightly simplistic

because a lot of ones I see, they're
HA, but the replicas are

not read replicas, they're like failover
replicas.

Nikolay: Shadow standby node.

Michael: I wouldn't call that a
one-node cluster, but it's still

you only need to

Nikolay: monitor the primary.

Yeah, you're right actually, you're
right.

But in this case you are not interested
because you don't have

any workload on that standby, hidden
standby, right?

Michael: Exactly, exactly.

Nikolay: So you are interested
on the primary and we see so many

projects reaching dozens of terabytes
already, which single node.

You inject it, we need to understand,
okay, it's self-recording,

so it's going to produce some writes.

If it's unlogged table, if it crashed,
it's gone.

That's the key idea.

We cannot afford...

David: At least, where the data
lands initially, those data would

be gone, right?

When they were unlogged tables.

Nikolay: You need to snapshot,
you need external means.

So to understand the incident after
crash, we should use regular

tables.

And when we've redesigned storage,
it's not so super expensive.

Of course, there is some WAL to
be written and some data storage

to be paid.

And of course, a little bit of
shared buffers occupied by our

data.

It goes to, if you have replicas,
it goes to replicas.

Maybe it's not a bad thing because
if it's the health of primary,

we can pay this price.

Michael: What are we roughly talking
about?

You mentioned a few gigabytes up
to maybe 20 gigabyte, like that

kind of amount for storage.

What are we talking about in terms
of WAL generation by default,

just to give people a rough idea?

Nikolay: I don't remember.

I thought about baby clusters like
1 gigabyte once, like free

tier is up to 1 gigabyte, right?

So I thought they should afford
this maybe with a little bit

tuned to less frequency or something
retention wise But it's

Michael: a storage now or WAL

Nikolay: both both.

Michael: Okay,

Nikolay: They are connected actually.

If you need to write-

Michael: You mean on super base?

Nikolay: Anywhere, any Postgres.

If you need to write 100 megabytes
to storage, you will produce

like very roughly, you will produce
kind of close to 100 megabytes

to WAL because this is the same
data.

Yes, in different form, but it's
the same data, right?

If you need to write 100 times
more, expect 100 times more of

wall.

David: Yeah, order of magnitude,
it would be about the same.

Nikolay: Yeah, very roughly.

Of course, like full page writes,
all the compression, but it's

very different.

Michael: I also think of them very
differently because with WAL

I think of it as like megabytes
per second, always.

It's always like a, with a time
component, if that makes sense.

So it's like a, it's a constant
amount that we're generating,

of course over a month or whatever
it is, that's a few gigabytes.

But I guess that doesn't actually
add up to very much per second

in terms of, yeah.

Nikolay: I think we should expect
something like 100 to a few

hundreds megabytes per day with
a lot of queries and indexes

and so on.

Yeah.

David: I mean, if you like ballpark
math, if it were 30 gigabytes

of data per month, that would be
roughly, I guess, by the power

of arithmetic, maybe a gigabyte
per day.

Nikolay: It's very stable.

David: And by 3600, you could figure
out megabytes per second

of WAL generation.

Nikolay: Kilobytes maybe already,
right?

And it's very stable because it
depends only on this cardinality.

And if you have some spikes of
workload, it doesn't affect the

amount of data, these snapshots,
right to WAL and data directory,

right?

David: It's a baseline.

So just, you know, you're paying
maybe $25 to Supabase, maybe

pay $26 and just pay for a little
bit more storage.

Nikolay: Not hundreds more as you
would pay if you install full-fledged

monitoring.

David: And if it's only, if it's
to first order for the primary

node.

Again, all of these small projects
are starting out with only

1 node anyway.

I mean, life is complicated and
these people are just trying

to get like a business off the
ground and a job done.

They're not thinking about multiple
nodes, especially early on,

but they, you, I mean, we 3 know
that they will need data to

guide them on their journey.

So this is just part of that.

I mean, Michael, you work in the
observability space.

We sort of dilate this out to a
wider view.

This is just another entry in the
observability space, like maybe

a new generation 0.5 of observability
tools for PostgreSQL.

But I mean, and there's pg_ash and
there's...

Nikolay: So it's not only about
observability.

I see, actually, the word new kind
of breed you used, I think,

in our discussions.

So I have pg_ash, you have pg_flight_recorder.

I also trying to revive PgQ in
this very format, pg_cron and PL/pgSQL

only.

That's it.

So it can be installed anywhere
and just tick.

I also have Lindex, which is not
yet released, which is rebuilding

indexes on pg_cron.

That's it.

You can inject PL/pgSQL and tick
on pg_cron.

That's it.

It's super simple.

I already think about tool for
automated partition creation without

heavy tools.

I don't know, it should be easy
to use.

But then you mentioned pg_tle.

David: Yes.

Nikolay: So can you maybe elaborate
a little bit why pg_tle?

Why not just single SQL file or
PL/pgSQL file?

David: It's both.

So, for those who don't know, TLE
is trusted language extensions,

which I have my own view on that.

I regard it as just a little bit
of extra housekeeping that's

associated with just a simple SQL
install file, but they are,

they, you know, TLE sort of dress
up SQL and PL/pgSQL code as

if they were a sort of kind of
managed extension, but they can

be installed without super user
privileges.

pg_flight_recorder comprises both
just simple install scripts.

You can use psql to install it,
but it also is available as a

trusted language extension that
can be installed through I think

dbdev Because that's how some people
want to be able to install.

Nikolay: Just to track like metadata.

David: Yeah, just so you can do,
it makes housekeeping a little

bit easier.

You can slide pg_flight_recorder
in with an install and if you

don't like it, you can uninstall
it in a very managed fashion.

So that's all that's meant there.

But yeah, it's very TLE, it's a
very lightweight way to have

managed extensions and Flight Recorder
offers that as well.

Nikolay: Yeah, I actually wish
PjCrone and TLE, they're both

inside Postgres itself.

And we would say something like
create package or something.

I don't know.

And it's just a bunch of SQL and
Pell, just go code or maybe

Pell Python if you want like anything
and it just can be installed

anywhere with versioning and so
on with like tracking of CVS.

I don't know, like if there are
any and so on, who knows.

Yeah.

But just like extensions don't
feel like a part of extensibility

of Postgres to me anymore.

This is my honest like feeling
lately.

David: Well, it also, what worries
me is who is testing, I mean,

with, with major version upgrades,
let alone minor versions,

who's testing all of these extensions?

Nikolay: But this question is also
applicable to any regular

backend code.

You use some libraries, You just
import them to your code, somehow

include, and that's it.

And versions also matter there,
and it's on your shoulders.

It should be on your shoulders.

This idea that we're not providing
some extensions because we

will need to maintain, give it
to shoulders of people.

This is a different part of the
thing.

David: This is intention with products
which telegraph or advertise

that were easy to use and you don't
need to worry, we will handle

it for you.

Nikolay: Yeah.

But I think It's great to have
flexibility and if people can

use various, like, choose and...

But they need to be responsible
for upgrades and part of maintaining

as they are already for libraries
and Go language, any language,

right?

TypeScript and so on.

So there's something here.

And I think it's AWS guys who created
pg_tle, right?

So definitely this project was
created with realization that

something is limiting people here
and let's bring something here.

That's a great idea.

And I wish- I mean,

David: Michael, do you experience
that at all with your customers?

Their challenges?

I mean I know you work in a slightly
different space but you

certainly you must encounter this
as well like tensions with

extensions, with managed database
providers.

Michael: Yeah so I get the impression
so I don't speak to people

all the time about this kind of
thing, but I get the impression

that people are looking for a little
bit of advice almost from

their managed service provider
on which extensions they should

trust, which are like the best
at what they do.

You know, often there's a choice
of 2 or 3 and they kind of want

their managed service providers
to pick 1 and say, you know,

this is the 1 we suggest or this
is the 1 we support and I feel

like there's a little bit of that
going on as well so it isn't

just yeah I think it is a little
bit of I don't know if it's

like king making or something but
like people saying this is

the 1 everyone using when people
come to Postgres for example

for the first time they're like
which backup tool should I use

which monitoring tool what's everyone
else using And there's

no kind of like official, there's
no, there's barely any kind

of extension management systems.

There's been about 3 or 4 kind
of created and I think there is

still, is it PGXN?

That's probably the most used.

But it doesn't have like reviews
or it doesn't have, Like it

doesn't have a lot of things people
are looking for in terms

of which ones are actually used
which ones people do that people

actually like which ones have got
a good track record when it

comes to major versions or low
Maybe no CVEs or very few CVEs,

you know that kind of thing So
I think trust is a big part of

it, and also people want a shortcut
as to which ones of these

should I be using.

Nikolay: I actually agree.

Supabase's, this database.dev,
it's another attempt to have this

David: It is yet another attempt,
which seems to be somewhat

honored in the breach but it's
yet another attempt but Michael

I take your point definitely that
there is a need and I would

say a growing need for if not king
making at least someone to

offer guidance and to bless these
You would know as well as I

would, the sort of persona for
database operators definitely

does seem to be changing.

I mean, there once was a time when
databases were an arena for

people to sort of develop and then
project expertise, which is

certainly true.

But more and more, I encounter
people, customers, super-based

users, who are very candid.

They will say to me or to us, I
don't know what I'm doing.

I'm not a DBA.

Some of them will say, I'm not
even a tech.

I'm not even technical.

I'm a founder.

And I just vibe coded my way into
this.

And, you know, there's less of
an urge now than there was in

the past to sort of burnish your
credentials as a database expert.

People are very happily, they're
very candid, and they will say,

I am not a database expert at all,
so please, can you help us?

Can you offer guidance?

If you tell me what extension to
install, I'll install it.

So there's growing need for those
kinds of tools and for greater

and better automation.

That can be a topic for another
session.

That's where my mind is.

Nikolay: And I agree actually with
this authority, what's good,

what's reliable.

Sometimes I have cases where we
have huge Postgres, self-managed

Postgres clusters, and when I'm
saying we should add some extension,

I'm saying it's available on this
managed platform, so it's reliable,

you know, let's add it.

It helps me to speed things up
so yeah

Michael: I had a couple of last
things I wanted to make sure

like or to get it'd be great to
get your thoughts on 1 is is

there any like Is there anything
that we haven't talked about

that you have 1 of your favorite
features of the talk?

I've seen there's quite a few in
there and then also is there

anything missing that you really
want to add?

David: There are In reverse order
things that I want to add I

again I really want I would like
to fortify this against observer

effect, against deleterious consequences.

I already know that there are some
important fixes to be made

and I'm committed to doing that.

I'm hoping, my tender hope is that
people will use this to some

degree again, so that I can get
feedback.

If there are, if it needs to be
strengthened, I will strengthen

it.

I will pour effort into it to make
sure that happens.

So I wouldn't say there are features
that I want to add.

It's probably already bloated in
terms of features anyway.

So maybe I'll just stop in terms
of adding features.

In terms of favorite features,
I mean, Capacity planning, Nik,

you know, is another worthwhile
endeavor besides incident management.

And it's something that is sorely
needed for super-based customers.

There are functions within Flight
Recorder to help you project

your capacity needs.

Like everything else, those functions
will be strengthened and

improved, but I really would like
to exercise those.

I'd rather this tool is used to
forestall problems rather than

to investigate ones.

Let's just not have problems at
all.

Nikolay: You touched several things.

I wish we had a separate episode
on each.

And actually, I think how we would
approach incident response,

RCA, with this tool, particularly
step-by-step, it's possible.

And also capacity planning, I agree.

And thing related to observer effect,
1 thing everyone who is

using this new breed of tools ticking
on pg_cron should remember

that pg_cron records logs, right?

And you need to clean them up.

And I think we need to team up
and bring some pull requests to

make this configurable and make
it Unix way, Linux way, when

everything is cool, don't say anything
and don't log anything

because it works, right?

Like some levels, right?

Like warning error level for each
job in pg_cron.

And another thing is this verbosity.

pg_cron also depends on this frequency
and it can produce a lot

of bloated logs right in Postgres
as well.

Especially if we go to sub-second
frequency, which is just implemented

for PgQue, just running a single
stored procedure, ticking 10

times within 1 second.

Yeah, I don't know if it's needed
for pg_flight_recorder.

Maybe not at this point.

It's too much precision, right?

Yeah, anyway.

David: Anyway, yes.

So there is a soft dependency on
pg_cron.

pg_cron is also maybe a little overly
chatty.

My kitchen wall clock just silently
ticks away.

It doesn't generate a daily journal
of the fact that it ticked.

That's what I would like.

This is what we have to live with.

So yeah.

Nikolay: Let's create pull requests.

I can create or you can create
and just support each other.

Maybe pg_cron maintainers will
agree that it's...

Actually, I opened the issue.

I didn't see feedback from them,
but it's definitely an issue.

It was an issue before we started
creating these tools.

I have another places where pg_cron
was chatty in logs.

Yeah, anyway.

And final thing for me, a good
place to start is benchmarking.

If you just do benchmarks, just
inject these tools, pg_flight_recorder,

and ask your AI to visualize the
result of that data, like before

destroying instance or something,
destroying database, just dump

that data and visualize it.

It's so easy these days.

David: It is.

Yeah.

The nature of these tools is changing
and this makes it super

easy.

That's what I learned as well.

But yeah, That's it.

No, it's not very profound.

It's not very complicated, but
we need more and better automation

in the community.

This is just 1 small contribution.

Many more to come, I think from
you 2 and hopefully me and others

in the community.

So looking forward to that.

Michael: Nice 1, David.

And just to check, what's the license
here?

David: It's as generous as I could
make it.

Michael: Nice.

So, open and permissive.

David: Open, yeah, exactly.

Michael: Supabase style.

David: Supabase style, it is.

I work at Supabase, but this belongs
to, if anybody, to the

community.

Michael: Wonderful.

Lovely to meet you.

You as well.

David: Very kind.

I appreciate it.

Thank

Nikolay: you.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Guest

David Ventimiglia

Support Architect at Supabase and creator of pg_flight_recorder

pg_flight_recorder

Creators and Guests

Some kind things our listeners have said