Postgres FM | Transcript: RegreSQL

January 16, 2026 • 58 Minutes

RegreSQL

Michael: Hello and welcome to PostgresFM, a weekly show about

all things PostgreSQL.

I am Michael, founder of pgMustard, and I'm joined as usual by

Nik, founder of PostgresAI.

Hey, Nik.

Nikolay: Hi, Michael.

Michael: And we also have Radim Marek with us, who is a consultant

and Postgres enthusiast, whose BoringSQL site we have mentioned

many times on the show.

Welcome, Radim.

Radim: Hello guys.

Thank you for having me.

Nikolay: Thank you for coming.

Michael: Yeah, absolutely.

So we wanted to talk about a newish project you have kicked off

called RegreSQL about regression testing SQL queries or SQL

queries.

Very cool.

It got quite a lot of uptick on Hacker News, a lively discussion

on there.

Congrats.

But also, I'd be interested in discussing with you about how

you got into the topic, like regression testing in general.

Where would you like to start?

Radim: Okay so that's the tough topic to start with.

So effectively it wasn't my project or it didn't start it as

my project it actually has started as a complementary project

to a book called The Art of Postgres, which I still believe is

one of the best things you can do to learn how to work with Postgres.

Nikolay: Dimitri, right?

Radim: Yes, Dimitri.

And I'm trying to get in touch with him forever, like last 4

years, But I'm not able, so maybe if he hears this podcast that

I will have a chance to talk to him.

So I did start with the project because I was kind of a bit worried

about how people deal with the SQL queries in their code.

Because usually it's either generated by ORM and you don't see

it or it was a string hard-coded in my case Go file so you can

see it in TypeScript files.

And all the testing around this one originally kind of bothered

me because you have an infrastructure which was difficult to

start, set up, and everything.

And I said there must be a simple way.

Something that actually running a simple query doesn't take minutes

or seconds to do all the infrastructure tiered up, tiered down.

And I found RegreSQL that is doing this job.

And I tried to fix the project.

And fast forward, I think 4 years, because it's actually quite

a long time ago.

I found another use cases because I'm working on some educational

products and I needed to test whatever
users are hitting the

metrics that I'm interested.

And then I kind of figure out,
yes, this is actually where it

can come in because it was comparing
whatever the output the

correctness of the query and that
was only step away because

then I realized okay every single
incident is based on increased

time on SQL queries and that was
just a you know small step to

go from something that is predictable
because time is unpredictable.

I got into something predictable
and this is where I started.

So I started committing, I started
adjusting it.

And right now I'm actually working
on a version 2.0, which is

a big update in terms of functionality
and in terms of developer

experience.

Nikolay: But how do you approach
the testing?

So I understand the tool.

For regression testing, it's super
important these days, especially

because there are no more excuses
not to have 100% coverage of

tests and proper coverage, not
just like good-looking coverage.

And performance testing in CI,
and even before your AI assistant

commits and pushes code, that should
have happened before probably.

How do you approach the fact that
we need a lot of data to have

proper plans?

Radim: Okay, so this is effectively
a never-ending topic, how

even developers can get to something
which is representative

of their database.

I actually call this 1, I think
that happened somewhere on Hacker

News with somebody from Supabase.

We came up to a topic which was
called, it works on my database.

That's the same issue.

But to answer your question, as
I said, initially I naively thought

I can do a time, But time is all
over the place and that wasn't

working.

So the only way how to approach
this 1 is buffers and cost.

Nikolay: Let's pause here because
I think it's super important.

I think industry doesn't understand
it yet.

We chatted about buffers with Michael
2 years.

Finally, Postgres 18 has it by
default and explain and analyze,

this is great, like absolutely
great.

So, industry doesn't realize how
important it is to shift focus

from timing to buffers still, right?

Why timing is not enough?

Why is it not so good when you
have to control performance.

What's your point of view on this?

Radim: Okay, so timing.

Timing is a perfect metric because
this is what you see if you

come in the middle of incident.

Timing is actually the easiest
to spot metric.

Queries are timing out, Hopefully
all of us have, you know, statement

timeouts, transaction timeouts,
everything in place.

Nikolay: And humans need the good
timing.

This is our final goal, right?

Radim: Exactly.

But the problem is, A, if you want something fast, explain alone

doesn't give you timing because that's just, you know, what the

planner thinks.

And EXPLAIN ANALYZE can't give you something that is predictable.

So if you want to do a regression testing, You can't rely if

you are running on, I don't know, underutilized AMD machine or

ARM machine.

Or, you know, everybody these days, we have Apple Silicon Macs

and they are super fast.

Their I/O is fast.

So timing is generally, that's the-

Nikolay: Doesn't work in CI.

Radim: It doesn't, yes.

Well, it doesn't even work in production because you will get

a big spread of queries.

You still have outliers.

1 customer can hit 10,000 rows.

1 customer hits 5 rows.

How do you...

Nikolay: What's the locking weight on locking acquisition as

well?

Yeah, yeah.

That's great.

Yeah.

I think it's even

Michael: worse in CI because flaky tests are a nightmare.

So even if, for example, we accepted double the time, if we said

like fail this test if it's double the time and that worked 99%

of runs, when it fails but for a bad reason, when it fails, it's

picked the same plan, no indexing changes, nothing has actually

changed, but there was just some something else happening on

the machine at that moment, and it took, it kind of blitzed past

its p99, we got that 1 time it was slow, and the test fails,

that's not a flaky test, that's a test that sometimes fails when

it shouldn't, like false positives.

And those are always the nightmare tests.

Nikolay: Before we proceed to the solution, we all agree, buffers,

right?

Still like let's pause maybe, I have a crazy idea, I'm thinking

for several years about this.

Maybe it's time to implement it.

Probably it's a better topic for our hacking sessions with Kirk

and Andrey we have usually on Wednesdays live.

But your opinion, I'm very interested to hear your opinion.

What if we still keep analyzing timing, but imagine, you know,

wait events, right?

Which usually are used in aggregated form, like how many sessions,

aggregate, and so on.

What if that analysis would be
brought to explain and analyze

to understand how this timing was
spent exactly.

Was it heavyweight lock acquisition
waiting?

Or I/O was too slow this time?

If in this case, wouldn't it help
to extract the, like to structure

the whole latency, to split it
to pieces, and in CI context,

extract pieces which are like unpredictable,
okay, this doesn't

matter, okay, waiting time, it
doesn't matter, waiting on lock

acquisition.

But here we see a lot of actual
CPU non-waiting, which usually

is null, right?

But there is a trick there.

But anyway, wouldn't it help to
keep in the, like, still keep

timing as main metric for regression,
testing in CI for performance?

Radim: I believe in theory that
would work, but for me the complexity

behind this is already implemented
on the cost and buffer level.

So I'm not sure if the benefit
would actually be having better

metrics.

Because what I found out is, and
we can get to the discussion

is, you know, buffers are fixed.

If you have a predictable data
set that you start with, and you

have a ideal state scenario.

I think that's the important ideal
state because production is

never in ideal state.

You will have a bloat, you will
have a, so, you know, my regression

testing is never- Always changing.

Exactly, everything is changing,
so it depends.

But if you want to do a regression
testing, you just need a matrix,

1 or 2 of them, which are predictable.

Buffers are always predictable.

Nikolay: Yeah, I'm just, I'm so
so interesting because you came

and talked what I was like shouting,
shouting on every event

basically.

Buffers, I even have t-shirt with
buff, explain what is buffers,

where buffers is bold.

By the

Michael: way, it can depend a little
bit, unfortunately.

It depends what you mean by buffers,
but like if you include

planning buffers, which are sometimes
reported, like there are

states that, like for example,
if for example, autovacuum just

ran, you might end up with a different,
if analyze's just run,

you might end up with a different
number of planning buffers

than if it hasn't just run or if
it hasn't run recently.

So there are like slight variations.

So yeah, it might not be perfect.

But cost is an interesting 1, for
sure.

Nikolay: So, okay, what I'm going
to do, I will postpone my comments

about why not only buffers, and
maybe let's return to timing,

and we will discuss maybe later,
but let's agree with you, like

buffers is the thing that doesn't
depend on the state so much.

So we just check hits and reads of shared buffers, also temporary,

like everything.

It's I.O.

Database is all about I.O.

Right?

But still the question remains how like your approach, like you

need a lot of data to reproduce the executor behavior so you

will see proper number of buffers, right?

You cannot do it with database sizes small.

Radim: Yes, that's the part.

It works on my database because developers they will start with

hundreds.

Cardinality of the data is going to be very small.

And I haven't seen a single company that wouldn't have a problem

with how to get a kind of representative set of data to developers

machines.

Well, I know there are attempts, but like if you take an average

software team, they usually have huge issues on this one.

So my issue, and that's actually part of the reaction to Michael's

comment, is I do always have to start with a clean slate.

So for me, this is not a production, so I have to count that

the vacuum ran and the statistics are up to date.

The original solution, which I described, fixtures, they are

the obvious solution, so you know I will always have to improve

the ways how to improve the fixtures.

But the way how to do it, and this is actually my roadmap towards

version 2.0, is to do a snapshot building.

You've probably seen, well you actually seen because you have

discussed it, that instant database clones.

That article wasn't a coincidence, it was actually for me need

how to get a fast start state for the regression testing, and

I needed something that works.

This is why I had to dig, and then I actually, same as you guys,

I was surprised it's already there, you know, because it wasn't

something I was aware actually exists in version 18, but suddenly

everything was done.

So it's not for production use, which you mentioned correctly,

because there are still issues.

But if you fix a easy to get starting point, if you imagine you

have a predictable build, which actually is coming, it's now

already committed, but it's coming for version 2.0 as part of

the developer experience.

And you have a fresh statistics, you can run the query.

And if your configuration, Postgres configuration is actually

a very interesting part because you can change the cost, how

it's calculated but you will always get the same buffers.

Exactly.

Nikolay: If you talk about hits plus reads probably the distribution

will be different.

Also hits might be hitting the same buffer multiple times.

So there are nuances there.

Radim: But the total, it's like
always the same.

So, and you Nik, you said it,
we all of us came from a spindle

disk.

So you remember times where you
try to hit the control C on the

query and it was just flushing
WAL files on a disk.

And we can say that I/O is cheap
these days, but it's not.

It's still the only metric that
you have to do.

And I/O, whatever there's a contention,
will affect time.

So for me, certainly buffers is
number 1.

So there is nothing more predictable.

Cost is very close.

And I actually did a, I haven't
done like a scientific research

on the topic, but if you take the
1 version of Postgres and you

run through different architectures
and same settings, the variance

is surprisingly low.

Not all

Nikolay: settings.

There is a specific set of settings
because you can have much

less memory, shared buffers can
be very small, you just need

to set proper effective cache size,
proper all the planner settings

and work memory.

We have an article about it.

And then I agree with you, it will
be the same planner behavior

for the same version.

That's it.

It's great.

It's magic.

You can have Raspberry Pi and repeat
the haver of huge like 800

core machine.

It's insane.

Radim: Exactly.

This is what I did.

I actually took my old Raspberry
Pi, I took Hetzner, I took a

GCP machine And the difference
was something around 2%.

So right now I have a threshold
of 10% by default, but I'm looking

forward to actually make it configurable.

That's great.

And let's let people decide.

But and then the answer to all
your initial questions is I don't

have to care about timing because
in production, if you would

have 1000 transactions per second,
I/O will get saturated, you

know, And that will naturally raise
the timing.

So if you care about buffers, you
don't have to care about timing,

because timing is just an extrapolation
of problems that happened

before actually the query returns.

Nikolay: Some people say, like,
let's do proper full-fledged

benchmarking just to study behavior
of planner for a few queries.

It's wrong idea because you have
so many moving parts around.

You just narrow down to single
plan, single session, right?

And focus on what matters.

But here, like my...

I have...

Before I had that question.

So you mentioned create database in Postgres 18, you mean strategy,

right?

When you can copy-on-write clone and when you have create database

with clones, template 1 database, but not in regular form, but

basically very fast because of copy-on-write.

Does it mean you use ZFS or something, Btrfs or what?

Radim: Well, it depends what you want.

I'm not forcing people to use anything.

I'm just offering an option if you want fast because you know,

another part of frustration was that everything in your integration

test that touches the database is slow.

Yeah.

And the way how people normally use it, it's slow, but you know,

running most of the queries it's like literally milliseconds.

Nikolay: Yeah, yeah.

And we have new 1 branching, we have other copy-on-write branching,

so that's great.

And here, if you compare, do you know about DBLab or not?

Radim: Yes, I know.

Nikolay: Yeah, so I'm comparing the various approaches and this

approach, like when we have multiple logical databases and physical,

is great because this is exactly what we try to achieve with

the DBLab, but not many people realize what you just said.

I hope with this episode more people will realize.

And the thing is that if you allocate this machine, you can have

a huge database.

This wonderful Postgres 18 feature, if You use ZFS, we don't

care about time anymore, almost.

So we focus on buffers.

Postgres doesn't know which file system we have.

And then you can achieve constant price for many, many tests

in parallel.

Unlike Neon or Aurora, thin cloning, you need to pay.

If you run 10 tests in parallel, for example, you can have 10

ideas from LLM, you need to pay 10 times more for compute.

Yes, storage is solved, but compute is not.

Here, create database, 10 databases in parallel, it's good.

The only thing compared to DBLab is that it's not possible, you

cannot have a major upgrade inside a clone and have independent

versions in parallel.

But Maybe some more differences, but in general, it's great that

it comes to Postgres naturally.

This feature is a great feature.

I hope it will get traction in CI environments.

Radim: I think it's actually great to have this natively.

I'm not saying it's perfect.

It's just an option.

Nobody is forced into a third-party
solutions, everybody will

pick whatever they want and, you
know, I'm not against Neon or

anything, I think you evaluate
your requirements and you say

what's the best approach you want
to take.

Nikolay: Well with Neon it's also
the same, like you can, you

just pay for compute but also you
can study planning and behavior,

put it into CI.

The thing is that you need to focus
on buffers because cold startup

will be different, right?

But we don't care anymore if you
focus on buffers.

That's the magic, right?

That's great.

And cost, also cost.

Because this is what says how planner
things.

The only problem is there are some
queries which are out of this

methodology.

For example, when people who deal
with RLS, they know this problem,

right?

If you have a query with counting
million rows and there is a

hidden filter involving current
setting, it's going to be executed

for each row And it's not buffers.

It's pure CPU time.

But these are exclusions, right?

We have only limited number of
them, right?

Radim: Exactly.

I would say before hitting that
problem, that is 99% of problem

we can solve much easier.

Nikolay: I agree.

And still you can troubleshoot
it on this machine because getting

those million rows is much faster
if you have this copy-on-write

and your methodology.

I mean, what you said, it doesn't
mean you should stop thinking

about timing, right?

You still can think about timing,
right?

Radim: You can, it will hit you.

That's what's going to hit you
in production at the end of the

day, because this is what the user
perceives when they render

the page waiting for the output
or something, but it's not predictable.

Yeah, The

Nikolay: problem is we gave this
tool, DBLab, to many hundreds

of engineers and noticed, I think
thousands already, and noticed

that some people expect that it's
full reproduction of production

behavior.

They say, why my Index is being
created 3 times longer than in

production.

Well because quite different environment,
right?

So there is some adjustment needed
in in a mindset to test like

this, right?

Well, okay enough about thing calling
and branching.

Let's think about tooling.

Michael: Yeah, I had a question
going back to cost tolerance.

You mentioned allowing for 10%
higher costs, and that confused

me slightly because I was thinking
on CI we'd always be running

on the same, effectively a clone
of the same Database static

state you mentioned why would the
cost why do we need to allow

for 10% higher costs potentially

Radim: that was initially my naive
protection effectively so

before I did the benchmarking I
said you know 10% literally I

just there was no measuring I just
said 10% sounds reasonable.

When I did measuring, I found,
as we discussed, you know, there

is slight variance, but it's not
actually, I think on average

was below 2%.

Nikolay: I would like to see details
here because I think there

should be 0 difference.

Let's talk, I'm going to follow
up with you on this because there

are some cases which I don't know,
I'm super interested to understand.

And

Radim: yeah.

I will just, you know, just to
finish the topic because, you

know, I did extensive testing in
terms of establishing it I haven't

dig into a reason because you know
some were our machine some

were very you know Raspberry Pis
so that wasn't comparable and

that was the where the difference
came from Some queries were

Nikolay: But Planner doesn't know
about hardware.

It doesn't know anything about
hardware.

Radim: We actually had this discussion
on the Prague Postgres

meetup 2 months ago, that it will
be actually very interesting

to go into a differences between
ARM and Intel processors.

What actually is, if there is some
default somewhere, I have,

for example, seen that I/O startup
time, again, it doesn't involve

buffers, but the I/O

Startup times on the cold I/O

Are much slower on ARM in general.

I believe there might be something
else.

Nikolay: Different code paths.

Exactly.

Right, right.

This I can imagine, yeah.

But it's not about hardware, just
code paths.

The architecture might matter.

Yeah, that's interesting.

Yeah.

What's also interesting here, I
think we touched an interesting

topic.

Could zoom in there.

It's because it's super interesting.

When they say 10%, I think it's
because in CI, you need to define

some thresholds to define where,
when it's failed, when it's

passed, right?

And it's when you work manually,
you have some like thresholds

in your mind, you don't realize,
but then you need to code it

basically.

And this is a challenge, right?

Radim: Exactly.

This is where it started from.

And this is actually where I hit
a wall with the original release

of the tool because people didn't
understand, you know, you said

it, people didn't understand buffers,
why they matter so much.

People didn't understand why something
is changing.

And I had to start thinking again.

And this is, you know, I think
if you would go to original blog

post, now there is a notice that
I had to take the fixtures out

of the test because people were
kind of linking it together and

they were getting different data
sets and they didn't use it

correctly.

My fault was that I didn't specify
properly that that was the

issue with the blog post.

But this is why the version 2.0
is actually heading in the direction

because we have touched on the
topic.

You can have a multiple snapshots
and those snapshots will give

you a different experience but
the same baseline on a given snapshot

should be stable or more or less
stable over time.

And then it depends because your
business will change, business

requirements will change.

Then for example, yesterday I committed
a change which is a selective

update.

So you can literally link a change
in baseline to a commit so

you can say and blame what actually
was that kind of historical

trend change on a query.

So this kind of adds another dimension
because suddenly you don't

have a good blame only who broke
something.

You actually can trace it back
to how did it affect the performance.

And this was actually on one of the
posts, Michael, you had on

your blog about this optimization
from 7 milliseconds to something

you hit the index scan.

And that was actually a big inspiration
because I said, you know,

this is actually very important
because the reason that query

does 7 milliseconds, that doesn't
matter.

But if you save so many buffers,
you should be able to trace

it back even on the positive outcome.

So at 1 point, there can be a layer
which will take all those

optimizations.

And this is where we will get to
very interesting topic and that's

the topic in this AI.

Nikolay: Right, experiments, right?

Radim: Experiment, well, I wouldn't
call it experiments.

I call it set reality because I
believe most people-

Nikolay: I mean, code is written
by AI only right now, right?

Radim: Yes.

Yeah,

Nikolay: yeah.

And how do we control it, right?

So as I said, we need not only
100% coverage, but it should be

real tests should be inside, right?

Including performance, I agree.

Yeah.

Radim: So, yeah, we had a discussion
with Michael in London during

a PGDay.

If you have a context which AI
don't have at 1 point, you can

then see regular SQL actually in
a very different light, because

it knows the query, it knows the
baseline, it has a snapshot,

and it has a context, which is
the schema.

Nikolay: Exactly.

And statistics.

Radim: Exactly.

You have statistics.

So you have suddenly much more
data, which you need.

And I had to, you know, all of
us, all of us have to think about

how AI is going to change our work.

I'm thinking

Nikolay: every day since 2017.

Radim: Exactly.

I have a weekly session which I
do and you know, I'm trying to

digest the news and do this 1.

But there are things that just
confirmed that this is the right

direction because the joke is 84%
of developers use some sort

of assisted coding or the vibe
code or whatever, we all do it.

Yes.

But the same 84% of people, they
don't write SQL queries anymore.

Nikolay: Yes.

Radim: If they were bad in writing
SQL queries, you know how

a LLM SQL queries look without
the context.

I actually found it, you know,
this is the deformation you have

when you look into a, how Planner
works.

I found it very funny because it
doesn't differentiate whatever

you're running Postgres 12, 14,
18, it will mix all that advice,

it kind of reads the blog post
which is just you something wrong

and people will just go with it.

And if you put this 1 in context
of a 10 times more pull request,

Who's going to review it?

And now you see where all this
coming through because if you

don't have time to review it, if
you don't have the human aspect

of reviewing it, and if you don't
have the DBA knowledge, I think

the lost DBA knowledge is actually
kind of key here.

You need something that will govern
that a your tests are fine,

you can write unit tests, integration
tests on a couple rows,

but how do you do it if you have
to return 5,000 rows?

This is very difficult And this
is what RegreSQL actually

was doing ever since start.

You can take a 5,000 rows and
compare them.

And it can suggest you, you know,
order is wrong or maybe, you

know, there is a wrong field and
it will notice it.

So actually that comes somewhere
between unit test and integration

test, it can measure or control
that your data and your contract

is still valid.

And this is where I believe the
direction of the tool is actually

having a big future.

Nikolay: That's absolutely great.

I just like everything you say
makes total sense, absolute sense.

Yeah.

That's great that we like think
in the same direction.

That's all I can say.

Radim: Well, thank you.

But I would say, you know, this
is, if you have experience dealing

with production system, this is
what you will effectively arrive

to.

Yeah.

Because you have to generalize,
you have to find underlying issues.

Nikolay: We need guardrails and
protection, automated protection

at a larger scale.

We cannot say, oh, let's not, let's
prepare to run those pipelines,

right?

Because they're expensive and long.

We need to make them cheap and
fast, radically, right?

So we can run like 50 of them,
right?

And check all the things and throw
away all hallucinated stuff,

right?

This is this is the way yeah This
is the way And then comes to

in many details including for example
There are some hard problems

which come from production side
not from development side.

For example plan flips You know
like and then it's different

so but it's a different problem
if we split these problems anyway,

I agree the most danger to performance
right now is the AI.

But it's also great.

But, Like, I mean, it's wild west
right now, right?

So we need some practices and tools
to be established.

Radim: But just to kind of put
it in contrast, I actually believe

most issues are actually very simple
because most of the incident

they start, you know, I'm not sure
if I'm the only 1, but most

incidents you see are literally
simple, missing index.

Or somebody rewrites index without
understanding how it should

be rewritten.

And you know, they obviously verify
it locally.

It works fine because they have
a small database but they don't

notice that the query exploded
100 times in terms of buffers

retrieved and that will have an
impact.

Well, 100 times is easy, that usually
you can survive it.

Nikolay: Well, important point
here also I noticed if you say

1000 buffers people don't understand
you.

If you say 8 megabytes they start
understanding how much it is,

right?

So you just multiply by block size
And this is magic.

So simple magic, you know.

Radim: It is.

That's, and you know, this is,
I don't want to sound as a promotion.

I'm building like it's called labs.

So, you know, we both have labs
of some sort.

Nikolay: For experimentation, it's
right.

Yeah.

Radim: Because for me, and you
mentioned it before, developers

want same environment as production
and they don't understand

why it's different.

And how many of us, I think 3 of
us, we are lucky because we

had access to a really busy production
environments and we know

how it looks like.

But if you take an average developer,
during their career, they

will never actually touch environment
that has more than 10 TPS

and maybe gigabyte of data, they
don't simply see it.

This is why, you know, if you trace
the BoringSQL block, everything

comes down to, this is fine on
your database, but it will eat

your life on production because
it won't work.

Nikolay: Yes, I compare it to bicycle.

It's hard to be DBA because you
need to learn, but where to learn?

If you don't have access to terabyte
scale and heavy workloads,

if the only practice you have is
production, it's basically learning

bicycle, how to ride bicycle on
highway.

Some people survived and they are
called DBA now.

Others they say the database is
slow, it's dangerous, don't touch

it.

If you have an environment where
you can safely make mistakes,

I saw magic when people have access
to terabyte scale clones

and they start to do really crazy
stuff and realize it's wrong,

but nobody affected.

And this is how learning is actually
works, you know, it works.

So yeah, you're basically 100%
resonating in my mind what you

do.

Radim: And then take whatever you
do, because I'm actually following

your post on LinkedIn about self-driving
Postgres.

Now you take, people won't write
SQL queries.

They will not, let's forget that
it's a regression testing framework,

they will not have this experience
and they will never hit issue

that Postgres would be badly tuned.

Yeah.

That suddenly increases the gap
and sure, there will be AI which

will help but you still need a
manual or human operator at 1

point or somebody who will be able
to diagnose right when wrong.

Nikolay: Because real testing shifted
to production that's why.

And we just need to shift it left
again.

Yeah.

Radim: Exactly.

So I think it's all about clarity
and giving people, sure, there

will be, I think we will never
have 10 times more DBAs.

We will have probably 10 times
less DBAs.

But we need to make sure how the
new people will actually learn

those skills and how they will
learn them predictably without

that fear, without that, you know,
being scared and who knows

what will happen in 5 to 10 years.

Nikolay: Yeah, I agree.

I agree.

And also it's very resonant to
what happened to code and containers.

Containers is the way to have reproducible
tests.

It's standard, right?

We just need the same for data.

And copy-on-write is the key.

But also methodologies should be
adjusted, buffers.

So everything is so.

And I agree, DBAs is like sysadmins.

It's in the past.

People need to learn Postgres who
build stuff and know how to

operate bigger databases coming
from various angles.

I have front-end engineers who
already start understanding things.

Or Supabase, Supabase is a great
example.

He attracted a lot of front-end
engineers to learn SQL and so

on, right?

That's great, I think.

Radim: And that's the boring part,
you know, I have to make the

joke because, you know, for me
literally the direction where

I'm going is those boring things
are with us for 40 years, they

will be with us for quite some
time.

And they just work, those skills
are reusable, no matter if you've

read them in LLM or not.

Somebody needs to understand them.

Nikolay: Yeah, yeah.

So can you explain what your tool
does?

So assuming we understood this,
that we need to focus on buffers

during optimization and establish
baselines focusing on buffers

and cost.

And we managed to achieve the same
data using this feature in

Postgres 18.

What does your tool do and who
should use it?

Radim: Okay, So the main goal of
the tool is to find all your

SQL queries and help you in some
way to write a plan because

you know queries need parameters,
it needs to have a way, So

you need to write a fixtures or
provide your dump database, anonymized

database.

That's your choice.

I'm not actually forcing you in
anything.

You have to get it to predictable
state.

That's your snapshot.

Then you run it on your queries
and based on your plan, each

query can have 1 to n plans.

That means what's the variation.

So you can have LLM to generate
you plans, you know, using random

values.

You can have a fuzzy generated
plans and that means every single

plan will execute that query once.

What the tool does, it captures
the expected, the original tool

did it also, it captures the data.

So what you have is right now it's
a JSON with all the data

that it produced.

This is your baseline And that's
the contract for correctness

of the queries.

Then you have the functionality
for baseline.

So right now the version 1.0 does
EXPLAIN only, but version 2.0

will have a full EXPLAIN ANALYZE
because that's what I need for

buffers.

And it records that data.

Nikolay: Yeah.

What about other settings, modern
settings like memory and serialize?

Radim: That is interesting because
again, this is the part of

the 2.0 roadmap because snapshot
can't be consistent.

You have to have a data, you have
to have a now even configuration.

So I didn't, I'm not saving all
the configuration, all the metadata

about where it's running and how
it's running, but I'm actually

saving this as part of the snapshot.

So if something changes, like if
you change a configuration,

what's the cost of sequential scan,
it will warn you.

You change something and I will
probably give you wrong results.

So you still have to interpret
the data.

At 1 point you arrive to a scenario
where you have a good Commit

and you can distribute the snapshots
for testing.

So ideally, you can have a couple
of versions of the snapshots.

You can have a small snapshot,
which will have 100 megabytes,

200 megabytes.

You can have a mid-size 5 gigabyte
snapshot, which you can use.

And then you can have a large snapshot,
50 gigabytes or something.

And then every time you run test,
you will get your report, what

actually has changed.

And that change is based on the
cost of the buffers.

Nikolay: Yeah, that's great.

And you capture also version, I
think, right?

And settings.

Yes.

And some like statistics and structure
of table, which indexes

we have and so on, right?

We're basically in the same, like,
met at some point, it resonates

a lot, but you come from development
more, I come from production

more.

And in production, what also matters often, including buffers,

for example, if you think about index-only scans, heap fetches

matters a lot, right?

And this is the state of production which can be bad or good.

And capturing from production the state, some statistics might

make sense as well.

Do you see somehow it could play or not?

For example, when last time this table was vacuumed, how much

dead tuples it has or how much bloat accumulated, something like

these things, you know, or it's outside of your scope?

Radim: I spend so much time thinking about this topic.

And if I would ask, if you would ask LLM, that would say, yes,

that's your natural direction.

But I actually proactively said no to this because this is a

different domain.

You know, you have to guarantee what's broken on the development

side.

You have to make sure that contract, which is also a performance

or buffers, is the same or better and you track it, but you can't

replace this in production.

Because in production you can have a latency, you can have a

replica, which run in different availability zones, you can have

a network issues, You can have, you know, so many issues that

can happen in production.

And there are tools that I effectively, I can't compete with.

And I think that wouldn't be, that wouldn't be my goal.

And because my angle is education effectively.

For me, it stops with CI.

Nikolay: Yeah, it again resonates 100%, because what you do,

you say, let's split the system into pieces and study each piece

separately.

And when we understand each piece, like, behavior fully, then

we can go and increase complexity.

And this, that's why you shouldn't use pgbench to study plans,

right?

Because pgbench is

Radim: very cool

Nikolay: to test the whole system.

And also that's why you shouldn't bring too much from production.

Study your system first, how it works, and then let's go to production,

understanding our system and knowing what it's capable of already.

Then we bring complexity.

This is like one of the oldest scientific methods, René Descartes,

right?

Let's split the pieces, study each piece, and then composition

to have whole understanding.

Without this split, it's super hard and you always have too many

moving pieces, right?

So I like this, but still I'm from production, so I care about

the state of production, you know, it's different.

Yeah.

Cool.

Michael: I think we're talking about different things.

They were, I really liked that.

This is about regression testing.

This is about not breaking things that like ourselves through

development.

This is not breaking correctness, whether that's a human changing

a query or an LLM chatbot changing a query.

So correctness is like primary importance, that makes sense.

And ideally not affecting performance, but we can't measure production

performance in test, in development.

So all we can do is get a proxy for it.

I don't think we can anticipate every single potential performance

issue in development that could hit production.

Nikolay: We've produced really difficult problems.

Yeah, with point-in-time recovery, sometimes you can reproduce

really hard problems at macro level.

Michael: Reproduction is different from anticipation though.

Anticipating, so to get it into your regression test you would

have to anticipate every possible issue and I think personally

I really like the direction of let's do 1 thing at a time, let's

tackle the common things, let's

Nikolay: look at the costing.

I'm not

Michael: arguing with the

Nikolay: whole concept, I'm just saying that also is possible,

it's just much, much heavier, much more expensive, time consuming.

In development we need to move faster to write a lot and Check

dozens of ideas at the same time like in parallel.

This is what we need right and this is perfect approach So the

Michael: last the last thing I wanted to say is I've personally

I personally think that This hasn't been a topic that's been

particularly of interest to people for many years.

I've expected it to be like I thought it should be.

I've worked at a company previously that made tools in this area

that weren't very successful and I was actually pleasantly surprised

by how much positivity and interest there was in on the Hacker

News discussion and in your in your revamp of this tool.

So for me, it's actually feels like people are ready for this

topic for some reason.

I don't know what changed or what.

Nikolay: What is writing there?

Who knows?

Michael: Do you think that's what it is, Radim, from your perspective?

Radim: I think this goes to, there's not 1 answer.

First is going back 1 topic, how many percent of incidents you

deal are really production-based?

Like no matter if it's human or LLM, you know, I'll be just guessing.

90% of all incidents are change management that didn't go well

and something happened.

So this is why the regression testing
actually when you Microsoft

you can't measure but if you take
a snapshot something which

is now working you should actually
do something that is measurable.

You measure if you're improving
on performance.

So it's you know sure contract
is important but actually if you

start with a 0.0, you can technically
say we on average go up

and down in terms of performance.

This is very easy to measure.

And you know this 1 for example,
you know how many times you

double guess when you upgrade from
Postgres 17 to 18, what's

the impact?

You can suddenly put a number and
if you somehow put all your

queries and do that number, you
can put that number into a bucket.

And the second 1 is no matter what
you do, I was asking myself,

you know, why nobody thought about
this 1 before?

I actually don't understand why
Nobody did this because it's

predictable.

The metrics were there, but nobody
did it.

And from my experience and the
frustration I had was all the

incidents were very easily avoidable
because the metrics were

not in the 10% margin, they were
in 1000%.

Every single incident started with
1000% difference.

Nikolay: I have also 1 question
about this tooling.

So how do you capture queries to
test?

Because for example, if we have
ORM, it produces various queries.

We can see pg_stat_statements, normalized
queries in production,

but for each query we might have
multiple plans depending on

parameters.

So how to find those queries?

I have my own methodology.

It's actually described in some
articles.

We could collaborate maybe a little
bit on it, but what's your

approach?

Radim: This is the difficult part.

I'm not going to lie.

I come from Go background, and
if you take, ORMs are not that

popular in Go, I would say.

For a framework like SQLX and things
like this, 1 day they encourage

you to write SQL queries by hand.

And this is actually where it started.

So, effectively, my first commits
were about supporting multiple

queries in 1 file.

So I have a different base.

There are even frameworks like
SQL C, I think that generates

code based on the queries or vice
versa.

But that would keep the target
market very slow.

So the obviously next step, and
that's the support of the different

frameworks.

So I think in my article, I did
a SQL alchemy.

That was 1 that actually allows
you to save the queries.

And this, if you remember, we talked
about the plans.

And plans are that variation of
arguments.

Nikolay: Right.

Radim: So you see, you have an
artificially generated snapshot,

which is fixed, and then you need
to generate the plans.

And plans variation, if you write
1 plan, your query testing

is not going to be this 1.

So you need to generate, and this
is actually where LLM can help

you.

Because suddenly you have a tool
which can create the variation,

create even hallucinate data.

And you will be testing queries
with a, I would actually be quite

happy to have hallucinated parameters
because it will show you

whatever something is changing
or not.

Nikolay: It's a proactive approach
actually, because my approach

is to take everything from production
because we deal with really

large systems and we need to understand
when we, for example,

perform major upgrade, there are
no plan flips, so planner or

plan flips are positive.

But what you say, like, let's use
LLM to generate parameters

which probably will happen next
month, right?

Like some edge cases we haven't
met before.

I like this a lot as well.

Yeah.

Radim: Yeah.

So you have to use what you are
afraid of.

So if we are afraid of randomized
queries, then why not use it?

And effectively said, yeah, I need
a variation of thousand plans.

Nikolay: Yes, again, like the problem
is like, again, I have

slightly different opinion here
because I'm coming from production

experience.

I would say we can generate like
thousands and thousands of various

cases, maybe 1,000,000 queries to
test, but it will increase the

longevity and the price we pay
for CI and testing, right?

So we need probably to prioritize
some queries.

How to prioritize?

We need to study production.

No?

Radim: Exactly.

Okay.

You need to understand your business
requirements, you need to

understand your outliers, you need
to understand your smallest

customers, you need to understand
your biggest customers, you

need to understand the patterns
which are happening.

Nikolay: That's also an interesting
comment, because I was thinking

about like this tooling, Oh, by
the way, parameters right now

is just an idea, it's not implemented
yet, a collection of...

Radim: No, that's already...

The queries are already parameterized.

Yeah, I understand the

Nikolay: picture, but if I have already a large 10TB database

with a complex workload and I want to start using your tool,

will it help me to create a set of queries, example queries,

which will be tested?

Radim: Queries we have to extract from somewhere.

So no, I would say answer is no, because right now I assume you

have a queries or way how to capture the queries, either by having

them in the SQL files or by being able to have a CI task that

will generate those SQL files from your ORM.

Nikolay: Yeah, I would like to have an additional call with you

to discuss because I have an approach maybe we could connect the

dots here and see if it can land because this methodology probably

works with your tool naturally, you know, how to inject this

approach to an existing big project.

We deal mostly with projects already starting, hitting walls,

they come to us for help.

So I understand completely how to start from scratch, but when

you have something already, there it can be difficult.

Another thing I wanted to point out, what I hear, you say smaller,

medium-sized, bigger.

Sometimes we have only 1 database and that's it.

Some systems like SaaS, which is installed only once.

But sometimes we develop software which has many installations,

small, big ones.

So it's interesting.

I think the approach when you define some reference databases,

this works better when we have multiple installations of various

sizes?

Is it what you think?

Radim: Yes, as I said, there's no universal answer how to generate.

And many people actually wrote me about fixtures being completely

wrong.

But what I'm trying to get past is I don't care about the features.

I care about the way how to generate a snapshot.

And if you have the same cardinality and same data, you can then

measure correctness.

So obviously the investment in a tool like RegreSQL is actually,

it's a big 1.

You have to start using it and you will get dividends of months

on.

You might get some dividends as you go, but actually the true

value will only go.

So the developer experience and then whole life cycle, that's

maybe a version 3.0, because you know, there needs to be much

more support.

So the tool itself is not important as much, it's the whole experience

in the process because methodology, exactly.

Nikolay: So what I'm actually interested in, like let me share

like to be honest with you and what I have in my mind listening

to you.

For example, GitLab, they use our approach.

Many engineers say it's great and we have case study publishers.

Everything is great.

But listening to you, I realize actually we always cared about

the biggest GitLab.com installation which is huge, multiple,

it's huge.

We focused on solving that problem.

We solved it, not fully, but solved how to test queries there.

It can be proved always.

But now I'm thinking, actually, GitLab has hundreds of thousands

of other installations and they have reference.

We actually should care about that as well.

So it's interesting because we could generate different sizes

of database and cover it by testing them as well.

So it's interesting because I was always thinking about the biggest

1, you know, because usually problems usually there.

Right.

Radim: Exactly.

You hear about the biggest ones.

This is where, you know, the production breaks, the incident

starts, the query, which was fine.

It sounds similar, but it's a very different problem.

But you need to find a some sort of boundary sometimes you can't

solve all the issues and maybe you need to find a different solution

Nikolay: right right right yeah so it's

Michael: just out of interest Nik what would you expect to break

on a smaller instance?

Is it that it's just like a different shape, for example, on

GitLab.com's main instance, maybe there's tons of 1 type of object

and in a different customer's installation there's a different

distribution and therefore plans are different?

Or is it like...

Nikolay: I can tell you.

So what we think about, like, we have various kinds of testing,

like, red data testing, It's like ultimate final testing with

basically production clones, which only a few people can have

access.

But also, you cannot download this database due to size and also

restrictions, PII and so on.

But what if you need to care about multiple installations in

this case, the direction DBLab is moving in is having full-fledged

database branching Git-like, but also cloning between multiple

setups.

And We already run on MacBook.

So I'm thinking if you care about smaller databases as well,

developers could basically download reference database and iterate

much faster because I hate how CI is organized right now.

If AI is hallucinating, in production if I hit it, it's an absolute

nightmare.

But hitting even in GitLab CI or
GitHub Actions, hitting problems

there, it's already too late for
me.

I want to iterate much faster.

I want all the tests to be run
before push, git push, right?

In this case, if we have a DBLab
of a smaller size, you download

it and like don't hunt only for
the biggest case, medium size,

small size.

I have like a terabyte or so here.

I can have a good example database
on my laptop and iterate much

faster and before push I already
have much better code, you know.

Michael: Yeah, so catch more issues
earlier and only catch the...

Nikolay: Shift left what you can
shift left.

Michael: Yeah, that makes sense.

Nikolay: Okay.

So it's saying let's test earlier.

Ideally, developers should test.

Usually they don't have time, but
AI, I have no excuses to AI.

AI should test everything.

I pay for it, right?

Tokens, not time, right?

That's it.

I think it's a perfect time to
think about all levels, and if

you can shift left to your laptop
to your machine, you should

do it.

So future DBLab versions will be
able to clone data between,

to pass snapshots between 2 DBLabs.

You can have it on your machine
and just extract and then do

something, extract back and share
with your colleagues some tests

and so on.

Yeah, anyway, I'm still in agenda
all the time because it's actually

my topic.

I'm so glad you came actually.

Radim: There's actually so many
topics and I think we can go

for another hour because you know,
it's like the support.

I think last night I pushed was
the same 1.

You take the snapshot, you run
it against migration, either a

SQL file or command.

So you can actually trigger a third-party
system and then test

the same just to test the migration
how it looks like and You

can start there so many angles
on this topic alone.

Nikolay: Oh, yes

Radim: You know, I don't say I
have answers for everything, but

I just feel like I

Nikolay: hit a topic.

Radim: Let's have

Nikolay: part 2 about testing of
schema changes.

Let's have part 2 just about it,
because it's a big topic.

This is where, as you said, change
management, this is where

danger comes from usually in instance,
right?

So let's just discuss it separately
because I think it's worth

discussing.

Michael: Radim, is there anything on the regression testing side

that you wish we had asked about or that you wanted to mention

before we wrap up?

Radim: I think we have covered most of the things.

Nikolay: Where to start for people?

Radim: Where to start for people?

Nikolay: You know, first steps.

Radim: Follow, first step, follow the project.

I think that's the first good step.

I will definitely make it public and do some, you know, sessions

around the version 2.0 because this is where the core developer

experience will come and I will actually need feedback on that.

So I'm actually open to any feedback.

You can email me at any point or open a GitHub issue.

I would be actually grateful to see that.

But I would say pick your 10 queries.

Don't start with anything.

Find a way to pick 10 queries which you believe are, maybe not

small, I would say representative of something that's breaking.

Try with them, try to create a plans, try to try the tool, how

it works, what might be the flow, and try to live with it.

And then have a rule that every single handwritten or LLM written

query needs to be covered.

If you start with that, you will start seeing value in a couple

of weeks, a couple of months, because you will have that check

mark, everything is fine.

And then in future, hopefully all the features and the developer

flow will support that you will get much more values and you

will actually get your time back in terms of safer production.

Nikolay: Confidence level builds up.

Yeah, I mean, a small set of queries, but it's a good idea to

understand which queries matter most based on some production

or something.

And you have a website boringsql.com so this is where you have

blog and so on and the project you're talking about is RegreSQL,

right?

Radim: Exactly.

Yeah, great, great.

Cool.

Right now it's not featured, I'm preparing a section.

Nikolay: You have a huge roadmap I guess.

Radim: You can't even imagine and this is actually where I loved

it.

Nikolay: I can.

Yeah, that's great, that's great.

And the project is in Go, so that's great.

Yeah, good, good, I think, finishing words, right?

Unless Michael has something.

Michael: No, I'll link to everything in the in the show notes

if anybody needs anything that
we've mentioned and yeah thanks

so much for joining us Radim it's
been great.

Radim: Yeah thank you it was very
interesting and I loved talking

because you know it's an important
topic.

Nikolay: Thank you for coming.

Radim: Okay, thank you guys.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Guest

Radim Marek

PostgreSQL consultant and enthusiast, boringSQL

RegreSQL

Creators and Guests

Some kind things our listeners have said