Postgres FM | Transcript: Modern SQL

February 2, 2024 • 32 Minutes

Modern SQL

Michael: Hello and welcome to
PostgresFM, a weekly show about

all things PostgreSQL.

I am Michael, founder of pgMustard.

And today I am joined by Markus
Winand, creator of the popular

sites, 'Use the index, Luke,' Modern
SQL, author of SQL Performance

Explained.

Performance trainer and consultant.

And according to your LinkedIn,
SQL Renaissance ambassador.

It's a pleasure to have you here
today, Markus.

Thank you for joining us.

Markus: Yes, thank you.

Michael: Awesome.

Well, I suggested the topic this
week, but it is very much your

topic.

It's modern SQL.

And I'd love for you to give us an
intro into what you mean by modern

SQL and also how you came to be
such an ambassador for it.

Markus: So, it basically goes back
when I realized that many,

many people think that the old
SQL, so what is more technically

described as SQL-92, is pretty
much the only SQL.

Like we have joins, yeah, we have
them for decades and people

think that's it because that's
what they learned like decades

ago and so on.

And they didn't realize that the
story didn't stop there.

It went on.

And we got what I refer to as more
modern features, which are

not rooted in the relational idea,
like window functions or recursive

queries, or more modernly like
JSON support, document support

at all.

We have XML for quite a while in
SQL.

This is what I kindly refer to
as, yeah, well, modern SQL, as

opposed to the grandparents' SQL,
everybody has learned like decades

ago.

Michael: So do you see this more
as an issue with people that

have been around for a few decades
and got stuck in their ways?

Or do you see it also with new
folks coming along and getting

taught by those people or the lessons
they're learning only include

the most basic, rudimentary SQL?

Markus: I would draw the line a
little bit differently there.

I think the problem is more related
to ORM users and SQL users.

So you see SQL users, there are
also 2 kinds of users, those

who just learned it once and never
got an update.

That definitely exists.

But I think the most problematic
area is actually in the ORM

world or rooted in the ORM world,
where the people just really

think because the ORM tool is doing
the best because well, they

are specialized in using SQL, all
sorts. The ORM tool.

So this is probably the best you
can do.

But that's unfortunately totally
wrong.

So the whole idea that SQL is just
about relational things is

just outdated.

That was right in '92 but it was
already wrong in '99 and somehow

all the people managed to miss
that message.

They didn't get the memo, so to
say.

Michael: Yeah, that's such an interesting
point.

I hadn't put 2 and 2 together with...

I've listened to your talks before
on this around the change

from being relational only to being
able to handle other data

types or other data models.

But the R in ORM, is that what
you mean by the relational part

of it?

Markus: Yeah, also the whole idea
that the database is just for

persistency.

Look at Java, JPA, the Java Persistence
API.

Just the idea that the database
just stores data.

That's a wrong idea for SQL.

It might be a right idea for some
other systems.

But for SQL, the fundamental idea
of SQL is that we have a separation

of the persistent data layout and
the transient data layout that

we need for 1 query right now.

Because 1 other thing that I like
to put emphasis on is that

SQL is not a query language.

SQL is more like a transformation
language.

If you look at query languages,
Go to any web page, open the

CSS and look at the selectors of
CSS.

This is a query language.

A query language can pick out 1
small piece of something larger,

and that's it.

But now look at SQL.

With SQL, we are most of the time
focusing on transforming data,

transforming the persistent data
we have stored on the disk to

last forever, more or less, to
answer a question we have at hand

right now.

Michael: Yeah.

Markus: That might be a different
question tomorrow.

It's most certainly a different
question.

And the whole idea of SQL is that
we normalize, yeah, there is

this bad word, normalization, we
normalize the data persistently

on the disk so that the shape of
the data does not change often.

Yeah, the shape not, the data itself,
yeah, the contents of the

table, but not the shape.

Not so often.

Now compare how static is the on-disk
schema, if you have probably

normalized it, compared to the
questions you try to answer with

those data.

You get every day new questions
from marketing or from reporting

or from whatever.

And the idea of SQL is to have
a flexible transformation between

the rather static schema you have
on disk, because once you have

collected terabytes of data on
disk, then it's hard to change

The model you have there.

No matter how you do it, it will
be hard, compared to the dynamic

we have on the question side.

SQL is the glue between these two
layers.

Michael: One thing you mentioned
there is normalization being a

bad word.

I'm curious what you meant by that.

Markus: Yeah, with normalization
we have all of us have learned

these levels.

First Normal Form, second normal
form, third normal.

Okay, of course, that's totally
correct and useful.

But to another extent, for the
layman people, just causing more

confusion than it is helpful.

What I like to explain normalization
is just, okay, don't focus

on the third normal or on any of
those.

Just try to figure out in which
tables would the data feel comfortable.

And naturally there are some things
like if you have a one-to-many

relationship, then it needs to
be two tables in SQL.

Just keep it on that level.

Don't go into the scientific definitions
of all of these six normalization

levels and some extras.

Try to keep it simple and you will
still get the gain from the

normalization.

So that you have a rather static
schema on disk.

Michael: I really like in what
tables would the data feel comfortable.

I'm going to steal that.

I think that should be on T-shirts.

If you make them, I'll buy them.

Awesome.

Right back to the modern SQL topic
a little bit more.

You mentioned that a lot of people
are stuck a few decades ago

with very basic SQL.

The counter argument might be,
well, they're solving their problems.

They can do a lot of things.

What are the big downsides or limitations
for those folks who

haven't explored some of the newer
features that have been added.

Markus: I'll give you an example.

With SQL '92, it was quite common
to do self-joins.

It was required to do self-joins
because some problems could

only be solved by using a self-join.

But self-joins are troublesome
in two ways.

First of all, they are complex.

And second of all, they are slow,
because they are touching every

data item twice.

Michael: And

Markus: now we have pretty much
for every situation, we have

some other syntax, most importantly
window functions.

Window functions are one of the big
self-join killers.

And now they become, actually the
syntax is more concise.

It's more clear actually when you
look at it, rather than scratching

yourself this way, you just scratch
yourself this way.

So it's actually better to read
the query.

It's actually also better to write
the query once you know the

syntax.

And it will perform better.

And you will have less maintenance
nightmares because you don't

have the redundancy in the code.

Like with a self join, if you add
a condition to one of the tables,

you might need to add it to the
other one as well or to the join

condition.

And this is easy to forget, then
you get wrong results.

Michael: Yeah, I love those.

So clarity and performance.

So clarity not just for yourself
writing things, but also for

people coming back to it, reading
it yourself in future.

And performance is the man after
my own heart on that front.

I think a lot of people assume
that, or looking at new features,

they assume they'll get new functionality
and forget that those

other two things are the bigger part.

And because of the way SQL works,
we likely could have done it

before.

That's a really good point.

Markus: But it's actually true
because SQL 92 was from one perspective,

it was quite complete.

Michael: So it

Markus: was completing the relational
idea.

And of course, you can solve almost
everything.

Asterisk recursion.

Almost everything with the old
SQL.

And that also makes it difficult
to learn new stuff because,

well, I can solve it already.

So why do I need to learn something
new that worked for me like

decades?

It's still working.

So what is it about?

And that's, of course, a little
bit more difficult to explain

people.

Michael: Well, I love how simply
you did it.

So clarity of what's happening
when you look at the SQL and performance.

What can they what choices
does the planner have when executing

that?

Yeah, really cool.

And also, I guess, edge cases as
well.

These features can be designed
to be more...

Well, we've done whole episodes
on handling nulls.

And they can have thought about
those more than, or if you have

to roll your own versions of these
features, you have to handle

nulls yourself, which can be painful,
to say the least.

Markus: So one of my favorite features
in this regard is the is

not distinct from operator.

It's the null-safe operator so
that if you have nulls on

both sides then you still get true.

Yeah, you could say it's a modern
SQL feature.

According to my definition, it
was added after SQL '92.

And if you look at on my website,
Modern SQL, you can check out

which systems support that operator.

And yeah, unfortunately, not so
many.

Michael: Interesting.

Markus: Although I consider it
something quite essential.

But still, also the vendors didn't
get the memo that there is

something new they need to catch
up with.

Michael: There I ask, I actually
haven't used it.

Is it in PostgreSQL, do you remember?

Markus: It is there.

Yes.

It does work syntactically, you
get the right results.

Michael: Okay.

Markus: But there are some gotchas
when it comes to optimization.

Michael: Interesting performance.

There we go.

Markus: So I have had just like
a month ago, I've had a case

where I would like using it, but
it didn't work out for performance

reasons.

Michael: That's so good to know.

That's very, very interesting.

I love those charts on Modern SQL.

When you look up one of these
newer features, and you have the

charts of one row per database,
per DBMS, and then a timeline of

whether it supported it, at which
point it supported it, whether

there are any caveats.

And every time I look at that,
I think, wow, this is impressive.

But also, this must be so much
work to maintain.

I was gonna ask, how do you do
it?

Markus: Yeah, I'm happy you asked
that because this is one of the

untold stories.

So I'm just a one-man show, and doing
the job site as a side project,

actually.

I have kind of a day job as well.

Yeah.

So the way I make it nowadays is
that everything is covered by

tests, by tests that I can run
automatically.

I test very deeply, like I always
check for the null case.

I always check for collations.

I always check for this and that
and a lot of, down to the SQL

state.

And yeah, well, then I get the
result.

And if everything is going fine,
then updating the website for

a new version takes me like maybe
an hour of work.

It really depends on the system.

Like I'm also featuring Google
BigQuery, which is a cloud-only

system.

You can imagine that I have higher
latencies compared to the

VM on my box.

So for that, it takes a little
bit longer to run the tests.

But after that, it's basically
just saying make and everything

is rendered again and updated again.

And yeah, well, that kind of works.

And now I'm working on getting
actually more features in there.

Michael: Oh, nice.

You mean, so what's coming up?

Markus: Yeah, so I try to catch
up, like, of course, I like to

focus on the modern SQL features.

Michael: Yeah.

Markus: And I also like to focus
on features that are already

implemented in some systems, because
this is more valuable, of

course.

And I also have always an eye on
the new features added by some

systems.

Like if we talk about Postgres,
I think it was even in 15 that

we got the functionality of the
unique constraints where we can

choose how it should treat the
null values.

Michael: So this is whether you
can determine that you are only

allowed to have a single null value.

Exactly.

Markus: it's the nulls distinct.

Let me just check up my own.

Michael: Oh, I like it.

Markus: Nulls distinct.

So yeah, it was introduced with
15.

So now you can choose for a unique
constraint if it has a nullable

column, whether one null should rule
out other nulls or not.

And that was now added to the new
standard of SQL we have in

the year 2023, so last year, we
got a new standard and it's in

there.

But even a year before it was put
into the standard, it arrived

in Postgres.

Michael: Oh, nice.

Markus: So, this is kind of the
features I try to catch up

with the new stuff introduced in
the standard as well as in newer

releases of the systems.

Michael: Awesome.

I watched a talk of yours, I think
it was about 5 years ago now,

though.

So I've only heard your opinions
of like the previous SQL standards.

not the latest one.

What were your main thoughts on
2023?

Markus: So 2023, it took quite
a long while.

The one before was 2016.

So that's longer than intended.

So that there's a target of like
4 to 5 years, something like

that.

Oh, really?

Cool.

The big thing in the latest release
of the standard is the query

language for querying graphs in
a more native syntax.

So if you know the Cypher query
language, then you get the idea.

It's basically about drawing with
SQL a path through a graph

that you want to match on.

And there is now a standard of
its own next to SQL, which is

basically independent of SQL.

So it's a new language, a new international
standard that defines

the query language, generally the
language to work with such

databases, like how to insert something
into a graph and how

to query the graph.

So this is kind of independent
of SQL, except that the same people

who do the SQL standard also do
this new standard.

And obviously, we would like to
have that capability also in

SQL.

So additionally, besides that entirely
new standard, there is

now a new part of the SQL standard
because the SQL standard is

actually made up of several parts
and now there's a new part

so that we can use the query functionality
part of the new standard

inside SQL in our queries.

Basically in the FROM clause, we
have match operations and we

can use that SQL-art kind of querying
directly in SQL.

And that's most certainly the killer
feature of 2023.

But of course, it may take a while
until vendors catch up.

There are some commercial implementations
are available, but

they predate the standard.

Therefore, they don't implement
the standard down to the last

sentence.

So I'm curious how this evolves
over time and if the standard

can still have the power to unify
these languages.

That's something that we have to
watch out for.

But besides that one mega topic,
and that's probably the reason

why it took so long.

Besides that, we have got many
small things like the null handling

that I was mentioning.

One really nice thing is that we
can now put into literal numeric values.

So if you write a number like 1000,
you can now put in the underscore

as a separator for let's say thousands
of digits.

So that you can easily spot when
it's a million or a billion

what it is, so that you can make
it more readable.

You see, it doesn't change anything
what you can do with the

language.

It just makes it nicer and more
maintainable.

And there are also, in respect
to JSON, there are also some additions,

actually quite fundamental additions.

Now, the standard has an actual
JSON type.

It didn't have one before.

The 2016 standard introduced a
lot of JSON functionality.

Most importantly, the JSON query
language.

That's a query language, the JSON
parse language, but it didn't

introduce a type for JSON.

And now with 2023, they have added
the type for JSON, along with

some other simplifications and
extensions.

So there are quite a few small
enhancements that just make everyday's

life easier.

But the big thing is I think that
the property graph language.

Michael: Yeah, makes sense.

I imagine that's gonna be a heck
of a lot of work for folks to

implement.

But well,

Markus: If it's gonna be implemented
at all.

True.

Because the SQL standard, it has
the core functionality which

every product is supposed to implement
and by testing it, I know

it's not working out.

And on top of that, it has optional
features.

And all the advanced features are
basically optional features.

So vendors can choose to either
implement it or not, but if

they implement it, then they should
follow the standard.

Michael: Got it.

Okay.

And I think I read somewhere, but
maybe this is misguided or

outdated, that vendors can claim
that they support SQL if they

support SQL 92.

But how does it work for the non-optional
features for the core?

Yeah.

Markus: True.

The short answer is not at all.

Okay.

Yeah.

And that's part of the story behind
modern SQL and the charts

you mentioned.

I think a standard is nice.

Okay.

But if nobody cares about it, then
it doesn't have any power

to unify the language.

But as soon as you see the differences
and the mistakes the products

have, then there is a motivation,
then there's an outside pressure

to actually comply with the standard.

And I can see it, I've actually
noticed it today, that when I

publish something on my website,
the vendors are looking there.

Michael: Nice.

Markus: And they are questioning
when I say, okay, it doesn't

work in that product, then they
ask, hey, I thought this is supposed

to work, why not?

And then I can show them the test
case I have and that it doesn't

work.

And some of them fix those issues
quite quickly, like in 2 weeks.

Michael: Brilliant.

Markus: So sometimes it just happens.

I publish a new article mentioning
10 systems and 1 of them is

behaving differently, maybe wrong.

And just coincidentally, 2 weeks
later, there's a new release

by that system that behaves correctly.

So that happened, yeah, that happens.

Michael: Would you like to give
any kudos or praise to the folks

that are managing to turn those
around so quickly?

Markus: In that case, it was the
H2 database.

The H2 database, yeah.

On the other hand, of course, there
are systems that have longer

cycles and then it takes longer
or maybe happens never.

But on the other hand, if I look
at Oracle, if I may talk about

Oracle briefly.

Michael: Yeah, yeah, please do.

Markus: The latest release, the
23 release, finally fills the

gaps that are there for decades.

Like they finally introduced a Boolean
type.

They have ignored it for so many
decades and have just said,

okay, make it a char 1, why?

No, and that's fine.

And now ultimately, there's movement
coming in there.

Even from Oracle.

The whole industry has actually
much more movement than like

10 years before.

Like that MySQL is getting Window
function support and recursion

support and all of that was something
I didn't expect actually.

When I was starting with modern
SQL in 2015, it was rather static.

There were a few products doing
the modern stuff and there were

a few products not doing the modern
stuff.

These were the two things.

And I did not expect the non-modern
stuff to catch up.

It was more like, yeah, well, they
will stick there forever.

But then ultimately, they started
to get a lot of dynamics into

the language support that they
offer.

Michael: Nice, and we all benefit,
right?

And thanks for bringing up Oracle.

I think they're a great case.

I think they often are one of the
first to support some of these

newer things, even if they have
had in the past some of these

gaps.

It definitely gets brought up quite
a lot when people choose,

like when people mention why they
chose PostgreSQL, standard

compliance and things working as
expected, which aren't exactly

the same thing, but quite close
to the same thing, gets brought

up quite often as one of the reasons
they trust Postgres over other,

at the time, open-source databases,
but even yeah, commercial

databases.

So it is important and we all benefit
ultimately if everybody

can work together on these things.

If it works as expected.

That's cool.

Other than your site, of course,
being very important in the

shift, what else do you think might
have helped with this change?

Markus: I think the MySQL move
is actually pretty important because

now the most commonly used database
actually in the world, besides

SQLite of course, but even SQLite
has moved.

Those systems supporting these
features means these are not exotic

features of a few vendors.

I think as soon as this is there
in the MySQL documentation,

then it becomes more official being
SQL rather than just being

Postgres or Oracle or whatever.

I think the fact that MySQL 8
got all of some of these modern

features makes quite a big difference
how people look at these

features.

Michael: Yeah, that was a huge,
huge release.

I remember MySQL, it was one of the
8 releases that included and

this is not related to SQL, well
not related to SQL standard

at all, but they got explain, analyze as well, which was a

big step forward in the things
I look at.

So yeah, huge, huge release.

And we're going to get hate mail
for this, but perhaps Oracle

deserves some credit for that now
being the stewards of the MySQL

codebase and project.

Markus: Yeah, I was not expecting
that when MySQL ended up at

Oracle.

I was rather expecting it like
being just come over to the real

database like this kind of thing.

But that didn't happen.

And if we now look at MariaDB,
which is basically the continuation

of the old team and at MySQL, then
we can see, well, they diverge

quite a lot.

And while MySQL was rather focusing
on getting the basics right.

MariaDB is more focusing on getting
features in.

And on the long run, I think the
basics are worth it.

And I think we can see that with
Postgres because in Postgres

the basics were right from the
beginning, I would say, at least

most of them, and that pays off
on the long run.

So I'm also happy to see MariaDB
being still there.

It's another competitor, especially
against MySQL in the open

source world.

But on the long run, let's see
how long they can survive.

Michael: Yeah, and as users, we
benefit from the competition,

right?

Like, obviously as community members,
I'm definitely very biased,

but I do like, I want there to
be competitors.

It would be, I don't think Postgres
as a project would benefit

from being the only relational
database standing in 20 years

time.

Historically, I don't think that's
been the place where most

innovation happens or the most
like customer benefit happens.

So, yeah, I love to see it and
appreciate your work on this as

well.

So I did.

I didn't want to minimize that.

I do think you have had an impact,
but it's cool to see other

reasons as well.

Markus: Of course, there are many
reasons for that.

Yeah.

You also see the noSQL movement,
remember?

Yeah.

In the meanwhile, most of the noSQL
vendors that are still there

have some QL, whether they claim
it is SQL or it is just some

similar QL.

Then we have new vendors that start
up with new SQL dialects,

you know, DuckDB and so on, that
also bring fresh ideas into

the SQL dialect.

I think some of the ideas they
are bringing up into their own

dialect are actually worth thinking
about bringing into the standard

and maybe other products like in
either way.

So, as I said, there's a lot of
dynamics at the moment.

Michael: Let's go back to what
you would like to see.

Let's say if somebody's like, well,
you know what, I do mostly

just use the SQL I learned when
I first learned.

Yeah, okay, maybe I do a bit of
JSON stuff here and there.

I've learned, I've had to use a
lateral join or window function

once, but you know, that's about
it.

What should they be doing?

How could they go about learning
more?

Markus: Well, this is pretty much
the same for the last, not

20, but 10 years.

The most undervalued feature of
modern SQL is and always was

window functions.

If you don't master window functions
yet, then well, stop listening

now, we will wait for you and learn
window functions, yeah?

And then continue listening.

So this is really the single most
important and powerful feature.

That's definitely my recommendation
if you are not into that

yet.

Other than that, let me give you
1 problem I see more often nowadays,

to change the mind about how to
use new features, because you

we're mentioning JSON as well.

Yeah.

I see JSON used in a way that
I'm not sure it's the best way.

Michael: Okay, great.

Markus: Quite often.

And the way I'm explaining it is
that data types like JSON or

JSONB, or even like Boolean, yeah,
they have several use cases,

actually 3 use cases.

A data type can be used in the
persistent schema.

It's in a great table, you can see
a data type.

But also important, a data type
can be used during the transformation.

And I'll give you an example there.

Like the Boolean type, we use most
of the time just during the

transformation.

Because in the WHERE clause, in
the ON clause, in the WHEN clause

of case, and so on, and so on,
and so on, we have Boolean expressions

and we're using Boolean values
just during the transformation.

It's not coming from the base table
from the disk.

It's not going up to the client
through the wire, just for the

transformation.

That's the second use case for
data types.

And the third use case is the API
layer.

So meaning how do we communicate
with the client?

And of course we think about a
tabular representation.

Yeah, of course, of course, of
course, and maybe a normalized

But here, I say, no, the normalization
is fine for the disk,

but it's not so much fine for the
API layer.

And here I love JSON.

Yeah.

Because if you think of a tag system,
you have some entities

and you can add tags on them.

Then obviously the normalized way
to do it is to have 2 tables.

I still think that's a good approach
in many, many cases.

But on the API layer, if you join
them, you will get multiples

of the entity data, which is meaningless.

That doesn't make sense.

But here comes basically the JSON
array_agg() and you aggregate

the tags into a JSON field that
you then give to the client API.

And then you just keep the 1 row
of the entity and have the list,

a list, yeah, of text there.

And here you see, this is where
I personally think that that's

the most important use case for
JSON.

It's not about storing JSON on
disk.

Yeah, that can make sense sometimes
as well, no question.

Yeah.

But the main use case I see for
JSON is on the API side.

Michael: Yeah.

Or for what it was originally intended,
right?

Markus: Yeah, maybe.

But what I see the people when
they say, yeah, we're using JSON,

what they mean is they store it
on disk.

Michael: Yes, yes.

Good point.

Awesome.

Well, is there anything you'd point
people at?

Have you given a talk on anything
like that before?

Any good materials for people?

Markus: Not so recently, you know,
there was this corona

break.

So no conferences.

I'm still not talking a lot in
public like there was another

podcast recorded a German one like
a few months back, but that's

it.

So follow my blog, modern-sql.com
and also use the indexloop.com.

This is where I publish my updates.

And what you can expect is basically
these matrices, who can

do what, use cases for the features
that I test.

That's of course also important.

What is the benefit of knowing,
okay, that feature does work

in that system, but I don't know
what to use that feature for.

Yeah, well, that's what you can
expect on a more or less regular

basis like once a month or twice
a month, something like that.

Michael: Nice, and I think I just,
might have even been this

morning, was reading your half
year update as well, which was

cool.

So yeah, highly recommend those
resources.

We've shouted them out a few times
on the podcast in previous

episodes, and your book as well,
SQL Performance Explained was

fantastic.

So thank you for that.

Actually on that note, when's Modern
SQL the book coming?

Markus: Yeah, well, I started in
2015 with the intention to make

it a book.

Michael: Oh really?

Markus: Yes, that was the original
intention.

So when I did the other book, the
SQL Performance Explained,

it also started as a blog and then
was later on put into a book.

And the blog is still available
for free on use-the-index-loop.com.

And the idea was to do the same
thing with Modern SQL, but then

happened what I've already told,
something which I thought was

static became very dynamic.

Michael: Yeah.

Markus: And now, yeah, it's hard
to catch up, but there is progress

and there will be a book.

Nice.

I'm still working on it, I'm using
the draft as material in my training.

So it is getting better but there
is no definite timeline yet

for it.

Michael: As should always be the
way with things in the future.

Well, good luck with that.

Thank you so much for coming on.

It's been a pleasure having you.

Best of luck.

Markus: Yeah, thank you and thanks
for having me.

Bye-bye.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Guest

Markus Winand

SQL Renaissance Ambassador. Author, Trainer, Coach. PhD in Common Sense. Creator of use-the-index-luke.com and modern-sql.com

Modern SQL

Creators and Guests

Some kind things our listeners have said