Modern SQL
Michael: Hello and welcome to
PostgresFM, a weekly show about
all things PostgreSQL.
I am Michael, founder of pgMustard.
And today I am joined by Markus
Winand, creator of the popular
sites, 'Use the index, Luke,' Modern
SQL, author of SQL Performance
Explained.
Performance trainer and consultant.
And according to your LinkedIn,
SQL Renaissance ambassador.
It's a pleasure to have you here
today, Markus.
Thank you for joining us.
Markus: Yes, thank you.
Michael: Awesome.
Well, I suggested the topic this
week, but it is very much your
topic.
It's modern SQL.
And I'd love for you to give us an
intro into what you mean by modern
SQL and also how you came to be
such an ambassador for it.
Markus: So, it basically goes back
when I realized that many,
many people think that the old
SQL, so what is more technically
described as SQL-92, is pretty
much the only SQL.
Like we have joins, yeah, we have
them for decades and people
think that's it because that's
what they learned like decades
ago and so on.
And they didn't realize that the
story didn't stop there.
It went on.
And we got what I refer to as more
modern features, which are
not rooted in the relational idea,
like window functions or recursive
queries, or more modernly like
JSON support, document support
at all.
We have XML for quite a while in
SQL.
This is what I kindly refer to
as, yeah, well, modern SQL, as
opposed to the grandparents' SQL,
everybody has learned like decades
ago.
Michael: So do you see this more
as an issue with people that
have been around for a few decades
and got stuck in their ways?
Or do you see it also with new
folks coming along and getting
taught by those people or the lessons
they're learning only include
the most basic, rudimentary SQL?
Markus: I would draw the line a
little bit differently there.
I think the problem is more related
to ORM users and SQL users.
So you see SQL users, there are
also 2 kinds of users, those
who just learned it once and never
got an update.
That definitely exists.
But I think the most problematic
area is actually in the ORM
world or rooted in the ORM world,
where the people just really
think because the ORM tool is doing
the best because well, they
are specialized in using SQL, all
sorts. The ORM tool.
So this is probably the best you
can do.
But that's unfortunately totally
wrong.
So the whole idea that SQL is just
about relational things is
just outdated.
That was right in '92 but it was
already wrong in '99 and somehow
all the people managed to miss
that message.
They didn't get the memo, so to
say.
Michael: Yeah, that's such an interesting
point.
I hadn't put 2 and 2 together with...
I've listened to your talks before
on this around the change
from being relational only to being
able to handle other data
types or other data models.
But the R in ORM, is that what
you mean by the relational part
of it?
Markus: Yeah, also the whole idea
that the database is just for
persistency.
Look at Java, JPA, the Java Persistence
API.
Just the idea that the database
just stores data.
That's a wrong idea for SQL.
It might be a right idea for some
other systems.
But for SQL, the fundamental idea
of SQL is that we have a separation
of the persistent data layout and
the transient data layout that
we need for 1 query right now.
Because 1 other thing that I like
to put emphasis on is that
SQL is not a query language.
SQL is more like a transformation
language.
If you look at query languages,
Go to any web page, open the
CSS and look at the selectors of
CSS.
This is a query language.
A query language can pick out 1
small piece of something larger,
and that's it.
But now look at SQL.
With SQL, we are most of the time
focusing on transforming data,
transforming the persistent data
we have stored on the disk to
last forever, more or less, to
answer a question we have at hand
right now.
Michael: Yeah.
Markus: That might be a different
question tomorrow.
It's most certainly a different
question.
And the whole idea of SQL is that
we normalize, yeah, there is
this bad word, normalization, we
normalize the data persistently
on the disk so that the shape of
the data does not change often.
Yeah, the shape not, the data itself,
yeah, the contents of the
table, but not the shape.
Not so often.
Now compare how static is the on-disk
schema, if you have probably
normalized it, compared to the
questions you try to answer with
those data.
You get every day new questions
from marketing or from reporting
or from whatever.
And the idea of SQL is to have
a flexible transformation between
the rather static schema you have
on disk, because once you have
collected terabytes of data on
disk, then it's hard to change
The model you have there.
No matter how you do it, it will
be hard, compared to the dynamic
we have on the question side.
SQL is the glue between these two
layers.
Michael: One thing you mentioned
there is normalization being a
bad word.
I'm curious what you meant by that.
Markus: Yeah, with normalization
we have all of us have learned
these levels.
First Normal Form, second normal
form, third normal.
Okay, of course, that's totally
correct and useful.
But to another extent, for the
layman people, just causing more
confusion than it is helpful.
What I like to explain normalization
is just, okay, don't focus
on the third normal or on any of
those.
Just try to figure out in which
tables would the data feel comfortable.
And naturally there are some things
like if you have a one-to-many
relationship, then it needs to
be two tables in SQL.
Just keep it on that level.
Don't go into the scientific definitions
of all of these six normalization
levels and some extras.
Try to keep it simple and you will
still get the gain from the
normalization.
So that you have a rather static
schema on disk.
Michael: I really like in what
tables would the data feel comfortable.
I'm going to steal that.
I think that should be on T-shirts.
If you make them, I'll buy them.
Awesome.
Right back to the modern SQL topic
a little bit more.
You mentioned that a lot of people
are stuck a few decades ago
with very basic SQL.
The counter argument might be,
well, they're solving their problems.
They can do a lot of things.
What are the big downsides or limitations
for those folks who
haven't explored some of the newer
features that have been added.
Markus: I'll give you an example.
With SQL '92, it was quite common
to do self-joins.
It was required to do self-joins
because some problems could
only be solved by using a self-join.
But self-joins are troublesome
in two ways.
First of all, they are complex.
And second of all, they are slow,
because they are touching every
data item twice.
Michael: And
Markus: now we have pretty much
for every situation, we have
some other syntax, most importantly
window functions.
Window functions are one of the big
self-join killers.
And now they become, actually the
syntax is more concise.
It's more clear actually when you
look at it, rather than scratching
yourself this way, you just scratch
yourself this way.
So it's actually better to read
the query.
It's actually also better to write
the query once you know the
syntax.
And it will perform better.
And you will have less maintenance
nightmares because you don't
have the redundancy in the code.
Like with a self join, if you add
a condition to one of the tables,
you might need to add it to the
other one as well or to the join
condition.
And this is easy to forget, then
you get wrong results.
Michael: Yeah, I love those.
So clarity and performance.
So clarity not just for yourself
writing things, but also for
people coming back to it, reading
it yourself in future.
And performance is the man after
my own heart on that front.
I think a lot of people assume
that, or looking at new features,
they assume they'll get new functionality
and forget that those
other two things are the bigger part.
And because of the way SQL works,
we likely could have done it
before.
That's a really good point.
Markus: But it's actually true
because SQL 92 was from one perspective,
it was quite complete.
Michael: So it
Markus: was completing the relational
idea.
And of course, you can solve almost
everything.
Asterisk recursion.
Almost everything with the old
SQL.
And that also makes it difficult
to learn new stuff because,
well, I can solve it already.
So why do I need to learn something
new that worked for me like
decades?
It's still working.
So what is it about?
And that's, of course, a little
bit more difficult to explain
people.
Michael: Well, I love how simply
you did it.
So clarity of what's happening
when you look at the SQL and performance.
What can they what choices
does the planner have when executing
that?
Yeah, really cool.
And also, I guess, edge cases as
well.
These features can be designed
to be more...
Well, we've done whole episodes
on handling nulls.
And they can have thought about
those more than, or if you have
to roll your own versions of these
features, you have to handle
nulls yourself, which can be painful,
to say the least.
Markus: So one of my favorite features
in this regard is the is
not distinct from operator.
It's the null-safe operator so
that if you have nulls on
both sides then you still get true.
Yeah, you could say it's a modern
SQL feature.
According to my definition, it
was added after SQL '92.
And if you look at on my website,
Modern SQL, you can check out
which systems support that operator.
And yeah, unfortunately, not so
many.
Michael: Interesting.
Markus: Although I consider it
something quite essential.
But still, also the vendors didn't
get the memo that there is
something new they need to catch
up with.
Michael: There I ask, I actually
haven't used it.
Is it in PostgreSQL, do you remember?
Markus: It is there.
Yes.
It does work syntactically, you
get the right results.
Michael: Okay.
Markus: But there are some gotchas
when it comes to optimization.
Michael: Interesting performance.
There we go.
Markus: So I have had just like
a month ago, I've had a case
where I would like using it, but
it didn't work out for performance
reasons.
Michael: That's so good to know.
That's very, very interesting.
I love those charts on Modern SQL.
When you look up one of these
newer features, and you have the
charts of one row per database,
per DBMS, and then a timeline of
whether it supported it, at which
point it supported it, whether
there are any caveats.
And every time I look at that,
I think, wow, this is impressive.
But also, this must be so much
work to maintain.
I was gonna ask, how do you do
it?
Markus: Yeah, I'm happy you asked
that because this is one of the
untold stories.
So I'm just a one-man show, and doing
the job site as a side project,
actually.
I have kind of a day job as well.
Yeah.
So the way I make it nowadays is
that everything is covered by
tests, by tests that I can run
automatically.
I test very deeply, like I always
check for the null case.
I always check for collations.
I always check for this and that
and a lot of, down to the SQL
state.
And yeah, well, then I get the
result.
And if everything is going fine,
then updating the website for
a new version takes me like maybe
an hour of work.
It really depends on the system.
Like I'm also featuring Google
BigQuery, which is a cloud-only
system.
You can imagine that I have higher
latencies compared to the
VM on my box.
So for that, it takes a little
bit longer to run the tests.
But after that, it's basically
just saying make and everything
is rendered again and updated again.
And yeah, well, that kind of works.
And now I'm working on getting
actually more features in there.
Michael: Oh, nice.
You mean, so what's coming up?
Markus: Yeah, so I try to catch
up, like, of course, I like to
focus on the modern SQL features.
Michael: Yeah.
Markus: And I also like to focus
on features that are already
implemented in some systems, because
this is more valuable, of
course.
And I also have always an eye on
the new features added by some
systems.
Like if we talk about Postgres,
I think it was even in 15 that
we got the functionality of the
unique constraints where we can
choose how it should treat the
null values.
Michael: So this is whether you
can determine that you are only
allowed to have a single null value.
Exactly.
So
Markus: it's the nulls distinct.
Let me just check up my own.
Michael: Oh, I like it.
Markus: Nulls distinct.
So yeah, it was introduced with
15.
So now you can choose for a unique
constraint if it has a nullable
column, whether one null should rule
out other nulls or not.
And that was now added to the new
standard of SQL we have in
the year 2023, so last year, we
got a new standard and it's in
there.
But even a year before it was put
into the standard, it arrived
in Postgres.
Michael: Oh, nice.
Markus: So, this is kind of the
features I try to catch up
with the new stuff introduced in
the standard as well as in newer
releases of the systems.
Michael: Awesome.
I watched a talk of yours, I think
it was about 5 years ago now,
though.
So I've only heard your opinions
of like the previous SQL standards.
not the latest one.
What were your main thoughts on
2023?
Markus: So 2023, it took quite
a long while.
The one before was 2016.
So that's longer than intended.
So that there's a target of like
4 to 5 years, something like
that.
Oh, really?
Cool.
The big thing in the latest release
of the standard is the query
language for querying graphs in
a more native syntax.
So if you know the Cypher query
language, then you get the idea.
It's basically about drawing with
SQL a path through a graph
that you want to match on.
And there is now a standard of
its own next to SQL, which is
basically independent of SQL.
So it's a new language, a new international
standard that defines
the query language, generally the
language to work with such
databases, like how to insert something
into a graph and how
to query the graph.
So this is kind of independent
of SQL, except that the same people
who do the SQL standard also do
this new standard.
And obviously, we would like to
have that capability also in
SQL.
So additionally, besides that entirely
new standard, there is
now a new part of the SQL standard
because the SQL standard is
actually made up of several parts
and now there's a new part
so that we can use the query functionality
part of the new standard
inside SQL in our queries.
Basically in the FROM clause, we
have match operations and we
can use that SQL-art kind of querying
directly in SQL.
And that's most certainly the killer
feature of 2023.
But of course, it may take a while
until vendors catch up.
There are some commercial implementations
are available, but
they predate the standard.
Therefore, they don't implement
the standard down to the last
sentence.
So I'm curious how this evolves
over time and if the standard
can still have the power to unify
these languages.
That's something that we have to
watch out for.
But besides that one mega topic,
and that's probably the reason
why it took so long.
Besides that, we have got many
small things like the null handling
that I was mentioning.
One really nice thing is that we
can now put into literal numeric values.
So if you write a number like 1000,
you can now put in the underscore
as a separator for let's say thousands
of digits.
So that you can easily spot when
it's a million or a billion
what it is, so that you can make
it more readable.
You see, it doesn't change anything
what you can do with the
language.
It just makes it nicer and more
maintainable.
And there are also, in respect
to JSON, there are also some additions,
actually quite fundamental additions.
Now, the standard has an actual
JSON type.
It didn't have one before.
The 2016 standard introduced a
lot of JSON functionality.
Most importantly, the JSON query
language.
That's a query language, the JSON
parse language, but it didn't
introduce a type for JSON.
And now with 2023, they have added
the type for JSON, along with
some other simplifications and
extensions.
So there are quite a few small
enhancements that just make everyday's
life easier.
But the big thing is I think that
the property graph language.
Michael: Yeah, makes sense.
I imagine that's gonna be a heck
of a lot of work for folks to
implement.
But well,
Markus: If it's gonna be implemented
at all.
True.
Because the SQL standard, it has
the core functionality which
every product is supposed to implement
and by testing it, I know
it's not working out.
And on top of that, it has optional
features.
And all the advanced features are
basically optional features.
So vendors can choose to either
implement it or not, but if
they implement it, then they should
follow the standard.
Michael: Got it.
Okay.
And I think I read somewhere, but
maybe this is misguided or
outdated, that vendors can claim
that they support SQL if they
support SQL 92.
But how does it work for the non-optional
features for the core?
Yeah.
Markus: True.
The short answer is not at all.
Okay.
Yeah.
And that's part of the story behind
modern SQL and the charts
you mentioned.
I think a standard is nice.
Okay.
But if nobody cares about it, then
it doesn't have any power
to unify the language.
But as soon as you see the differences
and the mistakes the products
have, then there is a motivation,
then there's an outside pressure
to actually comply with the standard.
And I can see it, I've actually
noticed it today, that when I
publish something on my website,
the vendors are looking there.
Michael: Nice.
Markus: And they are questioning
when I say, okay, it doesn't
work in that product, then they
ask, hey, I thought this is supposed
to work, why not?
And then I can show them the test
case I have and that it doesn't
work.
And some of them fix those issues
quite quickly, like in 2 weeks.
Michael: Brilliant.
Markus: So sometimes it just happens.
I publish a new article mentioning
10 systems and 1 of them is
behaving differently, maybe wrong.
And just coincidentally, 2 weeks
later, there's a new release
by that system that behaves correctly.
So that happened, yeah, that happens.
Michael: Would you like to give
any kudos or praise to the folks
that are managing to turn those
around so quickly?
Markus: In that case, it was the
H2 database.
The H2 database, yeah.
On the other hand, of course, there
are systems that have longer
cycles and then it takes longer
or maybe happens never.
But on the other hand, if I look
at Oracle, if I may talk about
Oracle briefly.
Michael: Yeah, yeah, please do.
Markus: The latest release, the
23 release, finally fills the
gaps that are there for decades.
Like they finally introduced a Boolean
type.
They have ignored it for so many
decades and have just said,
okay, make it a char 1, why?
No, and that's fine.
And now ultimately, there's movement
coming in there.
Even from Oracle.
Even from Oracle.
The whole industry has actually
much more movement than like
10 years before.
Like that MySQL is getting Window
function support and recursion
support and all of that was something
I didn't expect actually.
When I was starting with modern
SQL in 2015, it was rather static.
There were a few products doing
the modern stuff and there were
a few products not doing the modern
stuff.
These were the two things.
And I did not expect the non-modern
stuff to catch up.
It was more like, yeah, well, they
will stick there forever.
But then ultimately, they started
to get a lot of dynamics into
the language support that they
offer.
Michael: Nice, and we all benefit,
right?
And thanks for bringing up Oracle.
I think they're a great case.
I think they often are one of the
first to support some of these
newer things, even if they have
had in the past some of these
gaps.
It definitely gets brought up quite
a lot when people choose,
like when people mention why they
chose PostgreSQL, standard
compliance and things working as
expected, which aren't exactly
the same thing, but quite close
to the same thing, gets brought
up quite often as one of the reasons
they trust Postgres over other,
at the time, open-source databases,
but even yeah, commercial
databases.
So it is important and we all benefit
ultimately if everybody
can work together on these things.
If it works as expected.
That's cool.
Other than your site, of course,
being very important in the
shift, what else do you think might
have helped with this change?
Markus: I think the MySQL move
is actually pretty important because
now the most commonly used database
actually in the world, besides
SQLite of course, but even SQLite
has moved.
Those systems supporting these
features means these are not exotic
features of a few vendors.
I think as soon as this is there
in the MySQL documentation,
then it becomes more official being
SQL rather than just being
Postgres or Oracle or whatever.
I think the fact that MySQL 8
got all of some of these modern
features makes quite a big difference
how people look at these
features.
Michael: Yeah, that was a huge,
huge release.
I remember MySQL, it was one of the
8 releases that included and
this is not related to SQL, well
not related to SQL standard
at all, but they got explain, analyze as well, which was a
big step forward in the things
I look at.
So yeah, huge, huge release.
And we're going to get hate mail
for this, but perhaps Oracle
deserves some credit for that now
being the stewards of the MySQL
codebase and project.
Markus: Yeah, I was not expecting
that when MySQL ended up at
Oracle.
I was rather expecting it like
being just come over to the real
database like this kind of thing.
But that didn't happen.
And if we now look at MariaDB,
which is basically the continuation
of the old team and at MySQL, then
we can see, well, they diverge
quite a lot.
And while MySQL was rather focusing
on getting the basics right.
MariaDB is more focusing on getting
features in.
And on the long run, I think the
basics are worth it.
And I think we can see that with
Postgres because in Postgres
the basics were right from the
beginning, I would say, at least
most of them, and that pays off
on the long run.
So I'm also happy to see MariaDB
being still there.
It's another competitor, especially
against MySQL in the open
source world.
But on the long run, let's see
how long they can survive.
Michael: Yeah, and as users, we
benefit from the competition,
right?
Like, obviously as community members,
I'm definitely very biased,
but I do like, I want there to
be competitors.
It would be, I don't think Postgres
as a project would benefit
from being the only relational
database standing in 20 years
time.
Historically, I don't think that's
been the place where most
innovation happens or the most
like customer benefit happens.
So, yeah, I love to see it and
appreciate your work on this as
well.
So I did.
I didn't want to minimize that.
I do think you have had an impact,
but it's cool to see other
reasons as well.
Markus: Of course, there are many
reasons for that.
Yeah.
You also see the noSQL movement,
remember?
Yeah.
In the meanwhile, most of the noSQL
vendors that are still there
have some QL, whether they claim
it is SQL or it is just some
similar QL.
Then we have new vendors that start
up with new SQL dialects,
you know, DuckDB and so on, that
also bring fresh ideas into
the SQL dialect.
I think some of the ideas they
are bringing up into their own
dialect are actually worth thinking
about bringing into the standard
and maybe other products like in
either way.
So, as I said, there's a lot of
dynamics at the moment.
Michael: Let's go back to what
you would like to see.
Let's say if somebody's like, well,
you know what, I do mostly
just use the SQL I learned when
I first learned.
Yeah, okay, maybe I do a bit of
JSON stuff here and there.
I've learned, I've had to use a
lateral join or window function
once, but you know, that's about
it.
What should they be doing?
How could they go about learning
more?
Markus: Well, this is pretty much
the same for the last, not
20, but 10 years.
The most undervalued feature of
modern SQL is and always was
window functions.
If you don't master window functions
yet, then well, stop listening
now, we will wait for you and learn
window functions, yeah?
And then continue listening.
So this is really the single most
important and powerful feature.
That's definitely my recommendation
if you are not into that
yet.
Other than that, let me give you
1 problem I see more often nowadays,
to change the mind about how to
use new features, because you
we're mentioning JSON as well.
Yeah.
I see JSON used in a way that
I'm not sure it's the best way.
Michael: Okay, great.
Markus: Quite often.
And the way I'm explaining it is
that data types like JSON or
JSONB, or even like Boolean, yeah,
they have several use cases,
actually 3 use cases.
A data type can be used in the
persistent schema.
It's in a great table, you can see
a data type.
But also important, a data type
can be used during the transformation.
And I'll give you an example there.
Like the Boolean type, we use most
of the time just during the
transformation.
Because in the WHERE clause, in
the ON clause, in the WHEN clause
of case, and so on, and so on,
and so on, we have Boolean expressions
and we're using Boolean values
just during the transformation.
It's not coming from the base table
from the disk.
It's not going up to the client
through the wire, just for the
transformation.
That's the second use case for
data types.
And the third use case is the API
layer.
So meaning how do we communicate
with the client?
And of course we think about a
tabular representation.
Yeah, of course, of course, of
course, and maybe a normalized
1.
But here, I say, no, the normalization
is fine for the disk,
but it's not so much fine for the
API layer.
And here I love JSON.
Yeah.
Because if you think of a tag system,
you have some entities
and you can add tags on them.
Then obviously the normalized way
to do it is to have 2 tables.
I still think that's a good approach
in many, many cases.
But on the API layer, if you join
them, you will get multiples
of the entity data, which is meaningless.
That doesn't make sense.
But here comes basically the JSON
array_agg() and you aggregate
the tags into a JSON field that
you then give to the client API.
And then you just keep the 1 row
of the entity and have the list,
a list, yeah, of text there.
And here you see, this is where
I personally think that that's
the most important use case for
JSON.
It's not about storing JSON on
disk.
Yeah, that can make sense sometimes
as well, no question.
Yeah.
But the main use case I see for
JSON is on the API side.
Michael: Yeah.
Or for what it was originally intended,
right?
Markus: Yeah, maybe.
But what I see the people when
they say, yeah, we're using JSON,
what they mean is they store it
on disk.
Michael: Yes, yes.
Good point.
Awesome.
Well, is there anything you'd point
people at?
Have you given a talk on anything
like that before?
Any good materials for people?
Markus: Not so recently, you know,
there was this corona
break.
So no conferences.
I'm still not talking a lot in
public like there was another
podcast recorded a German one like
a few months back, but that's
it.
So follow my blog, modern-sql.com
and also use the indexloop.com.
This is where I publish my updates.
And what you can expect is basically
these matrices, who can
do what, use cases for the features
that I test.
That's of course also important.
What is the benefit of knowing,
okay, that feature does work
in that system, but I don't know
what to use that feature for.
Yeah, well, that's what you can
expect on a more or less regular
basis like once a month or twice
a month, something like that.
Michael: Nice, and I think I just,
might have even been this
morning, was reading your half
year update as well, which was
cool.
So yeah, highly recommend those
resources.
We've shouted them out a few times
on the podcast in previous
episodes, and your book as well,
SQL Performance Explained was
fantastic.
So thank you for that.
Actually on that note, when's Modern
SQL the book coming?
Markus: Yeah, well, I started in
2015 with the intention to make
it a book.
Michael: Oh really?
Markus: Yes, that was the original
intention.
So when I did the other book, the
SQL Performance Explained,
it also started as a blog and then
was later on put into a book.
And the blog is still available
for free on use-the-index-loop.com.
And the idea was to do the same
thing with Modern SQL, but then
happened what I've already told,
something which I thought was
static became very dynamic.
Michael: Yeah.
Markus: And now, yeah, it's hard
to catch up, but there is progress
and there will be a book.
Nice.
I'm still working on it, I'm using
the draft as material in my training.
So it is getting better but there
is no definite timeline yet
for it.
Michael: As should always be the
way with things in the future.
Well, good luck with that.
Thank you so much for coming on.
It's been a pleasure having you.
Best of luck.
Markus: Yeah, thank you and thanks
for having me.
Bye-bye.