Postgres FM | Transcript: jOOQ

December 13, 2024 • 51 Minutes

jOOQ

Michael: Hello, hello and welcome
to Postgres.FM, weekly show

about all things PostgreSQL.

I am Michael, founder of pgMustard.

I'm joined as usual by Nikolay,
founder of Postgres.AI.

Hey Nikolay.

Nikolay: Hey Michael.

How are you?

Michael: Nope, not falling for
that.

And we are joined by a wonderful
guest, Lukas Eder, founder of

Data Geekery and the creator of
jOOQ, a very popular framework.

Thank you so much for joining us,
Lukas.

Lukas: Thank you for having me,
Michael.

Michael: It's our pleasure.

So, how do you describe jOOQ and
what's it helping folks with?

Lukas: So, jOOQ is an internal
domain-specific language written

in Java where it models the SQL
language directly as a Java API.

This helps Java developers to write
SQL queries more efficiently

and more correctly within their
Java applications.

There's a code generator that generates
the entire schema as

Java objects and you can interact
with those objects and create

type safe queries.

That's the base of it.

Michael: Yeah, it's super cool.

But I'm super impressed with how
much you focus on developer

experience in general.

I hear quite a few people say they
really love jOOQ and they

give about 15 different examples
of things they love about it.

But what are the main things you
think that people particularly

like about it?

Lukas: Well, the first thing is,
and I've studied the market

extensively before making jOOQ
and the first thing is it really

looks as much as SQL as it's possible
to do when you model an

internal DSL API in Java.

So, of course, there's some limitations.

But when you have in mind there's
a group by clause, you just

start typing group by where you
expect it, and then it just starts

compiling.

So there's no surprises.

There's almost no surprises in
terms of SQL syntax.

So if you know SQL, then you immediately
know the jOOQ API.

You don't really have to learn
it.

I guess that's 1 of the most popular
things, But other than that,

it's also very nicely integrated
into the Java language or Kotlin

and Scala, that works as well.

And if you've ever coded PL SQL
in Oracle or PL/pgSQL in Postgres,

You kind of like the way how the
SQL language is embedded in

the procedural language.

So if you loop, if you iterate
over a result set in those procedural

languages, you have the same kind
of type safety and you have

the possibility, for instance,
to insert bind variables into

your statements at the right spot,
and you don't have to worry

about the infrastructure and the
logistics of connecting to the

database, and it just feels embedded.

And I think that's what a lot of
people really want to do, even

more than having the type safety,
is this feeling of embeddedness

into the target language.

And when you map your result sets
to Java objects, that also

feels very native.

So it feels like the database is
part of the Java of the JVM.

Michael: Yeah.

Even to the point of getting like
auto-complete suggestions in

IDEs, but like that is super nice.

Lukas: Yeah.

There are little details like this.

So when you say autocomplete, there's
also possibility to comment

on your database tables and comments
and that translates to Java

doc.

So whatever you comment directly
in the database, it's translated

to documentation inside of your
job program you hardly ever have

to go to the database to see what's
there so.

Michael: Nice.

Lukas: Yeah.

Michael: In terms of the history
how like I did look it up in

or at least the GitHub history
How long have you been working

on it and how does that feel?

Lukas: I think the first public
version started in 2009.

There were prototypes before that
in 2008 I think but 2009 was

the first public version, not on
GitHub then, it wasn't first

port.

I don't think GitHub was very popular
at the time.

And then Subversion.

That's how old it is already.

I had the idea already back during
my university studies at EPFL

in Lausanne.

There was a professor who said
you could model any kind of programming

language as an internal DSL.

I'm not sure if you use that term
yet.

I think DSL was more coined by...

More recently, but he said you
could model any language in terms

of an API if you just have Java
interfaces that return interfaces

that are return interfaces.

And this kind of idea stuck with
me for all this time, but I

didn't have a use case back then.

And when I started working in the
industry, in 2006 was my first

Java job.

I did some PHP work before, but
Java started in 2006.

I've seen that all the companies
that I've applied to, and even

the 1 that I worked at, they implemented
dynamic SQL libraries

because this was, this was something
that everyone needs, right?

So everyone has dynamic SQL queries.

When you have a UI with various
input boxes and they're all optional,

so you have to compose your SQL
queries in a dynamic way.

And everyone wants to avoid string
concatenation because it's

not safe and it's boring.

So they did that for both, they
had one for SQL and they had one

for Hibernate query language, which
is essentially the same thing,

just more limited, but at the same
concept.

And no one, no one actually thought
about making this a true DSL.

It's well, it was always just a
query builder with some weird

API.

So you had Java style methods where
you add clauses to the query

object.

And it didn't feel like SQL.

It felt like a tool to solve exactly
this dynamic SQL problem,

but it didn't feel like you didn't
enjoy using this thing.

And I think JPA still has this
thing with their criteria API,

which is still very useful if you
want to do dynamic JPQL.

But I've never heard anyone enjoy
using that API because it's

just, you have to learn it.

So you have to learn one more thing
and it only serves this one purpose

or at least that's how, what it
feels like.

It serves this purpose of dynamic
JPQL.

And that's when I started prototyping
it.

And in the very beginning, I had
to first implement the infrastructure

wasn't very diesel style API either
so I created the query builder

that could do some select from
where very simple conditions But

then I started to really implement
this idea because I remembered,

okay, this professor, he had this
idea and I'm going to try this.

Is it really possible?

And it turns out it was.

And it's really crazy how many
SQL statements today we have in

Juke that are all using this kind
of approach where you just

start with the same object and
then you start typing and you

auto complete your SQL syntax and
it's all one huge graph of methods

that helps you construct your SQL
queries.

So Juke has come a long way since
2009 I'd say.

Michael: Yeah, right.

I was looking in your documentation
about how many different,

how many different, you call them
families of SQL dialects that

you support.

So how many different databases,
but also the different versions

of different databases that may
or may not support different

syntax and SQL features.

How do you maintain that?

Lukas: Well, first off, luckily
the database products are all

very, very backwards compatible,
If we take one or two of them, we

don't really care as much, but
most of them really value backwards

compatibility a lot.

So supporting new versions is just
looking at the new features

and adding support for them, but
the old stuff still works.

So at least that's already covered
there, more or less.

But other than that, I mean, supporting
so many dialect per se

is, yeah, it's a lot of work and
a lot of testing, of course,

a lot of automated regression testing
to make sure nothing breaks.

And also a Juke really values backwards
compatibility a lot,

just like the database products.

So when you embed the joke query
into your database application,

you don't want stuff to break.

So if I'm not allowed to break
stuff, I'm probably not breaking

stuff.

So this kind of simplifies maintenance
as well, because stuff

that once works is probably not
going to break just like that.

I mean there's some internal refactorings
but there are no really

major, how to put it, major paradigm
shifts where everything

works in a different way now and
I was breaking everything.

So with automated regression tests,
it's not...

It's hard, but it's not that hard,
I think.

The hard part is just to find out
some weird syntax that implements

a standard feature on this particular
dialect.

And that's quite creative at times,
especially with the newer

dialects that have just been released
recently that are not that

mature yet, or Some dialects are
very opinionated.

So you kind of have to think in
terms of why did they make those

decisions and how can I map standard
SQL onto this database?

But I think that's 1 of the things
that people really like when

they work, especially when they
work with different database

products.

So if you have a company that works
with maybe Postgres and SQL

server, you don't actually have
to think about the differences

all the time.

So you can just write substring
in the jOOQ API and you don't

care if it's subster or inster
or whatever the database product

calls it.

It's just top string and jOOQ will
translate it for you.

Nikolay: But there are differences
as well, right?

And JSON is a big topic, right?

Lukas: Like huge.

JSON?

Nikolay: Yeah.

Yeah.

Those functions and yeah.

Lukas: It's crazy.

I mean, Postgres was 1 of the first
to actually support JSON,

but it didn't actually

Nikolay: before standardizing

Lukas: stuff.

So someone has standardized it,
Oracle did a couple of years

ago, and now Postgres started implementing
those standards from

what I've seen.

I've still not engaged with that
part yet, but you think now

that there's a standard, things
have stabilized, but still every

Database product does it differently.

It's crazy.

The devil is in the details.

Nikolay: Yeah.

And have you, like, do you need
sometimes to check the standard

or like you don't need it at all?

Lukas: I do it

Nikolay: all the time.

Lukas: Yeah.

Okay.

I'm trying to, the Juke API, if
it's not about a vendor-specific

feature, the Juke API really tries
to follow the SQL standard

in terms of syntax.

So a vendor-specific feature would
be Postgres on conflict clause,

which I think was a mistake.

They should have done merge from
the beginning and not invent

something new, but now we have
both.

And in that case, Juke also models
the on-conflict part like

Postgres did, and then SQLite copied
it and a couple of others,

I think.

But for instance, merge is a standard
SQL feature and many database

products have some extensions which
I then study, but I always

first look at the standard.

What does the standard do?

Because that's a thing that's going
to still be there in 50 years.

And different implementations might
have their quirks, which

sometimes I hope they deprecate
eventually and move towards the

standard.

Because usually I think the standard
is quite nicely...

Nikolay: Well, they behave differently,
so many people got used

to on conflict.

Lukas: Yeah, I get the point of
doing on conflict.

I mean, it's simpler for 80% of
the use cases, obviously.

Nikolay: Yeah, also merge only recently
received support of returning

clause, which is not standard at
all, right?

As I remember.

Lukas: The SQL standard has the
table, the data changed out of

the table, which was implemented
by DB2.

And then some minor database products
like H2 implemented as

well.

And it's not as powerful as returning,
especially because it

only allows you to fetch either
the data before or after the

updates for update statements,
which is also part of merge.

You maybe want to return both versions
to the version before

and after the update.

With returning in principle, that's,
that's a feasible.

So Oracle implemented this recently.

I'm not in follow.

Did Postgres allow for accessing
both versions of the row?

Nikolay: I don't remember as well.

I'm only like returning, well, if you DELETE, definitely you

have ability to return old thing you deleted.

Lukas: Well, there's only the old thing.

Old thing is deleted.

Nikolay: Let me quickly check.

Lukas: Why not?

But with updates, I mean, SQL Server always supported both versions.

So before and after.

And DB2 with the standard syntax supports only, you have to choose

which one, but you can't return both.

And Oracle 23 AI now supports both as well, like SQL Server.

Oracle uses the Postgres syntax, the non-standard syntax also

with the returning keyword, so maybe you can start from there

if you don't do it yet in Postgres.

Nikolay: I think in Postgres for updates we cannot return old

data.

We have access to it in triggers but it's different, right?

So only the new value can be seen there, but for deletes, it's

Lukas: probably more useful.

Nikolay: Well, I can imagine some cases when we want to report,

for example, what happened.

Lukas: Yeah.

When you do auditing, you want to have both.

Nikolay: Yeah.

Maybe let's add it.

Because

Lukas: standard supports this data change delta table, and it

accepts a merge statement as well.

So in principle, in the standard, you could have returning with

a different.

Nikolay: But it feels like more like a heavier approach in terms

of manipulation.

Returning is just like one line, right?

Return a star, that's it.

Lukas: Yes, yes.

Super easy.

It seems easier, yeah.

Also, there are tons of limitations.

I mean, you put your data change delta table inside of a SELECT

query, but then you can hardly do anything with that SELECT query

for instance, I'm not sure what exactly what is forbidden, but

unions I think are forbidden and joins are forbidden, so maybe

even aggregations.

I'm not sure anymore, but you're very limited with what you can

do.

So I'm not sure if there's any benefit with allowing that in

the SELECT statement as opposed to just returning.

But ultimately it's kind of the same thing as what Postgres does

when you put the ML statements in, in with.

So that that kind of behaves the same way, at least from a user

perspective.

Nikolay: Speaking of with, I know Juke supports CTEs.

Does it support recursive CTEs?

Yeah, yeah.

Yeah.

So everything, lateral join and so on, everything.

Yeah.

Cool.

Lukas: There's a high chance if SQL supports something and it's

not really very funky like the match_recognize clause, then Juke

will support it as well.

So match_recognize could be supported as well in Juke, but it's

such an edge case and still only supported in Oracle and some

very esoteric database products that Juke doesn't support yet.

So, so I'm skipping this for now, but with is, is everyone uses

with, so Juke supported as well, including recursive with, I

mean, there's some table valued functions that are like generate

series in Postgres, which have to be emulated elsewhere.

So if you want to ever, probably hardly anyone does that migrate

from Postgres to whatever, and you want to translate your old

generate series queries, then Juke will translate that to a recursive

with.

Michael: Cool.

Nice.

In fact, I feel like you skipped over that a little bit, but

that's such a cool feature that basically...

Translation?

Yeah, exactly.

We can get access to standard features that Postgres hasn't implemented,

or that whatever database we're using hasn't implemented yet

because you do the transformation, I guess.

How do you, translation did you say?

Lukas: Yeah, I call it translation.

So there's even, you can use Juke as a purely translate product.

So some customers actually do that.

Mostly when they migrate from Oracle to Postgres, you can just

embed Juke as a JDBC driver into, into your Java application.

And you don't actually have to use Juke directly.

So let's say this is a legacy application and uses JDBC or something

JDBC based like Hibernate or MyBatis, and you just put Juke

in the middle as a translating layer and probably 80% of your

queries will work directly on Postgres.

Of course, there's always edge cases and you have to do testing,

but at least your migration project will go much, much faster

because you can directly use Oracle queries on those other database

products.

Like for instance, Oracle has connect by and that can be translated

to with queries with recursive.

That's very hard.

So some cases don't work, but maybe many cases do.

And you don't have to actually look at those anymore.

Or Oracle special outer join syntax can be translated to left

joins in, in, in standard SQL.

So your migration just goes much faster.

You can also use it not as an embedded library, but just as the

web, the website is free or you use it locally as a CLI tool,

which is to have files, you have input files and Juke translates

those directly.

And you can manually check if it's correct.

So translation is a, is a, I'd say minor use case.

So a lot of companies work with only 1 database product and maybe

10 years, they, they might reconsider their choices and migrate.

But even then most, most companies stick with the database product

they started with.

And so many Juke users don't actually use this feature, but some

do.

Nikolay: May I ask slightly off topic question about JDBC since

you spent a lot of time with Java, obviously.

There is such thing as extra float digits.

Right?

Lukas: Extra what?

Nikolay: Extra float digits.

Okay.

There is such setting, and by default it's 0 in Postgres.

Or 1.

Yeah, no, 1.

It's 1.

Lukas: I've never heard of it.

Nikolay: Okay, but yeah, I remember some bugs in my code, and

I usually try to use pure SQL and psql, and you know, if I can.

I Remember that Java developers implemented the very same thing,

same queries, behavior was very different.

And then I noticed that the DBeaver, who else, other Java-based

IDEs also behaved the same way Java application did.

It was not the same as I saw in psql.

It happened with these extra float digits.

If you didn't notice this, this is maybe some very edge corner

case.

But I wonder like, why did the JDBC keep different default is

like no layers of some defaults.

This can be very annoying.

But okay.

Lukas: Well, the main thing that you always see The main difference

when you work with JDBC is, as opposed to psql is, in psql you

usually don't work with bind variables, so you don't have the

whole kind of effect that you get when you have bind variables,

although in Postgres This might not be that big of a difference,

but when you work with JDBC, you always have prepared statements

and bind variables.

And let's say my background is
mostly Oracle.

So with bind variables, you have
an execution plan cache and

the cache will store some assumptions
about your bind variables,

like a normalized distribution
and expectation regarding cardinalities,

et cetera.

So your actual bind variable might
be, if your dataset is skewed

and your actual bind variable may
be completely different from

what Oracle expects.

And, and then the execution plan
will be bad.

But from what I take, Postgres
doesn't have this problem because

it doesn't have this feature.

So this would be 1 of the main
differences.

But you were referring to some
logical error, not...

Nikolay: Yeah, Default and values
returned are different from

what I see in PC.

Yeah, but

Lukas: I've never heard of this.

Nikolay: Well, okay, maybe it's
already fixed by the way.

It was like 5 years ago or so I
saw it.

So, yeah.

Well, on the topic of JDBC,

Michael: I mean, do you have any
issues with it?

It seems like it's a remarkable
piece of software that's just

chugging along, but equally it
doesn't get much love.

It doesn't get much love.

Like I only, I generally only hear
negative things about it,

but it must be amazing given how
much it's used.

Lukas: I think it's the best API
that Java has, even better than

collections and everything else.

It's really, I mean, okay, maybe
they stole it from Microsoft

from ODBC, it's kind of the same
thing, but It's really the best

thing that Java has because it's
such a good standard.

I mean, everything can build on
top of it and you don't even

have to write SQL queries.

So there's, for instance, Neo4j,
which is a NoSQL database with

their own query language and you
can use JDBC with that database

as well as long as you wanna have
some tabular results And it's

a very, very good abstraction of
the network protocol layer.

And the reason why people hate
it, it's not targeted at developers.

So you can use it directly if you
want, But it's a bit hard to

use.

You have a lot of objects.

You have the connection object
and you have to create a statement

object and you have to remember
to close it.

And you have to remember to create
a result set object and close

that as well.

And it's not integrated with the
rest of the JDK libraries.

Like for instance, when you iterate
the result set, you have

your own API methods instead of
just a list or iterable.

So you can't use the for each loop
on a result set, for instance.

So the ergonomics of JDBC is not,
not up to date.

I once tried to convince the spec
expert group, it's a mailing

list to, to update their API.

There's a couple of things that
I think in, in at the time it

was Java 8, but even now you could
improve a couple of things

without changing anything fundamental
in JDBC.

For instance, you can skip the
statement part.

A lot of times you don't actually
have to think about preparing

the statement explicitly.

You have to do it when you want
to reuse the prepared statement.

So you can save some time in case
there's a resource or some

cache connection directly.

But a lot of times that's not the
case in some database products

and a lot of times it doesn't even
matter at all.

So you could just have a connection
and run the query.

So that's a minor improvement that
would just reduce 1 step for

each statement you write.

Yeah, but Oracle couldn't allocate
any resources to such a project,

even if the people on the list
agreed that many ideas were reasonable.

So I guess that's it.

It's an amazing piece of software
for integration products like,

like Juke or Hibernate or MyBatis
and everything else to build

on top of and all the drivers they
just work.

So a database company can just
publish a driver and it will work

in all kind of software.

For instance, if you, if you work
with DBeaver, it just works with

all the database products, right?

So you don't have to think about
how do you, how do I connect

to this new database thing, like
the cows or, or DuckDB or whatever,

it just works.

All these database products, they
just work because they all

use the same API in Java.

So it's really a wonderful piece
of technology, I think.

So I totally agree.

Nikolay: Yeah, this is exactly
where I saw this problem, comparing

to, I mean in DBeaver, comparing
to Java.

And I realized, okay, it's also
Java application, so yeah.

Good.

Lukas: Yeah, but that was probably
just the driver bug.

I don't think it has to do with
the database as an API.

I mean, obviously the drivers can
have bugs.

Nikolay: Right, right.

So You see bugs often, right?

I mean, in this...

Lukas: Yeah, not necessarily in
the drivers.

I mean, the drivers, they really
solve a very basic problem.

So it's mostly about network connections
and these kind of things

that no 1 actually cares about,
unless you have to optimize something,

like multiplexing or whatever.

But I think this stuff kind of
just works.

And when I find bugs, it's more
SQL related.

So it's inside of the database
and I do find a ton of them very

rarely in Postgres.

I have to say, I think in my career
I found 2 or 3 bugs in Postgres

only, as opposed to hundreds.

Michael: We didn't pay you to say
that.

Lukas: No.

It's really very good.

It's surprisingly good.

I mean, also, absolutely no regressions
at all.

So if there's a bug, it's an edge
case that no one thought about,

then it's really a very weird kind
of thing.

Nikolay: Yeah.

By the way, how do you check regression?

Just conversion from...

Lukas: In the database products?

Nikolay: Yeah.

For Juke, for example, some code
is written and then you just

check the conversion to SQL or
you check results as well.

Lukas: No, you can't just search
the SQL statement.

I mean, Juke's integration tasks
have a standard database, which

has about 5 tables.

It's very simple.

So there's a bookstore with books
and authors, and it's not a

big database.

So it has 4 books and 2 authors,
4 languages, this kind of thing.

But then I just know there are
4 books and they have these titles

and every kind of query has to
return exactly the same result.

So there are so many queries being
run in each database product,

maybe about 10, 000 queries, I
think, and also update statements

and all kinds of statements that
make the assumption that if

you have this kind of input database,
then this must be the output.

Or also, if I create a table with
Juke, then it has to have these

properties and throughout the API,
it has to be the same thing

for every database product.

Nikolay: Right.

I just wonder, earlier you mentioned
that usually database systems

maintain reverse compatibility,
but sometimes it's broken.

For example, in Postgres 12, CTEs,
before they were materialized,

like optimization fans.

So every step is materialized before
12, but in 12 default behavior

changed.

Right.

So, so is it something that Juke
should care about or no?

Lukas: I don't think that's a logical
regression.

I mean, it was a conscious
decision and it's only affects

the performance.

There's no logical difference of
the result.

So I mean, databases are allowed
to do these kinds of things.

I re I recall there was once a logical change in the UPDATE statement

in Postgres, which was a incompatible change when you UPDATE

a row.

So you had used the row syntax, you use parentheses And you UPDATE

2 Columns at once and suddenly both required the row keyword,

but only there.

This was a very weird.

Nikolay: 1 more recently, 1 more change related to subqueries.

I don't remember exactly.

It was a demonstration included in the generate series, I remember.

And yeah, maybe it was like up to 5 years ago, but there was

some change which was logical as well.

But I understand, you're like SQL standard, don't care about

Indexes, care only about results.

Yeah.

Right, I see, I see.

Lukas: No, I mean, a juke doesn't make any guarantees at all

with respect to performance either.

So, so from a Juke perspective, Juke doesn't try to fix your

SQL.

If you write a really crap query with, I mean, crap, what does

it even mean?

So if you write a bad query with a lot of nested subqueries,

et cetera, et cetera, in principle, there's nothing wrong with

it.

Right?

So there's, you will find hundreds of blog posts that, especially

from, from old times when optimizers were really bad still, where

people advised against using derived tables, then they advised

against using correlated subqueries, and then, but there's nothing

logically wrong with these things, right?

So you can nest as many levels as you want, as long as it's correct,

it's correct, and the optimizer should figure it out.

And Who is Juke to judge you for your query style?

So Juke doesn't fix these things or even assert anything, and

it would be completely wrong for Juke to do these kind of things,

because you want to write exactly this SQL, and Juke should render

exactly this SQL.

You have your reasons, right?

So...

Nikolay: I understand.

And my question is, I'm curious, how do people usually approach

performance?

For example, if some query is written using Juke with a chain

of SELECTs from blah blah, And then it turns out it's slow.

And then we find the fix, and we need to adjust something in

the query if it was a rock SQL.

Is it usually simple to adjust a Juke version of.

Lukas: Yeah.

I mean, there's no difference.

I mean, you're probably going to
use the, I'm not, I'm not sure

anymore what the Postgres version
is called, but you have your

statistics views in your database,
your performance schema, and

you query those to see what's,
what went wrong and then you make

assumptions, you rewrite the query
and you test again.

Maybe even using a benchmark, of
course then the benchmark is

written in Java, but I don't see
a difference when you write

jOOQ query or native SQL query,
It's the same process.

Michael: I was watching a really
good video by Kevin Davin, who

did a talk on jOOQ, but 1 of the
things that he really liked

about the product was it logs out
to the console, to the actual

SQL statement as well that it ran.

Lukas: When you do a debug level.

Michael: Yeah, which is super helpful
for performance.

So if that was slow, you can then
run it in your favorite editor

in PSQL wherever with EXPLAIN,
ANALYZE, et cetera, et cetera.

BUFFERS.

Yeah, always BUFFERS, hot topic
this week.

But yeah, so that seems like a
really developer friendly way

of getting back to the SQL and
then diagnosing the performance

issue.

Lukas: This was a feature from
day 1.

My assumption was that when you
develop, you have debug mode

logging.

And when you go to production,
you switch that to info or even

warning.

So you're not going to have the
performance penalty of re-rendering

the query.

It's even formatted, so you can
actually see the query.

It's not just 1 line, which is
when you execute the query, it's

just 1 line of SQL string, so to
have it more compact.

But when you debug log it, the
version contains the bind variables.

So you can just copy paste the
whole query with the bind variables

in a formatted way and study it.

I thought that was what everyone
wanted.

And Also you get the first 5 records
of the result set also in

the debug log, because that's probably
also what you want while

you develop at least.

Michael: Yeah.

In a pretty little table.

Nikolay: It's nice.

Lukas: So you can immediately see
what's going on.

And even then, I mean, 1 of the
things, 1 of the reasons why

I implemented it this way is when
you have a huge result set

with a lot of columns, that bothers
you while developing.

So you kind of start thinking,
do I really actually need all

these columns?

Because if you don't, then you
have a better debug log.

So I kind of think you have to
punish developers as early as

possible for their performance
problems.

I mean, jOOQ doesn't judge you.

It just gives you some tools.

Nikolay: Just giving you some convenience
looking at logs.

Okay.

Yeah, that's interesting.

So, reduce number of columns.

Good.

Michael: Changing tact a little
bit, I think you've got a really

interesting perspective on Postgres
from a broader landscape

of SQL dialects and databases in
general.

What kind of things do you see
that are relatively unique to

Postgres?

Or do you see people in the Postgres
world particularly using

or liking?

Lukas: I mean, the 1 thing that
is very unique to Postgres, there

are 2 things from my perspective.

The first 1 is it's very developer-centric.

So you can see that with other
database products like Oracle,

DB2 or SQL Server, they're very
production centric, very operation

centric.

They used to base their entire
sales on these kinds of things.

So this means you have a lot of
production tools as well, which

are great actually.

So I'm still waiting for the like
of Oracle Enterprise Manager,

if you know it, for Postgres where
you can analyze production

workloads and, and, and query what,
what kind of workload you

had 5 months ago, and it's still
there.

It is still happy.

I mean, you have to pay for these
kinds of extras, but it's still

there.

You can still analyze anomalies
in terms of performance, what

happened 5 months ago.

So these kinds of tools are a bit
lacking, but on the other hand,

because it's so developer focused,
you have this whole extension

system, which is unprecedented.

And I haven't seen it since either.

So anyone can extend anything in
any layer of the, of the database

product.

So you have these wonderful extensions
like PostGIS for instance,

which is, It's not really an extension
because it's also part

of the SQL standard, but it has
been implemented as an extension

just, I guess, to show what is
possible and such a huge additional

product can be made as an extension.

And you have indexes and data types
and all kinds of things that

other vendors have to either offer
out of the box or they don't

have it.

You can't extend Oracle really.

So that's really something very,
very amazing.

But as I said, from the other perspective,
from an operations

perspective, this is something
that personally, I think Postgres

is still very much behind Oracle, for instance.

So I'm an Oracle guy.

I can't really comment on SQL Server or others, but the Oracle

database where I worked at before I made jOOQ, it could easily

handle very, very complex queries that produced 500 lines of

execution plan on billions of rows, and it would run in milliseconds.

And I don't think you can do that with Postgres.

I wouldn't risk it.

So we all remember the days when there was a hard limit of number

of joins in Postgres, from which, starting from where, you didn't

have any smart optimizer anymore to reorder the joins in the

right order.

I forgot what the limit was, but I think you had to be very careful

if you have more than 10 joins or something like that, because

then it would just join from left to right syntactically, and

this is horrible.

Nikolay: Join collapse limit should be 8.

Lukas: Yeah, that's the join collapse limit.

So, once you know these kind of things, obviously, make the right

choices, and you probably won't run into big issues, but it's

just, it's a very unique focus that I often missed in the past

from Oracle, for instance, or completely ignored developer experience

for decades.

It was like, you don't even have a Boolean type, right?

So this is hard for me to understand.

It's such a simple thing.

You think it's such a simple thing.

So very, very useful, but Oracle wouldn't have implemented it

until recently.

So this is a focus that Postgres has and few of the others had,

even MySQL wasn't very developer friendly from that perspective.

So this is very unique.

If developers were the only ones to choose the database product,

it's always Postgres, because I mean, you're just gonna implement

everything with Postgres.

You can write stored procedures, you can have your own indexes,

you can extend it yourself.

Probably you don't have to because someone already wrote an extension

for whatever you need.

Oh, the second thing is, it's amazing how standards compliant

it is.

So a lot of times, Postgres really waits until something is standardized

before they implement it instead of innovating.

Nikolay: Right now we are waiting on UID version 7 because of

standards.

Like it didn't get into version 17 of Postgres because RFC was

not finalized.

Every library already started to support it, but Postgres decided

to wait on RFC, which is like very conservative decision, right?

Lukas: Most of the times.

Yes.

So I think that's a good thing
in the end, because the standard,

in my opinion, in most parts is
well-written.

It's a very complicated document,
of course, but I think in terms

of syntax, it's kind of everything
feels like SQL.

So some database products think
they have to invent some syntax

And it just doesn't look like SQL
anymore.

It's just very weird, funky thing
that doesn't fit the rest of

the language.

And the standard is very, very
consistent, I think, in most parts.

And to wait for the standard to
appear is a good thing.

So sometimes there's an exception
to the rule, as I said, on

conflict was 1 of these exceptions.

JSON as well, in case of which
I think it was something everyone

needed and the standard was severely
lacking.

And Postgres just did their own
thing.

So I kind of think a lot of people
kind of regret the heterogeneity

of the JSON API.

So you have to always consult the
docs to see if something uses

a JSON path, which is a standard
thing in the JSON world, or

if it's just an array of path elements,
or if this function returns

a table or a set of something,
or whatever data structure.

So I think now with the JSON standard,
this is going to be much

cleaner once it's completely rolled
out.

So I'm not up to date where the
Postgres stands.

I'm still not supporting that syntax,
but from what I've seen,

it's going into the right direction.

Nikolay: Yeah, some parts support
it, some parts I think are

still bending.

Yeah.

Yeah, because I

Michael: think A lot went in in
17.

Nikolay: Right.

Lukas: Yeah.

And speaking of this, I'm going
to be really curious.

I saw a commit recently regarding
temporal primary keys.

Is that correct?

And it is absolutely not the standard.

So this is very curious.

I'm very curious about this.

Are you aware of this commit?

Michael: I'm not.

Lukas: It's still on a branch.

I think it's not, it's not going
to be merged, but I mean, Postgres

has these range types.

Right.

Nikolay: Right.

Lukas: And they're useful obviously,
but the SQL standard since

2011 defined how you should write
temporal databases, both by

temporal.

So you have system versioning and then logical versioning.

And this is something that is still lacking in Postgres.

And I think a lot of people would like to see that.

Nikolay: Yeah.

In Postgres, there is an opinion that it's in the past, because

it was

Lukas: in the past,

Nikolay: it was removed.

Michael: So that's a nice temporal joke.

Lukas: An extension I think?

Michael: It was in Postgres, the original Berkeley version.

Nikolay: This is how MVCC inherited from these versions

of tuples.

So your time travel basically was supported very, very

long ago, but then it was removed.

So like interesting idea.

Michael: I think for

performance reasons.

Nikolay: Yeah, So I think so.

But yeah, this is a super interesting, like specific topic.

Sometimes we do need to implement something with triggers and

so on, like, which is very

Lukas: Yeah.

Yeah.

It's very hard to do manually.

I think the standard helps a lot with, with, I mean, you can

write an update statement and it's automatically transformed

into a delete and insert.

So if you have to split the record logically, this is really

a pain to do manually, right?

So it's

Nikolay: not super pain because you can have, for example, Postgres,

you know, like I remember your very old post, everything is a

table.

Or maybe it was a post of some 10 tips or something, and 1 of

the tips was, consider everything as a table.

In Postgres, you can collapse all columns into a single record

type and insert.

For example, in JSON form right now, I would insert it and I

would have a shadow table.

I implemented this several times.

You have a shadow table which serves as like a guy for all tables,

which has, has trigger to track changes.

Right.

And then you, you just track timestamp action, like it was delete

or update, who did it, and then whole row you pack it in a single

column, breaking first normal form.

Lukas: But then you normalize in a very bad way, and secondly,

this sounds like a lot of work.

Nikolay: It's not a lot of work, it's like 3, 5 lines of code,

that's it.

But it's not possible in jOOQ 3
version because triggers are

not supported, right?

Triggers are supported only in
Enterprise version.

Lukas: I mean, if you want to write
the trigger with jOOQ, yes,

then you have to use the commercial
additions, but hardly anyone

writes triggers with jOOQ.

I mean, this is only useful when
you have to have, when you support

5 database products and you want
to have, you want to write the

trigger only once you write it
with jOOQ.

But I mean, jOOQ doesn't care if
there's a trigger on the database

table, right?

So if there's 1, so it's transparent
to jOOQ.

Yeah, okay.

You can't really say jOOQ doesn't
support them.

It's just, you can't create them
with jOOQ.

Nikolay: Great answer.

Free version, you mean like an

Lukas: enterprise version?

Free version, yeah.

Nikolay: Yeah.

I'm very curious.

I'm not against this.

It's good.

I mean, it's great that you build
both like open source product

on Apache tool license and business.

I was just curious, like, have
you ever had thoughts about going

full closed source or fully open
source?

Well,

Lukas: it was fully open source
in the beginning.

So when I started this, I had no
idea this could be a business.

I mean, I just had to try it and
then put it out there for people

to see and give me feedback.

Is this a good idea?

Should I do it like that?

And there was a very active early
community of people who worked

with the all open source jOOQ.

And then I started seeing even
more and more banks and insurance

companies using it, but never contributing
as you know, the story.

And I thought it was my fault.

Nikolay: I get it.

Pong Jones, let's, let's generalize
Oracle users.

Lukas: Well, that was the choice
I made, but, but it doesn't

matter.

You know, It doesn't matter what
they're using in the end.

I just never got any...

I mean, I got contributions, but
I didn't care about code.

I cared about money much more than
code.

So, to make them a sustainable
business, in the end, I thought...

There are models where you pay
for support, but in my opinion,

if you want to make money with
support, then you have to earn

your money by working for it.

So I earn money when I work, and
if I have a licensed model,

then I earn money when customers
work, right?

So I thought this was going to
scale much better.

Michael: I like that a lot.

I also think if you charge for
support, you're incentivized to

hurt the experience.

Lukas: And I think, but you're
going to say that, but that's

how I think.

Michael: But I think, I think it's
not surprising that as somebody

that focuses so much on developer
experience and wants to keep

pushing that forward and making
that good, it's unsurprising

that you don't want to be incentivized
to make it bad like that.

It makes perfect sense to me.

Lukas: Yeah.

I mean, that's a, that's my thought
as well.

I, I don't want to name companies
or names, but I've heard some

open source products are very,
very unusable and you really need

the support and not all of them.

Of course.

I mean, Postgres and Linux are
examples, which aren't, which

this isn't the case, but there
are some products that are super

complicated and they're just pushed
out there and you're free

to use it, but you won't figure
it out.

So you have to call someone even
to install it and that's what

I'm thinking.

So in those cases, you need a sales
representative that pushes

the product down someone's throat
and they make a lot of money

with the support contract.

And I think it's less honest than
licensing, which people think

about, okay, is this a good product?

Should I really spend money on
it because I have to pay it up

front right from the day 1 I have
to pay money to use this product

and they're gonna test it really
really really well and there's

a lot of competition so I mean
people could use hibernate or

whatever but people choose juke
despite the cost which in the

end is very cheap for a company.

So I think cost is never the issue,
but, they're still going

to evaluate it because they have
to still go through purchasing

and make it a business case for,
for, for the purchase.

So I really want to have a very,
very good experience.

And 1 part of this is I also always
answered all the questions

on Stack Overflow within a short
period of time, because if I

answer it, it's gonna be authoritative
and I'm gonna answer with

all the details and all the caveats
and all the edge cases.

So anyone else who finds the question,
we'll find this question

and the answer.

And they're not going to ask me
again.

Right.

So Google is my friend and now
Chat GPT is my friend as well,

because Chat GPT can answer jOOQ
questions very well because

of all the answers I've given on
Stack Overflow.

Nikolay: Yeah, but open knowledge,
right?

Lukas: Yeah.

I don't want to earn money with
knowledge.

I mean, I could sell books and
all these kind of things, but

it's just the wrong kind of effort,
I think.

Nikolay: Destruction, right?

Yeah.

From the core.

I wonder for enterprise, like paid
versions, is source available

or it's like kind of

Lukas: yeah you get the source
code

Nikolay: well that's great

Lukas: the right to modify so you
can if you have a bargain and

Nikolay: you can fix it yourself
and propose a fix.

Lukas: Or maybe it's not even a
bug, but you think it's a bug,

but you disagree, you can fix it.

And some companies do that.

So this

Nikolay: is what we like in commercial
software, like full transparency

and see how it works

Lukas: inside.

I mean, most people don't actually
fix stuff, but maybe they

want to debug stuff to better understand
it.

And if you've ever worked with
a commercial JDBC driver, like

I do, with the Oracle driver, it's
always like, Oh my God, there's

a null pointer exception in the
middle of the driver and I have

no idea what I'm doing wrong from
an API perspective.

So I have to reverse engineer it.

Not, not with the code, but just
try an error with API calls

until I find out why the exception,
why the bug inside of the

driver happens.

And with source code, it would
be much easier, right?

So bugs happen, but if you know
exactly why the bug happens,

then you have a much easier life
than if you have to guess.

Nikolay: Yeah.

Last question from me.

What do you think about nulls in
general?

Lukas: Nulls?

Nulls, yeah.

Or they're a blessing in a curse.

I mean, I understand why they have
been introduced, because it

was easy to do, right?

And I mean, There are 2 takes here.

So nulls in most languages are
mostly a pain because they lead

to exceptions.

And null in SQL is mostly a pain
because it doesn't lead to exceptions.

It's just a value like many others,
but it's a special value

and it just propagates.

It's these kind of philosophies
and both are right and wrong.

But what else would you want to
do, right?

I mean, you kind of need an absent
value, right?

I'm not sure if the idea of the
unknown value is so useful, but

an absent value is very useful.

I mean, sometimes you just, you
can't normalize everything to

6 normal form to avoid NULLs, right?

So it's very pragmatic.

Nikolay: It's very pragmatic indeed.

Yeah.

But in SQL, we have so many places
where people make mistakes

all the time.

Lukas: Yeah, SQL is weird about
it.

I mean, this unknown value, I used
to do a SQL training and it

was about 30 minutes about it was
just NULLs and how it's different

here from there.

For instance, 2 NULLs are not distinct,
right?

Nikolay: Yes.

Lukas: But they're not the same
either.

So there are different terms in
the SQL standard, same and distinctness.

And then even operators are inconsistent.

So personally, I think Oracle did
the pragmatic thing when they

concatenate a NULL value to a string,
you probably don't want

the result to be NULL, I think.

You would just want to ignore the
concatenation.

This is pragmatic, but it's very
unlogical and leads to a ton

of side effects, which probably
means that it was a bad idea

to start with.

Nikolay: And there are NULLs in
other languages, I mean, in regular

application languages.

And I remember how I struggled
trying to explain this 3 value

logic and like, observing what
people do in code, in regular

application code with NULLs.

They do things I would never do
like in SQL, right?

So different philosophies, right?

Lukas: Yeah, there are different
philosophies.

But I'm not sure if the SQL philosophy
is really useful.

I mean, I'm trying to explain how
it works by explaining NOT IN

in the NOT IN predicate.

When you have a NULL in NOT IN,
everything falls apart.

I mean, it's very logical and consistent.

You can explain it with 3 value
logic, but it's never useful,

right?

This kind of invalidates the whole predicate.

Nikolay: Yeah, it's a bomb.

Lukas: It's a bomb?

And I mean, usually people don't
put a NULL value in a NOT IN

list, but what about a subquery?

So NOT IN SELECT something, and
then you have a NULL value in

there, and you don't get results.

And it's not obvious from just
reading the code or the query.

there, and you don't get results.

Why?

It's horrible.

Horrible.

Nikolay: had the whole episode about no, so if Michael long ago,

yeah, yeah, let's stop because definitely it's not, it's not

going to go away.

And I, I'm quite sure I will make more mistakes with nulls in

my life.

Lukas: But maybe the 1 thing I really think SQL should have done

differently, and there are some database products that actually

do it differently, is the default data type should be not null,

right?

So when you create a table, When you don't write anything at

all, then it should be not null.

The fact that the default is nullable is wrong.

Nikolay: This forces you to define default.

Or, well, no.

Michael: The default word is confusing.

You mean, if I create id int8, I shouldn't have to say not

null.

Lukas: Yes, you should have to say I want this to be nullable,

because then it's a conscious decision and they know what you're

doing.

But the fact that maybe by accident you have something not null,

something nullable.

Yeah.

It's just wrong.

Nikolay: And this isn't standard, right?

Or if you remember?

Lukas: I don't remember.

I think maybe the standard says it's implementation specific.

Nikolay: If it's a standard, We definitely

Michael: can't change it now though.

Lukas: No, you can't.

But I remember Sybase is 1 of the database parts that does it

differently and ClickHouse as well.

Interesting.

Another 1 I forgot.

Nikolay: So in ClickHouse every column is by default not null?

Yes.

That's cool.

Okay.

Lukas: Although they messed it up as well.

So you can still insert null in a not null column.

You just get a 0.

What?

It makes it worse.

Nikolay: Okay.

Okay.

Lukas: That's terrible.

Amazing.

I think this is a performance idea.

You actually, if you just ignore
the check, you kind of don't

check for NULLs, then you're faster,
but whatever.

Michael: Lukas, I'm super conscious
of your time.

Is there any last things you wanted
to say?

Lukas: I hope there are many many
more exciting Postgres releases

in the near future.

I'm looking forward to all the
good stuff that's coming.

Especially if it's ever coming,
temporal tables from SQL 2011,

that will be very exciting.

Michael: Nice, okay cool.

Nikolay: Thank you for coming, I enjoyed.

Lukas: Thank you for having me.

Nikolay: Thank you.

Michael: Yeah, very much so, and catch you
next week Nikolay.

Nikolay: Bye bye, have a great week.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Guest

Lukas Eder

Founder at Data Geekery, Java Champion, and creator of jOOQ

jOOQ

Creators and Guests

Some kind things our listeners have said