Michael: Hello and welcome to Postgres.FM, a weekly show about

all things PostgreSQL.

I am Michael, founder of pgMustard, and as usual, I'm joined

by Nikolay, founder of Postgres.AI.

Hey, Nikolay.

Nikolay: Hi, Michael.

Michael: And we have a special guest today, Franck Pachot, who

is a developer advocate now at MongoDB, formerly at YugabyteDB,

which is a distributed PostgreSQL database, also an AWS Data

Hero, and Oracle Certified Master.

So, welcome, Franck.

Franck: Hi.

Thanks.

Thanks for having me there.

Nikolay: And former Postgres blogger.

Franck: Yeah.

Nikolay: No?

Or yes?

Franck: Oh, yeah, yeah.

I will continue to blog about all databases, it's just that it

depends on the time I have.

Nikolay: Sounds good.

So I saw you are going to give a talk at some Postgres conference

in India, right?

PGConf India, I don't remember the name.

So still planning to do it, right?

Franck: Yeah.

And also Germany.

I just got the acceptance.

Nikolay: I'm very curious.

During daytime, you work using JSONs and these weird queries,

right, chains of something.

And then at weekend or something you present SQL talks.

How is it going to be played in your mind?

I'm very curious.

Franck: It's all about databases.

I mean, it's all the same.

Nikolay: All the same.

Franck: Yeah, you can do data modeling, document data modeling

on Postgres.

You can do it on Oracle.

You can do it on MongoDB.

You can normalize your data on SQL databases, on NoSQL databases.

The concepts are all the same.

Of course, there are little differences, like how NULLs are undulied,

for example, or how you join or you don't join, but yeah.

Nikolay: NULLs, let's postpone.

It's a special topic.

It's not for the start.

Okay.

I remember a series of blog posts from Michael Stonebraker about

criticizing document databases for lack of normalization and

so on.

So you are saying now that it's totally possible to apply normalization

in document database.

Is this what you're trying to say?

Or maybe I'm getting wrong.

Franck: I've also changed my mind probably because for 2 reasons.

First, the applications have changed.

I think the normalized model was really good for those monolithic

databases where all use cases with the enterprise information

system in 1 database running all use cases.

And then you need a normalized way to structure the data that

is shared by the whole company and all kinds of users.

Today, it's a bit different.

You have multiple services, multiple microservices.

They might have different databases.

And then the concern of normalization may be different.

For example, if you consume data only to read it and not update

it, you can denormalize a bit more.

So that's 1 reason and I think the main reason is also the applications

have changed.

Today in application programming languages you use documents

in nested structure, objects, object graphs, looks like more

like documents so it's easier to move it to applications.

Nikolay: I don't get it because we had documents for forever.

For example, Codd designed relational model originally dealing

with banking systems, right?

In 60s, 70s, and it was not convenient to have nesting at that

point.

Before rational model, we know there were what's the name like

net and I forgot names, but basically closer to...

Franck: Hierarchical models and network models.

Nikolay: Yeah, yeah, yeah, exactly.

And the idea was it's really inconvenient when we keep a document

as a whole and we need to split it into pieces and basically

divide and conquer, right?

We split into pieces and that's how we get flexibility and start

working.

And we had documents at that time as well, like invoices or transactions

like between financial institutions and so on.

So I don't see the big change, just amount of data and so on,

right?

And I don't fully understand why the idea of microservices or

something you, as I understand, you are bringing, like when we

have many, many databases, many services.

Why is it changing this?

Because in my head, it's vice versa.

If we have many services, we do need to structure and split into

more atomic pieces of our data, right?

And the article I mentioned, it's called "Schema Later Considered

Harmful."

After my post, actually, this is why I named my sub-transactions

blog post also considered harmful.

And some folks mentioned on Hacker News mentioned that there

is an article considered harmful, considered harmful, harmful,

considered harmful titles considered harmful.

So it's basically like not a good way to name articles, but the

blog post is quite good.

Like if schema design, normalization still makes sense.

If you don't do it, you deal with bad consequences later.

So please let

Franck: me understand.

Yeah, but it depends on your use case.

And also something I've been working on relational databases

where you normalized, but basically, when I learned databases

at university, it was all about normalization.

And then when you start to work, you hear people talking about

denormalizing everything.

And of course, you just need to think about the access patterns.

Nikolay: Yeah, Let me just add this.

Sorry for interrupting, but let me just add, I totally agree.

If we over-normalize, then we deal with very simple fact that

you cannot create 1 index on 2 tables.

You won't, Because, for example, filtering on 1 table, filtering

on another table, you want a single index scan.

Definitely, this is what we do also.

My team and I, we do, during consulting practice, we say, okay,

here we do need to normalize.

But my point is, if you take Mongo and other document databases,

they just provoke you to avoid normalization at all.

In relational data systems, we can...

Is it OK?

Am I wrong?

Franck: Yeah, for me, you are wrong.

And I think that's also 1 reason MongoDB was interested to have

a developer advocate coming from SQL databases, is that users

tend to think that they have to denormalize everything and to

put everything in 1 document, which is wrong.

The idea in MongoDB is to put together what you insert together

or what you query together, but in different documents if you

query differently.

Just to take an example, another entry system, you don't want

to put together the customers and the orders because you don't

want 1 document per customer where you just add orders that can

be a lot every year.

But the orders themselves, the orders and the other items which

we usually put in 2 tables in SQL databases just because they

have different cardinalities.

That's something you can put in a single document, because you

insert an order with all the items.

You have nobody who will just update
1 item of the order and

you query them together.

Of course, it depends on the system.

If you're in a system that analyzes
the order lines for marketing

purpose, buy the product and you
don't care about the customer

or the other, then maybe the modeling
is different.

And this is where different use
cases are.

But it's not about putting everything
in 1 document.

And that's also why it's good to
do some design reviews?

Because it's easy for a developer
to start and put everything

in 1 document, just moving what
they have in Java to the database,

but still needs design and still
need to think about what you

embed, like denormalize, or what
you reference, like you would

reference with foreign keys in
a secure database?

Nikolay: OK, I hear you.

I think I understand you.

But still, You say users have this
tendency to think.

For example, user Michael Stonebraker
says that he noticed that

maybe it's possible to normalize,
of course, but in relational

databases there is a big tendency
to normalize first and then

denormalize when needed.

In document store databases, there
is the opposite tendency.

Avoid normalization first and then
normalize when we have pain.

The whole article called "Schema
Later Considered Harmful."

I think, as I understand this article,
it's about that the relational

approach, direction of movement
is more beneficial in general

case than opposite.

What do you think?

Franck: Yeah, but remember that
relational databases were made

at a time where we were designing
the data before looking at

the use cases.

The normalization and the data
model doesn't care about the use

cases.

You just model the data.

You have orders, multiple order
items, an order belongs to a

customer.

You do a static model of your data,
and then you bring the application

use cases, and you can optimize
them with indexes, but you don't

change the data model for the use
cases.

But this is not really how applications
are developed today.

Today, applications come with a
main use case and rent fast access

for this use case.

For another use case, they just
check if they can do it on the

same database, or maybe do some
event streaming, put that in

another database and doing elsewhere.

That really has changed.

Today, even applications that run
on SQL databases, I see people

starting a data model, knowing
the access patterns.

And then maybe you can denormalize.

For example, it's okay to denormalize
something that is not updated.

The big danger to denormalize something
that may be updated is

that you have to update in multiple
places, which is a risk of

inconsistency if you forget 1,
and which is also a performance

issue, especially when you distribute,
then you have distributed

transactions at multiple places.

But data that you do not update,
and there is a lot of data that

we don't update, we just add a
new version of it.

For example, a customer is creating
a new order, you will not

update the order.

If we add a new item, that will
be a new order, but the existing

order has been validated.

You don't update this data later.

Usually you have a timestamp, And
even if you change something,

then you just add the new version.

So the applications have changed.

And I'm not saying that 1 is better
than the other.

But when we listen to the developers,
we see that they don't

want to build this ERD diagram.

That was never true.

So that's also something.

Nikolay: Nobody does that anymore,
building ERD diagrams.

Or only our AI system does, but
it's just a side function for

it.

But I don't understand why we cannot
do it on relational databases

and still have all the good stuff.

Because we have JSON, let's just
put it there and so on.

Franck: And It's very good to mix
both.

I've seen a lot of applications
on Oracle, on Postgres, on Yugabyte,

where it's a mix where you have
tables with Columns because they

are updated because you went indexes
on it, and you have a bunch

of metadata information that you
put in a JSON.

And that's also perfectly valid.

Nikolay: And so what does Mongo
bring here, if we have it already?

Franck: I think the API is very
different.

Nikolay: Of course.

Franck: Yeah.

With MongoDB, you can really...

You have your object graph in the
application.

In JavaScript, it's even easier,
but in Java, in whatever, in

Python, and you just communicate
with the Database those documents

and they are stored as documents.

The big problem with SQL databases
or so something that has changed

when Application have changed at
the time where everything was

done in the Database, stored
procedures or pre-compiled procedures

or whatever, then that was okay.

But with object-oriented programming,
you had this mismatch and

you need an object-relational mapping
to map from 1 to the other

if you don't want to do a bunch
of queries in text strings and

through JDBC.

So, what MongoDB brings at that
point is an API that really fits

with the programming language and
then it stores it as documents

rather than mapping that to relational
tables.

Nikolay: Yes, this is what HDB
and HQL are trying to solve.

They try to reinvent SQL to have
this, what you describe.

Yeah, but you mentioned OOP and
ER.

I think this is in the past already,
both.

No, I'm joking, I'm joking.

So for me

Franck: it's like...

Then what do you use today if you...

I mean, applications are built
with objects?

Nikolay: Well, I personally am
a big fan of things what guys

like Hasura, Supabase, others
do with thin layer providing APIs

right away without the need to
write this middleware.

It's great, this serves better
than object-relational mapping.

But people do object-relational
mapping, but at the same time,

I doubt a lot of guys who create
projects, they do actual OOP

with patterns and so on.

It's kind of like somehow not cool
anymore.

It's my perception.

I'm far from actual application
programming lately.

Franck: But it's also this old
debate where do you put your business

logic?

Ideally in a SQL database, you
put it in the database because

data is processed there, but then
you are constrained to specific

languages.

Nikolay: Well, right.

But if you put it to application,
you also have dependency on

this language you chose.

It's the same.

To me, the question about where
to put logic became much easier

to understand since like 10 years
ago when Angular and React,

they obtained, gained popularity
and a lot of logic and actually

Web 2.0, how many years ago already?

Like 20 years ago, right?

All these shifted a lot of client-oriented
logic to clients,

to front-end, right?

And this gave space to have logic
closer to data, like constraints

and what we usually do with triggers,
some dependencies, propagation

of changes or something.

It gives opportunity to keep it
in database where it should be

because otherwise, if you don't
do it closer to database, at

some point when company grows,
project grows, you add some other

tools or application layers or
something called, and you need

to re-implement the same logic
in different places.

And there is no strong guarantee
that it will be well maintained.

Yeah, but the

Franck: problem is just, I totally
agree, and there are very

successful database-centric applications.

But what developers want, they
want to use Java, not PL/pgSQL

and not PL/SQL.

And just because try to hire a
SQL developer or a PL/pgSQL developer,

that will be more difficult than
hiring a team of Java developers.

Nikolay: Right.

Right.

Michael wanted to ask something.

Michael: I don't know if this is
a change of topic, but I think

it's on the same path, which is
around developer experience.

And I know it's a subjective term,
but I do think when, at least

when Mongo went into the market,
but I think NoSQL databases

in general, they promised a few
things.

1 was a really good getting started
experience, a very quick,

easy, you don't have to think much
type, no schema to worry about,

and just get started.

And that's good for some things
and not so good in other ways.

But it also promised a couple of
other things.

And I think we can learn a lot
from these things in terms of

why was it popular?

Like, why did Mongo take off?

Why was NoSQL so popular for so
long?

It also promised kind of infinite,
or at least horizontal scalability.

And that's something we've historically
struggled with.

I know you worked on distributed
SQL, but it's something we've

historically struggled with in
the SQL world.

And then, yeah, I think that combination
of things seemed really

interesting to me.

And I wondered if you had opinions
on what is it about that developer

experience that really resonated
with people?

Franck: For me, that's really developer
experience where MongoDB

is really, was really successful.

The scalability, I don't really
know because I didn't use MongoDB

at that time, and then I've seen
scalability in SQL databases.

The scalability comes from the
data model where you can have

an easy sharding key.

Yeah.

As soon as you have an easy sharding
key, you can distribute

that on mostly all databases today.

On Postgres, you have multiple
options like Citus, like Aurora,

Limitless, where If you have a
sharding key, you can distribute.

I don't think it's really the point
today.

The point is really develop your
experience.

As you say, it's easy to start
and it's easy to integrate to

your programming language also.

Not having something else to learn,
a different language, but

also a different behavior, thinking
about what you need to look,

thinking about foreign keys, thinking
about performance when

you read from multiple tables.

But it's also, the easy to start
is also a problem.

And basically I'm working in the
DevRel team where most of the

job is helping users, developers
to do some proper data modeling

design.

Because it's easy to start, which
is good when you start a proof

of concept, but at some point like
in any database, you need

to do some design.

And the more easy it is to start,
the more difficult it is to

realize that, okay, we are not
in a proof of concept anymore,

we'll put that in production.

It's an application that will evolve
in the coming years.

And then we need to look at the
design.

And this is one of the major activity
in the DevRel team.

It's not like being developer advocate
for Yugabyte was really

about awareness because it's a
new database, so you just need

to let people know it.

MongoDB, people know it.

You just need to make them successful
with maybe a bit more complex

use cases and do some data modeling.

Nikolay: So I have a question about
how your personal experience

and this decision you made, obviously,
recently.

It feels like you switched teams,
like in soccer or football.

Right?

So my question was, any, like,
transfer cost?

Franck: Ah, that's a very good
question.

So let me explain how it was.

I was really happy at Yugabyte, about
the team, about the colleagues,

about the product.

I was really not looking for another
job.

And when other companies contacted
me, I was like, oh, sorry,

I'm happy where I am.

And when MongoDB contacted me,
it was more by curiosity, like

why an SQL databases is interested
by my experience.

And this is why I started discussions
by curiosity.

And then this is where I realized
that it was really an interesting

approach that's helping users on
document databases with the

knowledge of SQL databases, being
able to discuss with those

who use Postgres, who use MongoDB,
who have a new use case, they

want to know if they can do it
on both or one is better than the

other.

That was interesting.

And I was like, OK, I should think
about that.

And then, of course, there is an
offer that was interesting enough

to say, OK, why waiting, just going
there?

But I could have the same offer
from Yugabyte.

So it's not really what makes the
decision.

Maybe it just push you to say,
why not now rather than waiting

6 months or 1 year?

But no, the point was really learning
something new.

I really like learning something
new.

And all the content I create is
me about learning.

Nikolay: Yeah, well, yeah.

My first reaction was, of course,
I became very upset.

And I started to think, is it like
sudden change of your views

or maybe you slowly became more
unsatisfied with state of relational

and SQL world and so on.

So I asked our AI assistant, and
as you know, we have all your

blog posts.

So I asked to research among blog
posts where you talked about

NoSQL and SQL, and to my surprise,
it said you had such posts

in the past and it's not a sudden
change of views.

So the result from AI was it's
not a sudden change of views.

But when I start, I asked to dig
deeper, It was obvious that

maybe the key reason was nulls
in your past blog posts.

The key criticism point was how
null behavior.

And I was going to raise this.

I did it during the weekend and
I was going to discuss this but

as I already tweeted or x'd I don't
know how to say it.

Yesterday, what happened yesterday
in the morning, my team made

mistake and I actually I looked
at that merge request myself

so it was not null safe operation
leading to nasty bug, which

led to multiple companies receiving
emails from us, actually

a few emails from us, with wrong
data.

And it was because of just comparison,
not involving three-value

logic.

And I was beaten by this so many
times.

I had a startup where I was stuck,
my own startup, I was stuck

7 months without growth.

Although I knew there should be
growth, but there is no growth

and then I almost gave up and then
I digged deeper into the code

and found this bug again not null
safe comparison we fixed it

and in a few weeks we had 80,000
registrations per day.

I almost gave up on that startup.

This was like all nothing kind
of, you know, it's just, it's

distinct from or distinct like
or coalesce, you can fix it in

multiple ways.

But if you overlook it's just a
single line of problem that which

can cost you a lot of money and
time.

And like maybe whole startup can
depend on it as in my story.

So I'm definitely with you in the
criticism of null and not in

with null values, right?

Franck: I'm not really criticizing
it because I love the free

value logic.

I love nulls because I think I
understand it.

Nikolay: I also think I understand.

I also love exactly.

Yeah, yeah, yeah.

Franck: But it took me 20 years
to understand it.

And then I can understand that
a developer who already has a

lot of things to learn, do not
want to spend time on something

that looks like mathematics.

Nikolay: It's good.

It's kind of like I kind of came
from academia, right?

And I learned quickly during my
university time because I had

a very good professor, a big specialist
in databases, and I quickly

learned it.

But it took me 20 years to stop
liking it because I see reality

says nobody, nobody, like everyone
steps on this rake all the

time, including myself.

Franck: Yeah, you need to be pragmatic.

But also, you can also solve all
problems in SQL databases.

Just don't use NULL.

Just set all columns, not null.

And that works.

And you were talking about normalization.

Just normalize a bit more.

If you are tempted to put a NULL
in a column, then it's probably

because this column belongs to another
table.

And then it will not be a NULL,
it will be the absence of a row

in another table.

Just go forward, full normalization,
and do not allow any NULL,

and that will work.

I mean,

Nikolay: It will work, but

Franck: you will not have those
errors.

Maybe you will have some performance
issues.

Nikolay: Exactly.

Performance issues will be inevitable.

Franck: I see NULL like denormalization.

It's a shortcut that is easy.

It's so easy just to say, okay,
let's put a NULL because it doesn't

have a value.

If it doesn't have a value, it
should not have a row in the table.

Nikolay: Yeah.

I also remember, like, imagine
you have CTO or some leader who

understands NULLs.

Imagine all those poor application
developers who write Java,

JavaScript, doesn't matter, PHP
code, Ruby code, and this CTO

with this understanding of NULLs
in SQL constantly putting pressure

like you again you used it wrong
in your code and I was this

person and right now I'm like I
think just NULLs is

Franck: a good

Nikolay: concept but the world
says please no it just doesn't

work well So that's why I say I
don't like them.

Franck: I will take another analogy.

I think the best editor is VI.

Because I also

Nikolay: agree to

Franck: learn it.

Yeah, we had to learn it

Nikolay: inside TMAX.

Franck: It was hard to learn it,
we had to learn it.

But when you know it, you are very
efficient with it.

But I can understand that a junior
today do not want to learn

all those VI commands.

Same for null.

I mean, if you learn it and if
you spend all your life doing

SQL, then, yeah, it's good.

But that's not the reality.

Nikolay: Yeah, so back to Monga,
and let's talk a little bit

about the alternative and if we
go out of SQL world, but stay

inside databases, what's happening
to nulls and empty values,

unknown values and so on.

Zeros, empty strings.

Should it be considered all the
same or no?

Franck: In SQL, for me in SQL it's
easy.

A null is a value that exists,
but you just don't know the value.

Your top manager has a salary,
but you don't know it.

So if you have to put all salaries
in a database, then you will

have a null.

And maybe you will put it 1 day,
just because you don't know

it yet at the time where you insert.

The problem is that null is used
for other things, for something

that doesn't exist.

You know, when in Excel we say
NA, doesn't apply.

And if you use this as doesn't
apply in JVa script, you're just

trying to store it and have the
same logic when you query the

database.

So MongoDB does that.

It's very similar to not exist.

You have those documents where
you can declare an attribute or

not.

And in most cases, if it's not
there, it's similar to null.

And if you want to say explicitly
it exists but I don't know

the value, then add something else,
like a boolean that says,

okay, we don't know it.

Nikolay: Yeah.

By the way, You mentioned you like
it, it's a good concept, but

I'm thinking, so many caveats,
for example, if you take null

value and do plus 1, it will be
also null, like unknown, remains

unknown, because we don't know
what we're using.

If you

Franck: don't know a value, then
you can add 1 and you still

don't know the value.

Nikolay: If you say at the same
time if you use aggregate sum

it's not like that it uses 0 instead
of now right?

Franck: Yeah because you sum the
use it's defined as summing

the known values.

Nikolay: You cannot explain this,
it's not logical.

It's just as is, because sum is
just plus 1 argument, plus different,

just a sequence of plus operations,
right?

Franck: But...

Depends on how you define the aggregation.

If it's even the sum of the known
value...

Nikolay: If we have 3 rows, salary,
like $1, $2, and NULL dollars,

NULL, right?

Yeah.

If we just perform explicit summarization,
the result will be

NULL.

But if we use sum, we should be
the same result.

It will be not the same.

It will be 3.

Franck: Depends on how you define
it, but SQL defines that as

the sum of the values that you
know.

Nikolay: I apologize.

It gives

Franck: you an idea, and it makes
sense.

I mean, if you have 1000000 rows
and you ask for a sum, you probably

don't run an unknown just because
1 is not known.

At least you know the sum of the
existing ones.

Nikolay: Let me apologize and explain
what's happening here.

I just flipped the board and made
you defend the SQL world, which

is interesting because it shows
that you have courage to become

specialist in both worlds.

This is interesting.

Franck: For me, I changed the company
and I help different users,

but I did not change what I think
about databases.

I mean, I've been working a lot
with Oracle, I still think it's

a very good database, but I can
understand that people want to

move out of it, and it's probably
not because of the features.

I like Postgres, but I also think
that there is something else

to do in the storage and to distribute
it.

I like YugabyteDB, but I also understand
that some people may

want to use something else.

Same for MongoDB.

I just want to help users when
I can help them.

And also something, especially
on Twitter, but we see a lot of

people comparing databases like
MySQL is better than Postgres

or Postgres is better than MySQL
or whatever.

And what I always say is that the
best database is the 1 that

you know.

If you know how to administrate
better SQL server on Windows,

then that's probably the best database
for you.

It's not for me.

And if you are more successful
with the NULL behavior in document

databases, then probably you should
use document databases.

So my goal is just to have people
be successful and use the right

database depending on what they
know.

The worst that you can do is work
with MongoDB and do the same

design as you did on the SQL database
or the opposite.

Putting everything in document
in Postgres just because you have

learned MongoDB first, that will
probably not be good.

You need to understand how it works,
read an execution plan in

both case, understand how the indexes
are used.

Michael: I kind of agree for products
where you're the only user

like if I'm choosing between iOS
and Android or we were talking

before the call about macOS or
Windows if I'm the only person

affected I understand choosing
what I know best, but I feel like

with databases we're often choosing
for a team for an organization

for a company And it's not just
what I know best, even if I'm

the tech lead or, you know, even
if I am the decision maker,

I need to factor in what do my
team know best?

What can we hire most easily?

What's easiest to operate?

Or how long will this project last?

Is it a proof of concept project
or is it our main system you

know it's a bunch of other factors
I think are really important

and do you think you brought up
use cases at the beginning I

think that's like super important
because we often do know the

use cases we often do know the
access patterns so picking the

1 that is best for that makes more
sense to me than like which

1 I know best personally but I
do take your point that if you

if you take that as like an organization
which 1 do you operationally

know best as an organization like
that it does still fit but

I do think there's some subtle
difference there what do you think

Franck: I think that there are
a lot of use cases that can be

successful on many databases.

Of course, there are some special
cases that are really put at

the maximum throughput needed,
where you have really to define

the right technology for it.

But let's say you have time series.

Time series coming from IoT and
you have queries on them.

Of course, you can use a time series
database, but you can also

do it on Postgres with a time series
extension or not.

And you can also do it on a document
database.

If you do it correctly, I think
you have a lot of choices for

many use cases.

And finally, the enterprises that
need a specific database because

of the very high scale of it, they
finally build their own database

or they trick the 1 database to
use it freely like their own

database.

But I think you really have the
choice.

Many use cases, you can do that
on Postgres, you can do that

on Yuga, but you can do that on
Oracle, you can do that on MongoDB,

you can do that on DynamoDB.

But if you do it in a database
where you don't know exactly how

NULL works or how the isolation,
the ACID properties, the locks

are working, then you can also
be successful on any database

for many use cases, but you can
also be very bad in any database

if you don't care.

So it's more about the people,
I totally agree, not your personal

choice, about the people.

And I remember discussions when
I was doing consulting, I remember

discussing with a customer for
something where it would have

made sense to use stored procedure
and they were growing all

microservices, Java, all that.

And they just told me, yeah, but
if we do it in SQL, PL/SQL was

on Oracle at that time, we are
4 in the team who can do that

and maintain that.

And then if any problem is there,
we are 4 to be on call.

If we do it in Java, we have 200
developers in India, we have

200 developers in US.

If there is a problem during the
night, they will manage it.

So the good choice, Even if it's
not the best for performance,

for design, for whatever, the good
choice is is also something

where you can sleep and have a
team that can manage it.

Nikolay: Well, right now, AI can
help you fix bugs, tests, and

so on.

Oh, yeah.

It's easier, right?

I have a couple of questions from
friends, and I think you know

them, but I'm not going to reveal
names.

First question, is MongoDB adding
SQL to the product?

I don't

Franck: think this is in the roadmap
at all.

And I don't think people are asking
for that.

Let's look at another SQL database,
DynamoDB.

When DynamoDB added the SQL syntax
on top of it using PartiQL,

it was never used.

And the main reason was that users
were afraid of it because

with the API that, with the document
API, they know what happens.

The big difference, I mentioned
the API, but there is a big difference

between NoSQL and SQL.

In SQL, you have a declarative
language where you don't know

how the data is accessed until
you read the execution plan.

Which is good because you have
an application that is independent

of the physical data model, but
it's also more difficult because

the developer has no idea how it
works in production before looking

at the execution plan.

And when looking at the execution
plan, the developer may have

to work a long time to understand
why the bad execution plan

is chosen.

Is it because of statistics, not
good index, whatever, it's kind

of complex.

With the NoSQL APIs, you code the
data access.

So it depends on the database.

For example, in DynamoDB, if you
want to use an index, you have

to query the index.

In MongoDB, you have this data
independence where you query on

the collection, and if the index
can be used, it can be used.

But you control the data that is
accessed.

For example, when you design your
documents, You design something

that is joined when you insert
it, not at run time, where a query

planner will decide if it starts
with 1 table or another table.

And it has some good and bad.

I remember in consulting, spending
a long time with developers,

looking at the execution plan and
they know their data and they

know their access pattern and they
immediately tell me, of course,

that's not the right execution
plan.

We must start with this table and
then look up into this 1.

Okay, perfect.

I can use an int pg_hint_plan, for
example, in Postgres to validate

that it's a better execution plan.

And then the developer is happy.

Yeah, perfect.

I want that.

And then they're like, okay, but
it's not finished.

Now, we need to figure out how
to get the right execution plan

without the hint.

And with consulting, people were
paying the day just to get the

right execution plan that they
know initially was the best 1.

With an OSQL API, you are closer
to what happens physically and

then you have more control on that
and some developers prefer

that.

Nikolay: Next question was, what
do you think Postgres can or

should learn from MongoDB?

Maybe this, right?

Is it possible to...

Michael: I have 1 more.

I think they do major upgrades
really well.

Franck: Oh, yes.

Well...

Michael: But we can learn that
from a lot of databases.

Nikolay: Yeah, previous question
was because I had like maybe

outdated knowledge that many NoSQL
systems implemented some dialect

of SQL, for example, Cassandra
with CQL, right?

Franck: Yeah, but they...

Nikolay: Not used.

Franck: It's only syntax, it's
not SQL, it's not a declarative

language, it's just syntax.

I don't see the point.

Nikolay: If you

Franck: have an API that is integrated
with your programming

language, why do you want to write
a string in Java that you

send to the database if you don't
have to?

In SQL, You have to do that because
you have this data independence

and very different language.

But I don't really see the point.

But I forgot what you mentioned.

Nikolay: Yeah, I said vice versa
what Postgres could learn from.

Michael answered upgrades.

I concur with you, definitely.

Franck: But that is related.

In SQL databases, in relational
databases, to have this data

independence, logical and physical
data independence, where you

query, in SQL databases, you query
a logical model.

We were talking about normalization.

This is the logical model.

Maybe physically everything is
stored in 1 table.

You don't really care from the
relational SQL point of view.

But then to map the logical model
to the physical model, you

need a catalog, a dictionary.

And this is what is difficult during
upgrades, because you need

to change the catalog and the catalog
is shared.

You can short the data, you can
distribute the data, but the

catalog must be shared because
they must use the same dictionary.

And that's easier with a NoSQL
database because you have much

less to share about the metadata,
because the catalog is in the

application.

The schema, we were talking about
schemaless or schema on read

or on write.

The big difference is that Most
of the schema is in the application.

And then if you upgrade the application,
you have a new version

of the application, it knows the
new schema.

And the 2 versions can work together
if you take care that when

you read a document, you know how
to read it.

Michael: Great answer.

Nikolay: Yeah, last question.

What do you think about systems
which are built on top of Postgres,

like FerretDB and DocumentDB recently
released by Microsoft?

Franck: That's a good point.

So, beyond the funny thing that
DocumentDB is an AWS database,

but the name belongs to Microsoft
because before putting a MongoDB

like API on Cosmos DB, it was called
DocumentDB.

So, Microsoft did that multiple
times, put it in Cosmos DB to

see if it will be more popular.

So first, it's a mess.

Different API, similar, you don't
know the name where it comes

from, but I really like what the
FerretDB people are doing.

And for me, as a developer advocate,
I really like that there

is a MongoDB API on multiple databases.

In Oracle, you can also have a
MongoDB API.

The more you make it popular, the
more you help users to use

another API without changing the
database, that's perfect.

From a marketing point of view,
I don't think it's a big problem

either, because it's not only about
the API.

What I think that the big customers
of MongoDB like with MongoDB

is that they have in front a company
that is doing only 1 thing.

The company is doing only MongoDB.

It's not like Oracle that has a
database, but also another database

and cloud and manage service and
software.

MongoDB is doing only MongoDB.

So if they use MongoDB on MongoDB, they have hundreds of people

doing support on it.

Nikolay: I cannot agree here because I remember MongoDB company,

it's called Mongo or MongoDB, sorry...

So I remember they also did some Postgres when they first released

BI connector.

Remember this story?

They used Postgres.

Franck: I have no idea.

Nikolay: To be able to use Tableau and other systems for data

analysis, BI and so on.

They needed to make some bridge to SQL world and they used Postgres

for that.

It was very interesting.

Franck: I have no idea.

My point was more like, you can do some MongoDB on Percona, you

can do some MongoDB on Oracle, you can do some MongoDB for FerretDB

on Azure.

And that can work.

But if you are a big customer and want support, you probably

want support from the original 1.

Nikolay: I hear you speaking as a member of this team, new member

of this team, but I also have like my must have a note that MongoDB

is not pure open source.

Franck: It is not pure open source, yeah.

Nikolay: Well, FerretDB is Apache 2.0, which is pure open source.

So this is 1 of...

Franck: Yeah, yeah, yeah.

Of course, I'm a big fan of open source.

I would prefer that it is open source, but I can also understand.

You know why they had to change the license?

Because AWS was taking everything.

And finally, today AWS is a major partner.

So it was probably a good move.

Probably today it could be open source.

But yeah, I can understand given the history that they want to

protect the managed service.

Nikolay: Open source is eating commercial software, clouds are

eating open source software.

Yeah, you remember this sequence of fish picture, right?

Yeah, okay, I think no more questions from me.

It was very super interesting and yeah, enjoy.

Thank you for coming.

Franck: Thank you very much.

I really like also what you do, how you can come with so many

different topics on every week.

I think you never missed a week for us.

So yeah, that's really nice.

Nikolay: Great.

Michael: Really kind of you, Franck.

Thank you for joining.

Franck: Thank you.

Nikolay: Have a great week.

Creators and Guests

Franck Pachot
Guest
Franck Pachot
Developer Advocate at MongoDB (formerly at Yugabyte​DB, distributed PostgreSQL), AWS Data Hero, Oracle Certified Master

Some kind things our listeners have said