NULLs: the good, the bad, the ugly, and the unknown

NULLs: the good, the bad, the ugly, and the unknown

Nikolay and Michael discuss NULLs — including some problems they can cause, a new feature coming in Postgres 15, and some learning resources we like.

Michael: Hello, and welcome to Postgres FM,
a weekly show about all things PostgreSQL.

I am Michael I'm, founder of pgMustard, and this
is my co-host Nikolay founder of Postgres AI.

Hey Nikolay what are we talking about today?

Nikolay: Hi, Michael.

So it's time to talk about various mistakes and I've noticed that
many people from my conference experience and so on, I noticed

that, uh, many people love to hear about other people mistake.

I don't know like why they love it so
much, but I I've noticed it's very popular.

When you talk about, we made this mistake, it was so painful and people
enjoy listening or watching, uh, maybe they want to learn something.

I don't know.

Michael: yeah, I love it.

And there are two main reasons.

One is the hope that by hearing about their mistake.

I hopefully can avoid making the exact same one, avoid
having to make that myself, uh, it's extra painful.

And then the second reason I love it is it just makes you
feel better when you make mistakes, knowing that these

amazing other people make mistakes all the time too.

So I think that's for me, there's that part of it as well.

Nikolay: Right.

And, and, it's already fifth episode of our podcast and I
I've noticed we have quite good numbers so far, and I wanted

to thank everyone who is sharing and liking and so on.

Please continue.

It's it helps we, we see this and we also seek comments in Twitter feedback.

It it's so great to see.

Michael: Yeah.

And thank you to everyone.

Who's suggested ways we can make it better as well, really appreciated.

Keep topic suggestions coming to, we'll try
to get to quite a few of them over time.

Nikolay: Oh, and I also also want to say big hello to
all those who are currently running or riding bicycle.

And so it's good.

I, I also need to do it.

Okay.

Uh, so, mistakes, if we talk about mistakes, uh, related to
sequel and specifically Pogs, I think the biggest, the most

painful, at least in my career, it it's always related to NULLs.

Now it, this is the most.

Uh, the, the, the thing I hate most in SQL, uh, in theory, in,
in, in like SQL model, if you think about SQL model in standard

in, in various database systems and specifically in Pogo.

So let's talk about NULLs today.

Michael: Awesome.

And one of the reasons they're so painful is
because the mistakes don't come back right away.

Right.

You don't necessarily hit an error message
or you don't, uh, you know, there's yeah.

That feels like they kind of, they aren't invisible for a while.

Um, so that,

Yeah.

that's a better word for it.

Yeah.

Nikolay: mm-hmm mm-hmm yeah, hidden, silent.

And so on.

This is a like, if, if of course, if you
see the error, you can, you can react.

But if you don't see error and NULLs are very, very tricky.

Let's start with maybe some news about Pogs 15.

There will be new capability that will allow people to
say that unique constraints should not distinguish NULLs.

So there is only one in universe for some unique constraint.

By default, all NULLs are distinguishable like they are all different.

So we cannot say this now is the same as that now, because now it means
unknown in sequel logic, sequel logic is based on three valued logic.

It's, it's worth understanding what three valued logic is.

And again, if you don't know, please check
out, Wikipedia or other basic articles.

It's very fundamental knowledge.

Everyone should possess.

So if you compare to NAS, uh, the result is always
unknown, which actually translates to now, if we like

we don't have unknown values, we have only now values.

So like NULL, compared to NULL, Kind of also
NULL, however, strictly speaking, it's unknown.

And that's why if you define a unique constraint, you
can insert as many nails in this column as you want.

This is by the way, uh, Big difference between unique constraint and primary
key is because primary key is, uh, completely forbid nows in, in columns that

are, that participate in primary key, but unique constraint allow, allow nows.

So, if it's, a one column unique constraint, you can insert as
many values of now as, as you want as a mineral having now in this

column, as you want, and this in some cases is not convenient to some
developers so that's why in Pogo 15, a longer waited feature appeared.

can So default defaulted behavior is the same.

You can sort as many notes as you want, but with new option, I
don't remember this option, but this option allows you to say.

They can be only one now here.

Right?

And this is like kind of, I, I like that
this is explicit option, not a, a setting.

You, you change like you change.

Of course it'll be, it'll be nightmare.

And it'll break all, all logic that standard dictates and so on.

It's good that it's explicit.

And, and, uh, for only for those who need it will have it,

Michael: Yeah, I saw it.

Nikolay: I guess.

Yeah.

if

Michael: Yeah.

I saw it.

Sorry.

I saw a good blog post on this by Ryan Lambert.

Um, maybe a month or so ago, I'll share it in the show notes.

I thought it was super interesting, but I was
also struggling to come up with a use case for it.

I didn't really understand why you would want to be able to determine
that there was exactly or sorry at most one, NULL value in a column.

Did you, did you ever come across it?

Nikolay: Well, uh, yeah, if, if you.

I remember some, somewhere in some articles probably or some discussions.

So we compared, uh, like when you start understanding three value
logic, you are kind of infected, you always think about, okay.

Okay.

We have no unknowns.

You start.

If you, if you are more, that waste guy, you
start teaching your, uh, backend and front end.

Uh, developers, colleagues, especially front end colleagues who
write a lot of JavaScript code that, you know, unknown is unknown.

You cannot return like, think it's, it's about it as
an, as an zero or about it as a, uh, empty string.

So, so, and at some point you probably want to create
some table and, uh, use a NULL as like unknown.

And you can imagine, for example, we.

Uh, for example, um, true, false and unknown,
and we want to restrict only three options.

It can be for example, two column, uh constrainted for each person,
we can say the, you can, for example, it's it can be a poll results.

So each person can say, I agree, I disagree.

Or I don't vote in this question, for example.

Right.

And, uh, each person can say only once.

So we can have personal ID and result, right.

And unknown result.

We can use nails to, to record, uh, uh, that, uh, the vote is like unknown,
but it's actually, it's breaking logic because in this case, the third

option, I, I, I, by purpose, I don't want vote is, is kind of known result.

So strictly speaking, I wouldn't do this.

I would use, for example, minus one or.

Some people may choose mouse and then they want to, uh, restrict,
uh, and say only one result for each person is possible.

In this case, you want unique constraint and in this case,
you would want to have only one now possible for one person,

something like this is what popped up in to my head initially,
like I can imagine this can be useful, but I would avoid it.

Michael: That makes sense.

I actually used to work at a company that did survey software and
we did have more than one negative value because we had, well, for

example, you might have seen the question and skipped it versus you
might never have been presented the question and those are different.

Um, so yeah, very interesting.

Nikolay: Various flavors of various meanings.

I, I I understand some, some people need it and Pogo this
is probably not the biggest problem with no sexually.

Let's about, uh, other issues.

For example, I had a bad situation like roughly 10
years ago, it was my third one already social media.

And I was completely sure that it's good and market
is ready we we should launch with great success.

And I remember we launched.

Slowly in, in spring first and everything was ready to to grow.

And, uh, we used a lot of mechanics, uh, all social you might, you
might hate it or not, but when you check friends, Uh, initiating

some invitations, uh, without, uh, lying of because, uh, we know
the case LinkedIn had and so on, but it's a different topic, so

everything was ready and I saw audience and, and like likes it.

But at some point during summer, we experienced a very low uh, I, I
didn't understand why, because I see like, like if we talk about, if

you talk to specific people, they are happy, they consider project.

They're interesting.

And uh, analytics shows us, they use this system to connect to friends and
so on, but somehow don't don't grow as I expected as I And my my experience

was already quite good because I, it was already a third project for I
like, I understand how people work and so on, but somehow it didn't grow.

And I remember in the, in early September,
I went to like digging what's happening.

I went to code and I saw some select, which.

I don't remember it was comparison or some
etic operation, but it was not NULL safe.

So the result was now, but developer obviously didn't expect
result was unknown, converted to implicitly converted to

now, but developer didn't expect now to be present there.

And I just put coalesce there.

and we started to grow like, like rocket during next several months.

so by the end of year, it was huge success.

So I just put one word coalesce that's there

Michael: Wow so what was this doing?

What part of it, like, was it rec, was it
about, um, suggesting people add others?

Or like how, how did that limit growth so

Nikolay: if, if you're interested in off topic for us, but
if you're interested, usually all social networks do like

they, they offer you Who, which friends already use this
service and when you do it, they suggest you to connect

Michael: So back to back to coalesce, my understanding of that
is it returns the first non-no, value that it that it sees.

So you can, for example, put coalesce and then, the result you're expecting

Nikolay: Zero for example, you expect, if, for
example, want to use amatic uh, to make them now safe.

You need to use coals, for example, uh, use zero when
there is no value, otherwise whole result will become.

Now if at least one now is present among, among, right.

And this is exactly what happened Because
of that, the thing didn't work at all.

And, and once we fixed it, it was like, it was so good.

I was happy.

I I uh, the loss, like, and it was estimate was like
$30,000 during several months loss of this, uh, missing one

Michael: And that's at a startup, right.

That could be, that could be bigger at a larger company.

Imagine Yeah.

Um, so it's significant and of course, silent
as we were talking about before it doesn't fail

Nikolay: It's just working but not working well
again, we had some registrations, but not enough.

You know what I thought about it, it's not have this person mistake.

I would do the same easily.

And even if you have 20 years experience writing SQL as I do.

I completely a hundred percent understand.

I realize it, that I, I still have high chances to write some not now
safe code . this is the thing that you need always to keep in mind.

And it's so hard to just forget about it and have it, uh, once
again, have not safe operations and it's just dangerous part of SQL,

Michael: So, I guess this takes us back
to why they're your least favorite part?

Nikolay: Right.

So I like the flexibility because, through
a false, like, it's not, it's not enough.

You need something like unknown and it was added by purpose, but, maybe in
future, we will have some better tools that will maybe it's already exists.

I don't know if our listeners can suggest something, maybe there
are tools that can put some warnings and say, you know, This

column might have nows, and this operation is not now safe.

Probably you will have unexpected results.

It would be good to have such something like
inci CD checked all the time we write code.

But if you use ORM, which dynamically construct SQL , it's probably not.

Well, it's an interesting topic, how to improve experience.

I'm kind percent sure.

We, we're not in the perfect position in terms of NAS in relational databases.

Michael: That makes complete sense and tooling sounds interesting.

I wonder if it could be like some even static analysis
on, I guess you might need to know that a column does

contain those, but knowing that it could is enough.

Right.

So that's yeah, it feels doable.

I'm not aware of anything myself.

So, so is the, the messages basically remember that, uh, something
could be, could be know and handle it appropriately, but try to

remember is the best and maybe, maybe useful for code reviews as well.

I think sometimes.

In teams that I see there's sometimes more people
that are comfortable reviewing backend code or

application code than there are reviewing database code.

And sometimes there's, you know, there's an expert on the team or somebody
who's better at this than most people, perhaps sometimes their code isn't

getting reviewed or, you know, just remembering to review SQL as well.

Nikolay: if you, if you are reviewer uh, of course this is one should
be, you should have in, in, in in your back of, of reviewer's tool

set, let's check if all operations are now safe, but I, I think it's
not we, we should uh, the path that leads us to better results by

Michael: Definitely not the perfect solution, but for now,
I'm looking forward to hearing if there are good tools for

this, but if there aren't and if neither of us are aware of
them for now, I guess that's what people need to be doing.

You know, the, the developer has to think about it and also the reviewer.

Nikolay: all right.

And we also discussed only operations in comparison, but
there are many, many, many more other thick places where

mask can, can, uh, shoot off your, uh, legs or, or arms.

I dunno.

Uh, For uh, aggregate.

Some aggregates ignores some aggregates don't ignores, uh, some aggregates.

Do it depending on situation.

For example, if you do some again, like summarizing
all values in a column, nows will be ignored.

Of course.

Right.

Although some is based on plus operator, right?

Michael: What's the, yeah.

What would the difference be if they weren't ignored?

It would return NULL guess

Nikolay: If you think about some as first
value plus second value, plus blah, blah, blah.

If now, as present results also be now as well,

Michael: Yeah.

I see what you mean.

Nikolay: but some actually does coalesce coma zero, right?

So like it ignores now.

So it's like now safe.

Right?

Michael: One people aren't probably aware of is, is count.

Nikolay: count is, is interesting count, uh, for column will also ignore now.

So it won't, it won't count, rose where you have now for this column.

But if you say count star, all rose will be counted.

And if you say count, table name from table.

It's possible because like, you can count the whole row considering
whole row, all its columns, as, as one single value it's POS magic you.

It's like record type again.

Mouse will be counted.

Michael: What about this is probably ridiculous and
definitely not a real case, but if every column in the

table was no, sorry, imagine a single row with every column.

No.

Nikolay: single

Michael: Would that be counted by table?

Okay.

Wow.

Nikolay: yes, yes.

Why?

No, because I also discussed now some time ago
that two years ago, it was a YouTube, life stream.

And I said, you know, of course, if we, if, if we
do this, what you describe it won't be counted.

I do it.

And I see it's counted.

Michael: That's so funny.

Nikolay: Like it's, it's so many unexpected parts when
behavior is not as like we could have some certifications,

like no expert in progress and also like array a aggregation.

What do you think will be counted or like used or

Michael: May it be a good quiz, wouldn't it?

Nikolay: You if you have a chain of questions,
uh, specifically boom, moving to very bad places.

so it would, it would be tricky to answer everything properly.

Michael: Yeah.

What grade would you give people at the end?

Would it be, a, B, C, D E or no, maybe

Nikolay: or unknown.

So what, what do you think about array aggregation array,

a G

Michael: I

Nikolay: aggregate

Michael: I'll be, I'm gonna guess I'm gonna play this
game and I am gonna say that it does not ignore Knowles.

Nikolay: Yes.

It'll take them and use as a members of array.

Exactly.

So you see like it's, it's, uh, , it's a very complex thing, just one word,

Michael: That was genuinely a 50, 50 guest.

So I could, I would definitely not uh, I didn't know that.

,
Nikolay: al also like, like, uh, disclaimer, might be wrong.

Michael: What percentage

Nikolay: as I, as I've said, as I've said, no matter how many
years experience you have, you cannot be a hundred percent, right.

All the time.

Uh, with

Michael: But the, the, crucial part here is not being unaware of which one
it is, it's remembering that it might not be null safe in the first place.

So there might be unexpected behaviors when
you're working with data that can include nulls.

So it's the same takeaway.

Isn't.

Remember that when you're working with data that contains null values
or is that I'm not even sure if that's the right way of saying it, that

contains nulls then remember to test check, put some test data in, add some
realistic test data and then check that your code's working as expected.

But the, the key part is remembering.

Nikolay: Sure.

I'm quite skeptical.

I think people really understand all the dangers related
to now only when they experience it in production.

So like learning curve, should bring you some more
real examples to hit you so badly that you understand.

Oh, next time I should be very.

Michael: Yeah.

Nikolay: also only recently I learned, for
example, there is Jason B set function.

And if you it's can be used to set some part
of Jason B value to some different value.

So not, not, not be need to.

Override everything.

So you can say along this path, I need to replace something.

And if you put, if you want, Jason also has nulls.

If you want to put now there and say Jason B provide some path array of
keys and then say, no, you will have whole value set to know unexpectedly.

So you need to use single quote now converted
to J and B instead of regular sequel now.

So it's like complete everywhere.

Michael: you say converted to do you mean
like the cast operator or what do you mean?

Nikolay: Yes.

Concussed to JS and B.

So now, uh, string converter to J and V.

So in this case, uh, it'll be as expected you say
you only change the part of the whole value, not the.

Michael: So if I'm a team lead and I've got a few junior
developers on my team and I'd like them to learn a few

things around this, are there any good starting places?

I, I know you said they probably need to learn through
experience, but if I want to try and avoid that, what can I do?

Nikolay: Oh, it's a good question.

And probably a hard one.

I will, of course do some training maybe related to NAS.

Of course making sure everyone, uh, understands what, three valid logic is.

And so some theoretical basics and then, uh, some examples,
I don't know, like it depends on the tools for education.

You have, of course, if you have some playground
where you some databases it's and you.

do some exercises.

It would be, it would be helpful to write
some code and see how dangerous now can be.

But, I don't have excellent, simple answer here again, as
I've said, I'm very skeptical and think only when you hit

only when it hits you very badly breaking some functionality
only, then you probably will learn that it's really D.

Michael: Yeah, I think I saw a really good article by hacky Benita that
included an issue they came across that I will share in the show notes.

And I think you shared one from as Jeff Davis.

Nikolay: Jeff Davis, very old article and excellent collection
of very unexpected examples, much more than we discussed here.

So yeah, let's also attach the show

Michael: For sure.

I know it's difficult to ask people to go away and read
stuff, but it feels like learning that there even are.

These problems gives you a chance of knowing.

I think probably sometimes people don't even realize this could be a problem.

I definitely only learn about coalesce quite late in.

I knew quite a lot about SQL sequel before learning that

Nikolay: Anti joinin example as well.

If you, if you, if you use a not in and then expect, then have
nail inside in, you can also have unexpected, it's quite popular.

I think all articles.

So we mentioned, uh, they, they should discuss this
example as anti joinin also like have some problems nails

Michael: Same with not exists.

I'm guessing.

Nikolay: Uh here's the trick.

I, I always forget about it.

Actually, I, as I remember, not in is dangerous, but not exist is not
something like this, but every time I deal with it, I'm refreshing memory

and either experimenting gum or checking some book posts or articles,
unfortunately again, it's a bag of knowledge you should carry all the time.

And it's easy to stop thinking about it, if you don't use it every day.

Michael: Always test, kids.

Nikolay: Yes.

So testing is perfect.

So if you, if you think about something just tested.

Michael: Yeah, actually one other, so in terms of learning
resources, there are a couple of others that I thought were great.

Probably always a good port of call is have you come
across Marcus Winand's modern SQL, uh, or modern sequel

Nikolay: long

Michael: I think that's, that's pretty great.

I'd be surprised if they don't have something around this.

Awesome.

Nikolay: Mm-hmm

Michael: Do you, uh, the book Art of PostgreSQL.

I think that's quite good in terms of like a mixture
of beginner friendly and quite advanced topics,

Nikolay: should, be somewhere behind

Michael: purple one.

Right.

Um, awesome.

So I think those, those stand out to me as particularly good places to learn.

also anything else on those you wanted to talk about?

Nikolay: Well, last thing I would like to say to add, you say
like tested that I say tested, but I think many times when people

start, it's hard for them to test properly because to create
proper test, you need to understand the depth of the issue.

And I wish some people, like, I so many times people go to some chats or some
places where they can get public help and ask questions and got reaction.

Like, oh, why you just don't test it?

I wish some people like it's can be considered as negative actually reaction,
but I agree that I don't know why the new people, why they don't ask.

Can you help me to create proper.

So I, I would, I would see myself,

Michael: Yeah.

Nikolay: So please help me to test it all, all the aspects of this problem.

This would be a great, uh, approach to ask questions.

When you start with some topic.

Michael: I completely agree.

I think we could definitely be a little bit friendlier in some
communities in the Postgres world to newcomers, but equally I do

also understand that we do get people that clearly haven't even used
Google or even looked in the docs at the most appropriate place.

So it, I think it goes both.

But for sure, teaching people how to, how to share an online snippet or how
to set a little test example in one of the, uh, fiddle tools or something like

that, just to demonstrate their problem a little bit clearer goes a long way.

And if you're asking for help, it definitely helps
to show that you've tried something yourself first.

But, I think it, I think we can all improve there for sure.

Nikolay: Yeah.

Okay,

Michael: Nice.

One.

Nikolay: I hope it was helpful to

Michael: I hope so, too.

Nikolay: basic material, but as I've said, even you have 20
years of experience, uh, it it's still, it's still hurts.

Michael: Hopefully this was a good public service announcement, at least.

And I think some of this is quite technical
and quite difficult to do without examples.

So the articles we share in the show
notes might be extra helpful for this one.

So, I'll make sure to make those nice and clear.

Nikolay: Oh, one more thing I wanted to mention.

I forgot, uh, since August 11.

We have, we are in better place . Why?

Because before SGEs 11.

Imagine if you have a billion row table and Romanian NA
so, and you add some column, which can be true, false,

and you have you think, okay, I want it to be strict.

I want to have not now there.

So, so only true or, and, or only false two

Michael: Exclusive luck, right.

Nikolay: you know, Yes, updating billion rows.

It's it's nightmare.

I don't want to go with it.

So I will consider NAS as false.

And this is what we always did because practically
it was a big headache for us to update billion rows.

So we started to treat NAS as false and because
practically it makes sense you avoid big problems, but

since post August 11, we can say default falls, that's it.

And it'll, it'll be fast since post 11.

Even you can say default falls.

Not now.

Michael: What does that mean Do you, oh, one minute.

So I, I think I understand.

So you could add.

Nikolay: you, you can, you get results.

You want only two options are possible.

All existing row, uh, have false.

and now that is prohibited in this column.

It's perfect.

By the way, the same trick allows you to redefine primary
keys since Paul, August 11, but it's different topic.

Let's discuss it another another day.

Probably.

Michael: Awesome.

Well, thanks again, everybody for listening.

keep the feedback coming and hope you have a good week.

Nikolay: Don't forget to share us as much as possible.

We like it a lot.

Thank you.

So.

Michael: Cheers.

Bye now.

Nikolay: See you.

Some kind things our listeners have said