Postgres FM | Transcript: Partitioning by ULID

December 22, 2023 • 39 Minutes

Partitioning by ULID

Michael: Hello and welcome to
PostgresFM, a weekly show about

all things PostgreSQL.

I am Michael, founder of pgMustard.

This is my co-host Nikolay, founder
of Postgres.ai.

Hey Nikolay, what are we talking
about today?

Nikolay: Let's continue the discussion
about UUIDs.

I chose this topic because I recently
implemented something myself

and also because some of my customers
raised this again.

So I think many people still have
questions.

I had, was it a solo recording
about UUID?

Okay, it was a solo episode.

But obviously I have fresh things
to discuss and you also have

something to discuss.

So let's return to this topic and
be more practical this time

and think about really, really
large datasets and how to work

with them more efficiently.

It means partitioning.

UUID plus partitioning.

Michael: So that's what you looked
into.

Nikolay: Yep.

Well, let's talk about only partitioning.

We are not sharding experts, so
let's do only partitioning.

Michael: But I would say, actually,
I think it's important to

mention that one of the big benefits
of UUIDs in general is that

they do suit a setup
where you're generating IDs in

multiple places.

So it is a natural fit in terms
of topic.

Nikolay: So basically you ditch
sequences.

Michael: I think this lets you
avoid some of the drawbacks that

we're seeing in UUIDs, some
of the new implementations, some

of the new standards mean that
we don't have the drawbacks that

typically come up in discussions
of bigserial versus UUID or

ULID, as they're often called if
they're sortable.

So yeah, you were looking into
this.

Nikolay: Never, ever integer for
primary keys, never.

Michael: Yeah.

Hopefully, people know that now.

You were looking into it for partitioning
recently then.

So what was the use case?

Nikolay: So the use case was our
chatbot, a new AI chatbot we

developed, and we needed to have
a storage for all communication

that happens in a public channel,
because we have a paid version,

and a free version, and the free
version obviously is recorded

and shareable.

So basically, we created a chats table,
and I originally used BigInt

primary keys, and then I was ashamed
by my team members, saying

that come on, we need to use Timescale
there first of all and

also, like numbers in URL is not
looking great. We need some better

ideas to have there.

Michael: Is this because they would
be guessable, or is it a security

thing or something else?

Nikolay: Good question.

I have a long history about this.

So I remember my first social networks,
we tried to hide the

actual number of posts and users
we have.

But then a competitor was released
and they didn't care about it.

They didn't care about it.

And then I also started not
to care about it.

It's okay to have some numbers
in the URL and share actual numbers

with everyone.

But here I just think maybe it
will be maybe shared data set

in the future.

So I don't want to have sequences.

And also maybe I don't want to
share actual numbers.

I don't know.

Like it's one of the things where
you're not fully sure.

So I'm kind of okay to share numbers,
right?

So for example, we generate 5,200
messages per day with this

bot.

People are using it. We have 72 people
joined the program, many

more waiting in the waitlist.

It's not a lot at all, but obviously
it will explode soon, I

think so.

And maybe I don't want competitors
to see actual numbers.

I don't know.

50-50 here.

Because I had experience in both
cases.

I had experience sharing numbers,
I had experience hiding numbers,

and we used two prime numbers to
have not actual random, but kind

of random.

If you take a sequence and multiply
it by a very large integer and

then take a modulus with another
integer, and these two integers

should be mutually prime.

In this case, you have like a rotational,
without collisions, you

have like kind of random numbers
but they are not actually random.

And nobody knows your actual ID
number.

So this is how you can hide the
growth rates and actual number

of your, like, business metrics
and so on, like how many posts

or comments or anything you have
in your social network, for

example.

Again, it's not my main reason.

I just think, okay, we need to
use a UUID probably, and then the

team said we also need to partition
it.

We already used TimescaleDB.

Let's just use it.

And then I think, okay, obviously,
I want my indexes to behave well.

If I use a regular UUID version 4,
for example, supported by Postgres,

currently, I know we will have
performance issues because numbers

are going to very different random
places on the B-tree index.

Not purely random, but any insert
disturbs some arbitrary part

of index.

And this is not good for performance,
especially also if you

want to show last 25 items, for
example, ordered by time, creation

time.

Yeah, you will deal with a lot
more buffers than you should deal

with.

Michael: And a lot more for write-ahead
logging, right?

Because you've got full-page images
for different parts of the

index each time instead of well
B-trees are very optimized for

adding things to the to the end
to the right side continuously

like timestamps and UUIDs don't
fit that pattern at all but that

takes us on to I guess some of
the

Nikolay: newer right so I I And
since we also had a session on

PostgresFM YouTube channel with
Kirk and Andrei, I think it's

Kirk who brought this topic, let's
implement ULID in Postgres.

And using Andrei's hands, we usually
use Andrei's hands because

he's a true hacker in those sessions.

And he implemented quite quickly
some patch, sent it to hackers,

and quite soon we realized there
is already ongoing proposal

of RFC change to support version
7, version 8 of UUID, and Andrei

renamed the patch from ULID to UUID version
7, and then it was blocked

by the idea, let's wait until the
standard is finalized and fully

approved.

You read this thread.

There was also an idea, let's just
avoid naming version 7 completely.

Michael: Well, I thought that was
a really neat solution.

So, ultimately, what we want in,
well, what you want, what I

would like, I think what lots of
people would like, is a function

within Postgres to generate things
that can be put in a column

of type UUID, but that when we
generate new ones based on a timestamp,

they are sorted at the increase
over time.

So that comes with loads of benefits
for updating B-trees and

some of the other benefits that
you mentioned already.

Now there are some new definitions
of new specifications for

UUIDs coming down the pipe that
would suit those perfectly.

But somebody made a really good
point.

I think it was Andres from Microsoft.

One of the first replies that Andres
got was we could just name

the function along the lines of
what it does rather than UUID

v7.

I can't remember the exact name,
but they came up with something

very sensible, I'm sure.

And, yeah, that seemed really smart
to me.

Because then even if the spec changes
for v7, it doesn't matter.

We can implement those later or
have different functions for

those, but this one could still be
useful to users earlier, before

the spec finalizes, before everything
else gets agreed.

So I like that as a suggestion,
but I think it either got lost

or people are very happy waiting
another year.

Nikolay: Well, it's just because
it looks like RFC makes good

progress and it should be finalized.

It's still not finalized.

I've checked it a couple of days
ago.

It's not finalized.

But let's explain once again what
we want.

We want to take good things from
both worlds.

Growing integers, right?

Which if you order them actually
completely, if you have, for

example, regular approach, you
have ID and created at.

Created at is 16 bytes and ID is
8 bytes.

If you use 4 bytes, you will still
use 8 bytes because of alignment

padding, we know it.

So, and if you just insert and
created at is just now, and you

insert just 1 row per transaction,
or you have clock timestamp

instead of now, in this case you
can insert many rows.

So, the order of IDs and created
at values will be the same.

You can order by ID desc limit
25 or you can order by created

at desc limit 25, same.

And this is good.

This is good benefit, and it's
also good for performance because

of locality of data, rows created
inside the same second go probably

in 1 or 2 pages only.

Like, very, it's packed, not distributed
sparsely, right?

And UUID is good because it doesn't
require any sequences.

You can generate it using thousands
of nodes.

This is a name, UUID, universally
unique ID.

But they go into different places
of B-tree and you cannot order

by them.

You cannot order by them.

This is the main problem.

Michael: In like version 4 of the
specification.

Nikolay: Yeah, yeah, I'm talking
about current situation.

And also with RFC, it's still not
finalized, so it's also the

current situation if you don't
use drafts, right?

And this is a problem.

And we have good blog posts from
Shopify.

It's MySQL, but it's very good.

They explain the economical benefits
from switching to different

type of UUID, newer type of UUID.

And also, Husayn Nasir, sorry, I
pronounced his name wrong, on

YouTube explaining very well for
backend engineers, This Shopify

case and why a better version of
UUID is really better, he explains

it very well.

Very, very well.

Like, his explanation is, anyone
can understand this now, right?

Maybe better than I just explained,
right?

Michael: Let's link it up.

I actually pulled out a quote from
that Shopify blog post.

It said, in 1 high throughput system,
they saw a 50% decrease

in insert statement duration by
switching from UUID v4 to ULID.

Nikolay: And you can order by them.

If you want, like, 25 letters posts,
you need to order by, right?

Michael: Even if you don't need
those, even if you don't need

that, there could be significant
benefits because of

Nikolay: the B-trees.

Let's slowly move to partitioning.

If you don't have an orderable
ID and you rely on created at

column, or you can have an index on
created at, you can order by

and use an index scan on that.

Okay, It's okay.

Not perfect situation because you
have many more buffer numbers

in the plan, right?

Because you fetch from random places.

But if you want to partition such
table, in URL you have UUID,

right?

How to find which partition to
deal with?

Michael: You could do like hash
partitioning.

Nikolay: Well, yeah, yeah, yeah,
yeah.

But I want to partition by time,
I forgot to say.

I want to partition by time because
older data is used less often.

Michael: And maybe like you want
to phase it out over time.

Maybe you don't even, like, you
might want to eventually drop...

Exactly.

Nikolay: Or move to a different tablespace.

It's not super popular solution,
of course.

Also, There are some ideas to be
able to move it to object storage,

which is only supported in Timescale
Cloud.

I think one day it should be possible
with regular vanilla PostgreSQL.

Michael: I think I read about another
provider doing it recently

too.

Nikolay: Interesting.

Send me a link.

I'm very interested in learning
about it.

I think moving older partitions,
time-based partitions, like

our old data, like 2 years old,
moving to object storage, eventually

moving towards like bottomless
Postgres, right, it would be great.

But back to our question.

In URL you have blah blah blah,
in our case postgresql.org slash

some UUID.

And we need to understand which
partition to deal with.

In our case it's TimescaleDB.

How to tell TimescaleDB which
partition it is?

If it's a regular UUID, I think it's
a nightmare.

Michael: Well, I saw the update
and the reply you got from, was

it James Sewell at Timescale?

Nikolay: A couple of guys replied
from Timescale company.

I really appreciate it, but it
was about already this sortable

ULID version 7.

This is already solved.

This recipe I have in this Postgres
Marathon.

It's recorded.

I implemented it my own way.

It was not efficient.

And they just provided very good
advice how to do it much more

elegantly.

And I have it covered in my small
article.

So ULID, it's similar to UUID version
7 or 8.

ULID, it's actually it should be
UUID because it's universally

unique, lexicographically orderable
or something.

Sortable.

Michael: Yeah, probably like UUID.

But I like, I admire the people
that came up with ULID.

It's nice and simple and unique.

Nikolay: So I guess we are about
to abandon this abbreviation

at all and just use UUID version
7 or 8 because of the standard.

Michael: Or maybe in the future we'll just say UUID

because I don't, there might
not be that many benefits to

using the random ones.

Yeah, sure, you get extra uniqueness,
like you get more bits

assigned to the uniqueness part,
but we already have so many,

like I was looking up version
7 and I think the spec currently

says 48 bits for the timestamp.

So if you're thinking in terms
of bits

Nikolay: and bytes.

16 bytes.

Michael: Yeah.

So 16 bytes or 128 bits.

48 of those are reserved for the
timestamp component.

And there's a little bit about,
to specify the version, 74 bits

for the randomness.

Now naturally in a UUID v4, you
get all 128 bits for randomness.

So there is less randomness possible
in these ones.

Nikolay: There may be collisions,
right?

Michael: Or just a slightly higher
percentage chance of it, right?

Like, it's not, we're still talking
about a, you know, especially

if you're talking about in your
case, the chance of collisions

is just basically 0, which is why
these are useful.

Nikolay: Yeah, even if we will
have like 100 messages per second,

we will be fine.

Michael: Yeah, exactly.

Nikolay: And we can generate them
on any server basically.

Well, how clocks are set will be
a question of course.

Michael: This needs to be UTC,
right?

Nikolay: So this version 7 or ULID
or version 8, version 7 and

8 they distinguish in precision
only, right?

Michael: That was my understanding.

Nikolay: The idea is let's take
timestamp, let's generate regular

UUID, or something like that, and
then produce a new UUID, the

prefix of which will be, not prefix,
first bytes of which will

be corresponding to timestamp,
so they will be sortable.

If you can order by and with very,
very high confidence, you

can say, this corresponds to created
timestamp.

And actually, the interesting thing...

So, Postgres is on pause, right?

Waiting for RFC to be finalized.

I mean, I actually started to doubt
that Postgres 17 will get it.

This idea to change name and just
provide something not depending

on the RFC is good, but I guess
consensus was not, maybe not

consensus, but what I saw from
these Postgres scale hackers mailing

list, people decided to wait until
the RFC.

And it's not fully clear, I checked
the status, it's still like

waiting for some reviews, but it's
not fully clear when it will

be finalized.

So I guess, probably 17 will not
have it.

Michael: My experience with these
things is it depends if a couple

of people get excited about this
if somebody ends up needing

it, somebody commits it, if somebody
puts a patch together that

can generally be agreed on and
somebody else is willing to review

it, it could get in.

But it depends on a couple of people
at least having the energy

and

Nikolay: focus.

Well I just remember Peter Eisentrout's
comment that we should

probably wait for the standard and
Postgres is very close to some

standards, you know, like including
SQL standards.

So it's a very, very important
comment.

My impression is that we're just
waiting for the RFC.

But good news is that you can generate
it in any application

code yourself.

There are many, many, many ULID
or this ULID version 7, if RFC

is not yet finalized, they're called
already version 7, you can

do it.

And you can find on GitHub a lot
of stuff.

But you can also generate it using
PL/pgSQL.

Yeah.

And additionally, like, and Daniel
Verite, I hope I pronounce it

right, sorry guys if I pronounce names
wrong, showed some proposal how

to generate it using PL/pgSQL,
and then I asked, I'm just looking

at the code, I'm saying, we don't
need actually PL/pgSQL here,

we can use it with regular SQL
function, and he answers, yes,

just scroll down.

The same GitHub discussion, it
was GitHub gist, I see a SQL function,

it's quite simple and I just started
to use it.

So I used that function, and I
just generate UUID with 7 using

small SQL function, that's it.

But then the very big question,
do we really need a created that

timestamp?

If we have a timestamp already present
in this UUID value which

is 18 bytes, and we don't need
a very good precision, maybe we

can extract the timestamp back.

And the answer is yes.

I saw Andrei's comment in PostgreSQL
hackers saying that the authors

of the RFC proposal don't encourage
it, extraction of UUID version

7 values, extraction of timestamp.

But for partitioning, we actually
need it.

And also, we need it if we want
to just drop our created at column,

because why?

We have created that right here.

Michael: I'm not sure we do need
it for partitioning.

Like, if maybe in Timescale, the
way you define a hypertable,

but for example, if I set up a
range partition using a ULID,

And I could tell it, if I maybe,
maybe, yeah, I could, or maybe

not even just prefix, maybe exact
UUID between these 2 UUIDs.

Nikolay: Yeah, yeah, yeah, yeah,
I agree.

I agree.

Yes, I'm in the context of my particular
recipe with TimescaleDB.

Yeah.

And TimescaleDB, of course, would
like to have timestamps.

And I created some recipe which
looked quite weird, but it worked.

But then a couple of Timescale
guys provided good advice on how

to use it much better, just saying
time partitioning func option

when you create hypertable, TimescaleDB
hypertable, and that's

it.

This func function is our function
which converts UUID version

7 to timestamps.

I strongly believe this is a very
helpful function to have.

So we need to have it.

Even if RFC authors think we don't
need it, I think we need it.

I hear you, we can have ranges,
yes.

But having this function is super
helpful because I can reconstruct

timestamps even with some not perfect
precision.

Timestamps are 16 bytes.

UUID version 7 is 16 bytes.

We know not everything on those
16 bytes is related to a timestamp.

You said like how many bytes, I
don't remember.

Michael: More than half is random,
yeah.

Nikolay: Yes, so we lose precision.

But I don't need the milliseconds.

Michael: Oh, that's

Nikolay: a good point.

Michael: That's a good point.

I'm looking at the V7 spec.

I suspect V8 with more precision.

Nikolay: If you need better precision,
use V8 and have better

precision paying some extra storage
costs and memory, of course,

Michael: and buffers and

Nikolay: so on.

Michael: I don't think it's still
UUID format, right?

So I don't think you do pay those
extra precision.

But I think you get less randomness.

Nikolay: Ah, less randomness.

Well,

Michael: Fewer bits for the random jump.

Nikolay: Well, I need to look at it then.

Yes, I only played with v7 and I decided to use it, but maybe

I should look at v8 and understand the trade-offs.

Yeah,

Michael: Well, good to understand at least.

Nikolay: Yeah, yeah, interesting.

So anyway, you can take UUID v7 right now, generate it on application

on using this SQL function, quite simple, and then join this

with Timescale and have partitioning provided by TimescaleDB,

which is like fully automatic, very good.

And that's great, right?

And it's sortable and it's quite efficient.

Yes, it's 16 bytes versus 8, but for timestamps, we also had

16 bytes.

And I decided not to create created_at at all.

Michael: Yeah, you said do we still need created_at?

And then you answered yes, but I thought you meant the other

way around.

So you mean now we don't need created_at at all?

Nikolay: Yeah, we can extract it and that's it.

So I don't need super precision in my case, it's just some messages.

And I'm fine with even second precision.

So yeah, it's a good question which precision I get extracting

from this UUID version set up.

Michael: Well, there's an argument you could, if you don't care

about the spec, you could invent your own version that only goes

down to a second precision and gets even more random.

You could play with that a little bit.

Nikolay: And adjust the function and feed it to Timescale and

so on.

This recipe is already showing all internals, right?

It's pure SQL.

So we don't need to...

We can use it in any place, in any Postgres flavor, RDS, Aurora,

anywhere right now.

So that's great.

Of course, in RDS you don't have TimescaleDB, right?

But you have a new sharding of Aurora, just released like a month

ago or when.

So yeah, I don't know, like I'm excited to see us shifting from

regular numbers to these IDs, actually.

Yeah, it's good to see some numbers.

Let's provide some links to blog posts for those folks who want

to explore overhead.

Michael: Well, I think there's an interesting post by Brandur

that I'll link to.

Nikolay: Exactly, this is one of them.

Michael: Yeah.

Nice.

But they also mentioned a couple
of downsides of ULIDs, which

I hadn't considered that I think's
worth mentioning.

And well, let's talk about them
quickly.

1 is that, well, there's naturally
less randomness because we're

taking up some of it with timestamps.

We've talked about that 1 already.

But the second 1 is you could end
up producing a lopsided index.

So if you have deleted data, we
won't reuse that space, like

sequential IDs.

But in other UUID implementations,
because they're more random,

you would reuse that space.

So we've got kind of a different
bloat problem, except if we

partition, right?

If we're partitioning and if we've
got index maintenance, if

we're dropping our partitions over
time, we'll naturally get

rid of that bloat.

But I thought it was a really good
point that there are some

downsides to this always increasing
idea as well.

But they mentioned at the end of
the blog post that they expected

to go into the investigation very
much on the side of using big

serial or big ints everywhere.

And actually they ended up thinking
these ULIDs or they called

them performance aware or intelligent
UUIDs, which I thought

was quite a nice phrase.

Was their favored approach now?

Nikolay: I'm very sorry.

You said in the end of his blog
posts, and I see pictures of

the walk.

I don't know how it's related at
all.

No, no,

Michael: A bit above that.

Just at the end

Nikolay: of the title.

I've got distracted.

First of all, very good title of
the blog post, identity crisis.

So ID crisis.

But yeah, these pictures of the San
Francisco Bay Area and the walk down

to Pacifica.

I know Brando already heard 1 of
our episodes and highlighted

my phrase on Twitter when I was
saying, very good blog post,

but completely wrong.

Something like this.

Not completely, right, okay.

And I was right, actually.

I like speaking from experience.

And now I must say, you should
not put very good pictures of

101 highway.

Because it's super distracting.

Michael: Yeah, I actually remember
that quote.

I think it was I think the words
you used were something along

the lines of great post 1 conclusion.

Nikolay: Ah, yes, exactly.

Right.

And then

Michael: that must be I think there
must be a second one as well,

because I think at Crunchy Data,
credit to them, they listened

to the one where we were talking
about random page cost and how

it was still 4, but they did some
benchmarking and reduced it,

I think, to 1.1, which is a much
more SSD-friendly number.

So maybe they listened

Nikolay: to 2.0.

And also, yeah, I also learned
from this session, which was full

of, I also learned that 1.0 probably
is not better than 1.1,

which is interesting because they
had numbers.

Yeah, they had some numbers proving
that 1.1 is better than 1.0.

So this is super interesting.

Next time I touch this area, I
will pay attention to their numbers.

Michael: Nice.

Nikolay: Yeah, that's good.

Michael: Were there any other posts
or things that you've seen

that you wanted to draw people's
attention to?

Nikolay: I don't know.

There are many materials about
it.

I don't remember particular ones.

Let's just put something in the show
notes.

It's an interesting topic, I think
very important for maybe everyone.

Michael: Yeah.

There's an outdated one by Christophe
Pettis that I thought was

quite good, but then a more recent
one by them as well, suggesting

that we think about it in two steps,
the age-old UUID versus serial

question.

They suggested thinking, firstly,
should our keys be random or

should they be sequential?

That's a very good point.

And do you want them to be guessable?

Do you want to be able to infer?

Actually, we didn't talk about this,
but if you're using ULIDs

and someone can tell their ULIDs,
they also get a little bit

of extra information.

They can tell when this ID was
created.

Is that a problem for you or not?

These are the questions you need
to ask yourself.

So that's number one.

Should they be random or should
they be sequential?

Nikolay: Do you want to hide it,
right?

Michael: Yeah.

And then the second one is, should
they be 64 bits or should they be

be larger than that?

And that's a second separate question.

But I think Christoph might be

Nikolay: really good.

I want them to be shorter, not
larger.

Michael: Okay.

But sorry, bits.

Did I say bytes?

I don't remember.

Anyway, the point is how much randomness,
like how much entropy

do you need?

Nikolay: And that's collision risks,
basically.

Michael: Yeah.

But these 2 questions, they're
separate questions.

And Michael made a really good
point that often they get conflated

in these arguments between like
people that argue for UUIDs are

arguing along 1 of these questions
and people that argue for

bigints are just completely ignoring
that question and going

down a completely different question.

So I think asking both of those
questions is important before

picking your ID.

Nikolay: We could handle collisions
maybe in certain conflict

and just adjusting last bits slightly
on conflict.

Yeah, well, I'm not sure.

I'm just thinking like out of cloud.

Michael: But yeah, in terms of
likelihood of collisions, I actually

haven't done the math to how many,
like how unlikely they are.

But I think they're pretty minuscule
likelihood, at least in

most of the deployments.

Nikolay: It depends on the volumes
you're trying to ingest in

your database.

Michael: Yeah, of course.

Nikolay: Yeah.

But anyway, I think next time we
build a new system we need to

think should we use these like
int8 surrogate keys at all

Or maybe it's time for UUID version
7, version 8 with partitioning?

Actually, my response to my team
was, partitioning?

Come on, we know how to handle
a billion rows easily in 1 physical

table.

Let's just grow it until a billion.

And I said, no, no, no, let's do
it.

Like, it's so easy.

Like, let's just do it normally,
in a normal way.

Okay, okay.

We have partitioning.

We have UUID version 7.

A modern approach, you know.

Michael: Well, last question.

Last question from me.

What chunks interval did you go
for?

Nikolay: Oh, that's a good question,
I actually need to check.

Because I remember for development,
I think I used for experimentation.

I used 1-hour chunks, very small,
like tiny.

Michael: So, even smaller in your
test setup?

Nikolay: Well, maybe a minute, just
for testing, but eventually,

I think it will be, I don't know,
days or weeks, but it will

be smaller chunks.

TimescaleDB is good with small
chunks.

A lot of small chunks.

Michael: Oh really?

Nikolay: Not months.

Well, maybe months actually.

It depends, actually.

It depends.

It's a good question.

I need to reconsider it.

This I need to finalize, but we
can adjust it over time as well,

I think.

Michael: Yeah, I wouldn't be surprised
if you start off much

longer and then go smaller and
smaller.

Nikolay: Makes sense.

It depends.

If you want to test it in advance
how it works with a big number

of partitions, chunks of hypertable,
you probably want to go

smaller and find problems earlier.

It's a problem when you want to
go deeper, right?

But if you build a normal system,
of course, it makes sense I

think the rule of thumb is, it's
not about Timescale.

Rule of thumb is, except Timescale,
partitioning for all tables

which exceed 100 gigabytes, maybe
actually 10 gigabytes.

This rule of thumb, so to speak,
was raised by several folks.

I listen to them very well.

Their words matter to me in a lot
of sense.

So why 100 gigabytes?

Why 100 gigabytes?

Why?

It's just like, okay, it's some
empirical rule based on, for

example, Alexander Kukushkin said,
based on Zalando experience,

those 100 gigabytes.

But then I realized I can build
some theoretical basis and find

a better threshold.

So, theoretical basis is how many
transaction IDs consuming transactions

per second you have.

For example, 100, right?

So how fast your move consuming
exceeds.

And then how long does it take
for Autovacuum to process a single

table?

And also how long does it take
to create an index on the largest

tables.

Because when you create an index,
you hold a snapshot horizon.

If it takes many hours, you have,
for example, 5 hours to build

an index on a 5 terabyte table.

For example, it's very arbitrary
numbers, right?

During which, autovacuum cannot
delete the tuples from any table

in your cluster, in your database.

And it means that if you have a
lot of write growth during this,

you insert a lot of...

So you can start comparing these
numbers and understand, this

is...

Okay, I need not to go more than
like 100,000 writes to be spent

during index creation.

And if you split your physical
table into smaller physical tables,

partitions, or chunks in the sense
of Timescale, index creation

becomes faster.

And autovacuum is blocked for a
smaller period of time.

Michael: But a couple of other
things, like a couple of things

I've heard being used for this
are size of cache, like how many

chunks do you want to be in memory?

Like if your memory is smaller
than your most recent chunk size,

that might not be optimal for performance.

You might be better off with smaller
chunks.

Some of the more recent ones are
more likely to be in memory.

And then and then another one.

Nikolay: It's but it's very rough,
rough reasoning.

I would

Michael: say yes, but it's the
other direction, right?

It's like, one of them is encouraging
you to do it earlier and

earlier.

That one is in the same direction,
isn't it?

It's like earlier is better.

But you can go too far.

You could have too many.

Let's say you want to regularly
query the last day, but that's

going to involve querying 24 partitions.

We've already talked in the past
about problems when you query

too many relations at the same
time.

Nikolay: Yes, planning time and
execution time and lock manager

contention.

A lot of stuff happens if you have
a lot of partitions, and especially

if each partition has a lot of
indexes.

So yeah, many, many things to,
there's a trade-off obviously

here, but Timescale is very good
with a large number of partitions.

Michael: Okay, good.

Nikolay: Yeah, so we can go with
daily partitions, it's okay.

And we have cases, my colleagues
have observed some cases where

we have dozens of terabytes of
data with daily partitions, very

small, maybe not even daily, I
don't remember details, maybe

some like 8 hours partitions, ingesting
like a lot per second.

Michael: Did they change the...

Remember when we talked about the
lock manager issues with not...

When you don't have partition pruning
and lots of indexes on

each partition?

Did Timescale...

Like presumably that hard limit
is still there in Timescale as

well.

16.

Yeah.

So you say it's really good with
lots of partitions, but if you're

querying too many of them at the
same time, like if we're doing

monthly reports against 31 or however
many days, we're going

to bump into those limits, right?

Nikolay: Definitely but a monthly
report is just a single query.

The problem is when you have thousands
of such queries per second,

then they start competing and the
log manager, lightweight log

contention is happening.

If it's some not frequent...

So, second question to check is
how many QPS you have.

Michael: Yeah, yeah.

So if it's like dashboards or something.

If it's monthly reports,

Nikolay: you're probably fine.

Nobody says we cannot have more
than 15 indexes on a table, achieving

16, this fast path locking threshold.

It's okay to exceed it.

The only problem when you need
to exceed it a thousand times

per second.

Yeah.

Michael: Cool.

Right.

Any last words on this one?

Nikolay: No, I think it was good.

A lot of things to understand and
play with.

And I think I encourage folks to
look at these new versions of

UUID if not yet.

Does it matter where we generate
it, on application or on database?

Michael: I think the only risk
is time zones.

Nikolay: Clocks can be off.

Michael: Yeah, true.

If they're off by seconds, then
it will be going to different

database pages.

Yeah, it won't be

Nikolay: as bad.

And also, order will be broken.

Michael: Yeah, so I guess it matters
a bit.

Nikolay: Yeah, I prefer generated
on Postgres database side.

But interesting question, pros
and cons of database versus application

side.

Anyway, I think Postgres eventually
will have UUID version 7.

Not sure for 17, but it should
be for 18 at least.

Only a couple of years of waiting.

But we don't need to wait, and
we know now how to use both this

new type of UUID and partitioning,
TimescaleDB or any other.

Yeah, good?

Michael: Nice one.

Well, thank you, Nikolay, and catch
you next week.

Nikolay: Thank you.

Bye.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Partitioning by ULID

Creators and Guests

Some kind things our listeners have said