Partitioning by ULID
Michael: Hello and welcome to
PostgresFM, a weekly show about
all things PostgreSQL.
I am Michael, founder of pgMustard.
This is my co-host Nikolay, founder
of Postgres.ai.
Hey Nikolay, what are we talking
about today?
Nikolay: Let's continue the discussion
about UUIDs.
I chose this topic because I recently
implemented something myself
and also because some of my customers
raised this again.
So I think many people still have
questions.
I had, was it a solo recording
about UUID?
Okay, it was a solo episode.
But obviously I have fresh things
to discuss and you also have
something to discuss.
So let's return to this topic and
be more practical this time
and think about really, really
large datasets and how to work
with them more efficiently.
It means partitioning.
UUID plus partitioning.
Michael: So that's what you looked
into.
Nikolay: Yep.
Well, let's talk about only partitioning.
We are not sharding experts, so
let's do only partitioning.
Michael: But I would say, actually,
I think it's important to
mention that one of the big benefits
of UUIDs in general is that
they do suit a setup
where you're generating IDs in
multiple places.
So it is a natural fit in terms
of topic.
Nikolay: So basically you ditch
sequences.
Michael: I think this lets you
avoid some of the drawbacks that
we're seeing in UUIDs, some
of the new implementations, some
of the new standards mean that
we don't have the drawbacks that
typically come up in discussions
of bigserial versus UUID or
ULID, as they're often called if
they're sortable.
So yeah, you were looking into
this.
Nikolay: Never, ever integer for
primary keys, never.
Michael: Yeah.
Hopefully, people know that now.
You were looking into it for partitioning
recently then.
So what was the use case?
Nikolay: So the use case was our
chatbot, a new AI chatbot we
developed, and we needed to have
a storage for all communication
that happens in a public channel,
because we have a paid version,
and a free version, and the free
version obviously is recorded
and shareable.
So basically, we created a chats table,
and I originally used BigInt
primary keys, and then I was ashamed
by my team members, saying
that come on, we need to use Timescale
there first of all and
also, like numbers in URL is not
looking great. We need some better
ideas to have there.
Michael: Is this because they would
be guessable, or is it a security
thing or something else?
Nikolay: Good question.
I have a long history about this.
So I remember my first social networks,
we tried to hide the
actual number of posts and users
we have.
But then a competitor was released
and they didn't care about it.
They didn't care about it.
And then I also started not
to care about it.
It's okay to have some numbers
in the URL and share actual numbers
with everyone.
But here I just think maybe it
will be maybe shared data set
in the future.
So I don't want to have sequences.
And also maybe I don't want to
share actual numbers.
I don't know.
Like it's one of the things where
you're not fully sure.
So I'm kind of okay to share numbers,
right?
So for example, we generate 5,200
messages per day with this
bot.
People are using it. We have 72 people
joined the program, many
more waiting in the waitlist.
It's not a lot at all, but obviously
it will explode soon, I
think so.
And maybe I don't want competitors
to see actual numbers.
I don't know.
50-50 here.
Because I had experience in both
cases.
I had experience sharing numbers,
I had experience hiding numbers,
and we used two prime numbers to
have not actual random, but kind
of random.
If you take a sequence and multiply
it by a very large integer and
then take a modulus with another
integer, and these two integers
should be mutually prime.
In this case, you have like a rotational,
without collisions, you
have like kind of random numbers
but they are not actually random.
And nobody knows your actual ID
number.
So this is how you can hide the
growth rates and actual number
of your, like, business metrics
and so on, like how many posts
or comments or anything you have
in your social network, for
example.
Again, it's not my main reason.
I just think, okay, we need to
use a UUID probably, and then the
team said we also need to partition
it.
We already used TimescaleDB.
Let's just use it.
And then I think, okay, obviously,
I want my indexes to behave well.
If I use a regular UUID version 4,
for example, supported by Postgres,
currently, I know we will have
performance issues because numbers
are going to very different random
places on the B-tree index.
Not purely random, but any insert
disturbs some arbitrary part
of index.
And this is not good for performance,
especially also if you
want to show last 25 items, for
example, ordered by time, creation
time.
Yeah, you will deal with a lot
more buffers than you should deal
with.
Michael: And a lot more for write-ahead
logging, right?
Because you've got full-page images
for different parts of the
index each time instead of well
B-trees are very optimized for
adding things to the to the end
to the right side continuously
like timestamps and UUIDs don't
fit that pattern at all but that
takes us on to I guess some of
the
Nikolay: newer right so I I And
since we also had a session on
PostgresFM YouTube channel with
Kirk and Andrei, I think it's
Kirk who brought this topic, let's
implement ULID in Postgres.
And using Andrei's hands, we usually
use Andrei's hands because
he's a true hacker in those sessions.
And he implemented quite quickly
some patch, sent it to hackers,
and quite soon we realized there
is already ongoing proposal
of RFC change to support version
7, version 8 of UUID, and Andrei
renamed the patch from ULID to UUID version
7, and then it was blocked
by the idea, let's wait until the
standard is finalized and fully
approved.
You read this thread.
There was also an idea, let's just
avoid naming version 7 completely.
Michael: Well, I thought that was
a really neat solution.
So, ultimately, what we want in,
well, what you want, what I
would like, I think what lots of
people would like, is a function
within Postgres to generate things
that can be put in a column
of type UUID, but that when we
generate new ones based on a timestamp,
they are sorted at the increase
over time.
So that comes with loads of benefits
for updating B-trees and
some of the other benefits that
you mentioned already.
Now there are some new definitions
of new specifications for
UUIDs coming down the pipe that
would suit those perfectly.
But somebody made a really good
point.
I think it was Andres from Microsoft.
One of the first replies that Andres
got was we could just name
the function along the lines of
what it does rather than UUID
v7.
I can't remember the exact name,
but they came up with something
very sensible, I'm sure.
And, yeah, that seemed really smart
to me.
Because then even if the spec changes
for v7, it doesn't matter.
We can implement those later or
have different functions for
those, but this one could still be
useful to users earlier, before
the spec finalizes, before everything
else gets agreed.
So I like that as a suggestion,
but I think it either got lost
or people are very happy waiting
another year.
Nikolay: Well, it's just because
it looks like RFC makes good
progress and it should be finalized.
It's still not finalized.
I've checked it a couple of days
ago.
It's not finalized.
But let's explain once again what
we want.
We want to take good things from
both worlds.
Growing integers, right?
Which if you order them actually
completely, if you have, for
example, regular approach, you
have ID and created at.
Created at is 16 bytes and ID is
8 bytes.
If you use 4 bytes, you will still
use 8 bytes because of alignment
padding, we know it.
So, and if you just insert and
created at is just now, and you
insert just 1 row per transaction,
or you have clock timestamp
instead of now, in this case you
can insert many rows.
So, the order of IDs and created
at values will be the same.
You can order by ID desc limit
25 or you can order by created
at desc limit 25, same.
And this is good.
This is good benefit, and it's
also good for performance because
of locality of data, rows created
inside the same second go probably
in 1 or 2 pages only.
Like, very, it's packed, not distributed
sparsely, right?
And UUID is good because it doesn't
require any sequences.
You can generate it using thousands
of nodes.
This is a name, UUID, universally
unique ID.
But they go into different places
of B-tree and you cannot order
by them.
You cannot order by them.
This is the main problem.
Michael: In like version 4 of the
specification.
Nikolay: Yeah, yeah, I'm talking
about current situation.
And also with RFC, it's still not
finalized, so it's also the
current situation if you don't
use drafts, right?
And this is a problem.
And we have good blog posts from
Shopify.
It's MySQL, but it's very good.
They explain the economical benefits
from switching to different
type of UUID, newer type of UUID.
And also, Husayn Nasir, sorry, I
pronounced his name wrong, on
YouTube explaining very well for
backend engineers, This Shopify
case and why a better version of
UUID is really better, he explains
it very well.
Very, very well.
Like, his explanation is, anyone
can understand this now, right?
Maybe better than I just explained,
right?
Michael: Let's link it up.
I actually pulled out a quote from
that Shopify blog post.
It said, in 1 high throughput system,
they saw a 50% decrease
in insert statement duration by
switching from UUID v4 to ULID.
Nikolay: And you can order by them.
If you want, like, 25 letters posts,
you need to order by, right?
Michael: Even if you don't need
those, even if you don't need
that, there could be significant
benefits because of
Nikolay: the B-trees.
Let's slowly move to partitioning.
If you don't have an orderable
ID and you rely on created at
column, or you can have an index on
created at, you can order by
and use an index scan on that.
Okay, It's okay.
Not perfect situation because you
have many more buffer numbers
in the plan, right?
Because you fetch from random places.
But if you want to partition such
table, in URL you have UUID,
right?
How to find which partition to
deal with?
Michael: You could do like hash
partitioning.
Nikolay: Well, yeah, yeah, yeah,
yeah.
But I want to partition by time,
I forgot to say.
I want to partition by time because
older data is used less often.
Michael: And maybe like you want
to phase it out over time.
Maybe you don't even, like, you
might want to eventually drop...
Exactly.
Nikolay: Or move to a different tablespace.
It's not super popular solution,
of course.
Also, There are some ideas to be
able to move it to object storage,
which is only supported in Timescale
Cloud.
I think one day it should be possible
with regular vanilla PostgreSQL.
Michael: I think I read about another
provider doing it recently
too.
Nikolay: Interesting.
Send me a link.
I'm very interested in learning
about it.
I think moving older partitions,
time-based partitions, like
our old data, like 2 years old,
moving to object storage, eventually
moving towards like bottomless
Postgres, right, it would be great.
But back to our question.
In URL you have blah blah blah,
in our case postgresql.org slash
some UUID.
And we need to understand which
partition to deal with.
In our case it's TimescaleDB.
How to tell TimescaleDB which
partition it is?
If it's a regular UUID, I think it's
a nightmare.
Michael: Well, I saw the update
and the reply you got from, was
it James Sewell at Timescale?
Nikolay: A couple of guys replied
from Timescale company.
I really appreciate it, but it
was about already this sortable
ULID version 7.
This is already solved.
This recipe I have in this Postgres
Marathon.
It's recorded.
I implemented it my own way.
It was not efficient.
And they just provided very good
advice how to do it much more
elegantly.
And I have it covered in my small
article.
So ULID, it's similar to UUID version
7 or 8.
ULID, it's actually it should be
UUID because it's universally
unique, lexicographically orderable
or something.
Sortable.
Michael: Yeah, probably like UUID.
But I like, I admire the people
that came up with ULID.
It's nice and simple and unique.
Nikolay: So I guess we are about
to abandon this abbreviation
at all and just use UUID version
7 or 8 because of the standard.
Michael: Or maybe in the future we'll just say UUID
because I don't, there might
not be that many benefits to
using the random ones.
Yeah, sure, you get extra uniqueness,
like you get more bits
assigned to the uniqueness part,
but we already have so many,
like I was looking up version
7 and I think the spec currently
says 48 bits for the timestamp.
So if you're thinking in terms
of bits
Nikolay: and bytes.
16 bytes.
Michael: Yeah.
So 16 bytes or 128 bits.
48 of those are reserved for the
timestamp component.
And there's a little bit about,
to specify the version, 74 bits
for the randomness.
Now naturally in a UUID v4, you
get all 128 bits for randomness.
So there is less randomness possible
in these ones.
Nikolay: There may be collisions,
right?
Michael: Or just a slightly higher
percentage chance of it, right?
Like, it's not, we're still talking
about a, you know, especially
if you're talking about in your
case, the chance of collisions
is just basically 0, which is why
these are useful.
Nikolay: Yeah, even if we will
have like 100 messages per second,
we will be fine.
Michael: Yeah, exactly.
Nikolay: And we can generate them
on any server basically.
Well, how clocks are set will be
a question of course.
Michael: This needs to be UTC,
right?
Nikolay: So this version 7 or ULID
or version 8, version 7 and
8 they distinguish in precision
only, right?
Michael: That was my understanding.
Nikolay: The idea is let's take
timestamp, let's generate regular
UUID, or something like that, and
then produce a new UUID, the
prefix of which will be, not prefix,
first bytes of which will
be corresponding to timestamp,
so they will be sortable.
If you can order by and with very,
very high confidence, you
can say, this corresponds to created
timestamp.
And actually, the interesting thing...
So, Postgres is on pause, right?
Waiting for RFC to be finalized.
I mean, I actually started to doubt
that Postgres 17 will get it.
This idea to change name and just
provide something not depending
on the RFC is good, but I guess
consensus was not, maybe not
consensus, but what I saw from
these Postgres scale hackers mailing
list, people decided to wait until
the RFC.
And it's not fully clear, I checked
the status, it's still like
waiting for some reviews, but it's
not fully clear when it will
be finalized.
So I guess, probably 17 will not
have it.
Michael: My experience with these
things is it depends if a couple
of people get excited about this
if somebody ends up needing
it, somebody commits it, if somebody
puts a patch together that
can generally be agreed on and
somebody else is willing to review
it, it could get in.
But it depends on a couple of people
at least having the energy
and
Nikolay: focus.
Well I just remember Peter Eisentrout's
comment that we should
probably wait for the standard and
Postgres is very close to some
standards, you know, like including
SQL standards.
So it's a very, very important
comment.
My impression is that we're just
waiting for the RFC.
But good news is that you can generate
it in any application
code yourself.
There are many, many, many ULID
or this ULID version 7, if RFC
is not yet finalized, they're called
already version 7, you can
do it.
And you can find on GitHub a lot
of stuff.
But you can also generate it using
PL/pgSQL.
Yeah.
And additionally, like, and Daniel
Verite, I hope I pronounce it
right, sorry guys if I pronounce names
wrong, showed some proposal how
to generate it using PL/pgSQL,
and then I asked, I'm just looking
at the code, I'm saying, we don't
need actually PL/pgSQL here,
we can use it with regular SQL
function, and he answers, yes,
just scroll down.
The same GitHub discussion, it
was GitHub gist, I see a SQL function,
it's quite simple and I just started
to use it.
So I used that function, and I
just generate UUID with 7 using
small SQL function, that's it.
But then the very big question,
do we really need a created that
timestamp?
If we have a timestamp already present
in this UUID value which
is 18 bytes, and we don't need
a very good precision, maybe we
can extract the timestamp back.
And the answer is yes.
I saw Andrei's comment in PostgreSQL
hackers saying that the authors
of the RFC proposal don't encourage
it, extraction of UUID version
7 values, extraction of timestamp.
But for partitioning, we actually
need it.
And also, we need it if we want
to just drop our created at column,
because why?
We have created that right here.
Michael: I'm not sure we do need
it for partitioning.
Like, if maybe in Timescale, the
way you define a hypertable,
but for example, if I set up a
range partition using a ULID,
And I could tell it, if I maybe,
maybe, yeah, I could, or maybe
not even just prefix, maybe exact
UUID between these 2 UUIDs.
Nikolay: Yeah, yeah, yeah, yeah,
I agree.
I agree.
Yes, I'm in the context of my particular
recipe with TimescaleDB.
Yeah.
And TimescaleDB, of course, would
like to have timestamps.
And I created some recipe which
looked quite weird, but it worked.
But then a couple of Timescale
guys provided good advice on how
to use it much better, just saying
time partitioning func option
when you create hypertable, TimescaleDB
hypertable, and that's
it.
This func function is our function
which converts UUID version
7 to timestamps.
I strongly believe this is a very
helpful function to have.
So we need to have it.
Even if RFC authors think we don't
need it, I think we need it.
I hear you, we can have ranges,
yes.
But having this function is super
helpful because I can reconstruct
timestamps even with some not perfect
precision.
Timestamps are 16 bytes.
UUID version 7 is 16 bytes.
We know not everything on those
16 bytes is related to a timestamp.
You said like how many bytes, I
don't remember.
Michael: More than half is random,
yeah.
Nikolay: Yes, so we lose precision.
But I don't need the milliseconds.
Michael: Oh, that's
Nikolay: a good point.
Michael: That's a good point.
I'm looking at the V7 spec.
I suspect V8 with more precision.
Nikolay: If you need better precision,
use V8 and have better
precision paying some extra storage
costs and memory, of course,
Michael: and buffers and
Nikolay: so on.
Michael: I don't think it's still
UUID format, right?
So I don't think you do pay those
extra precision.
But I think you get less randomness.
Nikolay: Ah, less randomness.
Well,
Michael: Fewer bits for the random jump.
Nikolay: Well, I need to look at it then.
Yes, I only played with v7 and I decided to use it, but maybe
I should look at v8 and understand the trade-offs.
Yeah,
Michael: Well, good to understand at least.
Nikolay: Yeah, yeah, interesting.
So anyway, you can take UUID v7 right now, generate it on application
on using this SQL function, quite simple, and then join this
with Timescale and have partitioning provided by TimescaleDB,
which is like fully automatic, very good.
And that's great, right?
And it's sortable and it's quite efficient.
Yes, it's 16 bytes versus 8, but for timestamps, we also had
16 bytes.
And I decided not to create created_at at all.
Michael: Yeah, you said do we still need created_at?
And then you answered yes, but I thought you meant the other
way around.
So you mean now we don't need created_at at all?
Nikolay: Yeah, we can extract it and that's it.
So I don't need super precision in my case, it's just some messages.
And I'm fine with even second precision.
So yeah, it's a good question which precision I get extracting
from this UUID version set up.
Michael: Well, there's an argument you could, if you don't care
about the spec, you could invent your own version that only goes
down to a second precision and gets even more random.
You could play with that a little bit.
Nikolay: And adjust the function and feed it to Timescale and
so on.
This recipe is already showing all internals, right?
It's pure SQL.
So we don't need to...
We can use it in any place, in any Postgres flavor, RDS, Aurora,
anywhere right now.
So that's great.
Of course, in RDS you don't have TimescaleDB, right?
But you have a new sharding of Aurora, just released like a month
ago or when.
So yeah, I don't know, like I'm excited to see us shifting from
regular numbers to these IDs, actually.
Yeah, it's good to see some numbers.
Let's provide some links to blog posts for those folks who want
to explore overhead.
Michael: Well, I think there's an interesting post by Brandur
that I'll link to.
Nikolay: Exactly, this is one of them.
Michael: Yeah.
Nice.
But they also mentioned a couple
of downsides of ULIDs, which
I hadn't considered that I think's
worth mentioning.
And well, let's talk about them
quickly.
1 is that, well, there's naturally
less randomness because we're
taking up some of it with timestamps.
We've talked about that 1 already.
But the second 1 is you could end
up producing a lopsided index.
So if you have deleted data, we
won't reuse that space, like
sequential IDs.
But in other UUID implementations,
because they're more random,
you would reuse that space.
So we've got kind of a different
bloat problem, except if we
partition, right?
If we're partitioning and if we've
got index maintenance, if
we're dropping our partitions over
time, we'll naturally get
rid of that bloat.
But I thought it was a really good
point that there are some
downsides to this always increasing
idea as well.
But they mentioned at the end of
the blog post that they expected
to go into the investigation very
much on the side of using big
serial or big ints everywhere.
And actually they ended up thinking
these ULIDs or they called
them performance aware or intelligent
UUIDs, which I thought
was quite a nice phrase.
Was their favored approach now?
Nikolay: I'm very sorry.
You said in the end of his blog
posts, and I see pictures of
the walk.
I don't know how it's related at
all.
No, no,
Michael: A bit above that.
Just at the end
Nikolay: of the title.
I've got distracted.
First of all, very good title of
the blog post, identity crisis.
So ID crisis.
But yeah, these pictures of the San
Francisco Bay Area and the walk down
to Pacifica.
I know Brando already heard 1 of
our episodes and highlighted
my phrase on Twitter when I was
saying, very good blog post,
but completely wrong.
Something like this.
Not completely, right, okay.
And I was right, actually.
I like speaking from experience.
And now I must say, you should
not put very good pictures of
101 highway.
Because it's super distracting.
Michael: Yeah, I actually remember
that quote.
I think it was I think the words
you used were something along
the lines of great post 1 conclusion.
Nikolay: Ah, yes, exactly.
Right.
Right.
And then
Michael: that must be I think there
must be a second one as well,
because I think at Crunchy Data,
credit to them, they listened
to the one where we were talking
about random page cost and how
it was still 4, but they did some
benchmarking and reduced it,
I think, to 1.1, which is a much
more SSD-friendly number.
So maybe they listened
Nikolay: to 2.0.
And also, yeah, I also learned
from this session, which was full
of, I also learned that 1.0 probably
is not better than 1.1,
which is interesting because they
had numbers.
Yeah, they had some numbers proving
that 1.1 is better than 1.0.
So this is super interesting.
Next time I touch this area, I
will pay attention to their numbers.
Michael: Nice.
Nikolay: Yeah, that's good.
Michael: Were there any other posts
or things that you've seen
that you wanted to draw people's
attention to?
Nikolay: I don't know.
There are many materials about
it.
I don't remember particular ones.
Let's just put something in the show
notes.
It's an interesting topic, I think
very important for maybe everyone.
Michael: Yeah.
There's an outdated one by Christophe
Pettis that I thought was
quite good, but then a more recent
one by them as well, suggesting
that we think about it in two steps,
the age-old UUID versus serial
question.
They suggested thinking, firstly,
should our keys be random or
should they be sequential?
That's a very good point.
And do you want them to be guessable?
Do you want to be able to infer?
Actually, we didn't talk about this,
but if you're using ULIDs
and someone can tell their ULIDs,
they also get a little bit
of extra information.
They can tell when this ID was
created.
Is that a problem for you or not?
These are the questions you need
to ask yourself.
So that's number one.
Should they be random or should
they be sequential?
Nikolay: Do you want to hide it,
right?
Michael: Yeah.
And then the second one is, should
they be 64 bits or should they be
be larger than that?
And that's a second separate question.
But I think Christoph might be
Nikolay: really good.
I want them to be shorter, not
larger.
Michael: Okay.
But sorry, bits.
Did I say bytes?
I don't remember.
Anyway, the point is how much randomness,
like how much entropy
do you need?
Nikolay: And that's collision risks,
basically.
Michael: Yeah.
But these 2 questions, they're
separate questions.
And Michael made a really good
point that often they get conflated
in these arguments between like
people that argue for UUIDs are
arguing along 1 of these questions
and people that argue for
bigints are just completely ignoring
that question and going
down a completely different question.
So I think asking both of those
questions is important before
picking your ID.
Nikolay: We could handle collisions
maybe in certain conflict
and just adjusting last bits slightly
on conflict.
Yeah, well, I'm not sure.
I'm just thinking like out of cloud.
Michael: But yeah, in terms of
likelihood of collisions, I actually
haven't done the math to how many,
like how unlikely they are.
But I think they're pretty minuscule
likelihood, at least in
most of the deployments.
Nikolay: It depends on the volumes
you're trying to ingest in
your database.
Michael: Yeah, of course.
Nikolay: Yeah.
But anyway, I think next time we
build a new system we need to
think should we use these like
int8 surrogate keys at all
Or maybe it's time for UUID version
7, version 8 with partitioning?
Actually, my response to my team
was, partitioning?
Come on, we know how to handle
a billion rows easily in 1 physical
table.
Let's just grow it until a billion.
And I said, no, no, no, let's do
it.
Like, it's so easy.
Like, let's just do it normally,
in a normal way.
Okay, okay.
We have partitioning.
We have UUID version 7.
A modern approach, you know.
Michael: Well, last question.
Last question from me.
What chunks interval did you go
for?
Nikolay: Oh, that's a good question,
I actually need to check.
Because I remember for development,
I think I used for experimentation.
I used 1-hour chunks, very small,
like tiny.
Michael: So, even smaller in your
test setup?
Nikolay: Well, maybe a minute, just
for testing, but eventually,
I think it will be, I don't know,
days or weeks, but it will
be smaller chunks.
TimescaleDB is good with small
chunks.
A lot of small chunks.
Michael: Oh really?
Nikolay: Not months.
Well, maybe months actually.
It depends, actually.
It depends.
It's a good question.
I need to reconsider it.
This I need to finalize, but we
can adjust it over time as well,
I think.
Michael: Yeah, I wouldn't be surprised
if you start off much
longer and then go smaller and
smaller.
Nikolay: Makes sense.
It depends.
If you want to test it in advance
how it works with a big number
of partitions, chunks of hypertable,
you probably want to go
smaller and find problems earlier.
It's a problem when you want to
go deeper, right?
But if you build a normal system,
of course, it makes sense I
think the rule of thumb is, it's
not about Timescale.
Rule of thumb is, except Timescale,
partitioning for all tables
which exceed 100 gigabytes, maybe
actually 10 gigabytes.
This rule of thumb, so to speak,
was raised by several folks.
I listen to them very well.
Their words matter to me in a lot
of sense.
So why 100 gigabytes?
Why 100 gigabytes?
Why?
It's just like, okay, it's some
empirical rule based on, for
example, Alexander Kukushkin said,
based on Zalando experience,
those 100 gigabytes.
But then I realized I can build
some theoretical basis and find
a better threshold.
So, theoretical basis is how many
transaction IDs consuming transactions
per second you have.
For example, 100, right?
So how fast your move consuming
exceeds.
And then how long does it take
for Autovacuum to process a single
table?
And also how long does it take
to create an index on the largest
tables.
Because when you create an index,
you hold a snapshot horizon.
If it takes many hours, you have,
for example, 5 hours to build
an index on a 5 terabyte table.
For example, it's very arbitrary
numbers, right?
During which, autovacuum cannot
delete the tuples from any table
in your cluster, in your database.
And it means that if you have a
lot of write growth during this,
you insert a lot of...
So you can start comparing these
numbers and understand, this
is...
Okay, I need not to go more than
like 100,000 writes to be spent
during index creation.
And if you split your physical
table into smaller physical tables,
partitions, or chunks in the sense
of Timescale, index creation
becomes faster.
And autovacuum is blocked for a
smaller period of time.
Michael: But a couple of other
things, like a couple of things
I've heard being used for this
are size of cache, like how many
chunks do you want to be in memory?
Like if your memory is smaller
than your most recent chunk size,
that might not be optimal for performance.
You might be better off with smaller
chunks.
Some of the more recent ones are
more likely to be in memory.
And then and then another one.
Nikolay: It's but it's very rough,
rough reasoning.
I would
Michael: say yes, but it's the
other direction, right?
It's like, one of them is encouraging
you to do it earlier and
earlier.
That one is in the same direction,
isn't it?
It's like earlier is better.
But you can go too far.
You could have too many.
Let's say you want to regularly
query the last day, but that's
going to involve querying 24 partitions.
We've already talked in the past
about problems when you query
too many relations at the same
time.
Nikolay: Yes, planning time and
execution time and lock manager
contention.
A lot of stuff happens if you have
a lot of partitions, and especially
if each partition has a lot of
indexes.
So yeah, many, many things to,
there's a trade-off obviously
here, but Timescale is very good
with a large number of partitions.
Michael: Okay, good.
Nikolay: Yeah, so we can go with
daily partitions, it's okay.
And we have cases, my colleagues
have observed some cases where
we have dozens of terabytes of
data with daily partitions, very
small, maybe not even daily, I
don't remember details, maybe
some like 8 hours partitions, ingesting
like a lot per second.
Michael: Did they change the...
Remember when we talked about the
lock manager issues with not...
When you don't have partition pruning
and lots of indexes on
each partition?
Did Timescale...
Like presumably that hard limit
is still there in Timescale as
well.
16.
Yeah.
So you say it's really good with
lots of partitions, but if you're
querying too many of them at the
same time, like if we're doing
monthly reports against 31 or however
many days, we're going
to bump into those limits, right?
Nikolay: Definitely but a monthly
report is just a single query.
The problem is when you have thousands
of such queries per second,
then they start competing and the
log manager, lightweight log
contention is happening.
If it's some not frequent...
So, second question to check is
how many QPS you have.
Michael: Yeah, yeah.
So if it's like dashboards or something.
If it's monthly reports,
Nikolay: you're probably fine.
Nobody says we cannot have more
than 15 indexes on a table, achieving
16, this fast path locking threshold.
It's okay to exceed it.
The only problem when you need
to exceed it a thousand times
per second.
Yeah.
Michael: Cool.
Right.
Any last words on this one?
Nikolay: No, I think it was good.
A lot of things to understand and
play with.
And I think I encourage folks to
look at these new versions of
UUID if not yet.
Does it matter where we generate
it, on application or on database?
Michael: I think the only risk
is time zones.
Nikolay: Clocks can be off.
Michael: Yeah, true.
If they're off by seconds, then
it will be going to different
database pages.
Yeah, it won't be
Nikolay: as bad.
And also, order will be broken.
Michael: Yeah, so I guess it matters
a bit.
Nikolay: Yeah, I prefer generated
on Postgres database side.
But interesting question, pros
and cons of database versus application
side.
Anyway, I think Postgres eventually
will have UUID version 7.
Not sure for 17, but it should
be for 18 at least.
Only a couple of years of waiting.
But we don't need to wait, and
we know now how to use both this
new type of UUID and partitioning,
TimescaleDB or any other.
Yeah, good?
Michael: Nice one.
Well, thank you, Nikolay, and catch
you next week.
Nikolay: Thank you.
Bye.