Gapless sequences
Nikolay: Hello, hello, this is Postgres FM.
I'm Nik, PostgresAI, and as usual with me here is Michael,
pgMustard.
Hi, Michael.
Michael: Hello, Nik.
How's it going?
Nikolay: Very good.
How are you?
Michael: I am good, thank you.
Nikolay: We haven't recorded last week because I was on trip
in Oregon forest having some fun disconnected from internet mostly
Yeah, so now I return you said that we should discuss sequences
somehow, right?
Michael: Yeah, so I was looking back through our listener suggestions.
So we get, we've got a Google doc where we encourage people to
comment and add ideas for us to discuss topics.
And whenever I'm short of ideas, I love checking back through
that.
And 1 of them from quite a long time ago actually caught my eye
and it was the concept of gapless sequences and I guess this
might this might be a couple of different things but I found
it interesting both from like a theoretical point of view, but
also in terms of practical solutions, as well as being 1 of those
things that's kind of, so a sequence with gaps is 1 of those
things that catches most engineers eye.
Like if you start to run a production Postgres, you will see
occasionally an incrementing ID and then a gap in it.
And you think, what happened there?
So it's 1 of those things I think most of us have come across
at some point and been intrigued by.
So yeah, there's a few interesting causes of that.
Nikolay: Name sequence should mean it's sequential.
Why the gap?
It's unexpected.
And by the way, this episode, is it number 163 or because I missed
last week it will be number 164
Michael: Do you know what, it would be quite funny to, should
we increment the episode count?
Nikolay: Yeah, what's the number because I was telling, yeah
Michael: Either we should do yeah 164 missing 1 and then do 163
next week as like a joke, because it's like coming in too late,
or we just carry on increasing the number.
Nikolay: This is another anomaly you can observe sometimes, because
at commit time you can see like we have many users they can use
next numbers from sequence all the time and Then at commit time
the order is not, it can be different of course, right?
Michael: Yeah, I forget the name for it, but whatever it whatever
that phenomenon is but whatever is we should discuss that next
week and have the...
Nikolay: It's not serializable right so it's yeah it's not serialized
so if you have 2 transactions and
1 for example you have support
system, ticket tracking system,
and you generate ticket numbers,
you think sequentially.
1 user came, opened a ticket, but
haven't committed yet.
Another user came, opened ticket,
and that ticket has bigger
number, next 1, right?
And then committed already, and
then you committed this, you
see 1 ticket created before that
1 right but at the same time
if you generate timestamp automatically
with like created at
time created at Column with with
default clock timestamp or something
and it was it INSERT happened at
the same time when a Sequence
nextval call happened.
In that case, created at values
will have the same order as Sequence
values, like ID column values.
Right.
So there will, there will not be
a problem when you order by
those tickets but normally can
be understood oh there is a ticket
number like 10 and then number
9 visible layer because we don't
see uncommitted writes right So
it should be committed first
before it's visible to other transactions,
other sessions.
Yeah.
Yeah.
But this is different anomaly,
like gapless gaps.
It's this Anomaly is very well
known because for the sake of
performance, Sequence, this mechanism
in Postgres exists for
ages and it just gives you next
number all the time and of course
if you for example decided to Rollback
your Transaction you
lost that value.
So this is number 1 thing.
Michael: Yes, exactly.
So it's to allow for concurrent
rights, isn't it?
So if you've got, imagine like
within a microsecond, 2 of us
trying to INSERT to the same Table,
if my if I am just before
you and I get assigned the next
value in a Sequence and then
my Transaction fails and is rolled
back you've already been assigned
the next value after me so yeah
I think that's that's super interesting
So I think that's probably the
most common, in fact possibly
not, but that's the 1 I always
get see always see as the example
given as to why.
Nikolay: Yeah so for not to think
about going back to previous
values like this is your value
and it's like fire and forget
like this value is wasted and the
Sequence has shifted to new
value although you can reset it
using there's nextval and there's
setval there's currval and currval
requires a assignment first
before you can use it and then
setval.
So you can shift Sequence back
if you want, but it's global for
everyone.
And Also interesting, Sequence
is considered a Relation as well,
right?
Michael: Yeah, we discussed this
recently, didn't we?
Yeah, yeah,
Nikolay: in pg_class you see a
real kind equals, you said capital
S, right?
Michael: Capital S, and by the
way I was wrong, it's not the
only 1 with a capital there is
1 other do you know the other
1?
Nikolay: No.
Michael: Indexes on partitions.
Okay.
Capital I.
Nikolay: Okay.
Michael: So we've got 1 1 cause
already you've mentioned which
is transactions rolling back I
want to go through a bunch of
other causes, but before that should
we talk about, like why
would you even want a gapless sequence?
Like what, we've got sequences
and sequences with the odd gap
in are fine for almost all use
cases.
Should we talk a little bit about
why even bother?
Like why even discuss this?
Why is it a problem?
Nikolay: Well, expectations, I
guess, right?
You might have expectations.
Michael: So I think I've only got
a couple here.
I'm interested if other people
have seen others and but 1 I've
got is user visible IDs that you
that you want to mean something.
There was a really good blog post
on this topic by some folks
at incident.io and it was actually
old friends of mine from GoCardless
days and they wanted incident IDs
to increment by 1 for their
customers so they could refer to
an incident ID And if they've
had 3 that year, it's up to 3 and
then the fourth 1 gets assigned
incident 4 and It's not ideal if
they want to if want them to
mean something For them to miss
the odd 1 and much worse to miss
like 10 or 20 in a row
Nikolay: So they obviously have
many customers.
So it's multi-tenant system, right?
Yeah, And do they create a sequence
for each customer?
Michael: Well they did initially.
Nikolay: Okay yeah I'm asking like
because I saw this in other
systems and I remember the approach
when we have a sequence just
to support primary keys unless
we use beautiful UUID version 7.
Yeah.
Well, with some drawbacks, but
overall it's winning in my opinion
these days.
But for each customer, like the
namespace of each client ID or
organization ID, doesn't matter,
project ID, we might want to
have internal ID.
Internal ID which is local, right?
And then we shouldn't use sequences.
It's like overuse of them because
if each customer has like thousands
or millions of rows and like we
can handle it and collisions
would happen only locally for this
organizational project or
customer, right?
Which is great.
Yeah, right.
So, so yeah.
And for sequences, we just, the
only thing we care about is uniqueness,
in my opinion.
Michael: Yeah, yeah, you're right.
Uniqueness is, but that's the job
of the primary key, right?
It's also the fact they only go
up, I think.
Nikolay: Yeah, yeah, well, unless
somebody rewind, right?
Michael: Setval.
Nikolay: Setval, exactly.
So, and the capacity, like, just
forget about it because it's
an int8 always for any sequence.
I noticed some blog posts you share
with me, not this 1, different
ones, that you used int4 primary
keys.
I am very welcome this move because
these are our future clients.
Yeah.
So very good move.
Everyone please use int4 primary
key and later if you're
successful and have more money
you will pay us to fix that.
Yeah.
Michael: I like you flipping the
advice.
So wait, but you said something
interesting then.
So you said sequences are always
int8.
So even if I have an int4
primary key, the sequence behind
it is int8?
Nikolay: Sequence is an independent
object which, well, independent
relatively because there is dependency
which is also a weird
thing, like owned by, right?
It might be dependent, but it also
might belong to a column of
a table.
With alter table, alter sequence
owned by some column, right?
But overall it's just a special
mechanism, int8 always,
and just gives you a next number,
next number, That's it.
Simple.
Michael: Yeah.
So yeah, by the way, I wasn't talking
about...
So incident.io did use sequences
initially and it turned out to
be a bad idea, but all I meant
was that that's a use case for
not just monotonically increasing
IDs, but IDs that increase
by exactly 1 each time.
So that's 1 use case for like the
concept of gapless sequences.
And another 1 came up in the blog
post by Sequin that I shared
beforehand, and I'll link up in
the show notes again and that
was the concept of Cursor-based
pagination so the the idea that
you well I think it's I think it's
very similar to keyset pagination
but based on an integer only so
the idea that it would I guess
it's I guess for those it's most
important that it monotonically,
that it only increases but also
that concept of the committing
out of order becoming important.
So if we read rows that are being
inserted right now there might
be 1 that commits having started
later than a second 1 that sorry
having started earlier than a second
1 that hasn't yet committed
so we could see The example they
give is we could see IDs 1,
2, and 4, and later 3 commits But
we only saw 1, 2, and 4 at
the time of our read So if we were
paginating and got the first
set and it went up to 4 And then
we only looked for ones above
4, we've missed 3.
So that's an interesting definition
of a sequence where you don't
want there to be gaps maybe at
any point.
Nikolay: You know what I'm looking
at the documentation right
now and I think it would be great
if this thing called not sequence
but something like generator, a
number generator or something.
Because sequence it feels like
it should be sequential and gapless,
like it's just some feeling you
know.
This gives false expectations to
some people, not to everyone.
Of course the condition says getSequence
define a new sequence
generator.
So generator is a better word of
this.
And I think the condition could
be more explicit in terms of
gaps to expect.
So yeah, in my opinion, in my practice,
it happened not once
when people expected them to be
gapless somehow, I don't know.
A lot of new people are coming
to Postgres.
Michael: All of us were new once,
right?
I definitely experienced this.
I think for us, moving on to a
second cause of this, I think
the reason we were getting them
was using insert on conflict.
So it was something around having
new users that had been added
by somebody else in the team, for
example.
So the user had already been created
behind the scenes because
somebody invited them, and then
when they signed up, we were
doing an insert on conflict update
or something like that and
then so as part of that the next
file was called just in case
we needed to insert a new row but
we ended up not needing to
because it was an update instead.
So I think you can get these again
through insert on conflicts.
Nikolay: Yeah, and actually the
documentation mentions it.
Oh, cool.
It mentions it, like I think still
could be mentioned more explicitly
maybe in the beginning and so on.
And the thing is, someone might
consider sequences as not ACID,
right?
Because if rollback happens, they
don't rollback.
For the sake of performance obviously.
Michael: So it violates atomicity
does it?
Nikolay: Yes or no?
Yeah so if other things, other
writes are reverted, this change
that we advanced sequence by 1, we shifted its position, it's
not rolled back.
So our operation is only partially
rewarded if we strictly look
at it.
For the sake of performance it's
pretty clear but yeah so like
kind of not fully ACID and that's
okay it's just you need to
understand it and that's it yeah
and the most natural but I can
understand the feelings of people
who come to Postgres now and
just from the meeting they expected
it but then boom.
It's a simple thing to learn.
Michael: Another case where naming
things is hard.
Nikolay: So, yeah, for me it's
a generative number, huge capacity,
8 bytes, and it gives me a tool
to guarantee some uniqueness
when we generate numbers.
That's it.
Very performant.
Very, very.
I never think about performance
because rollback is not supported.
That's it.
Let's go and yeah, but let's talk
about again like If we really
need it, I would think do we really
need it or we can be okay
with it.
If we really need it, I think we
should go with like specific
allocation of numbers, maybe additional
ones, not primary keys,
right?
Michael: Yeah, well personally
I think this is a rare enough
need that, it's not needed by every
project I don't
Nikolay: think right
Michael: I've run plenty of projects
that have not needed this
feature so I personally think there's
not a necessity to build
it into Postgres core as a feature
as like you know a sequence
type or something but I do think
it's interesting enough like
it seems to come up from time to
time and I think there were
neat enough solutions at least
at lower scales I'm sure there
is a solution at high scale as
well but there are simple enough
solutions at lower volumes that
I think there's no necessity
I don't think for a pre-built solution
that everyone can use.
Nikolay: High-performance solution?
It's impossible because if there
is a transaction which wants
to write number 10, for example,
but it hasn't committed yet
and we want to write number next
number or also number 10, it
depends on the status of that first
transaction.
We need to wait for it, right?
It creates natural bottleneck.
Yeah.
And we, we like, I cannot see how
it can be undone, right?
Like can be done differently.
We need to wait until that transaction.
We need to serialize these rights.
And again, for me, the only trick
in terms of performance here
is to use the fact that if we have
a multi-tenant system, we
can make these collisions very
local to each project or organization
or tenant, right?
So they compute only within this
organization and other organizations
are not like, are separate in terms
of these collisions.
Michael: And ultimately, then it's
about parallelizing writes,
which I think is then sharding.
Yeah.
So if you've got the multi-tenant
system across multiple shards,
you can then scale your write throughput
So yeah, it feels to
me like another case of that probably
being the ultimate solution
Nikolay: well, if you have sharding
and distributed systems,
it's like
Michael: Across shards, I don't
mean
Nikolay: yeah locally locally
Michael: Yeah, exactly.
If you've got a tenant that's local
and you can...
Nikolay: Because if you want pure
sequential gapless number generator
for distributed systems, it's a
whole new problem to solve.
You basically need to build service
for it and so on.
But again, if you make...
So you should think about it.
Okay, we will have thousands of
writes of new rows inserted per
second, for example, soon.
What will happen?
If the collision will happen only
within boundaries of 1 tenant
or project organization, doesn't
matter.
It's not that bad, right?
They can afford inserting those
rows sequentially, 1 by 1, and
maybe within 1 transaction or some
transactions will wait, but
maybe just 1.
So maybe this will affect our parallelization
logic.
So saying, let's not deal with
multiple tenants and multiple
backends and transactions.
Let's do it in 1 transaction always.
But if we like write thousands
of rows per second and they belong
to different organizations, collisions
won't happen, right?
Because they don't compete.
So this dictates how we could build
this high performance, gapless
sequence solution.
We just should avoid collisions
between tenants for example.
That's it.
Michael: Yeah.
But we've jumped straight to the
hardest part.
Should we talk about a couple more
of the kind of times that
you might...
Nikolay: — Oh, surprises!
Yeah, so rollback is 1 thing which
can waste your precious numbers.
Another thing is, I learned about
it and I forgot, and relearned
when you sent me these blog posts,
there is a hardcoded constant
32 pre-allocate.
Actually I think there is constant and I think there is some
setting.
Maybe I'm wrong but there should be some setting.
Yeah, so which you can say I want to pre-allocate more.
Michael: Oh, I didn't come across that.
So we've got set log values, that's the hard-coded 1, right?
Nikolay: Yeah, maybe I'm wrong actually.
So there are pre-allocated values.
And can we control it?
No, we cannot control it, right?
32.
Ah, there is cache, right?
What is cache?
When you create a sequence, you can specify the cache parameter
as well.
Michael: Okay, so what does that control?
Nikolay: Yeah, so this controls exactly like this.
If you don't do it, it will be 32.
Michael: Oh, okay.
So it's defined on a per sequence.
Nikolay: Per sequence.
You can say I want 1000.
Pre-allocate.
Michael: What if we set it to 1?
Nikolay: Well, only 1 will be pre-allocated, right?
1 is minimum, actually.
Michael: 1 is minimum.
Nikolay: Yeah.
Actually, it's also interesting, maybe I'm wrong because there
is also...
Yeah, so I'm confused.
So the computation about this parameter says 1 is default, but
we know there is also 32 hardcoded constant.
In any way, I don't know this hardcoded constant can be associated
with 32 gap.
So when, for example, a failure happens or just you fail over,
switch over to new primary, which should be like normal thing,
right?
You change something on your replica, switch over to it.
This is when you can, you might have a gap which is described
in 1 of those articles 32.
So I'm not sure about this cache parameter, right?
So maybe if you change it, it's only cache of pre-allocated values
and that's it.
Maybe like specifying it won't lead to bigger or smaller gaps.
I'm not sure about that.
So maybe there are 2 layers of implementation here.
But based on articles, we know there are gaps of 32.
And this is just common, right?
And interestingly, this is connected to recent discussions we
had with 1 of the big customers who have a lot of databases.
And we discussed major upgrades.
And we have 0 downtime, 0 data loss, reversible upgrades solution
which multiple companies use.
And part of it is like 1 of the most fragile parts is when we
switch over.
During switchover into logical replica, we do it with basically
without downtime things to pause, resume and PgBouncer.
Also Patroni supports it.
So we pause and resume.
And between pause and resume, where small latency spikes in transaction
processing happens, we redirect PgBouncer to a new server.
And that server by default has Sequence values corresponding
to initialization, because the logical replication in Postgres
doesn't support still, there is work in progress.
I think it's close to.
It doesn't replicate values of Sequences.
So the question is how to deal with it.
There are 2 options.
First, you can synchronize Sequence values during this switcher,
but it will increase this spike.
We don't want it because we achieved a few seconds spike.
That's it.
It feels really pure 0 downtime.
And if we start synchronizing Sequences, it will be incremented.
Especially some customers had like 200,000 tables, it's insane.
But okay, if it's only 1000 tables, I thought, well, I don't
want it.
Actually, 1 of the engineers on the customer side said, you know
what, like, this set value is not too long.
If we quickly read it, quickly adjust it, maybe, okay, another
second.
And testing shows, yeah, exactly, like, changing position of
Sequences super fast, actually.
Yes, if you have hundreds of thousands of tables and Sequences,
it will be quite slow.
But it's only a few, you can do it quite quick, also can paralyze
it maybe, but it will make things more complicated.
But another solution which I like much more, we just advance
it beforehand, before switchover, with some significant gap,
Like I say, check how many you spend during a day or 2.
Millions, 10 millions, advance.
We have enough capacity for our life.
8 bytes, it's definitely enough.
So, yeah, just bump it to like 10 millions.
But then it works with, you know, your system, like 1000, 2000
tables, just 1 system, and you know, these big gaps are fine.
But when you think about very, very different projects, thousands
of clusters, you think oh maybe some of them won't be happy with
big gaps.
You know?
This is a hard problem to solve.
Michael: And if you go back in
the other direction, let's say
you want to be able to fail back
quickly, that's another gap.
So each time you bounce back and
forth.
Nikolay: Yeah.
Yeah.
Yeah.
Since our processes, our process
is fully reversible.
It's really blue green deployments.
Every time you switch, You need
to jump and we recommend jumping
big like we have big gaps and I
would say you should be fine
with it But I can imagine
Michael: why not smaller gaps.
Why not?
Like why not?
Let's say it's a two-second pause.
Nikolay: Yeah, if you know there
won't be spikes of writes right
before you switchover, well, we
can do that.
But it's like, it's just, there
are like risks increase of overlapping.
If you did it wrong, after switchover
some inserts won't work
because this sequence already used,
right?
—
Michael: With duplicate key, or
yeah.
Nikolay: —
Michael: Yeah, so...
— What would the actual errors
be?
Duplicate key violations?
Nikolay: — Yeah, so your sequence...
But yeah, it will heal itself,
right?
Thanks to the nature of sequences
which waste numbers.
Insert a unique key value, oh,
it works.
It's funny.
Yeah, anyway, I always preferred
to be on the safe side and do
big jumps But when you think about
many many clusters and things
of many people It's a different
kind of kind of problem to have
and and so I'm just highlighting
the gaps are fine.
But what about big gaps?
Yeah, you know some Sometimes it
can look not good.
In this case, We are still thinking
maybe we should just implement
2 paths, you know, and by default
we do big jump, but if somebody
is not okay with that, maybe they
would prefer a bigger spike
or bigger like maintenance window,
like, okay, well, up to 30
seconds or so yeah while we are
synchronizing those sequences
and don't allow any gaps or will
for me naturally knowing how
sequences work for years like gaps
should be normal right
Michael: yeah it's so interesting
isn't it like the trade-offs
that different people want to make.
Nikolay: You know solution to this?
Michael: Yeah.
Pardon me?
Nikolay: You know the good solution to this?
Finally start supporting sequences in logical replication, that's
it.
Michael: Yeah, that would be...
Well, yeah, and that might not be too far away, so yeah.
I think
Nikolay: so, I think so.
This work in progress lasts quite some years.
It's called logical replication of sequences or synchronization
of sequences to subscriber.
And it's already multiple iterations since 2014, I think.
And it has chances to be in Postgres 19, but it requires reviews.
It's a great, great point for you to take your code or Cursor
and ask it to compile and test and so on and then think about
edge cases, corner cases.
If you don't know C, this is a great point to provide some review.
You should be just an engineer, like writing some code, you will
understand discussion, comments, it's not that difficult.
So I encourage our listeners to participate in reviews, maybe
with a AI, but there will be still value if you consider yourself
an engineer.
You will like figure it out over time which value you can bring.
The biggest values in testing is to think about various edge
cases and corner cases as a user, as Postgres user, right?
And try to test them and AI will help you.
Michael: Yeah.
Well, I also think we do have several experienced Postgres, like
C developers listening and I think it's always a bit of a challenge
to know exactly which changes are going to be the most user beneficial
because you don't always get a representative sample on the mailing
lists.
I think sometimes like a lot of the people asking questions are
very like the beginning of their journey.
They haven't yet worked out how to look at the source code to
solve problems so you don't get some of the kind of slightly
more advanced problems always reported because people can work
around them and I think this is one of those ones that people have
just been working around for many years.
A lot of consultancies deal with this in different ways, but
it is affecting every major version.
There is friction on, so if any hackers, any experienced hackers
are also wondering like, which changes could I review that would
have the biggest user impact.
This feels like one.
Nikolay: This feels like so many wanted.
Logical replication is used more and more like bluegreen deployments
and so on.
And it's like for me in the past, if I looked at this, let's
include by the way, commitfest
entry so people could look at
it and think if they can review
and help testing.
So in the past, I would think,
okay, to test it, I need first
of all, what I need, this is about
logical replication and behavior.
I need logical replication, setting
up 2 clusters with logical
replication.
Oh, yeah, okay.
I have better things to do actually,
right?
Now you can just launch your cloud
code or Cursor and say I have
Docker installed locally on my
laptop or something.
Please launch 2 containers, different
versions maybe, create
logical replication and let's start
testing.
And then like not containers.
If containers work, now you can
say, okay, now I want some of
them are built locally from source
code and then same thing.
And you don't need to install logical
repli...
To set up logical replication yourself,
that's it.
So, yeah, so this like roadblocks
can be eliminated by AI and
then you focus only on use cases
where this thing can be broken
and this is where you can start
contributing.
You just need to be a good Postgres
user, that's it.
Michael: Yeah, nice.
Nikolay: Good.
Just to be able to distinguish
logical replica from physical
replica manually, that's the only
thing you need to know to start.
Yeah, good.
Okay.
So, are there any other cases where
we can experience gaps?
Michael: Well, I actually thought,
I only wanted to talk about
2 more things for sure.
1 is why 32?
Why do we pre-allocate these?
I think that's interesting.
And 2, what can you actually do
about, like if you, I thought
the incident, like especially at
lower volumes, there's like
some neat solutions.
That were the only last 2 things
I had on my list.
Nikolay: Well, for performance
we pre-log it, right?
Because technically it's a page,
it's like also a relation which
stores value and so on, right?
Michael: Well, I got the impression
from a comment in the source
code that it was...
So let me read it exactly.
We don't want to log each fetching
of value from a sequence so
we pre-log a few fetches in advance.
In the event of a crash we can
lose, in brackets skip over, as
many values as we pre-logged.
So I got the impression it was
to avoid spamming the WAL.
Nikolay: Yeah, it's optimization
technique, that's it.
Michael: So I could imagine a case
where you'd want to pay that
trade off the other way around,
like, and it's good to know,
as you mentioned, that you can
reduce it on a per-sequence basis.
Nikolay: I think it's different.
I think what you can reduce is
cache, but it's not the thing that
goes to WAL.
I'm not 100% sure here.
I just think you still lose 32.
But because these are 2 different
things.
1 is a hard-coded constant value,
another is dynamic control
by user.
But maybe I'm wrong again here.
It's a good question to check,
but it's nuance.
For me, a sequence is always a
gap, having gaps, that's it.
Michael: And it's okay.
So okay, the last thing was solutions.
And I thought the instant 1 was
really neat but also quite oh
it's very very simple I like simple
solutions that work for now
and we can solve later problems
later.
And it was just to do a subquery
and read the max value currently
and increment it by 1, so not using
sequences of course.
Nikolay: Yeah, no sequences, it's
just reading.
It reminds us of the episode we
had with Haki Benita, right?
And the problems.
Yes, yeah.
Get or Create or something like
this, right?
Yes.
So we need to basically, we need
to deal with other, like we
need to read the maximum value
and get plus 1, but maybe others
do the same thing in parallel.
And how to deal with performance
in this multi-concurrent environment.
Again, the clue for me is to narrow
down the scope of collisions.
That's it.
So contention would be local to...
Michael: So there's multiple options,
right?
You could just implement, I say
just, as if it's simple.
Retries are 1 option.
If you expect collisions to be
super super uncommon retries would
be a solution but I think there's
well the Sequin blog post actually
goes into a bit of depth into how
you could scale this if you
are doing tons, like a lot per
second.
So that's an interesting solution.
There's way too much code to go
into now, but I'll link that
up in the show notes.
But yeah, I did think there's like
a range of solutions like
from we have a multi-tenancy system,
like incident for example.
You're not going to be creating,
hopefully most organizations
are not going to be creating thousands
of incidents per, never
mind second, per day.
So the chance of collisions or
like issues there are so low that
it's almost a non-issue whereas
a different use case I actually
I can't think of a use case for
like needing a gapless sequence
that can insert thousands or like
thousands per second.
So I just don't see that being
a, well, I'd love to hear from
people that have seen that or have
had to deal with that and
what they did.
Nikolay: Thousands per second?
Michael: Yeah.
For an a gapless sequence like.
Yeah.
Where it's important not to have
gaps.
Nikolay: Yeah, yeah, yeah, yeah.
Yeah, because if you have a lot
of inserts, you have big numbers.
So the idea is a desire to have
gapless matters when we have
only small numbers.
Michael: I think it's more important,
right?
Nikolay: Yeah, maybe.
Also, the little 32
Michael: would disappear quickly.
Imagine a gap of 32, it would disappear
quite quickly.
If you're, if you're,
Nikolay: if it's a big number,
stop, stop paying attention to,
yeah, maybe, maybe.
Michael: And also, I don't think
computers care about gaps.
I think it's humans that care.
Yeah.
Personally, I don't know.
Nikolay: Yeah.
Well, with sequences, I remember
it was 2005-06 when we wanted
to hide actual numbers of users
and things created in our social
network.
So we used 2 prime numbers and
set default next val from sequence
multiplied by 1 big number and
then a module of the different
number.
So it was like fake random, you
know, to hide it.
I figure like you can still, if
you create some of things yourself,
you see numbers, you can quickly
understand the logic and like
it's still, you can hack it and
understand the actual growth
rates.
But it's hard to understand absolute
value for this.
You don't know how many things,
compared to like people who don't
care, they're just 1 global sequence
for all users and okay,
number of posts, this like 1 million
something.
Okay, this platform has 1 million
posts.
It gives some signals to your competitors
right
Michael: so Have you, I learned
today what that generally is
called it's called the German tank
problem have you heard this?
No It's like the maybe not the
first but like the first famous
case of this was, I think, in World
War II, the Allies were using
the numbers, like an incrementing
number found on German tanks
to find out how many they were
going through, like how many,
what their production capacity
was.
It was a useful thing in the war.
So yeah, this is older than computers.
Nikolay: Yeah, it reminds me how
the guys who from my former
country went to your country to
poison some guy and their passports
were sequential.
That's how they were tracked.
Yes.
So stupid, right?
I mean, sometimes gaps are good.
If you want to hide some things.
So if you build some system maybe
you want gaps actually.
Michael: Yeah, that's the next
episode, another different episode.
Nikolay: How to build gaps.
Michael: Gapful sequences, yeah.
Nikolay: Some random gaps so everyone
doesn't understand how
many.
Michael: Yeah, just UUIDV4, right?
Nikolay: Random jumps.
Yeah, so that's it.
I also wanted to mention sequences
have a...
Like, a sequence has a few more
parameters you can specify,
like, min value, max value, and
you can say it should be in loop.
I don't know why, I never used
it.
Cycle, It's called cycle.
So you can specify from 1 to thousand
and cycle.
Michael: So you don't, for example,
need to, it doesn't need
to be on a primary key.
So it couldn't be on a primary
key.
That 1.
Nikolay: Yeah.
I would use like just percent operator
model, just divide by
something and have the same effect.
But...
Michael: Yeah, I guess it's similar
to transaction IDs.
If you think about how transaction
IDs look.
Nikolay: Wrap around, yeah.
If you want to wrap around, go
for it.
Yeah, I'm very curious use cases
for this never never used it
yeah but increment also you can
specify jump and like only odd
numbers for example right
Michael: yeah or any positive might
be more
common.
Nikolay: We want to increment by random.
This will be our random gaps to
fool everyone.
Yeah.
Okay, good.
Enough about sequences.
Thank you for the topic.
Michael: Likewise.
Good to see you and catch you soon.