Slow count
Nikolay: Hello, hello, this is PostgresFM, podcast about Postgres,
as you can guess.
My name is Nikolay from Postgres.AI, and as usual, my co-host,
not guest, co-host is Michael from pgMustard.
Hi, Michael.
Michael: Hello, Nikolay.
Nikolay: So you chose a topic, and it is?
Michael: Slow counts, Or like why is count slow and what can
we do about it?
Nikolay: How to make counts slow?
You just count.
So yeah, let's discuss.
I don't know.
For me it's super old topic and I'm not sure I will use some
fresh knowledge here.
I have a lot of fresh knowledge about Postgres, and I share it
constantly.
But this is something which is super old.
There are improvements we can discuss, like index-only scan.
We talked about them recently a lot.
But I guess my role will be to use my knowledge from, you know,
like 15 years ago mostly.
And just before we started this recording, I shared with you
how I a little bit helped PostgREST to get rid of count, right?
Because sometimes you don't need it.
So I'm going to stay in that area 10 years ago.
And If we have something new, I will be just putting more to-do
items to my to-do list to explore and that's it.
Maybe you know some things better than me.
Let's see.
Michael: Yeah, well, the PostgREST link is great because The resource
I still like to point people to on this topic, the best thing
I've ever found on it, is a blog post by the PostgREST author,
Joe Nelson, on the Citus blog.
So I'll link that up for people that I think explains it in a
lot of depth, but in 2016.
So quite a long time ago, in 8 years at the point of recording.
And I was kind of surprised how many things have changed, but
also how little has changed in a way.
Like there are optimizations.
A lot of things that were kind of little gotchas back then aren't
true anymore.
There are a bunch of ways that it's faster now by default.
But the essence of it is still true, like because of the way
Postgres does MVCC, when we do an application.
Because it's Rowstore.
Yeah.
Basically, because all the things that make it great as like
a transactional database count against it.
And a pun not intended.
When it comes to aggregating across
a lot of data and count is,
I think, at least in my experience,
that's the most common aggregation
people want to do in like typical
use cases.
Nikolay: Yeah, that's a good point.
Since the Postgres row store, it's
better.
Imagine you have a table of users,
for example, as many SaaS
systems or e-commerce systems do,
like customers, users.
Since Postgres is mostly like OLTP,
row store, OLTP-focused database
system, it stores a tuple, a row.
If it's not too wide, it stores
it in 1 place, in 1 page, basically.
As we discussed many times, a page
is 8 kilobytes by default
in most cases, and very roughly,
if a row doesn't exceed 2 kilobytes,
it will be on the same page, and
that's it.
The mechanism called TOAST won't
be involved.
If it exceeds it, it will be chunked
to pieces and stored in an
additional table called the TOAST table.
So imagine we have a users table.
And then users can log in, so we
need to retrieve information
about this user.
And it's good because all information
about a single user is
stored on the same page, right?
If it's narrow, right?
As I said, it means that it's quick
to retrieve.
It's just 1 buffer hit or read.
Michael: Yes, extremely efficient
for like if we wanted to show
the profile of that user if we
wanted their name, their age,
you know loads of information about
them, single page read to
return that information maybe like
a couple once you look them
up in the index, then the heap
page.
Nikolay: So it's very I/O optimized
basically.
Efficient.
Efficient, exactly.
But if it comes to counting or
calculating some sum or average,
minimum, maximum, everything, aggregates.
It's not good because usually aggregates
deal with only a single
column or a few columns.
And we don't need all the other
columns this table has.
If we have a row store, as Postgres
is, in this case, it means that
we need to do much more I/O, many
more buffer operations, hits
and reads to get information for
like to calculate aggregates
for a single column.
For example, count them, right?
And it means if you want, for example,
very, very, very common
ask in startups, in social media,
in SaaS, in e-commerce, let's
just show how many new users we
have every day.
We need to count daily, right?
And this means, like, imagine if
we have only a few records in
an 8-kilobyte page, few users records,
few users tuples because
we have many columns.
It means 1 buffer operation will
allow us to count only like
4 users, 6 users, and that's it.
Compared to column store where
each column is stored in separate
files, and if this file is about
ID, it's only ID.
If it's about creation time, it's
only creation time, datetime.
In this case, counting is much
more efficient because 1 page
can bring us many more users, and
we can count them much better.
This is quite straightforward and
very simple, a trivial comparison
of ColumnStore and RowStore and
why in ColumnStore aggregates
and counts specifically are much
faster than in RowStore, just
because of IO, right?
Michael: Yes, and that before you
start to think like, I think
because column stores are used
for these aggregation and analytics
use cases so much, I think they
also have, they tend to have
more optimizations around these
aggregations as well than at
least like the "old school" transactional
processing databases
have.
I think lines started getting blurred
a little bit with some
of the extensions, like some of
the newer extensions in Postgres
world that also have some of these
optimizations.
But yeah, I think that's a great
starting point as some of Postgres'
strengths working against it.
I think another 1 that is not just
RowStore versus ColumnStore
is because of how Postgres does
visibility.
I think there's if you look at
some other transactional process
databases, because they don't do
like the same marking pages
as not because they do undo logs,
basically.
They can also cheat a little bit
for certain counts.
Like, if you want to count all
of the users, which if you have
some specific use cases, like let's
count all of the records
in this table, they can keep, they
cheat and do some of the solutions
that we're going to
Nikolay: discuss maybe at the day
to day level.
Well, they're not cheating.
They put new row version in the
same place as old row version,
it means that compactness, if we
have already compactness of
storage, it's not lost.
Compared to Postgres, where new
version comes to a new place, maybe
a different page, in many cases it's
a different page.
Maybe it's a place which was taken
by another tuple which became
dead and then vacuumed, deleted
by a vacuum, right?
Or autovacuum.
It means that we might end up having
very scattered storage for
our tuples, especially if we need
to count only specific tuples.
We have the WHERE clause in our
SELECT COUNT.
It means that we need to, or we
might have an index scan for this,
but it's kind of random access,
right?
We need to count many tuples which
are stored in random places,
and we don't use partitioning.
Our table is 1 terabyte in size.
It's super bad case.
An UPDATE randomizes the storage,
right?
The new version is stored in random
locations.
And it becomes worse and worse
and worse.
It's very scattered, and we lose
compactness.
This is how MVCC in Postgres works,
unfortunately.
We oftentimes say it's more like
rollback-focused than commit-focused.
Because in case of rollback, the new
version becomes basically dead
eventually and cleaned up, and the
old version remains in the same
place, and we don't lose compactness.
If we have a commit, we move rows
all the time.
We move tuples, the physical versions
of rows, all the time, and we
lose compactness eventually.
And this is bad for count, bad
for other aggregates as well.
Michael: Yeah, so I think that's
like a good overall summary
of why count is slow, or at least
why
Nikolay: it scales.
Michael: I think while it's not
bad on tiny datasets, it's fine,
right?
You probably won't notice it too
much, but as data scales, even
in the blog post I was reading,
they only did a dataset of a
million rows and already, even like
the fastest optimizations
at the time could only get an accurate
count in about 100, 200
milliseconds, something like that.
So we're already talking about
10x what you would consider a
fast query.
So even at a million records, which
is not many in many use cases,
we're talking about a potentially
meaningful length of time.
So yeah, I think, should we move
on to what we can do about it?
Nikolay: Let me share the history
of this blog post.
I might somehow provoke that blog
post that John Nelson wrote a little
bit, and I'm not taking full responsibility
for that blog post,
definitely.
But I remember he published it
in 2016, right?
And I started using it all the
time.
I shared, like, if you want to
understand count, just read that
blog post.
Of course, right now I would say
it's missing BUFFERS in the
EXPLAIN plans. For sure, because
as we just discussed, I always
reason number 1 of why things are
slow, right?
We should not talk about timing
only. We should talk about BUFFERS
as well all the time.
But back 1 year earlier, in 2015,
I started looking at PostgREST,
which right now is 1 of the main
building blocks
Supabase has.
And I was impressed, like, great,
we can ditch Django and Rails
applications completely in simple,
at least in simple cases,
and do everything on server-side
and front-end.
This is actually the same time
React became popular.
I think it was started in 2013,
'14, and so on.
So it skyrocketed in 2015.
And a lot of business logic went
to frontend, of course.
Also React Native.
Great.
So we need to get rid of middleware,
keep data-oriented business
logic in the database, keep UI-oriented
business logic on the
client side, and this is great.
So I found PostgREST, and I quickly
realized I came from already
quite heavily loaded systems.
I remember, for example, first
time I saw integer for primary
key exhausted, it was in my own
project in 2008.
So I already realized some data
volumes and how they work in
Postgres, And I quickly found that
in PostgREST, they had already
some pagination, very good, but
on each GET request, you say,
I want only 25 items on my page,
but they also show you how many
overall items you have for this
request with all your filters
and so on.
And I thought, well, that's not
good.
I know Postgres has slow count,
right?
If I every time they counted, so
it's written in Haskell, so
I started digging inside source
code and I found indeed it runs
count all the time, so I helped
them to get rid of it by default.
So I remember I created issue,
let's get rid of count by default,
and they quickly agreed and removed
it.
So I'm happy I helped a little
bit to realize that count should
not be used by default.
And also at the same time, Joe
Nelson, the original creator
of PostgREST, started to dig into
this topic and found several
alternative ways how to make count
fast, including approximate
method.
I like this method.
It doesn't work all the time.
It works very well if you have
autovacuum well-tuned, so it
runs auto-analyze frequently, right,
because you need fresh statistics.
And it also works if you don't
play too much with filtering.
So, for example, if you have a
users table and you know to display
overall count of rows in this table,
you can use just approximate
approach, which, in a few words,
you just run explain without
analyze, you just run explain,
select star from users, right?
And the planner will tell you how
many rows it expects.
And this is already an approximate
number you can use.
This is all trick I first learned
maybe in 2004 or 5 when I started
with Postgres.
We used it.
Michael: I think, yeah, I think
this trick comes into its own
for things that are harder to find
out from the statistics or
from other methods.
I think, like for example, if you
have dynamic filters that users
can set, or if you, basically,
if you have,
Nikolay: if you
Michael: need to be able to set
arbitrary where clauses or you
want to be able to join relations
and get estimates then, This
is an extremely powerful estimation
tool because you're basically
using Postgres's planner to do
it.
Nikolay: But you need to make sure
it's up to date most of the
time.
Michael: Well, that's not even
enough though because we've had
plenty of episodes where we've
talked about times where the plan's
estimations can be way off.
Nikolay: So yeah, just have 2 columns
and 2 filters and don't
have multi-column statistics, that's
it.
Exactly.
Michael: It's a super easy way.
Yeah, or in 2 different tables,
like, where you can't even.
So yeah, so we've talked about
the limitations of that before,
but in the real world, I've seen
very few, I mean, it's probably
a limitation of my experience rather
than because I've seen how
popular these estimation tools
are.
I've seen very few cases where
people would rather have an estimated
count than no count at all.
In lots of user facing applications,
I've seen people want an
exact count, or they don't want
to show it.
And there seems to be this...
I cannot agree.
Nikolay: I cannot agree.
For example, I...
What do you see?
Michael: Where do you see
Nikolay: it used?
Well, for example, I had this.
We have a social network, for example,
and we just want to display
the total number of registered
users and the total number of,
I don't know, like groups created
or anything created, posts,
comments on the front page.
And it's just overall number.
Definitely this approach will work
there very well.
And we rounded it.
We don't show, well, next idea
let's round it but also show some,
if we know current number and 1
week ago number so we can understand
the velocity, the speed of growth
and we can also pretend it's
growing right now.
It's like fake, but this is what
social media do all the time,
right?
Michael: But this is kind of my
point that you wouldn't use the
actual...
So the number when it's an estimate
could go down.
Like even if it's like a number
of you, like in that case, you
wouldn't want to show a number
this week that was lower than
the number last week.
Nikolay: Right.
You're right.
Michael: So there are you wouldn't
you still wouldn't use the
exact like.
So to be more specific, there's
some really cool algorithms.
There's a really popular extension
called HyperLogLog, extension
called Postgres HyperLogLog.
Yeah.
Nikolay: Count distinct.
Michael: Yes, so it's for estimating count
distinct, you're right.
So more specific than just count,
but it's available on pretty
much every, every cloud provider,
which makes me think it must
be pretty popular.
Like it must have been requested
on a bunch of these.
Nikolay: Not necessarily.
Things work in life differently.
It might be just, you know, like,
we don't know how popular it
is overall.
It might be just cool to have it
at some point.
You know, sometimes people have
high expectations.
And also in case of managed Postgres
services, in any areas of
business, any segments of market,
if some big player added it,
others also added it.
Michael: Or even if, let's say
only 1 big customer needs it or
wants it.
If they're big enough, maybe you
add it.
Nikolay: Yeah.
Also might happen.
Yeah.
Michael: So yeah, good point.
But I do think this whole category
of estimated counts is super
interesting.
And that feels like 1 big solution
that you can say, look, your
count is slow, 1 option is estimating.
And I think it's really interesting
intellectually, and I'd love
to hear from people that do use
it in production for real things.
But very often I see people say,
okay, that's cool, but what
are the other options?
What else can I do?
Nikolay: Let's finish with when
to apply this approximate approach.
I think if you need to count rows
in 1 table, it's easy, no filtering,
it's definitely an option.
But you need to make sure autovacuum
is configured to trigger,
like autovacuum analyze scale factor
is configured properly
so statistics are up to date most
of the time, I would say.
But also, if you have filters,
these filters should be like single
column, no joins.
In this case, it's good.
In some cases, it can be not good
as well.
For example, if you need a daily,
it might be off completely.
Or if your project already is old,
you have many years of data,
daily might be also challenging
because we know by default Postgres
keeps only statistics for 100 buckets.
Michael: Oh, interesting.
Nikolay: I
Michael: see where you're coming
from.
Nikolay: But we can raise it.
We can raise it for specific tables,
even specific columns.
And this can help as well in this
case.
If we know we do daily or monthly
counts, we can change this
setting and have more precise statistics,
right?
And still use an approximate approach.
Make sense?
Michael: Yeah.
I think it might even be simpler,
like it might be that most
people don't end up going this
route because they actually find
they can live with the trade-offs
of the other approaches.
Like it feels like a last resort
estimation.
If you've got a slow count, maybe
you're thinking, well, can
we make it like there are ways
of making it faster, but they
come with certain trade-offs.
So maybe we should discuss those
first and then come back at
the end to say, if you can't live
with those trade-offs, then
you have to then, and you still
want to display something or
show something.
Nikolay: For me it's vice versa
actually.
Approximate count is super easy
to implement.
I would consider it first.
Would it work well enough in our
case or we exclude it completely?
That's it.
Because it's super easy to implement.
But let's move on.
You already started talking about
HyperLogLog, right?
This is for distinct count and
this extension is indeed available
almost everywhere.
So if you have a count distinct,
why not consider it, right?
Because it might be very good,
fast, and so on.
And Joe Nelson and this old article
explains it.
So what I would do, I would consider
it in this case.
I would do some benchmarks with
current data volumes, but also
10x for example, predicting future
growth with focus on I/O.
I definitely would look at buffers
numbers in the plans and understand
how, like, what about scalability
here?
If we scale 2x, 5x, 10x, will still
I/O numbers be fine and leading
to acceptable latencies or no?
This is the question, right?
Michael: But this is, we're still
in estimation, right?
Nikolay: Yeah, but it's more like
it's smarter already, right?
Michael: Yeah, true.
Smarter than just relying on Postgres
statistics.
Right.
Yeah, true.
Nikolay: And next, like, I know
the article also just, like,
it's interesting that we discussed
the article which is 9 years
old or how many 9 right?
Well,
Michael: I actually I did as part
of my prep for this I did 1
they Conveniently they provided
like code snippets for each step
and I didn't I did a little a few
changes But it was very very
easy for me to run this on Postgres
17, actually, just because
I was doing some testing.
Nikolay: What changed?
Michael: Well, so the first thing
I noticed was his plans didn't
even have parallel.
Like, bear in mind, we're talking
about doing a count over a
million rows.
These days you just expect to see
the first query plan have a
gather and...
Nikolay: 3 workers by default,
basically.
Michael: Yeah, exactly.
And I didn't.
And I was like, wait, so his blog
post was pre-Postgres having
parallel workers.
And that was
Nikolay: the first thing.
And it means roughly it should
be by default 3 times faster,
because aggregation, like, append,
how it's called, it's easy
in this case, it's just sum of
numbers, That's it, right?
Michael: Yeah, count should parallelize
really well.
Because I don't know his hardware,
and I didn't go back and try
and get a really old version of
Postgres, I didn't actually.
Actually, I probably could have
just tested by turning off parallelization.
But yeah, probably about 3 times
faster.
Nikolay: Good.
This is a good point.
Yeah.
Michael: Yeah, that was the first
thing I noticed.
And then there were a couple of
other differences in the nitty
gritty, but nothing like fundamental,
nothing like too big.
So most of the points still stood
really well, even though, as
I said, it was like a lot's changed,
but also not that much has
changed.
So yeah, the other 1 was in the
past, you had to do a workaround
for you needed to use a subquery
trick for like count distinct
to get an index only scan.
So instead of being able to do
count distinct on column and get
an index only scan, you had to
do like a select within that.
So you had to select distinct
to get the index only scan, don't
need to anymore.
That was quite cool to see.
Nikolay: Yeah, that's a good point
as well.
Michael: But yeah, like a bunch
So actually index-only scans, we
should probably talk about that.
Nikolay: Yeah, we just had the
episode about it a month ago.
So it's very related to...
The ideal case for Postgres in
its raw store situation, it's
index-only scan with low heap fetches,
ideally 0.
Which means, again, that leads
to the need to configure autovacuum
to make it work more frequently
and move on faster.
In this case, if it's an index-only
scan, this is the best we
can do with Postgres.
Parallel index-only scan, maybe,
as you said.
And this is good.
I mean, this is the best thing
you can see in plans index-only
scan with low heap fetches.
The only better situation would
mean you need to start denormalization,
caching, and so on.
I would consider it as a heavy
solution to the problem.
Because this would give you a few
buffer operations to retrieve
your count.
If you have fully denormalized,
maybe, for example, materialized
view.
I would not recommend using materialized
view in complex projects
at all.
Default materialized view.
But consider we have it already,
and we have proper index on
it.
Finding proper count would be just
finding maybe 1 row, that's
it, 1 value.
It's like primary key lookup, right?
Or maybe just index lookup to find
1 record.
That's it.
It's super easy but to achieve
that, First of all, it requires
effort.
You need probably triggers or asynchronous
triggers somehow.
And second of all, if it's not
materialized view, you need to
have something probably with desire
to support partial refresh
because default materialized view
supports only full refresh,
which is not efficient in most
cases and can lead to bloat itself.
Michael: So I've seen people implement
versions of this and use
off the shelf things that are kind
of in Postgres extensions
for this as well.
So this is actually the that's
the approach I've seen most commonly
used for this because it doesn't
scale with volume.
I think it's really attractive
because even an index only scan,
if you double the data size, you're
going to roughly double the
time it takes to do the count.
So you're going to keep bumping
into this problem down the line.
Whereas...
Nikolay: Well, yeah, you're right.
If you double...
Yeah, even with the index-only
scan, we still need to...
If we, for example, if we forget
about structure completely,
talk about only leaf nodes, if
we have 2 times more values to
store, we need 2 times more pages,
right?
2 times more leaf nodes.
And it translates to 2 times more
I/O.
Operations.
And of course...
Michael: And even if it's not 2
times, it's still like, it still
degrades over time.
And these counter-caches...
Nikolay: LRU, not better than 2
times.
It can be even worse, but not better.
Because if you need to keep references
to 2 times more tuples,
you do need more space in leaf
nodes, right?
2 times more leaf nodes, at least,
maybe more actually, if we
have a rebalancing situation and
again, not compact storage of
tuples and so on.
Michael: Yeah.
So on the index-only scan, actually,
I think it's worth, before
we move on from that, we're thinking
about some, because I think
that goes back to the beginning
of the conversation.
You talked about rowstore versus
columnstore.
If we have an index only scan on
only a single, relatively small,
like let's say a primary key, for
example, that we can use to
count, it's ordered as well.
So it can help us with count and
count distinct.
It's almost columnstore-like in
a way, because if we only have
to search the index, we're searching
like a single column effectively.
So we're almost getting that columnstore
benefit without some
of the aggregation tricks that
some of these columnstores
Nikolay: have.
Leaf nodes are doubly linked in
both directions, so we avoid
traversing the whole tree.
We just start at the beginning of
our range, the first value, and then
we just go scan leaves only, and
that's great.
I mean, you're right.
If it's an index-only scan, we don't
care about other columns.
But interesting point that we need
to make sure we select count
star, we'll probably choose some
index, most likely primary key,
right?
But if you count some column which
allows NULLs, like we had
a very long time ago, we had an episode
about the dangers of NULLs.
And this is one of the dangers, because
count, if it's NULL, it
doesn't count.
Count doesn't count NULLs, right?
This thing might be surprising
in some cases, even for experienced
folks.
I surprise myself with this once
every couple of years at least.
Michael: I see this so often in
people talking from a computer
science perspective.
People asking, like, what's faster,
count(*) or count(1) or
count(ID)?
But I've never in practice seen
any use for anything other than
count(*).
Have you?
Nikolay: Well, in some cases I
specifically count not null values
in some column.
So I put the column there and that's
it.
In many cases I count with additional
filtering.
You know this, it's already not
new, it's like 20 years old,
but you know filter extension.
Like you say, count(*), this
is our overall count.
And then instead of doing these
very heavy structures in SQL,
like case when blah blah, case
when else, I don't like them.
So instead of that, you just say
count(*) or column filter
and then in parentheses where.
And this gives you opportunity
to count many things in 1 go,
right?
You just have
Michael: 1 scan.
Nikolay: Yeah, so it's filtering
on the fly.
If it suits your filter conditions,
then it will increment your
additional counter.
So you have multiple results in
1 select, multiple numbers returned,
right?
That's a great thing.
And in there, sometimes I use colon,
understanding that nulls
won't be counted in this approach.
For example, I want to understand
how many people fill this value,
again, user stable and we have,
for example, Facebook ID.
It's null if it's not, if Facebook
was not connected, in this
case, you say count(*) means
overall number of users registered
daily, for example, and then you
say count(Facebook ID), counting
only those people who connected
their Facebook accounts.
Makes sense, right?
Yeah.
In this case, you don't even need
filter.
So it's kind of just maybe syntax
sugar, right?
Michael: Yeah, I have seen an argument
that * is misleading
because it doesn't mean what it
means in other contexts.
And maybe we'd have been better
off if it was like count and
then parentheses but nothing in
the parentheses would have been
Nikolay: a getable.
Lev Tolstoy Danylovich I actually
agree with this.
Because if we propagate this logic
about nulls, the * should
mean none of the columns are null.
And this is always so.
Lev Tolstoy Danylovich Yeah,
Michael: but All of the columns
are not null.
Nikolay: So let's talk about this
denormalized approach.
I think this is most powerful but
requires effort.
Michael: And well, it depends.
I think there's like some really
simple approaches that have
some big trade-offs and then there's
some more complicated approaches
that have like fewer performance
trade-offs but they were more
complex to set up.
Like for example I think adding
a trigger that lets say you
know what you want to count and
you're able to maybe you're doing
a bunch of reads but not many writes
to this table and it's already
quite big.
So you're okay to pay a little
bit of overhead on each write
in order to maintain just literally
a counter.
Nikolay: Yeah, I've done it many
times.
But you know what will happen,
right?
Under load.
Okay.
Especially if you have foreign
key, as I remember, there are
issues because multixact ID and
contention.
And imagine you have count, like,
you know this popular anti-pattern
many people do in the beginning
of their career, when they try
to develop some accounting part
of their systems.
Michael: Like debit and credit.
Nikolay: Yeah, yeah.
And they have like common, how
to say, common account or something,
like counting all the money flow.
And this requires updates to single
row in some table.
Any transaction, financial transaction,
triggers this update
synchronously.
And this becomes quickly a hotspot,
obviously, because if you
have 2 SQL transactions, already
SQL transactions, doing some
financial transactions, that by
triggering, they try to update
a single row, they compete.
Like, I mean, this 1 will be blocked
until the other finishes
with commit or rollback, right?
And this doesn't scale at all.
Michael: Well, I've seen solutions
to that.
I've seen somebody, I think it
might have even been Tobias Petry,
It might have been one of his Twitter
tips.
All you do is you add more.
If you split-
Nikolay: Depends on the table,
right?
Michael: No, no.
So you have, say, 10, keep it small,
counts.
And each right triggers, updates
one of those, but you don't know
which one.
It gets random, it updates one of
them, and then when you want to count
everything,
You have to sum all 10.
Nikolay: Right.
Michael: So yeah, it removes the
hot, like it spreads the
Nikolay: The heat.
Yeah, this is one approach.
Second approach, if you, for example,
have some...
Well, it's not easy, I would say.
And batching is already a good
thing, but it's also like, okay,
you basically reduce contention,
but you don't eliminate it.
Full elimination would require
a synchronous trigger.
Michael: All I meant is it's quite
simple at low scale.
It only gets complex at higher
scales.
Nikolay: Yeah, I also did it.
Let's increment by 10 all the time
or by 100 in jumps, but it
becomes already not real-time,
it's okay.
But ideally, in heavily loaded
systems, it should be asynchronous.
So normal writes don't trigger
immediately.
I mean, they don't lead to increments
at all.
And they just register an event, something
happened, right?
Or they just are there, and we
will retrieve them by selects
and that's it.
Yeah, synchronously we will do
processing with batches as well.
Do you
Michael: know my favorite, the
favorite thing I've seen for this
is, well, other than maybe some
proprietary solutions is simple
roll-up tables.
So maintaining a count of older
stuff in bigger batches.
So you might do a yearly count, and
then a monthly count, then
a daily count, and then anything
since that day, you go through
all of the data.
But it means that over time, you
can roll things up into bigger
aggregates.
And I think that, to me, scales
really well.
Nikolay: Yes, but the main approach,
naive, simple approach,
let's do synchronous updates and
we will immediately have a hotspot
it's bad.
Then okay asynchronous approach
there are two like sub approaches.
One is pool like every minute we
pool look oh do we have some new
data If we have it in one batch,
we issue update every minute.
It's okay approach, actually.
And reduction of it, it's actually
regular, it utilizes you and
you just refresh it fully from
time to time.
It's again, it's super simple,
relatively.
And it lacks partial updates, this
is probably very annoying.
So multi-lives view is a very rough
approach, regular multi-lives
view.
If it's pg_ivm or how it's called,
it should be better.
I haven't tested it myself and
I needed to convince.
I had big plans to test Hydra,
this new thing with DuckDB, right?
pg_duckdb or something.
Yeah.
This pg_ivm.
I have so many things on my plate
to test for maybe already not
months but years.
But unfortunately, I must confess,
customers with managed Postgres,
with their own needs, occupy our
time lately most of the time.
Because we try to do things which
don't require extensions.
RDS, CloudSQL, and others don't
have.
I wanted to mention PGQ.
PGQ is present in CloudSQL, right?
It's interesting.
So CloudSQL users can use it and
benefit from it because it's
very old and very well battle-tested
solution from Skype.
So imagine if we have some things
we can register in PGQ, it
will take care of scalability because
it has partitioning inside
and so on.
And then we just, it's like push
events, push events, and then
consumers of those events, they
increment count, but also maybe
in batches and so on.
So like, so issue just 1 update
to increment by some number,
right?
And this improves things a lot
and makes the system more resilient
because lags will be lower.
It can be not PGQ, but something
like Kafka or I don't know,
Sidekiq or what you have, like
celery, if you're a Python guy,
right?
But with this asynchronous nature
is really good and nobody likes
actually delays like 1 hour because
you start with 1 minute,
you say, okay, it's acceptable,
but the system grows.
And then you say, okay, we need
to reduce frequency to like once
per 5 minutes now.
But it then grows again once per
10 minutes now.
But if you have asynchronous push,
like through an asynchronous
message queue or something like
an event processing system, In
this case, you can build a good
asynchronous kind of trigger,
right?
I wish Postgres had something internally,
and every managed service
provider supports it, so all our
customers already had something.
But right now, unfortunately, it
requires effort, and you need
to build with things you use, sometimes
outside Postgres, like
Sidekiq or Kafka or something.
Michael: Yeah, I think we've mentioned
pretty much everything
from, like, vanilla Postgres without
extensions at the moment.
But there are a bunch of other
interesting projects.
I agree.
I think the DuckDB stuff, especially
that cross-company initiative
seems super interesting, but there's
at least a couple of projects
at the moment pushing some of these
analytical queries to a column-oriented
file.
Nikolay: But we need to keep in
mind that this also brings additional
challenges.
Again, this means you need denormalization
because you have original
table as a source of truth and you
need to bring data to extra
table or something and maintain
this relationship all the time
so it's consistent and not lagging
too much, right?
Michael: Yeah, but I think the
optimizations they add might then
be worth it in a lot of cases.
Like the column store and then
the vectorization, like there
are so many benefits once you pay
that, like if you're willing
to pay the upfront cost.
Nikolay: You can also do more like,
I would say it's already
traditional architecture when you
have analytical system in addition
to Postgres.
It can be, I don't know, like it
can be even ClickHouse.
It can be Vertica if it's old or
these days Snowflake.
And we know PeerDB was recently
acquired by ClickHouse.
And it means, imagine we have our
OLTP storage with users table,
and then through logical replication,
it goes to ClickHouse,
for example, and there you can
do any accounts with good speed,
right?
Any analytical, you can move analytical
workloads, which are
maybe even user-facing to there,
right?
It's also worth considering, but
it moves us outside of Postgres,
and I don't like it personally.
Michael: To stay kind of within
Postgres, I think the work the
Citus team did was super interesting,
because I think they were
largely used for these large analytical
workloads where they
used the fact they had sharded
to multiple nodes to let you parallelize
some of these queries.
So it was, it gained from some
of that extra parallelization
And then the other company doing
some interesting stuff in the
space, I think is Timescale, who've
added some optimizations,
added some functions.
But I think...
Nikolay: Continuous aggregates
is a great thing.
Michael: Yes.
And continuous aggregates means
that they've done a lot of that
heavy lifting.
If you have access to Timescale,
if you're self-hosted or on
their cloud, then you don't have
to do all the complicated stuff.
Nikolay: And if you have not only
access to Timescale, but to
proper Timescale, because when
some company says we support Timescale,
but it's only, how is it called,
like community edition?
Michael: Apache.
Yeah, I get confused.
I think they actually community
edition is the one that you can't
use.
It's the Apache 2 one that you can
use on other clouds.
Nikolay: Yeah, some clouds support
Timescale, but only this reduced
option.
But full-fledged Timescale is a
great thing to have.
Definitely compression plus continuous
aggregates.
We have it in a couple of places,
it's so great.
And this is it, like, it's a flavor
of Postgres, which I do like,
but it's not available if you're
already deep inside RDS or so,
you need to consider migration
and so on, yeah.
Timescale Cloud or self-hosted.
Michael: Exactly.
So, yeah, and then the last thing
to bring us back to the topic
we talked about last week is they
have an implementation of loose
index scan for count distinct.
You can use your own implementation
of loose index scan
to speed up some
of these.
Nikolay: Like in
Wiki, right?
Yeah, exactly.
Nikolay: With recursive.
Michael: Yeah, so that's pretty
much everything I had.
Nikolay: Yeah.
So I think we covered the basics.
If you want to deal with billions of rows with
hundreds of thousands of transactions per second
It's not an easy topic actually.
Michael: Thanks so much Nikolay.
Catch you next week.