Postgres FM | Transcript: Partitioning

April 7, 2023 • 34 Minutes

Partitioning

[00:00:00] Michael: Hello, and welcome to Postgres fm, a weekly show about all things PostgreSQL. I'm Michael, founder of PG Mustard, and this is my co-host Nikolay, founder of Postgres ai. Hey, Nikolay, what are we talking about today?

[00:00:09] Nikolay: Hi Michael. Let's talk about partitioning,

[00:00:12] Michael: Absolutely table partitioning in Postgres. we've talked about it in a few episodes, but it's great to do an episode completely on this. We've had requests for it and I'm excited to get to it. where do you want to start? Perhaps, with what partitioning is

[00:00:26] Nikolay: Of course

[00:00:27] Michael: it's about splitting a table into multiple tables. And the beauty of it is that that's transparent at

[00:00:34] Nikolay: Yeah, implicit, right?

[00:00:36] Michael: Yes, exactly.

[00:00:37] Nikolay: we can, we can split ourselves. We can say like, we have Table name. We can do table name one, table, name two, but it'll be not interesting to rewrite our application. And ING does the same, but behind the scene, right?

[00:00:51] Michael: Yeah, exactly. With lots of nice, cool features to, and an an ever improving list of features to go with that. Did you look into the history of it at all? And I mean, obviously you, you've lived the history of it,

[00:01:02] Nikolay: I lived, yeah, I lived through the history. I remember I was using Inheritance based partitioning and helped clients implement it. And this actually was my first real job in terms of focus consulting in Silicon Valley, one startup, which moved from, actually, I helped them move from Hiroku to rds. And also they had issues with large tables and obviously I, I said like, let's petition it and it, it was before post August 10. Which implemented declarity of partitioning, so quite long ago. and we did it with inheritance a lot of PR code was written.

And at that time, actually interesting part is this exactly the task when I realized that RDIs regular clones, Very good to for testing. So I had many, many cycles of testing to have fully automated procedure move old rows from non partition table to partition table in, in batches to have view on top of it to provide full transparency for application.

Everything is possible, but it took a lot of. Because in inheritance based partitioning record, maybe it's, it provides you more flexibility in some places, but it likes automation. Like you need to write a lot of code. And of course if you have cls, it's good, you can test it. But still it's a lot of work. And by that time already I think PG department existed. I'm not sure rds. But somehow I decided no. Yeah. Anyway around 2015, as we also discussed a couple of times a lot of former Oracle users or current Oracle users came to POGO community and the ecosystem, and they obviously raised the question, we could not live without declarative positioning anymore.

It's not good. Let's do it. And declarative positioning was implemented. I think it. Pocus 10, right? First version appeared,

But then in each version, a lot of new features related to partition were, added. It means that if you own quite old pocus and you are considering petitioning or.

To use it more often, obviously first thing I would do to consider major upgrade to the latest version possible because it would provide much more features, benefits in this area.

[00:03:24] Michael: Yeah. In fact, you mentioned it right at the beginning of that, but I think it's worth going back to the, the benefit here is that large tables can get difficult to manage. So there's, there's

[00:03:37] Nikolay: Yeah.

[00:03:37] Michael: become very,

[00:03:38] Nikolay: Sorry for interrupting. I just realized since you asked about history the inheritance based partitioning existed for a long, and inheritance is a good concept. That's from the era when, like objectional database concept was popular. Late nineties, early 2000. But declarative positioning is much easier to use. And now of course, I, I would not use inheritance based petitioning at all. Expect very, some rare cases, but Let's talk about why do we need it at all? What do you think?

[00:04:11] Michael: Yeah, so I mean, I, bump into different reasons that people want to do this, but the main one I see is maintenance. So easier and more paralyzable maintenance ability to, so that includes things like indexing, re-indexing, vacuum, analyze probably a bunch of things I'm forgetting there. But in addition to that, if you have, data that needs to be deleted, so old data or a certain client, like if you, you might want to partition based on how you want to Yes.

So partition the, we have the ability with partitions to. Drop to detach the petition and drop the entire table and not have to not have to worry about the, the deletes and the, the impact of all of those. So

[00:04:59] Nikolay: Or the truncate pa, particular partition. And, and that's it because truncate is much faster than delete. And then vacuuming because delete doesn't physically remove anything. It just marks topples as deleted and. And it's it's heavy operation if you have millions, millions of row.

[00:05:16] Michael: Yeah, exactly. So did I think you still detached first, right before truncating, but I'm,

[00:05:21] Nikolay: Well, nobody prevents you from truing particular partition. Why not? it's a good overview, but I would like to emphasize that most people think, many people, not most, I, I don't have statistics in hands, but, but usually people from perspective, they think partitioning is needed for query, perform. It's so, but not so like There are direct and indirect benefits for query performance if you use partitioning. Direct benefits is, of course, indexes are smaller. And faster. Good. But we all know that battery three height grows very slowly. So you need to have like, jumping from 1 million rows to 1 billion rows.

It's not, it's just a few hops in terms of industry reval. But indirect, benefits are much more in. And let's talk about them. you mentioned manageability, . Maintenance of partitions. Exactly. Like if you need to vacuum very large table, it'll take hours. Sometimes half a day, sometimes like, almost like a day.

I saw very depends on, on your system, on your power of your disks and so on. But it can. Quite long, but it, cannot be paralyzed in terms of hip and to stable vacuuming. Right? so, it's single-threaded process and you cannot speed it up. Even if you adjusted cost limit and cost delay for auto vacuum workers, so they, go full speed.

You don't care about negative effects. It's still slow because it's single thread. If you have partition table, if you have, for example, 10 partitions, Hundred partitions, you can benefit from many, many workers to, to do much more work faster. Right? Then it's question about your disc limits situational risks and so on, but, you have flexibility.

You can tune it already. So for indexes, it's better. But as I remember you auto vacuum also for one table. It processes all indexes. Using single worker only manual vacuum can process multiple indexes In modern Postgres, newer versions, it can be paralyzed, right? So, Partitioning is much better for vacuuming.

It's, straightforward because paralyzation and I see due to defaults uh, we have only three workers by default usually, and it's so sad to see three workers on systems with 96 cores. I do it all the time. Somehow people think auto vacuum is good. if, if you have so big systems, so many VCPUs or course consider adjusting number of workers.

Earlier because it requires restart, right? Especially if you are thinking about partitioning, these things come together. If you are planning to use partitioning, you definitely need to reconsider your auto vacuum tuning settings and have more workers. I usually recommend up to 30% of VCPUs to set, number of workers, max, a number of workers for auto, and also like increasing other, other things. so this is number one thing, but close to it. Also in, you mentioned index maintenance. Again, building an index on a very large table might, take an hour or more, sometimes hours, like eight hours. I saw it. Also, it's not fun at all, especially it's not fun. Unlike regular vacuum index build or index.

Concurrently or not concurrently, doesn't matter. It holds X horizon.

[00:08:53] Michael: Yeah, good

[00:08:53] Nikolay: we discussed it a couple of times. The, there was attempt to fix it in nine, in, in Postgres, 14.0. It was reverted in 14.4. All versions between 14 0 1 23. Are considered dangerous because they can lead to inex corruption. So if you are 14 pogs 14 user, you should be at least at POGS 14.4.

it was very great optimization not to hold the extreme horizon when you build or rebuild index, but unfortunately it was rev reverted. I don't know the plans when it'll be, Back. What, but what does it mean? Holding a smith horizon, it means that while you are building or rebuilding your index, uh, there is ame horizon.

It means that there is some number in terms of transaction Id exceed transaction id. Right. Corresponding to the moment when, when you started to build the index and all topples, which became dead after. It cannot be deleted by auto vacuum, or regular vacuum because six min horizon is in the past.

So while you are building index on very large table, vacuuming is less efficient and it cannot delete many to taps. If you enabled auto vacuum logging, you can see some dead taps are found but cannot be deleted. This is exactly because Xin Horizon is in the past for them. and this a, it's, this is a problem leading Tolo, right?

Higher blood that we could have.

[00:10:19] Michael: Higher bloat that can lead. You mentioned second order effects or, or indirect implications. So we have bloat causing slower query performance, but, but also Vacuum getting behind can lead to things like the visibility not getting updated and then that can lead to plans no longer being able to do index only, uh, true index only scans.

So that can impact query performance as well. So like there are these other effects as well. so, being able to paralyze, vacuum and have more workers for doing vacuum on each partition is a, it can be a benefit from that perspective too.

[00:10:52] Nikolay: Right, so one of the things, if we have partitions and we don't, like, there is some rule, some people follow, don't allow tables to. Above hundred gigabytes if you have hundred gigabytes or you need petitioning already, so, right. If we have smaller tables our auto vacuum workers are. Feeling much better removing garbage much faster.

That towels garbage collection works more efficiently and this affects query performance. Definitely. It's very hard and tricky to measure. Like sometimes people say, well, okay, how bad is that? Well, we can take some experiments and, and show degradation of query, some query performance, but it, it's very depends on nature of data and workload.

But I already feel it's, it would be possible to develop some metric, for example, understanding the exceed growth rates. Like you are writing transactions, how many per seconds you have exceed growth rate is this. And you could have like some threshold saying or SLO sla. slo probably we don't want to allow our horizon to be these number in terms of exceed or this number, in terms of minutes or seconds behind current.

Uh, So we should not allow Xmen horizon to leg that much. from this you can go to the requirements of how big your partitions could be. Should be, right. Because like there is some meth here. It's, it's possible to develop for particular system. it's much harder to develop some methodology for arbitrary system, right?

[00:12:33] Michael: good point. In fact, that probably leads

[00:12:34] Nikolay: That's why, hun, that's why. Hundred gigabytes. Just hundred gigabytes. So good rule, right? But we, we can do better in terms of like explaining why, why this number?

[00:12:45] Michael: yeah, it's a nice, memorable number for sure. While we're on the topic of when, I guess there's also the question of how, and I think

[00:12:53] Nikolay: One second, one second. We didn't, we, we didn't finish with why.

[00:12:56] Michael: okay. Continue.

[00:12:58] Nikolay: Very big reason also is data locality.

[00:13:01] Michael: Yeah. Okay. Yep.

[00:13:02] Nikolay: it's huge reason, especially for data like time series data or time decay data, which has like, these are new roast, these are fresh. And been accessed more frequently and these roles are quite old, maybe sometimes soon they will be not needed at all.

We can, we will consider pruning It can be regular time serious, like measurements of temperature or something like me, metrics, or it can be also like social media comments and posts and, and so on. tend, fresh data tend to have usage pattern, like to be more frequently accessed.

And we would probably like to have fresh records in our shell buffers and operation system, cash and so on, and all the records. Only few of them, and, and we could, could allow like higher latency, a little bit higher latency for all data probably. It depends of course, but if you think about very large table, we have everything like mixed, any given page can contain new and fresh records.

[00:14:02] Michael: It's kind of like a loose clustering. Like we don't have index organized tables in Postgres, but it's almost like a, it's not an enforced clustering, in terms of strict, but it's kind of a loose clustering if you think about the time. Series example, things aren't guaranteed to be in order within a partition, but each partition will be kept separate.

[00:14:23] Nikolay: Yeah. Yeah. So, yeah, actually, if you, for example, consider indexes you, you have an index on table. You have, for example, some something like date, date, creation date, and you think, okay, it would be good to partition by day, days, or months or weeks. You could, you could have all your queries dealing with filtering on. Somehow, and then you could instead of partitioning on table, you could just split indexes to 10 or a hundred or thousand indexes partial indexes like partition at index level. And it would work well. If the only reason for partitioning was query performance direct benefit from smaller index size, but as we just discussed around bigger reasons than just smaller index size.

yeah, so you need to split tables definitely, Maybe one more reason, like data locality is very important because if you've considered, like you, you have, I dunno, like a lot of hip apple updates inside pages and so on. But it, it, it's not good to mix fresh and old row in one page all the time, like mixing in a huge table because you need to allocate more memory to. For example, okay, I want to all my fresh data to be in the buffer pool, for example, closer. Application. In this case, you need all pages right of this table. But if you start splitting by date somehow, like, okay, fresh partitions have fresh data, all partitions have all data for fresh shows, you, have much fewer pages.

Right? I'm talking about cash efficiency here. Right. And in this case also you mentioned all visible on all frozen these, these bits for each page. It's a visibility map. It's very good to have all data separated in old partitions because updates or list happen very infrequently there, or inserts also.

And so we, Ottawa can, can skip Many more pages. Keeping them all frozen, all visible all the time, right? it's also about vacuum efficiency, but different vacuum, not deleting that apples, but keeping these visibility map up to date. And this is super beneficial for performance query performance because visibility maps, like we can see for index only scans hip fet. In the plan less. Hip fetches means vacuum is legging to process, like index only can work. But hipp, FES means like we need to consult hip because this page is not in marked all visible, so we need to check. It. But if it's all partition, most, most of its pages are all visible or frozen. Great. So index, only scan is better.

How to measure it. Also, interesting question for, for particular system for, again, more, it's not trivial, but doable, but for arbitrary, arbitrary system to find some metrics when you need positioning. So you need positioning hundred, gigabytes hundred gigabytes. Go with position. And, more workers for oca. I'm thinking about it for many years. And these, these reasons are not well discussed or discussed in, very dry manner. My main reason is indexing, and second reason is index locality only then smaller index size and so on. And.

[00:17:49] Michael: Nice. I agree. And I, but I, I also echo what you said, and I see it brought up most commonly when it comes to query performance, and I think that's one of the least good reasons for doing it. But yeah, there's lot, lots of good reasons and I'm glad they, they have net positives there as well. There are also, of course, some downsides in terms of query performance, like we see, for example, the planner has to do some partition pruning, right?

And like it excessive numbers of partitions. Well, it is getting better in later versions of Postgres for sure, but at, at the, at the thousands of partitions level, you can start to see some issues

[00:18:26] Nikolay: planner planning time increases a lot. Let me again, like, sorry, I, I'm, I, echo is good word. I have echo in, in my mind. I need to like finalize my thought about why I put indexing and re indexing on the first place. In terms of reasons of partitioning, Like in heavily loaded systems, we need index maintenance.

Always like it should be planned, properly planned. Without index maintenance, you will have degradation over time, a hundred percent. It's not only about how pogo ccs is organized, it's for other systems. It's also saw indexes need to be rebuilt. Quite often, well since POS is 14 less often due to like a lot of bere jet application in 13 and 14, but still index maintenance is must have as like day two DBA operation.

And this means if you need to rebuild indexes and you have huge stables, you will have bloat just because you are doing this quite slow. Holding horizon. So index maintenance. We have had an episode about it, right? Means also like do index maintenance, but also have fast auto vacuum processing.

It can be done with partitioning this like index maintenance, vacuum and partitioning. These three components should be should play. this is why I put it on the first place. I think it's, underestimated reason not well discussed in, in, in positive ecosystem and blocks and so on.

[00:19:52] Michael: I think something we, we probably have to get to is how like people might be wondering how to pick a partition key, and I think that comes, like, that's an, well, it's interesting, right? Like in terms of

have you seen it done badly or like what's your, like what, what do you generally see people doing?

[00:20:08] Nikolay: There are some pains in paulus's partitioning. One of the pains that all pri uh, primary key and all unique keys, unique indexes must include partitioning key. I remember, I, I already mentioned this time when I implemented using inheritance. I implemented parti partitioning on Samia system, migrated from Heroku. I remember the moment when I said, oh, we need to index this. We had some position key. I even don't remember. Was it like range or list or hash? I think, not hash, but probably range. I don't remember. And I said we need to index it as well and like we, if we index it exactly like described in partitions, like with this expression, for example, date right from timestamp.

And I, I was told, I, I, I kidding. It's already partition by this. We don't need it. Right? Posts planner already knows how to find proper partition and indexing on this value. It's the same. So partitioning works like some index, index as well, like for to locate proper partition in the plan. We, we just use these conditions and that's it.

So you don't need to have index on it, but at the same time, primary, key and unique is must have it. And this is. For example, if you have surrogate key, like ID some integer or U U I D U U L U L I D, anything version five, six, I don't know anything. Now you consider your huge table to be partitioned by range using timestamp.

For example, created ad. Right now you need to redefine your primary key to have both ID column and created ad. And this is super annoying because now you think, oh, what about foreign keys? Which reference to this primary key, they also need to be rebuilt. When you redefine primary key, it's also not an easy job.

You need to think about if you need to do it zero downtime. Also, funny thing, I had recent experience with partitioning and I realized if you have multi terabyte. And your management tells you that you can allocate up to one hour of downtime. It's interesting situation because you, you, you tell them back, I need either 12 hours of downtime or zero. If we have 12 down hours downtime, I will design my changes in block and manner, not online manner. So alter table create index, because create index is like roughly two times faster than create index concurrently. But if you tell me I have only one hour, it's nothing for me. I will go with online operations and I just need more time to develop and.

So this is funny thing, like you cannot be in the middle between huge chunk of downtime and zero. one hour is not good. Okay? Zero downtime means we need to do some preparations, and one of preparations is to redefine primary care and unique. all unique is you might have multiple ones to include partitioning key, how to find you.

You asked actually different thing, how to find proper partitioning key. It's tricky. Very tricky question. And also gra granularity how come we discuss time-based partitioning and didn't mention timescale,

right? Because if.

[00:23:24] Michael: Well, we haven't discussed how you create parti. Like I think one of the main benefits of the way timescale do things is it's auto Exactly. Everything's done for you. so we've already talked about partitioning being transparent to the user, but timescale will almost make it transparent to you as the developer as well.

It's so good.

[00:23:43] Nikolay: Is there for you to use.

[00:23:45] Michael: yeah,

[00:23:45] Nikolay: And also com compressing old partitions with like 50 x compression. It's super cool. This was what regular Postgre selects, but if you are on rds, you cannot use timescale because a WS and timescale company don't have contract for.

[00:24:01] Michael: Well, but PG Parman is like a, that you cut there are uh,

[00:24:05] Nikolay: Okay, but sorry. But it doesn't provide you compression and it doesn't uh, automate in many cases. For example, if you, I'm not sure about rds, but on cloud sql, Google you can read it. Okay, we have PG department, but we don't have scheduler. This like background worker or something. So automation should be handled like you.

To use PG or some other Chrome jobs or something. So it's not full automation, it's just some functions and, and I can write myself these functions with my logic.

[00:24:35] Michael: But this is a big thing that can go wrong with partitioning, right? Like when I've seen, when I've seen blog posts that are, that mention big issues and they've mentioned partitioning, it tends to be a new partition failed to create. And that has that naturally, that's. That's a downtime issue. If you get to the point that you should be, let's say it's time-based and you get to the new month or whatever you're petitioning by and that that petition doesn't exist, you can't accept inserts all of a sudden.

[00:25:01] Nikolay: it's better to create petitions in advance a little bit, right? And, and to have some code that checks everything and some monitoring in place. So definitely when we talk like declarative, partitioning hides a lot of details. From the user automates many things, but still a lot of work needs to be done.

A lot of work, for example, exactly. Partition creation or detach and destroy it. you also need to do it yourself or use timescale if you can't afford it. If you have, such preference.

[00:25:34] Michael: I was gonna raise something else. I've, I was speaking to somebody recently who's considering partition getting to the point where they, they think it could be useful and I've, I've thought of a different way of thinking about your partition key, and that might be how you would like to delete data.

Eventually if you want to. the time-based stuff's really good because a lot of people will want to delete old data eventually, but that might not be the way you want to delete data in the future. Like, for example, this person, Customer based might make sense because when a customer cancels they want to delete all of that data and it, and that by that point it might make sense to think about it by a customer instead of by by time.

So it's really interesting in terms of locality as well, right? If you're running a a a SaaS app, chances are you're gonna have even if you're doing larger queries, they're probably. One account at a time. Nobody's looking at data across two accounts. It's, it's all, you know, within a single account.

So depending on how you want to access your data, but I think crucially, depending on how you might want to delete the data, might be a really interesting way of thinking about that partition queue.

[00:26:41] Nikolay: Yeah. Also, I know cases but it was from sharding experience. But but it's very close. Partitioning can be considered like a local sharding. That's to some extent. I know the cases when people chose Like shouting, partitioning key in, in MEChA implemented mechanism in such a way that if, for example, they, it was a huge system, first of all, very big one, like huge system.

And they chose mechanism to move accounts which are not actively used. To like more archived partitions and store like to let them be evicted from the Cassius and so on. But those which are quite active, we're located in like hot partitions and be, be present in the like working set of database.

It's also possible it's the same like you described, but not at the customer level, but batches of customer. Like not basises or sets, subsets of customers and the accounts and all the data associated with account is moved to some inactive. Once it becomes active, it transparently automatically moves to different participants.

It's quite interesting approach and if if it's done, if it's done in non-blocking manner and background, it's interesting. Like I, I'm not fully understanding those guys because they live at much bigger scale. It was, it was the Yandex

and their male system, and, but it was about Shing and like I remember.

They lived like a year or so with four plus pogs, both Charlotte and I I like, I was, I was asking, are you already migrated to, to pogs? And they said, let me, no, we, we use both. Let me check you. What's your email? We will check. And, and they told me, Oh, you are still in Oracle. Let me press a button. Press a button in one minute.

You are already in pocus. If something wrong tested, we, we can move back. And I love this approach because you can, for example, if you think about partitioning and you can implement it requires some time. But it's so cool if you can, can move transparently to migrate in batches to move back and forth and have real testing.

New schema, old schema, you moved 10% of your data, you continue testing it. If something is, is wrong, you know, in like within one hour you can move back and everything is on old schema. This is like very solid mature approach. Right. But it requires engineering of course. And yeah,

[00:29:08] Michael: I'm thinking about the IO overhead of moving people backwards and forwards and backwards again. But of course for the one direction it makes makes a lot of sense.

[00:29:16] Nikolay: imagine it's like partitions are have many clients, but you can move one single client. It's interesting.

You can do it in batch. It's flexible. Something, something was interesting, engineered there. And they tested it during many months checking all parameters, a lot of nodes involved. But in positioning, it's also possible when I implement, implement positioning.

I, I consider this like transparent move in batches. Of course, everything should be online, but back to the key. It's hard question. Things to consider. First of all, like data, like college, definitely. Sometimes people understanding that they have time, decay time, serious data, they still choose list-based partitioning.

They create, for example, table partitions. With ideas and list based partitioning to have better management of what is happening with partitions and so on, even though they could do it with like range partitioning using time. So it's, it depends.

[00:30:12] Michael: Help in terms of with balancing, like keeping the petitions roughly the same? Like let's say you're a growing startup each month you have more data. Yeah. So you can,

[00:30:21] Nikolay: But also, Like number one question should be your workload. You need to analyze your workload and understand like if some qu, if some query doesn't have partitioning key in, in work clouds, for example, it means that probably it'll need to scan all partitions. In some cases it's fine, but you, you should test and understand it, right?

If, if, well, if it's some slow query, Anyway, but if it benefits from some indexes from index only scan, we are can accept the trans like 10 seconds maybe it's fine. But what will happen in a couple of years? So, so anyway, with partitioning MO most often you need to adjust application.

We. Say it's transparent, but you to benefit from it, you probably need to adjust many queries, so they can definitely benefit from it. For example, if you have parti, it should be present in most queries, right? To check the plans and so on.

[00:31:15] Michael: Yeah, if not all, but yeah. That's cool.

[00:31:19] Nikolay: Mm-hmm.

and, then you like granularity, right? So day partitions or week partitions, month partitions. I if you check timescale, they have very, very, like, In many cases we have small, small chunks we call chunks, and many of them, but of course it depends on positive version. You need to check what will happen if you have thousand partitions.

If you go with, for example, day partitioning in three years, you will have roughly a thousand partitions, right? So what will happen with planning time? But if you prune. Old ones, maybe you will not reach thousand petitions. Right. But in some cases in on modern pocus, I, I'm actually not sure all productions are laing, so I don't have a good example of large system running Ocus 15 that I can touch with partitioning.

So I just learned from, uh, originals that many improvements happened, but haven't tested the many of them yet. But planning time is a, maybe a concern, definitely.

[00:32:15] Michael: I've seen a slides and a couple of talk recently from, I think Christoph Pettus at about running Postgres at scale, and I think the numbers he mentioned were in the region of. In the past, it was around under a thousand partitions that would really cause you issues. But nowadays, 25,000 seems to be like an a, a threshold that has been mentioned.

But yeah, I, again,

I don't have direct

[00:32:38] Nikolay: we, we have trade off here. We want to be below a magic number hundred gigabytes for each position, right? At the same time, we don't want to be number of positions too high to affect planning time badly. So we need to test our system and decide where is our golden, like middle ground here.

[00:32:56] Michael: And this is where the next step is, right? You, you, you've talked in the past about a hundred gigabytes partition and how, how big before you should be thinking about as sharding.

[00:33:05] Nikolay: Terabyte.

[00:33:06] Michael: Yeah. Okay.

[00:33:07] Nikolay: Well, it's also we, we can live with table 10 terabytes partitioned to hun hundred. Well thousand probably partitions and run it on one node or quite modern machine, which should be very, very good. It's, it's doable. Definitely 10 terabytes. It's not a problem for modern posts and modern hardware, but if you have 50 terabytes, you should split in one way or another.

But it's another, another topic not partitioning already. Well, we didn't touch many, many, many, many, many, many topics, but I hope some interesting stuff we delivered today. So my number one reason is indexing and index maintenance, vacuuming as well, and data locality second one.

But number one advice is as usual, Experiment on clones. Check the plans. Experiment, experiment, experiment.

[00:33:57] Michael: Wonderful. Thank you, Nikolai. Thank you everybody, and see you

[00:34:00] Nikolay: Thank you, Michael. Thank you, Michael. Bye-bye.

[00:34:02] Michael: Bye.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Partitioning

Creators and Guests

Some kind things our listeners have said