Postgres FM | Transcript: Read-only considerations

April 14, 2023 • 40 Minutes

Read-only considerations

[00:00:00] Michael: Hello and welcome to Postgres FM, a weekly show about all things PostgreSQL. I am Michael, founder of pgMustard. This is my cohost Nikolay, founder of Postgres ai. Hey, Nikolai, what are we talking about?

[00:00:10] Nikolay: Hi Michael. Uh, your choice, I cannot even name it, so please help me.

[00:00:15] Michael: Yeah, let me read out the request. So this one's based on a listener suggestion, so thank you so much for that one. They were asking about special considerations for databases that are used in a readonly mode. All day and then get an update at night with additional data. So they are asking us to focus on, well, so they suggested the focus therefore should be on performance during the read only times.

So yeah, I think this is fascinating. I've only seen it a couple of times myself,

[00:00:45] Nikolay: Coronavirus data, right? in Great Britain. You, you mentioned this example, so we, we like real load data nightly for example. And then people just use the frozen state of data

[00:00:56] Michael: I remember reading about that one. I'm not actually sure that strictly counts as one of these cases. I, I'm not sure, for example, if they take any downtime or any time between updates. I'm not even sure if this person is,

is assuming we have downtime or

[00:01:11] Nikolay: In the very beginning, let's agree. We don't consider any downtime as reasonable for us.

[00:01:16] Michael: Interesting. I like it.

[00:01:18] Nikolay: I just quit. If someone suggests downtime, I just quit. I don't work with them anymore at all. Like, we

[00:01:24] Michael: It makes it more interesting. It makes this more interesting

[00:01:26] Nikolay: right? Let's consider we don't allow downtime, but we consider all data changes happen during specific time period of day, or.

[00:01:35] Michael: Yep.

[00:01:35] Nikolay: Periodically and then we have frozen state of data and want to our, want our donly queries to be as fast as possible, like the fast as possible. Like this is our goal.

[00:01:46] Michael: Sounds perfect.

So yeah, kind of read only like that's how I'm thinking about the, I know it's not read only, but that's how I'm thinking

[00:01:53] Nikolay: I think we can split to two parts of this problem. First is how to make redon as fast as possible, and second, how to. Refresh or update data or change data within specified period of time. So then we have, again, the fastest possible on queries, right?

[00:02:14] Michael: Perfect. I have my notes organized exactly the same

[00:02:16] Nikolay: oh, okay. Good, good.

[00:02:18] Michael: So wh which would you like to start with? I'm thinking the reads.

[00:02:21] Nikolay: Yeah.

Yeah.

[00:02:22] Michael: I dunno about you, but I thought about this from like a perspective of. Why do rights and updates and inserts, why do inserts updates and deletes? Why do they cause problems?

And therefore, what problems don't we have and therefore, what can we do differently? So do you want to kind of go, trying to think of what the highest impact things are first or just start listing things that we can, we can do.

[00:02:45] Nikolay: Let's do it in random order and then try to summarize and find the most impactful ideas.

[00:02:53] Michael: Sounds perfect. So some of mine being performance focused myself naturally the first thing I went to was thinking about indexes. So we have in one of the biggest downsides of creating lots of indexes in general. is right overhead. So one of the things I was thinking was maybe the dial shifts a little bit towards.

We can afford a few more indexes than we would in a heavily updated system. So that, felt like maybe a slight shift when people were thinking like, what are my natural instincts? And maybe you need to go against those a little bit or, change the balance there a little bit, but not just

[00:03:34] Nikolay: So you basically prep, propose to index all the columns.

[00:03:40] Michael: It crossed my mind that there's a few things like deep in the weeds of indexes that might also be interesting. So first, firstly, may maybe more indexes make sense. Secondly, maybe more multi column index makes sense. So like index, only scans we don't end up getting, I think you, you mentioned this previously, but the heap fetcher's number, right?

[00:04:02] Nikolay: right? but these are two different things. First, let, let me comment on this here. So first of all, in my practice I rarely see. I see it, but not frequently at all, that people try to minimize the number of indexes Most often, I, I just come and see a lot of unused, redundant indexes.

We had an episode about it. This is what people usually do, like they. Do more indexes than needed already. but I agree with you definitely we could afford more indexes if we know we are not going to. Around updates rights during daytime, but I don't think in practice it'll prevent someone from creating another index in different situations.

Sometimes like we have pen table and people try to avoid indexes at all. We will just have a single index. It's, it's, it happens, but in many cases people just create a lot of indexes. Anyway, so this, I agree, but it's hard to measure. And I think the second item you mentioned, it's probably the most interesting one to reach index only scan around faster due to hip FES zero.

If you see the plant hip fetches, zero means that you are good. But the thought explaining this, probably we should remind those who are not positive experts. How VCC works in a few words, right? So every update every delete just marks your topple as dead. But then some, some transactions probably might still need it.

So, and then auto vacuum cleans those top. And marks also index entries as that as well, and also cleans them. And if we have a lot of deletes and updates not on search deletes and updates, we probably start to have blood issues in both. Table and indexes. Right. And before even considering index only scans, probably we should think about more fundamental problem like bloat and Thats right.

so what do you think, like if we say before our redon period starts, we would like to vacuum everything. and probably get rid of blo. So we repack everything. I, if we have a lot of time, this would be ideal. We get rid of any blo and we know that all that tablets already deleted for auto vacuum.

There is no work in terms of regular vacuuming, right?

[00:06:25] Michael: Yes. I think that's a really big one, actually. So because we have to keep it online. I think repacking makes sense. Pre vacuum, right? There's no point in doing a vacuum and then repacking everything Or is it am I

[00:06:37] Nikolay: Let's distinguish repacking indexes and repacking tables indexes. It's like reindex and the state will be fresh repacking tables. It's ju like vacuum full recreation. Almost. So it's again, basically repacking means you just perform something like dump, restore, or vacuum full and whole database if you repack everything and it's good.

We have fresh state, we don't have that taps and nothing to do for oto, vacuum or vacuum, except just one thing. The state of visibility maps right.

[00:07:11] Michael: Well do, will they change during, I was thinking of suggesting potentially I've missed something, but I could imagine this being one of the few exceptions to the rule of not disabling auto vacuum. I wondered if running our vacuum at the end of our just before the reed period starts

[00:07:32] Nikolay: to build visibility maps. Exactly.

[00:07:35] Michael: To update. Yes, exactly. As a, as a final step. I was actually thinking vacuum analyze because I've we might have added a lot of data or, yeah, exactly. So run, so do all of our, in fact, we've jumped to the second topic of like the, I guess the bulk loading, but it's important for that for the index only scan optimization. So the l the final step I was thinking would be a, a manual, also, not manual, but like automated vacuum analyze, not via auto vacuum.

[00:08:01] Nikolay: Explicit vacuum, right.

[00:08:02] Michael: Explicit, that's a better word for it. Yeah.

[00:08:04] Nikolay: Which is also

good in newer ous versions because it can process syn indexes in parallel. Unlike auto vacuum, auto vacuum still cannot do it. It's always processing table and all its syn indexes using just one worker. If, if you use explicit vacuum, we can do it slightly.

[00:08:21] Michael: Yeah. and the thing that ties this back to the query performance is that that final vacuum marks pages visible

[00:08:28] Nikolay: all visible

[00:08:29] Michael: Exactly. Updates the visibility map so that when Postgres is plan inquiries, and when it's doing index only scans.

Because we are not getting any new data, because we are not getting any updates, any deletes during that period, we can't have any of those marked updated.

[00:08:46] Nikolay: right. so, The first thing I would like to have is zero load, almost zero blo maybe. And zero dead tops. There is also a thing, like we can do it actually if, if for downtime we could do it to vacuum full, but we agreed that we exclude this option, so we should run PPG pack and for index it's well, all situations are different, right?

So we might see that we don't have enough time to do everything. Right. In this case, probably I would just take care of indexes, first of all. Accept some blood and that tablets, well, that tablets maybe not that blood, maybe. Yes. And also if we have partitioning we could just Play with partitions or like some rotation or something to insert new data, new partition and just like get rid of all partition.

Just drop it and that's it. Well, depends, but in general, I would like to see zero below zero the tops and visibility maps updated. And also in some cases, I don't know the particular situation, maybe some Rios are still happening. Right. And Ova does multiple, does. Things it has ma multiple goals.

Get rid of that opposites one goal to keep visibility map updated. It's another goal. And visibility map has two bits for each page, right? Not only all visible, but also all frozen. And who knows, maybe some rights will trigger to be running in transaction around. So it would be probably good to, to do vacuum analyze freeze.

to mark all frozen beats for each page in visibility maps. And finally, they collect stats, but it's analyze part for our explicit vacuum. So we have a fresh statistics in PPG statistics, right? So all these things are good. Maybe one more like as dessert, let's add one more thing on top of it, like cherry on our, cake.

P pack supports. We can reorder The hip, the table, define specific physical order of Taps we can choose a specific order, but just one, of course. Just one. Some queries will be super fast because they will deal with fewer buffers because they will be getting data in proper order. Fewer B buffers touched. They would be super fast and we can afford it if we repack

[00:10:59] Michael: Yeah, absolutely. If, if you're familiar with other databases, this is what people often call index organized tables, and it makes a lot of sense. there's a chance, like for example, the examples I've seen of these are often quite time series heavy use cases. There's a chance your data will already be ordered.

So if, it is, if you've been able to insert it in order, then inserting your new data overnight or whenever you do it in an ordered fashion is also a way of potentially getting that clustering without having to do a very heavy Long process. So yeah, inserting it, making sure it ends up in order is the, is

[00:11:38] Nikolay: Right. And since we don't have delete deletes or inserts, so we know that it won't, won't change during day. So

for the status is frozen, it's a perfect situation

[00:11:49] Michael: Yeah.

[00:11:49] Nikolay: This reminds, us brain indexes. We also had an episode. Right, right. Okay.

[00:11:53] Michael: Well, I wanted to bring up bringing indexes because I think this might be, Benefit that we have here. I think it might be that we, so I know we've talked about Brin indexes and some of the benefits that have come in, was it post goes 15 or 14?

[00:12:06] Nikolay: 14 maybe.

[00:12:07] Michael: yeah.

But one of the downsides of Brin indexes is that they degrade as they as data gets

[00:12:14] Nikolay: In our case, I don't, but also uh, they are slower than three, so probably don't need them because we, don't want to improve update performance. So I would still choose Petri again,

[00:12:26] Michael: Well, it depends, right? Like if they also are much, much, much, much smaller.

[00:12:31] Nikolay: don't, they don't spare our Cassius. Right.

[00:12:34] Michael: Exactly, if we have a huge data set and Ram Ram's expensive, right? There might be some

[00:12:41] Nikolay: Yeah. Several trade offs here. Hard to say

[00:12:45] Michael: Right. But I, all I meant is that in general, day-to-day thinking, maybe you don't think of Brin indexes that much, but if you are in this case, maybe you can think of it a little bit more as, as something that's an option to you.

The other. Index type that I wanted to bring up along those lines is another one that's expensive to up well. So it is one that's expensive to update, which is gin indexes. So because we don't have these updates I didn't, I wasn't considering that it would have to be online the whole time.

So eventually we are gonna have to update it or drop and

[00:13:15] Nikolay: It's collecting, pending, po pending list, and so, so on. Like we, we definitely can tune our access to benefit from a redundant situation. It's a good point. So, yeah.

[00:13:26] Michael: You make a good point that we could tune. well the overhead would be overnight, right? It would be while we're doing our updates, not during the day. So re performance is still good

[00:13:35] Nikolay: Right. So yeah. Yeah, definitely we can benefit from regional, state.

if I remember, so the pending list, and I quickly checked right now, fast update option. The thing is that we want everything be already in place. So we don't want to have additional lookups internally for Cellex. So we can say The speed of cellex is most, important for us.

Updates can be slow. It's fine. So we makes proper choices when we create genetics or rebuild it, and that's it.

[00:14:07] Michael: Yep.

[00:14:07] Nikolay: So fast update are off.

[00:14:09] Michael: yeah, my understanding is that that won't make all updates slow, but it'll make some updates extremely slow.

[00:14:17] Nikolay: Right, right. When, when we reach painting list is reached, like by default, if, if I remember, it's four megabytes. Yeah. It's, it's, it's not good to, to, yeah. I had issues with it in some cases. Yeah. It's not, it's not fun.

[00:14:30] Michael: we have some blog posts because in fact, actually there's one that I was gonna mention earlier by Haki Benita, that talks about loading data in assorted fashion. And there's a good one by, I think Lukas Fittl talking about the gin updates. So I'll link up both of those

[00:14:43] Nikolay: But it's too, it's too fine tuning to me still like this what we discussed. order of taps. And fully vacuum state up to date visibility maps, zero risks that auto vacuums start working at sometime. we don't want to, to be working during our daytime at all. It's perfect.

we discuss like visibility maps, All pages are marked, all visible, and it means that our index only scans will be as fast as possible because they will have hip fess zero in plants. When we look at plants, we see hip FES non-zero.

It means that index look up will already already not index only hip. Was inspected to ensure about visibility. But if our visibility map is up to date, no rights to happen to table, it means that hip features is zero. It means the index only scans will be super fast. And in this case, I would check my queries and rewrite them to use index only scales like everywhere.

Where I. Sometimes you need to use index only, scan quickly, and then you already fetch separately, like using like. With as materialized. So materiality with like plant fencing or somehow else, but you just want to find roast using index solid scans and then you can already read all columns you need from it and so on.

So I would check all the plants and ensure that I use index solid scans as much as possible. And hip FETs are indeed zero. This is super important and spoiler. I will consider this as probably the number one thing I would like to have in overall, in overall our discussion. Like hip zero and index on scans.

[00:16:26] Michael: Yeah. Awesome. There's a couple of other things in terms of queries that you can do to favor index. Only like just checking that you do need all columns that are being requested if you like. There's something especially if you're using an RM or something that can really lead. All columns being requested when actually only a few are needed for

[00:16:46] Nikolay: This is what I meant. I was meaning, yeah. Or, or if you do need these columns, okay. Include them to to have covering index and to have it in index.

[00:16:56] Michael: Yeah. Or even, just a straight up multi column index.

[00:17:00] Nikolay: So I would fight. I would fight hard to have single index on the scan in most critical queries. In this case, I know this is the best performance pauses can AF can give me, right?

[00:17:13] Michael: So here's an like, here's a related topic. And I've got two more for you. This one is materialized views or pre aggregation in general?

[00:17:25] Nikolay: Ah, yeah.

[00:17:26] Michael: Well, it doesn't, they don't have to be materialized views way. It could just be that we load the data in this, like we, we aggregate before even loading. But um, it doesn't.

[00:17:35] Nikolay: In a broader meaning,

[00:17:35] Michael: Yeah, exactly. The the, the

[00:17:38] Nikolay: with index only

[00:17:39] Michael: gonna change. Exactly. So it helps with index only scans. Right. Cuz if you have data in two tables and you get a materialized view like we talked about this before, right. You can, you can now index on columns across multiple tables by

[00:17:53] Nikolay: Yeah. I, I want, I know you wanted to go to like DBA low level stuff, like backups and so on. Let's, let's do it slightly later. This is great topic, like, okay. Lot. We can as one of my old customers told me someday I love materialized juice, but they feel like huge Hummer with like jewelry, so sometimes it doesn't work.

To find. So imagine if you created a lot of materials to use, prepare them, created indexes on them, have index on these scans. But then my question is, what's the state or your of your buffer pool and file cash. Is it good because you started to. Keep the same data multiple, multiple times, like many times the same records.

Basically you have normalized situation and probably you need more memory to keep cause you have more pages and this kind of bloat as well because like you could do much better if you avoid materialized use, for example. Maybe it depends. I don't know the there is no single answer.

[00:19:02] Michael: I think it depends a little bit on how on the. Number of similar queries that are hitting your, like if you are have like a dashboard that is hit having like the same things being asked of it over and over again. That's one thing. If you allow people to set all different filters and they could be sliced in the data in any way they like, completely custom.

Maybe there's like

[00:19:23] Nikolay: as well, huh? Indexes.

[00:19:26] Michael: Oh yeah, but what I'm, what I mean is if you don't allow that much customization, maybe you materialize everything and on and only load that data.

[00:19:34] Nikolay: So in some cases probably I would choose materialized views, but I would think twice, maybe even more like. Three, four times do I need them? Because what I would do, I, I would try to understand what's what my working set during day, how much bytes of memory, gigabytes, terabytes, I dunno how much memory I really need.

And probably I would try to use in reality a PG buffer cash extension to inspect the current state Of our shared buffers and understand what indexes are currently loaded how much of them, and same for tables. And from there also checking buffer pool efficiency from pta, database and monitoring.

Everyone should have it. I would ensure that efficiency is more than 99. So at least two nines we should have there. And in this case it's good. Like, okay, we have space, we have buffer pole big enough. We have room for modularized two, right?

[00:20:34] Michael: Well, yeah, and the, the main reason I wanted to bring it up was not that I thought it was necessarily a great idea, it's more that we don't have one of the big downsides in this use case. We don't have the fact that the, the data goes stale, so it just opens up that possibility that they might be a better idea than they would be generally.

[00:20:51] Nikolay: Mm-hmm. But this is super important to understand the content of the buffer pool and probably the page cash, and understand how how many pages are going to be evicted because they don't have enough. Space in the buffer pool because when it needs to be evicted we have contention issues. For example, in POGS 95 96 it was a lot of work done specifically for select only workload, read only workload.

And I remember excellent small post from Alexander Koroko, when he was working with, by the way, my. People from Peron and so on, like towards 1 million TPS on one machine. It was long ago, like

[00:21:30] Michael: We've shared it before. I'll, I'll

[00:21:31] Nikolay: yeah, yeah. yeah. It's great. And because it, it shows exactly how we can also run PG bench usually with select only prepared transactions and so on. prepared state. this shows that contention possible not only when we change data, right? It's possible in redon state as well. And I would try to minimize it and to keep our working set as stable as possible. In our b preferable ideal situation. We have a lot of memory.

Our databases very small. It's enough. Well, there will be different question about cost optimizations from our non-technical people, probably financial people and so on. But in this case, we are good. We can afford a couple of more materialized views and and so on, right? But in reality, we usually don't have enough memory.

Database is quite big and we try to optimize it. In this case, I would try to keep. As much as possible of, our working set and materialized use are our anus in this case, right? They increase demand for memory

[00:22:33] Michael: Yeah, I had one more topic that I wanted to run past you on the query performance side, and that was again, thinking about some of the downsides of lots of like a high, Churn of our data and that's replication can we afford more replicas or like more geo distributed replicas to get data closer to end users in this workload where we don't have as many, like we're where during the day there's just, we don't have to worry about lag at all.

[00:23:00] Nikolay: You know what I like about Shing, it's because if we have charts and each chart has one or two better to standby notes, it's so much better compared to situations when we have 10 standby node and we spend not only disk space, but also a lot of ram to store the same data many, many times just to scale rates.

It's so inefficient when you need many, many standby notes. It's like, You spend a lot of money in just the sto, it's again, like too redundant storage in terms of not storage, like, I mean temporary storage from memory. So I don't like to have a lot of standby notes, but sometimes we do need it.

And again, in the context of methodized use. They will, since we supposed to use physical replica replication, we will need to store them on all nodes and occupy to pay for memory many times. So I, you see, I don't like metro actually. Right?

[00:23:54] Michael: I've sensed a, pattern. But what about the idea of reducing latency globally? Like some of these use cases? We are in the same country, like the UK Covid dashboard. Probably most of their traffic was coming from the uk.

[00:24:09] Nikolay: Ah, you, I, understand you. Probably you trying to say if we have redone the state during daytime, our replication is small, like, and we can. Bring servers closer to user, redundant, stay perfect situation. We, even if lets, if Latins distance is big we don't care because it's already, datas already there.

That's a good point. Like edge computing almost right.

[00:24:34] Michael: Yeah. Well, exactly. I know we are again, paying for RAM in multiple places, cpu, multiple places, but. It feels like maybe that's a trade off you'd be more willing to take because you don't have that what the normal issues there.

[00:24:46] Nikolay: Yeah, so we should use this remember this project which runs post right in browser

[00:24:51] Michael: Oh, right. Yeah.

[00:24:53] Nikolay: using web assembly and virtual machine inside it. And so like, it's, it's

[00:24:57] Michael: Or even something like yoga bite. Right.

[00:24:59] Nikolay: Yeah, so it's, it's valid at point and definitely worth considering like bringing regional case. And for example, like I remember AWS Aurora, if it's global database ideal, like it only clusters and so on in this context and makes more sense. But what did you want to tell me about backup?

[00:25:18] Michael: Well, do we need them? Like we've got, we can have a daily one, for example, as part of that nightly. Do we need anything more than that?

[00:25:27] Nikolay: so, I would keep them. But there is a, a setting, I don't remember from top of my head, but this setting is responsible for it's like a half out maybe. So the setting says even if wall um, 16 max by default, even if it's not filled yet on this time out. In this case, of course, if, you know there are no many rights and we don't need the huge recovery time, probably we should make it less frequent, one hour, for example, right?

So, so we, we archive walls less frequently but worth remembering, some cel can lead to rights and they can write or some wall and so on. But if we vacuumed everyth, And we propagated everything to replicas while log hints If our hi Hibis on standby notes are also like, have everything from the primary in this case, we should be good. And indeed we can archive less frequently. But I think it's, it's a small question. It's just a matter of space occupied in backup. I don't think we will notice the overhead performance overhead happening from archive command working. It's not like checkpoint or

[00:26:37] Michael: Well, should we move on to, what about checkpoints then? What would you do of those?

[00:26:42] Nikolay: Well, checkpoints, I, I think if rights are not happening at all, we don't care. we can keep default settings,

[00:26:48] Michael: Every five minutes, or

[00:26:50] Nikolay: I don't care. So if, if everything, like, if all buffers are clean, no dirty buffers at all, almost no. Like it's, it won't, it won't be noticeable. It, it becomes noticeable only when rights are happening.

And if you remember checkpoint tuning we discussed a lot how to tune for heavy rights

[00:27:08] Michael: I was thinking we could therefore make it a lot less frequent, but it doesn't matter. Yeah,

that's a really good

[00:27:14] Nikolay: exactly if we checkpoint with very few dirty buffers, it'll be very fast. And that's it. So checkpoint is, and lot of vacuum we already discussed. If we do vacuuming every, and even if we do freezing ourselves, in this case, vacuuming one trigger. That's it.

[00:27:32] Michael: Right. What about the overnight process? Should we get to that?

[00:27:35] Nikolay: Yeah, this is where things start to become less good looking because we probably don't have a lot of time to vacuum ize or whole. Our database ver I would probably include to see the details about the process. In this case, we need to make decisions having trade offs and. We are waiting both or maybe sometimes multiple choices and choosing the best one.

For example, we might say, okay, we don't have time to rebuild and to repack our table using some clustering. Or we do it in infrequently or probably we even don't touch some indexes. It depends. So it depends on how. How narrow our window to apply changes is. But I would definitely consider partitioning here because in this case we have more control and data, ology and so on, and more control in also in terms of how much work will be needed for vacuum.

Because if you have huge stable, you just, you know, like, okay, we update 10% of our huge table, which is 10 terabytes every night for, okay, you. Okay, 10% means one terabyte. Probably it's too much for a few hours, but anyway, we don't know in to which pages our updates will go, right. It's distributed who knows how, and in this case, we don't have control.

But if it's a partition table, we discussed it very recently. We insert new data to new partition or deal with fresh partitions. We have some very already, like in frozen state old partitions. We touch them very rarely. In this case, vacuum works much faster rebuilding. We don't need to rebuild everything.

We need to rebuild particular indexes and he and tables also partitioning is our friend here, unlike Mr. I'm, I'm joking. Maybe ized fuel are not that bad, but I just they are outdated, the approach POS currently has. So we live in time when we need much more powerful ized fuels already. And there are projects that exist trying to solve it, but I wish we already had, have had it everywhere.

I mean, inside pos.

[00:29:45] Michael: in terms of that, you, you mentioned like having a deadline or having like a certain compressed period of time where we needed to do, maybe we don't have that

[00:29:52] Nikolay: a couple of hours, for example, our window. Mm-hmm.

[00:29:55] Michael: So it's an interesting point cuz some of the things I was thinking, I'm not sure they apply anymore. If we cuz we've, we've said we need to think about zero downtime except could we do, like, bluegreen deployments, so we could have one cluster live

[00:30:11] Nikolay: Or benefit from branches, maybe branches,

[00:30:14] Michael: yeah. Whatever you want to call it, where you flip the connection so that we could have been doing all this data in.

[00:30:21] Nikolay: well partitioning can provide this already for some cases. Partitioning. You just create new partition. You, you, you can switch to reading from it to when you, you already field it. Why not? No, like it's almost blue green, or.

[00:30:36] Michael: Yeah. Interesting. I hadn't thought

of it

[00:30:38] Nikolay: It's like in

pg q three partitions. one is been used right now. Another we are working on it and third one, we use it recently and we will be processing it. So, so rotation of partitions. This is like for, from queuing in pogs approach used by Skype in Peq. And we could do similar thing here. And it's similar to like bluegreen or branching.

So, but if you have branching, for example, new one, you can, you can use them or, if you install database engine on production if you're okay with using ZF there, or you can implement it with hardware as well. In this case you can have branches. But it's like we, Can just have some rotation here, right? Why not? But what I wanted to say, also here, I would tune checkpoints. So

one of trade offs we want to make here, we probably want to generate less wall pages. So having. Less frequent checkpoints would be good Distance increases fewer full page rights inserts are happening.

but the price we pay here is longer restore time after crash. Right? So why would increase maximum size lot checkpoint time out a lot like 30 minutes Maxwell size, like hundred gigabit. Checking freed space, definitely. Maybe even more. And we know if we crash, we will be in recovery like 10, 15 minutes.

It's okay.

[00:32:05] Michael: Well, and it doesn't like, it doesn't matter, right. Cuz we are already

[00:32:10] Nikolay: depends on the case, so I don't know. I don't know. Maybe it matters if people continue using it and we are down for 15 minutes. Maybe it's not acceptable, but if we can afford this risk, If we say, okay, our restoration time, 50 minutes is fine. In this case, we say like, checkpoint time out 30 minutes, maximum size hundred gigs.

Again, some I provide some arbitrary arm numbers. Space should be tested for particular situation, but then we produce a fewer war records. Our intensive rights performance is. Then other choices. Maybe I should drop indexes bef when I do. Yeah.

[00:32:50] Michael: Yeah, so this is what I was thinking about the bluegreen, like, we can't put, we probably can't drop indexes before we insert,

[00:32:57] Nikolay: if

it pass new partition, why not? We can create them after we inserted

[00:33:02] Michael: Yeah, that's Bluegreen idea. The, the doesn't the partitioning have the same issue for you as the materialized views though? Like in terms of buffer cash?

[00:33:11] Nikolay: This is new data, right?

[00:33:12] Michael: Sorry. So, yes. So you're saying put just the new data into

[00:33:17] Nikolay: It doesn't have, it's not original source of data. It relies on tables and just copies, transforming somehow, and copies the same data as derivative, right?

[00:33:27] Michael: I completely misunderstood what you, how you meant you were using partitioning. You meant like basically new partition per day.

Makes

[00:33:34] Nikolay: For example,

[00:33:35] Michael: Yeah.

[00:33:36] Nikolay: All, we have all, or we can have like seven partitions. And when Monday starts, we, we reuse all Monday's partitions, for example, or have rotation, depends, I don't know, like the particular case. And we can design something or just use time, time, time timescale and, and that's it. But the idea is that partitioning is super beneficial because we again, like vacuum, we have data local.

Data's registered in more compact, way less sparsely. So probably we even don't need to, to apply clustering or like reor, it's called cluster, but it's reordering, right? And PPG Pack has this option to cluster data. So maybe we even don't need it because we know all fresh data is one partition already.

Daily partition vacuum is faster. Index is under control. We don't need to rebuild whole index. We build just index on our partition. All others already, are good visibility, maps are good. Index behavior is good for all partitions. If tomorrow we'll be using yesterday partition, we still keep it, but it's kind of frozen state.

So it's, it's quite good. We, we definitely have quite a lot of stuff to, to use for optimiz. Case here. So with Maxwell size and checkpoint time, time out can be changed then without restart. When we prepare for bulk updates, do bulk updates without indexes or with as few indexes as possible and recreate indexes or create them afterwards, adjust to checkpoint time out Maxwell size to move faster, to produce less wall and to put less pressure on backup and replication systems and on disk. Of course, what.

[00:35:17] Michael: Maybe drop constraints before ha before doing the inserts.

[00:35:21] Nikolay: Maybe, but depends again, like if we do see or triggers, if we do see some overhead from them, yeah. What considering, but creation them also takes time, right?

[00:35:32] Michael: Yep. It's worth testing both, right? Like test without and.

[00:35:36] Nikolay: if if it's new partition, if it's like bluegreen approach, if we create new partition, nobody is yet reading from it we, we can block it and create indexes without concurrently.

[00:35:47] Michael: True.

[00:35:49] Nikolay: And constraints as well. We, we know it's already big, like okay, we have like 10 gigabytes partition, but we know nobody's using it yet, so we can move faster and regular. Create indexes like roughly two times faster when create index concurrently. And same with partitions. We don't need not valid and when valid two phase creation, we can just create blocking this petition, not taking into account others at all. Before we open the gates to it,

[00:36:17] Michael: One last idea from my side. If we can't afford to do all maintenance every night, we could stagger it, right? Like we don't have to do the same every night. We could do like a section of them on Mondays, a section of them on Tuesdays, a section on Wednesdays,

[00:36:32] Nikolay: Huh. Yeah. Interesting, interesting. Yeah. Or also maybe some actions can be like we can avoid duplicated actions and do it less often as well. So, so instead of changing something every day, we change it once per week. Although everything else is changing is like we have daily partitions, blah, blah, blah.

But something is changed only once per week just to avoid I don't know. It, it's like some fantasy. So, so

[00:36:58] Michael: Well, I guess it depends on the use case, right? But all I meant is in terms of maintenance, it's o Like if you get rid of bloat once per week, it doesn't mean you have to do a lot of work once per week. It could be you do a little bit of work

[00:37:10] Nikolay: Yeah. For example, we create some table repack only like once per week. Yeah.

And then we know we accumulated some changes and like some blood and so on. It's like an order changed and we We process it in, heavier way, just once per week. It's also some optimization considering trade off we have, but in ideal work, we have enough time to insert our data and to say welcome.

All repack, repack everything, all tables, all indexes with clustering. And then run vacuum without full, without full vacuum. Analyze, vers, freeze on whole database.

[00:37:50] Michael: Nice.

[00:37:51] Nikolay: And in the end we have quite ideal state. And that's it. And I would return Maxwell size just in case. Not to keep it very, very big, but maybe dynamic play with Maxwell size and checkpoint time out is, it's not that needed.

Maybe we can keep it quite large for all the time.

[00:38:09] Michael: Brilliant. Did you have anything else?

[00:38:12] Nikolay: I don't think so.

I think it's, it's enough. We already, like, we designed some, brainstorm some product here. I.

[00:38:20] Michael: Yeah, I hope it was helpful whoever requested this, and it was definitely interesting for us in terms of thinking it through.

[00:38:26] Nikolay: Maybe last thought. all I explained comes from my experience, but I never saw such case when I can afford dealing with like whole database like that. Usually we have only part of database behaving in describe in this pattern, but like, so additional uh, question would be. If only some of tables behave like this, what to do, but it's, their things become much more complex because trade offs becomes harder and so on

[00:38:57] Michael: I'd say that becomes much more normal though. I think this one's interesting. Yeah. This one's interesting and I'd be really interested to hear from anybody that does maintain a system like this, and is it, if there's anything we've forgotten or anything we've missed that you do, that is good in this situation, it'd be great to hear.

[00:39:12] Nikolay: Coronavirus database in Great Britain, for example. Right.

[00:39:15] Michael: Yeah. yeah,

[00:39:16] Nikolay: I think a lot of government data in many countries it's quite static and it's

been refreshed, so

it's should be popular.

[00:39:25] Michael: some huge data sets get done like this, like geographic ones spatial ones. There's all sorts that I've seen that only get a refresh once per day.

[00:39:34] Nikolay: Yeah. Good. Okay. Thank you for the idea.

And thank, thank, thank, thanks to our listener who gives this idea.

[00:39:44] Michael: Yeah. And thank you, Nikolai. Thanks everyone for listening. See you next week.

[00:39:48] Nikolay: Thank you. Bye-bye.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Read-only considerations

Creators and Guests

Some kind things our listeners have said