Postgres FM | Transcript: Parallelism

May 26, 2023 • 44 Minutes

Parallelism

[00:00:00] Michael: Hello and welcome to Postgres fm a week share about all things Postgres. I am Michael, founder of PG Mustard. This is my co-host Nik, founder of Postgres ai. Hey, Nikola, what are we gonna talk about today?

[00:00:11] Nikolay: About parallelism, parallelism.

[00:00:14] Michael: you trying to say it multiple times at the same time?

[00:00:16] Nikolay: I did it sequentially.

[00:00:18] Michael: Nice.

[00:00:19] Nikolay: do you know that parallelism well with sequential operations? this is where it shines. Like if something needs to be done sequentially, their parallelism is, can be applied.

[00:00:32] Michael: it makes sense. I guess

because

[00:00:33] Nikolay: it's like Right?

[00:00:35] Michael: I see what you mean. like a dichotic, like, almost like it feels like it shouldn't be the case, but it is very

[00:00:40] Nikolay: it's not my thought by the way. It's from Thomas Monroe who, who contributed a lot to. parallelism of, various separations.

[00:00:47] Michael: this is one of those topics where we have had multiple requests. not in parallel, I guess, but yeah. So people are keen to know, like how Parallel Query works specifically from pros and cons. I think talking about parallelism overall makes sense as well, like a bit of an overview. So I'm glad you, added those bits in. So yeah, looking forward to this one. do you wanna start with what it is or a bit of history?

[00:01:09] Nikolay: Yeah, let's start, with history. I must admit, I'm not a big expert on this topic. Of course, I know many things, but I'm far from being expert and I see, uh, I don't see bottom in some places. if you get what I'm, what I mean, like, in some places I see I need to dive deeper to see the bottom.

[00:01:27] Michael: well, I'm, I think from previous conversations we've had that sometimes, it's a sign that you are an expert if you realize quite how much there is that you don't know still.

[00:01:36] Nikolay: But this is one of topics where I, I feel my weakness, so like disclaimer.

[00:01:42] Michael: Sure. Yeah. And I've only looked into this from a, like, I've obviously done a bit of reading for this episode, but previously looking into just performance of queries in general and having to read around things. having to write some things for it as well. So yeah, mine is only service level two compared to people that have written this.

But I think a lot of people don't even necessarily realize that Postgres can run queries in parallel that it can do certain

[00:02:05] Nikolay: It does it not just Ken. It does all the time. And, Usually my, experience is, uh, closer to ltp and it's, it makes sense because POGS is, uh, the best LTP database system. But some people of course, do some heavy analytical kind of queries and, of course, I remember how appeared in first and nine.

Point six

2016. Uh, easy to remember. Six. Six, right, and so on. And, but you need to remember that, uh, 9.7 became 10, so 2017, but it was not enabled by default in 2016 in 96.

And, um, I remember there were fears that in NTP context, It's not a good idea to have it because it's more for analytical queries.

Uh, let's enable it, dynamically for particular queries, only because of course it has some overhead.

[00:02:59] Michael: Yep.

[00:03:00] Nikolay: So for ltp, we want like only like single. Core operations and not, but eventually, right now we live with it enabled by default and we see benefits and so on.

Of course, sometimes, uh, application producers are not optimal suboptimal queries. Uh, even some, even sometimes having sequential scans or have index scans. And in this case, parallelism works well, improving the speed and, and so on. unlike, for example, just in type, uh, which we usually recommend disabled for L G P,

this we recommend usually enable, and it's interesting.

[00:03:37] Michael: I think it's a similar conversation, and the difference is that the pa, the, the. Defaults are better for the parallelism, settings. I think the J one basically, I don't think the cost, uh, estimations are quite right. I think maybe we would be able to leave that one on more if it had a higher cost threshold or it had some costing improvements.

But I think parallelism, costings are a bit better and it tends to

not cause as

[00:04:04] Nikolay: What about planning time? Planning time also affected right.

[00:04:09] Michael: Yeah, good point. I'm not sure about the, I'm not, I haven't seen many parallel queries that also explode planning time, so I don't think it's as bad as things like number of joins and things. So it's, yeah, I, I haven't seen it cause excessive planning time issues.

I dunno if you have.

[00:04:23] Nikolay: the reason I'm talking about this because like, we definitely now live in the world where, JIT probably should be disabled for LTP cases, but poly should be enabled sometimes it's sometimes, uh, tuned and, we see it works well. every release we have improvements and it works well and it makes total sense to me.

By the way, we, we needed to mention that we are not going to talk about, parallel execution on multiple nodes. Like I dunno, like a green plum cyto. All, all those systems we talk about only single node Persian side one machine.

[00:04:56] Michael: Yep.

Core

[00:04:57] Nikolay: This is, uh,

important to dis to distinguish, right? And also like to, to frame what we have, we have, it only for, read operations. Plus a different level for some other operations, like vacuum only, manual, vacuum, explicit vacuum and uh, P store and so on. But, for queries it's only for reads. We don't have it for rights still, but like for example, parallel copy would make total sense. But it's still under development,

[00:05:25] Michael: Yeah, we do have it for similar things like create table and create index. Uh, create materialized

view

Refresh. Materialized

[00:05:32] Nikolay: Oh, it's also rights.

yeah.

we can consider the mri, queries also, but the DL queries, for DML queries, it's only for re read, only for res.

[00:05:42] Michael: Yeah.

[00:05:43] Nikolay: select with these queries, and this is like since 2016, first non-default. Next year, it was already became default in, in Chen 10, former 97.

then it was developed better, better e.

Every year we have a bunch of improvements.

[00:06:00] Michael: Yeah, I think it's interesting, like there's a couple of things about those first few versions. I think it was really interesting. In 9.6 it was released, it was very limited in scope, but still useful for people. So it I think it only supported sequential scans, a couple of joint operations and. It was, as you say, it was off better for, and that was controlled by a, a very important setting.

I think if you're, if we're gonna be talking about this, it's, max parallel Workers per gather was set to zero. So that meant it couldn't use any extra work, each query couldn't use any extra workers. And that in version 10, a couple of big improvements. One was that more scan types were added, so index scans, index only scans as well for B3 indexes, and bitmap.

HEAP scans as well. The heat part of those could be paralleled. Huge, huge improvements. Um, and that max parallel workers per gather was bumped to two, which meant we could get three, processes by default on a query that could, that supported parallelism. So one that suited it, uh, we could get up to three cores working on our query at the same time.

[00:07:06] Nikolay: Right, and let's, let's maybe mention In this context, so, so I remember in, in 2016, 17, I, I remember we played with, some CELs trying to understand, Is it war of having, uh, it enabled by default even in 96. And I remember examples when I, we saw very clearly that, overhead brings some penalty, so it's better not to enable it, but then somehow I lost traction.

And then, uh, we already live in the world where it's everywhere enabled and we see it's beneficial. And if we check examples, I think it's very doable to find examples where you disable release for some query and you see. That it's better in terms of, in terms of all, by the way, time and buffers, maybe Right.

Buffers? Well, eventually time. Time is the most important for end user. Uh, user doesn't like, well, user might, might care about buffers, but only, those which produce final result, which cost of the user. Because if it's like a terabyte to download, it's not good. Right? But, Buffers here. Also important and overhead, uh, I think exists in terms of both.

so it's, it's good to check both and analyze, and, um, make decisions based on, on that. But let's, let's maybe mention like why para is important at all philosophically and like theoretically and so on.

[00:08:23] Michael: I have some. I think I know where you're going with this, but I have some other like, yeah, let's do the obvious ones first.

[00:08:28] Nikolay: we have a lot of CPUs. This is a key.

[00:08:33] Michael: And a growing num, like in the the modern world, it's quite common to have a lot of CPUs available. It because people often scale up their database size and especially in the cloud that comes with lots of CPUs

[00:08:46] Nikolay: The, the key problem is that, uh, cpu, uh, in general CPU development, processor development, went into direction where one core, cannot be improved. as before, like we had this, house called the law

[00:09:01] Michael: Molo, I think.

[00:09:02] Nikolay: Right, right, right, So obviously the Microsoft CPUs, Intel md, they decided to.

Bet on multiple core approach. And we now have hundreds of course, easily in cloud. It's relatively cheap, especially on some kind of sport instances you can play with. and this, means that application needs to, be adjusted because, You cannot say like, okay, now we need, like, we want to benefit from hiring multiple cores, but unfortunately application needs to be changed to benefit from it.

It cannot be done transparently in most, in most cases. Sometimes it can be done in some. Trivial cases when work can be paralyzed somehow. And like, well, again, like it's a different topic, but some, uh, algorithms, algorithms can be paralyzed. But I remember even some languages exist, like extensions to see and so on.

I, it's from my university. 20 years ago, I already forgot everything. But uh, obviously to make pogs benefit from. Multiple course. It needed to be heavily reworked. By the way. I remember also in 96, a lot was, was done to reduce contention in shell buffers. It's also important, so a lot of areas need, need to be improved to benefit from multiple core.

And of course, if. You have a few users, but, like many, many more course. Like you have only like 10 sessions, but you have 96 course. okay. And three, three auto working workers default, like, sounds like most of course will be underutilized and it's not good, right? This is the key, the development of modern hardware dictates how to build software, right?

[00:10:46] Michael: it's an, it's, there's an opportunity there, right? Like, especially for things that are very easy to paralyze by giving it access to three times the workers, we, we are not gonna get a three times improvement because as you said, there is some overhead, but we could get close. Like it's not uncommon, especially once you bump that number up to get close to the number of workers.

times the, benefit. So, so there is, there's huge potential. Not as much as in indexing. You know, you see, whenever you see a, a blog post that says, we how we sped up our database a thousand x in, in one easy step, it's, that's never gonna be parallelism, but it, so that's

[00:11:24] Nikolay: Well, let's you switch from one CPU to a thousand CPUs and, uh, scaling linearly, it's, which is very with, they can be imagined already, but, yeah. Interesting example. If you dump PG bench and you have bench table, you do some benchmark, you created that one terabyte database, which, uh, obviously.

All one terabyte will go to PG bench accounts, table. The other tables initially are empty and, then you say, okay, I'm, I'm going to test, dash j option of PG dump because PG dump, allows you to, benefit from multiple cores. And you say, okay, I have 96 cores, so I expect my P. Okay, not hundred x, but at least like, okay, 20 x I will be already happy almost.

Right? And you'd start dumping and see no, benefits at all. Why? Because it, dash J can work with, uh, when you have many tables, only if all your database is a single table and others, other tables are empty, cannot realize it.

[00:12:29] Michael: But so is that I thought that was like a practical limitation cause the work hasn't been done. Or is it a theoretical limita? Like is there some

[00:12:35] Nikolay: It's just how pluralization of pitch dump is implemented. It's implemented at at at database level, not, it could not precise single table unless it's partitioned

[00:12:46] Michael: Yeah, that makes sense. But it's not really one table anymore. But yeah, it's a nice

link back to that.

[00:12:51] Nikolay: And a couple of years ago, uh, special option for PG bench was added. So now you can create, when you create PG bench database, uh, I initialize it. You can say I wanted to have partition PG bench accounts, and then you can speed up your dump. And this is interesting example showing that like there are many places where polarization can be tricky.

[00:13:13] Michael: Yeah, well it's, while we're talking about parallel query side of it, I think it is, obviously it's worth giving a shout out to the people that that did. This is a huge amount of work to get it in at all and make it safe, make it perform. And then it did improve a lot in versions 10 11, and it kept improving.

They were kind of, Smaller improvements in performance improvements in 13, 14, even 15 recently. So it has, it's something that went in and has continued to get better, which is normally a sign that people are using it and want it to be, um, wanting it to be better. I'd say another reason it got added, this is like the same reason, but, a second order effect was that this was around the time I think that people.

We're starting to think about alternatives to Oracle and SQL Server. So I think a lot of, a lot of people migrating were probably struggling. This is a theory based on who was working on the, on the feature and like which consultancies they were working for. But this isn't I don't think this was a passion project for people who are excited about working on

this.

I think it came from there's real need. Exactly.

[00:14:17] Nikolay: Right. And let me finish with, uh, reasons, dictated by hardware. It's not only about cpu. I see in many articles people mentioned cpu, like, we have many course, let's do it, but it's not on. Ah, and of course, uh, very important that POS is still, based, not fail based,

it's also important, uh, and uh, this is how POGO works. So, for parallelism, it needs to run multiple processes and then, uh, find a way to communicate with between them. But I want to, to highlight, like, to emphasize a very important point. it's not only about CPUs, of course, about CPUs and, and, uh, multiple course and and so on, but it's also about disks. This is often overlooked When people explain why we need parallelism, they forget. Mentioned that now we are on s s D, and to get maximum of it, of ssd you need, to do multiple threats or, I mean, multiple processes. You cannot, get full speed of SSD if you, if you read, uh, in a single thread.

You need like 16 threads to saturate this cayo. Uh, you need the multiple if, for example, F F I O fuel, very good tool to benchmark disks. If you use single thread at benchmark, you will never reach, situation. You need, uh, like eight 16, , threats. And this means that, to go full speed, especially if you have queries, you definitely, , need multiple threats or multiple processes and the communication between processes.

so disks are, in my opinion, are very important.

[00:15:55] Michael: But couldn't. So here's a question. Couldn't that be an argument for having more connection, like allowing more users, for example, rather than like, it

[00:16:05] Nikolay: it's it's both. It's both, of course. Well, if you have. A lot of small, queries in parallel of, of course, this is a different, area. We, this is what I mentioned in 96. There was big improvement to, to reduce contention. Uh, how we've work with, with the buffer pool is organized.

I remember a great work from in post, blog post from Alexander Koroko towards 1 million TPS on single node. it was interesting that it was. Done around the same time when my SQL did it as well and they worked

basically together. Right. That's interesting. And, uh, but it's not related to a processing of a single query.

It's related to multi-user, multi session, uh, uh, pro processing. But, uh, pluralism for single session is needed when again, you have. Many course you have much fewer and this, and this is actually normal, much fewer, active sessions right now. If you have more sessions, active sessions than course, it's not a good position already.

I mean, usually we have some room if we don't have this room, if we have 96 course and more than 96 active sessions. I suspect this, we situate it already, the CPU or like most, most likely cpu. It's not good position, but uh, usually we have some room and why not to, to use some room and speed up one query and reduce contention as well.

It'll be, it'll be executed faster and we will have. Average, , active session number, going slightly down. It's good,

[00:17:39] Michael: yeah, I was gonna ask, is this because, so we've, got, Good note in the docs. I think actually the Postgres docs on parallelism are great. Are they far in far more detail than I was expecting when I first looked it up. I think they really do explain it well. a note that they make is that this isn't just about slow queries.

This isn't just about. querying lots of data. It's querying lots of data and returning few of those roads. It's, it's a, it's not as likely to make a big difference to your queries if you're having to return all the roads anyway, because of the overhead of, of sending that data over the network. It's likely to dominate the, the time.

But if we're doing

aggregations,

[00:18:17] Nikolay: Well,

[00:18:17] Michael: sure, But the, the big benefit here is in, in kind of aggregate queries or things that are, and, and. Heavy queries that are returning some summary of that information or only returning a, a small, which normally is the only justification for doing a heavy query, right?

If you're doing a heavy query and then not returning, most of it,

it's bad, right?

[00:18:36] Nikolay: That's why I, I think buffer's first analysis of single query matters here at all, a lot. You need to pay attention to buffers and see what's inside, how many bites, how many pages were involved in query processing, and, uh, compare parallel, non-parallel unit

[00:18:53] Michael: Yeah,

[00:18:53] Nikolay: this in mind. Right, so you, you return just one row, like, like average aggregate, but uh, inside it you needed to process a lot of buffers, and this explains why it's so, Hey,

[00:19:06] Michael: So I was a bit surprised when you said, like, I get the impression you don't turn off parallelism for some of your, like let's say you've got a lot of O L T B, they don't, but would you, if you had the choice, do you

[00:19:17] Nikolay: Uh, it depends. I would probably do some, benchmarks. , I would, uh, analyze the content of p statements, create some, like I call it, um, how I, how do I call it? Simulated workload or something like I, I, I already forgot how we call it. Sorry, uh, word out of my mind. So, I would, , create some test set and just run multiple, iterations, checking, , even probably just in single thread as usual. Like I, do it sometimes in shared environments where I don't care about time, uh, lot. I, just see, , how much data, how much buffers we're involved in processing , without realism. With realism, it'll give me idea. Of course, it's not directly converted to time, but quite, it'll influence the most, , the resulting time.

And, Here, of course, in parallelism, we can understand like we, , uh, reading, , thousand buffers using single process versus three processes, or four workers on more. it's different, right? So we, we need to understand like timing will be affected by paras, but still, uh, it's interesting to see overhead and so on.

So buffer's analysis is important, right?

[00:20:27] Michael: yes, absolutely. In fact, there's a few, few tips. So while when we're talking about this, we're probably talking about looking at explain plans, so we get information about pluralism through Explain, we can

[00:20:39] Nikolay: buffers, right?

[00:20:39] Michael: explain yes, but, but initially just explain, you can see whether the, whether the planner is,

[00:20:46] Nikolay: It's going

[00:20:47] Michael: uh, is. Exactly, not just is going to use it, but how many workers it will request as well. So you can get some information about, how much parallelism. So it doesn't have to max out the option. It, it can choose anything between zero and your, uh, Well, it initially max work, max parallel works per gather.

Um, at each, for each gather or gather. Merge. So yeah, there's that. But then additionally I wanted to say that VERBOs is quite important here as well. If you want to see some of the, like the per the per process statistics, VERBOs is quite a useful st uh,

[00:21:22] Nikolay: Yeah, good point. So what I've like, let me finish my thought. Probably I was not clear, of course, if you, if we need for example, to deal with hundred buffers, for example, again, currently if we forget about vacuum, we, we should discuss vacuum and create index and dump restores separately, which have pluralization. And ze we mentioned last time, let's, let's talk about them separately. But, , for normal queries, DML queries, it's only about reads. So for example, if we need to deal with thousand buffers originally fetched, from, , hip using sequential scan or from index using index scan index only scan or also bitmap scan.

We understand that, okay, if we have three workers, will it be three times faster? And here it's interesting to see overhead and buffers option will also provide us some additional insight about overhead, of parallel versus non-parallel. And parallel with how many workers and when planner chose this, , plan it only as, Takes two kinds of input, , besides query itself. First is statistics for all tables involved, and second, , planner settings and work ma'am Planner settings From settings. You have category, special category about pointer settings. You can, uh, there are some settings related to

[00:22:39] Michael: Yeah, actually work members are really good. It's, it is an important point here. If, if, yeah, if you are gonna change your parallelism settings, you also need to consider your work members setting at the same

time. Like

why am I saying that? Is because, most people realize that work memo can be used multiple times in a query.

If you have, multiple hashes or multiple sort operations work memo can be used multiple times. That's why it's a, a scary parameter to set and why people often go relatively conservative, but they want to bump it up so the operations can happen in memory. Once you introduce. Further parallelism that multiplies.

So you can have each parallel operation that's using workmen can use it the number of times that there are workers. So it, it multiplies quickly. So if you're gonna bump that number up, it's something to be aware of and it might be an argument for reducing it.

Sometimes maybe you can, you can get away with a higher work member setting. , for your entire workload if you reduce it, but, like work, ma'am, if you do have the odd, really big query that you need to run, you can set some of these settings locally, right? Oh, like for on a session level?

[00:23:45] Nikolay: So Workman is super interesting topic and, uh, we need a couple of minutes to discuss it. But before that, let me mention what, like I understand about, which read queries can be, paralyzed in pos. First of all, if you want to benefit the most, you need the, the most recent version because each version had improvements. Since 10, like 10, 11, 12, all them had improvements. Second, the easiest is a sequential scan in index scan and bmap scan. They all can be Parallel, and it's quite obvious because, well, index is more complex, but , still can be paralyzed all good. we can benefit a lot and if we expect only few users using our 96 core or maybe 244 a md, epic, uh, Milan co core, uh, p u, so many , course, of course.

And the only, if we have like this beast running pauses. 15 and only 10 users doing some analytical queries. We definitely need to reconsider and, uh, increase, uh, max workers per gather. Max. , , there is one set settings Mark Max working processes, which requires restart. All others don't require restart max workers.

Mark

Max worker processes.

[00:25:00] Michael: Yeah.

[00:25:00] Nikolay: It's usually increased also for logical replication. But in this case, if you expect Urus and only a few users using some Beast machine, you need to increase it a lot. And this requires restart. Everything else can be adjusted later and dynamically in session to to experiment and so on.

And of course, like, okay, these like Access method, uh, notes. Plan, which lives in, in our explained plan, relatively straightforward, but then interesting like aggregates, joints, right? These iterations are interesting. And , in 10 her joined was, or me joined, was supported. Like, and later, couple of years later and 12th maybe I don't remember exactly, Has joined was supported as well. But they have some difficulties. Right. And, and also I see in release notes of 16, which is, uh, it's already better, right? Better was released last week, I guess, full outer join also can be paralyzed and so on. It's interesting and if you see like about joints, it's also possible about aggregates.

It's also possible. , but things become much more difficult there and now. Memory. Uh, memory, even without pluralism is so tricky in POCUS management. what backend developer expects. I say, okay, uh, I have, uh, 600 gigabytes, 25% of that goes to shell buffers and also 200 gigabytes probably, or maybe like. 300 gigabytes. I'm ready to give to normal query processing. Everything else will be second underlying layer of caching, which is page cash.

[00:26:41] Michael: Mm-hmm.

[00:26:42] Nikolay: I want some setting, I want to say, okay, 200 gigabytes can be used by pause. There is no such setting, and it's so hard to understand how, like if we set up new server, it's so hard to predict how much memory it'll be using.

It's super

[00:26:59] Michael: Oh, I think a lot of people don't even realize what happens if it goes wrong. So like it is, we are talking out of memory like. everything shuts down. Like that's the,

[00:27:10] Nikolay: It, it depends. We, we can enable swapping and some people, uh, advise to enable it or enable small, small swap, for protection. But in general, uh, yes, it's not good position. You either be, uh, either server goes down or you are becoming very, very slow.

Like, so latency spike, you have latency spike. It's not good.

You, you don't, you definitely don't want out of memory killer to be involved or to be very slow. So you don't want to be out of memory, but how to properly plan it. Me memory planning is super difficult topic. Maybe we should allocate for it. Special episode, but uh, in few words, you take work ma'am.

Oh, of course, there is maintenance work, ma'am. And auto vacuum work, ma'am, which is by default is inherited by maintenance work, ma'am. So if you understand how many workers can build indexes and this can be paralyzed as well, which is good. also, you know how many auto vacuum workers you have, max work, max auto vacuum workers, uh, which is by default three, not enough for modern hardware at all.

So you allocate some tobacco max workers, you increase it so you can plan this part relatively easy. It's simple, but normal query processing, GML processing, it's super complicated. So you take work ma'am. And then you should understand work, ma'am, is what, uh, not single session can consume, but , it can consume multiple times of work, ma'am, because it, it's, , how my, how much memory is allowed, , to be consumed by.

ordering and joint operations, hashing operations, which is becomes very complete. Okay? One query can have multiple joints and, uh, so, so it can consume multiple times of work, ma'am, for safety, you need to take work, ma'am, and multiply, for example, by two or three, but in, but in reality it's never allocated, as a whole.

Usually it's like, Some part of it. So you need to understand your workload on production to make proper decisions. And I see many servers are like kind of over provisioned because they know that queries most of the queries are, are relatively, light, so they don't reach even one work, ma'am, for singles query.

So it's good to have max, connections. Like they allow max connections, quite a lot there are many reasons to try to reduce max connections, especially for older pos. But in general, you need to take max connections, multiply by Mark ma'am, and also by two or three. Because there is some, like uncertainty, , how many megabytes can be used by single session. But of course, worth mentioning, uh, default work, ma if I'm not mistaken, it's for megabytes, miyes, it's for, uh, TPO pauses.

[00:30:03] Michael: Yeah, it's too small.

[00:30:05] Nikolay: Right? So hundred makes is good.

[00:30:07] Michael: Well, yeah. For, big ones, but I think I think if I was setting defaults, I might try and argue for a bump to 16 at least. Like I feel like there's almost no, like even for tiny instances,

[00:30:20] Nikolay: on your

[00:30:20] Michael: but I know it depends. I know, I know. we should probably go back to capitalism,

[00:30:24] Nikolay: Right. So I wanted to draw this picture really quickly and, uh, then we add parallelism here. It means that our Max connections, which for example, is uh, 300 for our 96 Beast server. Okay? We have work maam, hundred max, and, uh, we, for safety, we multiply by two or three. It gives, hundred max times roughly a hundred.

It's 10 gigs already, and Max connections, , is, , 300. I said like, it's already, it's already too much. Too much, right?

So probably you should reduce work command, but in reality, you see it's never achieved. So you, you like real me. Memory consumption is good. So you try to increase it to avoid temporary files, but Parallelism What does it bring to, to this picture? It brings more complexity because it said, okay, now each session, , every, each, , session out of our 300 or 500 max connections can have multiple parallel workers and consume with more work, ma'am. Right? And for safety, you need to multiply by expected number of parallel workers on average, which you, you also hard to predict without absorbing in reality, , real production behavior.

So you have two multipliers now.

[00:31:40] Michael: Yeah, that, and that was my point. I think, , people are often kind of aware of the status quo, but if they change this setting, they might not think to read, to change the other one as well. So it's, yeah.

[00:31:50] Nikolay: So my, my main point is that, , if you, I do some consulting. My, my team does some consulting and we say, okay, if you want to be super safe, You need to use these multiple multipliers, but, you will end up, , having a lot of temporary files. Unfortunately, later in iterations, you can adjust and try to get rid of, , temporary files going maybe to unsafe space theoretically.

But practically you see that your workload is good. You, I mean, you still have a lot of, , free memory, which is used by page cash. so also, you don't need to steal all bites, all gigabytes from page cash. So this is approach, this is theoretical safe approach, but, , it's very wasteful in terms of memory because, and, and, and leading to uh, let see overhead because of temporary files.

So parallelism gives complexity to this picture, definitely. Oh, there is also multiplier for, for hash. Uh, okay.

[00:32:50] Michael: Yeah, hash me. Hash me multiplier. But it's, if you're thinking about work, mam, hopefully you're thinking about that as well. , because it's in the docks in the same place and stuff. So yeah. That's two now by default. Yeah. , another tip, like right before we move off, so parallel query, um, If you notice, like if you have a query plan that's not parallel or something you are, you think should be, one thing for example, to be aware of is that user-defined functions by default are not marked parallel, safe.

So that catches a lot of people out. So, , you need to explicitly mention that those are parallel, safe.

[00:33:25] Nikolay: And they

[00:33:26] Michael: like thing I've seen,

[00:33:27] Nikolay: safe as well, right?

Also, also, you probably deal with very small table, tiny table, tiny index. There are two thresholds defining, uh, like below these two thresholds. One is for table, one is for index below them. It cannot be, you can adjust them and go down if you think it's worth, but depends again.

[00:33:49] Michael: Yeah, good point. So you will never, like, if you don't change any settings, you'll never see a parallel scanner on a tiny table.

[00:33:55] Nikolay: Oh, by the way, you know what, CPU is one reason we discussed it. , CPU U development dictates software

development. S SD discs also. But memory, we have some, like, it's so. , affordable. Hundreds of gigabytes now. Well, ,

it depends of course, but

Right. But if your startup, which grows really quick,

like you need to, you have some money, but you, cannot be slow, right?

So you sometimes prefer just , scaling vertically and, uh, putting whole database or. Major part of it, to memory. Right. It's also, it's possible. I mean, right in clouds, it's definitely po It's one of options which we have, and it's quick. course it requires some additional budgets.

And, , also this is very closely connected to partitioning because when I say put whole database to memory, It may might be not hold database, but working sad. And if you have tables partitioned, you probably have some old partitions which be out of memory, and you have fresh partitions which are in memory, and you start benefiting from, from, from parallelism because you don't touch disc and so and so on.

And work with memory, of course, can be well paralyzed,

unlike magnetic disks, right?

[00:35:13] Michael: is that a good transition onto things like parallelism for create, index and

[00:35:19] Nikolay: Yeah. Non for digital and other, yeah. So. For create index is definitely good thing because, we like to recall this, you call the time drama with POS 14. It means that when you build, and again it's related to partitioning, everything is related to partitioning. Everyone should partition tables, which exceed hundred gigs.

So if you build an index over large, Table. And we, as we discussed in our episode about index maintenance, we definitely want to rebuild indexes, , from time to time because their health, uh, tend tends

to degrade. Yeah, it, it degrades slow, slowly in modern Postgres like 14, 15, but, things are much worse in older post versions.

So while you rebuild index, uh, Xmen Horizon has been held. Uh, auto vacuum cannot delete freshly that topples and, uh, blood is accumulated eventually. Uh, attempt to improve it in pog . 14 was reverted in 14.4. So it means that you want your index build to go faster because, if you move faster, fewer right transactions. Concurrent sessions produce dead topples.

[00:36:35] Michael: Mm-hmm.

[00:36:35] Nikolay: If you move faster, if you have room, of course, if you have power in terms of, CPUs and , disk, right? So you move faster. Fewer that taps generated and, , less bloat is produced. Eventually when OWA , cleans those, that taps makes sense, right? So you want to go faster, but of course you don't want to situate your IOPS limits, for example.

Or, disc throughput and and so on. And of course, cpu, if you, if you rebuild 10 indexes in parallel, like, uh, and, , they are also paralyzed having more workers, and it's like dozens of backends, , are working on index maintenance. Uh, you are going to probably also saturate your cpu, which is also like not good.

So , this needs careful planning. Probably during weekend moving fast enough, but not situating anything. Control of utilization of CPU and DISC here is the key. but it's great feature cause you can rebuild, index faster, especially if it's on large It's good. Then, uh, what else?

Vacuum, unfortunately, vacuum is prioritization. Works only for indexes and only for explicit vacuum. Auto vacuum cannot do it still. And maybe I'm, I missed something, but, it's only for your manually invoked, vacuum runs. But if a table has multiple lin indexes, it's quite often. The indexes can be processed, , using additional workers, which is really good thing.

I hope eventually a vacuum will be paralyzed, even for a single table. But, you know, like in general, again, partitioning is important because you canalize processing using just multiple workers. It's if you deal with multiple partitions, Even with for auto vacuum, in this case, this feature for explicit vacuum comment, which applies only to specific tables, indexes, you don't care.

We have many workers. We have a lot of partitions. We can process them using auto vacuum and that's it. This is the way to provide and practice if you have a lot of course, and a lot of resources.

[00:38:42] Michael: Yeah, and I think. Defaults in these areas are still quite low for most people. Like in terms of like auto.

[00:38:48] Nikolay: settings. Tippo,

[00:38:49] Michael: Yeah. cool. What about analyze? Actually, I don't, I didn't look it up,

[00:38:52] Nikolay: again, like I think a, a single table analyze for single table is, it's not, I think as I remember, it's not paralyzed. it might be, but, , the key here is again, uh, we usually rely either on, , auto vacuum, so we need more workers. my recommendation is to consider, like you have number of course, take something between 10% up to maybe 30, 40% of them.

And this is because if you think any rights, they just Mark X. Marks or like, Then we need to really process it. And you need to look at a significant amount of resources for that, to manage your rights, right? So you definitely need to increase, uh, tobacco max workers. Three is not enough. If you have dozens, of course, of hundreds of course.

So, analyze,, either it's, uh, rely on your telecom max workers. And again, partitioning is good because, analysis of large table, especially if you globally increased default statistics target, like some people do, it's not good. But, also there is important case when you want to explicit analysis, especially after PJA grade. In this case you run vacuum GB CLI tool, for example. with dash J. And, , you, you are going to have problem if you have huge tables because des again applies at top level. It takes tables and run, analyze on them. So if you have partitioning, you can say, Des, after pjak, for example, we have maintenance window.

and we can use all course. We can situate resources because we, we want to minimize duration of maintenance window and possession with vacuum GB J you take number of course go, full speed and then you are limited again, si. Similar to my example with P dub, you are limited with like largest table size.

That's why also important to partition because in this case, physical tables, They will be much smaller and polarization , will work much better. So let's you need to shrink the stables to pieces,

[00:40:57] Michael: Nice. And we can link up our episode on partitioning as

[00:41:00] Nikolay: right. it's very related to pluralization,

[00:41:02] Michael: yeah, on that note, for further reading or if anybody wants to learn more about this, where would you point them?

[00:41:08] Nikolay: Oh, by the way, it's not only related to prioritization, but also also related to, gml to reads, if you select from many partitions, you need, constant exclusion. You need to, to deal with as few partitions as possible, and also, parallel workers might help you there as well. Definitely to read from different partitions and then to, result,

[00:41:32] Michael: Yeah, I haven't studied, I haven't actually looked into that, but it's still using the same like worker processes, right? Like it's still

[00:41:38] Nikolay: right?

[00:41:38] Michael: parallel scans.

[00:41:39] Nikolay: What I'm trying to say is that if you need to deal with many partitions in your query, multiple workers can be helpful.

[00:41:46] Michael: Mm-hmm. Yeah. True.

[00:41:47] Nikolay: It's very

related topic. Positioning is very related to parallels, obviously.

[00:41:51] Michael: Yeah.

[00:41:52] Nikolay: So, vacuum we covered creating, discover, analyze, covered. What else? Yeah.

[00:41:57] Michael: I think, we should leave it there.

[00:41:58] Nikolay: No, no. Ry store is important. Peter Dam supports d j, but Peter store also supports d j and there are improvements. I don't remember if you check release notes that some version, some fixes were done. So in newer versions it works better. Also, there is work with, foreign data wrappers and so on. It can be some, something can ized, but I don't remember details

about that.

[00:42:19] Michael: I think that the Postgres phone, a trap, for example, doesn't support it by default, but you can, there is a fork of it by PSM 64. Um, not, I I think that was experimental when they got sold.

So I don't, I dunno anymore about that On that note, there is a really good talk by somebody that used to work at Psalm 64 called Sebastian that I'll link up as thing is, it is parallels and dos and don'ts It's

about an hour, I think,

[00:42:44] Nikolay: Oh, great. I didn't, I don't remember it. worth mentioning also. So I would like to have copy paralyzed. Sometimes it's, you need to deal with single latch table.

And ization applies only at like higher level, so between tables. So if you have two tables, you can probably move two times faster roughly. But if you have single table per copy would be interesting feature.

And You can cook it yourself. Dump, restore with dealing with snapshots and so on. knowing your ideas, you can split your table and so on. It's possible. But also there is a create s which is paralyzed, right? So, and also use creation of them, with initialization of data.

[00:43:31] Michael: And refreshing materialized views as well.

[00:43:33] Nikolay: maybe something else, I'm sure something else. Check. of things valid.

[00:43:38] Michael: absolutely. Also for, for anybody looking at the, like wanting a recap of the basics, I did also have a blog post on. Like for people that did want to increase Max par, uh, max Parallel works together and the basics around that. I think we've covered it all here. but that's a shorter summary

[00:43:55] Nikolay: There are many, many answers there, but yeah, it

experiment.

[00:43:58] Michael: Wonderful. Well, thank you Nicola. Thank you for the people that requested this. and I'll see you next

[00:44:05] Nikolay: I have strong feeling. We didn't cover everything, but I, hope we, did good overview

[00:44:10] Michael: Yeah, absolutely.

[00:44:11] Nikolay: And I expect many more features in the future because it's obviously a huge topic and very important one for modern hardware. Thank you so much. Thank you all listeners. See you next time. Bye.

[00:44:22] Michael: See you. Bye.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Parallelism

Creators and Guests

Some kind things our listeners have said