Postgres FM | Transcript: Under-indexing

October 27, 2023 • 35 Minutes

Under-indexing

[00:00:00] Michael: Hello and welcome to Postgres FM. My name is Michael, founder of pgMustard, and this is Nikolay, founder of Postgres. ai. Hey Nikolay, what are we talking about today?

[00:00:08] Nikolay: Hi Michael, uh, you chose the topic under indexing You said this is the most popular case you see, like, when people like indexes, table like indexes, not people.

[00:00:19] Michael: Yes, well, last week, we discussed over indexing and the problems that can come about because of that. And then that, that made me think, wait, why have we, why are we covering this case when we haven't covered the, the case I tend to see more often? And I think is at least. In terms of raw numbers of companies, , or people, I think more people are in this boat, even if they don't think they are, and I think that's, that might be an important thing, I think you said you were going to do a poll, you might be about to do a poll on whether people consider that they are under indexed or over indexed, I think a lot of people have added a lot of indexes, but for their trade offs, Either some of those are not necessarily precise or optimal, or there are some duplicates in there, maybe like they have a lot of indexes, but that they still could afford to have a few more precise ones to really help their workload.

[00:01:12] Nikolay: well, yeah, I haven't, I don't have poll results right now, but, right, so, we think overindexing comes... So, after you are in the previous position, under indexing, right, so, and then over indexing and then pre side indexing, like evolution of the state of your index sets. So, Yeah, I don't, actually don't know, and the poll, it's not representative, even if a few hundred people answered still, we don't know, and of course, from my experience, when people, ask us to help, like consulting practice, uh, of course, we already see a lot of efforts to fix problems, and, , obviously, sometimes too many indexes created, and, Obviously, this is the state, not initial one.

It's already some developed situation. So, yeah, I agree with you that under indexing is also interesting topic, maybe more basic, but worth discussing.

[00:02:11] Michael: Yeah, and I've actually just realized, I probably should have realized this earlier, that it's possible to be both at the same time, right? Like, you could bring in a consultant, they remove... 10 indexes and add two. Like, it's possible that you're missing the perfect indexes, but also have way too many indexes.

So maybe it's a bit oversimplistic for me to say. Finding

[00:02:30] Nikolay: interesting that from, if you come to some project, removing indexes is easier. It's a simple task, actually. You check duplicates, you check redundant, unused indexes, right? And this is easier than finding proper indexes when some tables are under indexed. Because you need to analyze workload, and this is more complex research, basically, than just checking for redundant or unused indexes. I mean, those types of research are also not super trivial. Have some complexities, we discussed them, but finding proper indexes may be more difficult problem. What do you think?

[00:03:20] Michael: them is interesting. I think there are some pretty simple cases of finding cases of being underindexed as well. But clearing it up, I guess now that we have, especially now that we have drop index concurrently, it's quite easy to clean up, there's not much risk to cleaning up indexes, whereas adding an index, or then the downsides aren't as, like, if you're dropping an index, as long as you're confident it's not being used.

You're only seeing upsides, whereas adding an index, you have to convince people it's worth it as well, maybe. To be honest, I'm not sure we're ever in a situation where we're really, having to choose between these two. Normally it's a case of a quite clear cut win, but on that, like, before we dive too much into specifics, should we go through a few of the, like, how would somebody know what are some signs that you might be in a situation where you, are what we're calling under indexed.

[00:04:11] Nikolay: Sequential scan. The main first sign. You see sequential scan. And actually, if you see it, maybe first thing you need to do, not just go and create index, but first you need to check your Postgres settings, because still I see many managed Postgres services, service providers. They do, bad job.

Preconfiguring Postgres clusters. I see a random page cost being four in so many places still. They, they give you SSD disk and keep random page cost four, which is for magnetic disks. And this is first thing you need to check if you have sequential scans, uh, More often than... You expect an index scan, but you see sequential scan.

Even if you don't have index yet, it's worth first checking this thing, and only then start creating indexes, because otherwise you might see you created index, but it's still not used, because if random page cost is four, it means that Postgres will prefer sequential scans more often.

[00:05:19] Michael: Yeah. I, I thought this, I was actually quite surprised when somebody I know was migrating from a provider that tuned it a little, they tuned it down to 2. 0, not, not as low as you might suggest, but they were moving to a provider who didn't suggest copying across the config and had the setting as 4. 0 and saw no issues at all. So I, I do

[00:05:40] Nikolay: a fresh case. Let's drop some links. Crunchy Bridge. CrunchyBridge, like,

[00:05:46] Michael: Yeah.

[00:05:47] Nikolay: I said, great, like, super cool that, that they provide real super user, unlike most of the others, but, they also keep random batch cost for, what's happening there?

[00:05:58] Michael: 0. I don't know. I think, I think they make a good point that it's, well, in a way, it's not that important, right? Like it's important for some subtle cases and for some cases where we're scanning a lot of

[00:06:08] Nikolay: Edge cases, probably, right?

[00:06:10] Michael: yeah, but for the, for the standard, like primary key lookups, it doesn't make a difference as soon as

[00:06:16] Nikolay: agree, but edge cases happen more often than you, than we would like to, to have them.

[00:06:21] Michael: yeah. And as a managed provider, I

[00:06:23] Nikolay: Slightly more complex query and you already have this problem. I encounter with this problem very often, like we start optimizing and this like, I myself forget about it, that's why I'm raising this as a first thing, because I found myself in this position when I spent some time and then realizing, oh, random patch cost 4, so many times already, like I can count like 10 times maybe during last couple of years. So it's a lot already, and we should, we should fix it first and then move forward.

[00:07:00] Michael: when you and what, let, going back to what you mentioned, sequential scans, are you, do you mean, let's say in the, like statistics we're looking at, like in a, in the system views, we're looking at the percentage of sequential scans done on a table, or do you mean like looking at a single execution plan of a slow query and it's using a

[00:07:18] Nikolay: well, here you touch the topic of lacking the tools to analyze workload more holistically because, if we don't have auto explain enabled, pg_stat_statements don't have plans, and you might guess that there is sequential scan looking at, uh, buffer numbers. All right, compare them to calls and think, oh, so many buffer numbers, maybe sequential scan there. But of course, I'm mostly talking about, um, explain and analysis of a single query.

[00:07:49] Michael: but we can also look up, is it PG stat user, Tables, we can look at the proportion of queries that use sequential scans versus index scans. That could be, like, if you're under index, you might see

[00:08:00] Nikolay: Yeah, exactly. Actually, you are right. Yeah, at table level, not at query level, but at table level.

[00:08:06] Michael: exactly. That, that, that's one of the starting points that I would look at for, like, seeing if somebody is, like, thinking about overall, they just haven't added that many indexes.

That's going to be a real telltale sign there. But I was thinking even other, like, even higher level things, like, maybe they're saturated on IOPS, or maybe they have high CPU all the time, like, there might be some system wide effects of doing a lot of these table scans as well, right?

[00:08:34] Nikolay: Right, so it's a good point. Analysis of pgStats user tables, it's a good idea. But also worth remembering that some queries do need sequential scan. For example, dump or some reading whole table. Select count star from, oh, select count star, index count, maybe index only scan. It

[00:08:55] Michael: Yeah, but you're right, like, this is actually probably worth mentioning quite early on, that, I mean, I actually haven't looked at the percentage recently, and it will depend on, like, the size of the table, like, the number of columns and things, But, if you're scanning more than, I'm going to put out a number there, and this is wrong, but like it's roughly in the right area.

If you're scanning more than 20 percent of a table, chances are a sequential scan is going to be faster, or at least as fast as an index scan, and less expensive overall. I know, hmm,

[00:09:28] Nikolay: yeah, select count, if it's even whole table scan, I mean whole, you need all rows, count all rows, index only scan will be much faster than counting with sequential scan, obviously. So, it really depends. That's why, I actually, honestly, I, I forgot about PgStata user tables because I don't use it too often.

[00:09:47] Michael: interesting.

[00:09:47] Nikolay: prefer switching to the next layer. This is very top level. Of course, if you go from top to bottom, it's great. You start with, like, looking at tables. Each table has a lot, participates in many queries. But I prefer going second layer, queries, so PgStata statements. And there we don't have. The counter related to plans, and each query might, have multiple plans, depending on parameters, so auto explainer is good, and this is, like, anyway, , let's maybe skip it, we discussed this query analysis in other episodes, but, the bottom line, sometimes you do need sequential scan, for example, if you dump, it's inevitable, but, in many cases if it's sequential scan, You probably need an index, unless, as you said, we need to read too much of the table.

So, words cardinality and selectivity to these two words. These are important to understand. If selectivity is bad, means that cardinality is high, and we read too many rows, sequential scan maybe is fine.

[00:10:53] Michael: And the other exception is tiny tables. Like, if you've got less than 100 rows in a table, like a lookup table, and you see like a

[00:11:00] Nikolay: How many rows you mentioned? 10.

[00:11:02] Michael: I said 100, but let's say 10. Like, let's use a number that you can't, uh,

[00:11:06] Nikolay: 000 or just 10?

[00:11:07] Michael: 10 rows. Like, we're talking about a little lookup table, for example, and you see, you see 100 percent of scans of that table are sequential scans. Not a problem. Like, just not an issue.

[00:11:18] Nikolay: Right, right. But 1, 000, 1, 000 rows maybe in the scan already will be

[00:11:22] Michael: Yeah,

[00:11:23] Nikolay: Slightly, very, like, very small difference between,

[00:11:28] Michael: And actually, all of the things I mentioned are not a definite sign that you're under index. Like, high IOPS, high CPU, they could, they could just be like system load. It could be, you would have perfect indexing and still have those issues. But they might, they could be a sign. Like, like you said, high number of sequential scans isn't a sign that you necessarily are under index. But it might be. it's a clue rather than a, piece of evidence.

[00:11:51] Nikolay: right, but looking at the single plan, if you see sequential scan, when you know that you just read one row in a billion row table, this is a strong sign, you need some, you need an index.

[00:12:04] Michael: exactly. If it's an important query, if it's a query you repeat over and over, or like, if it's not one you're running like once ever, and then never needing again.

[00:12:13] Nikolay: trade off, right, you think. If,

[00:12:16] Michael: But yeah, I've got a couple of others that, might be... Useful to think about. So we've got the, high IOPS, high CPU, low index scan ratio, also like low cache hit ratio could be a sign, if you're using indexes nicely, chances are you are going to see more things being read from.

And then a couple of other more obvious ones are, slow deletes. If you're seeing that, there's a decent chance you're missing an index.

[00:12:41] Nikolay: Before, this is already, like, you dive to some specific topic. I mean, it's not very uncommon, but before we consider this, how about mentioning that you might have index scan, but it's still underperforming, and the sign of it, if you see in the plans, a filter. Some rows will filter out because, like, index is open.

Was created for different queries, basically. It works, but finds too many rows, like, extra rows.

[00:13:13] Michael: Yeah, exactly. And this takes us back to the first thing you mentioned, which was sequential scan. So if we're looking at a query that we think is slow, or is maybe one of our most common queries, or we're sorting pgStats statements by total time, and we're just looking for the exclusion point of one of those, and we see either a sequential scan with a large filter, As you were mentioning, an index scan with a large filter, then there's a, there's a chance that's not optimal , for our workload, and we'd be better off with either an index, in the case of a scrunch scan, or better, like a more precise index , for that access pattern.

[00:13:48] Nikolay: Right, right, so I, I also don't like filtering.

[00:13:53] Michael: But there's something, again, you might, say I'm diving too much into the weeds here, but there is something slightly different about these two. With sequential scan, it can be really obvious. The, the rows removed by filter tends to be quite a large number. Not always the exact number you're filtering, because chances are, if you're doing a large sequential scan on a modern version of Postgres, it will be parallelized at least.

So the number you're seeing will probably be Thank you. 3 times lower than the actual number being filtered because you have to multiply it by the number of, um, total threads or I don't quite know how to describe it. Number of workers plus the main. Yeah, but plus the main one, right? So what do you call those three combined?

[00:14:31] Nikolay: If you see two parallel workers, it means three workers already, right?

[00:14:35] Michael: yeah, yeah, yes. So that number can be already big though, and it can stand, it sticks out in plans, whereas sometimes with. In efficient index scans, we might be put like in a nested loop, for example, with thousands of loops and that number removed by like rows removed by filter could be, could say just one or two or five,

[00:14:56] Nikolay: the loop, right?

[00:14:58] Michael: but you have to multiply it by that number of loops to realize what the proportion and if it's the difference between rows removed by filter and rows returned by that node, the actual rows returned is is the giveaway there. And if you multiply it by the size number of loops, then you get a really good idea of how inefficient this is being and therefore, oh, could we think of a better way? But the, the key takeaway, you're probably not going to remember all of that if you're listening to this for the first time, but if you just, please, please, please don't just see an index scan in your query plan and move on and assuming it's the most efficient thing possible.

[00:15:37] Nikolay: so sequential scan is one sign, filtering in the, like, not our index, someone else's index is a different case, but there is also, you see in the scan, maybe no filtering, but also Top end sort or something, like in memory sorting, because, and you could do better if index combine couple of columns, which one, for example, is to support filtering the where clause, right?

And second is to order by. For example, it's like filter by date and then order by name, or filter by name, order by date, for example. If you combine both columns properly, in proper order, In this, in the same index, in a single index, it will be just one single index scan and that's it. No memory, in memory sorting.

So this is yet another sign that probably you need to think about different index, right?

[00:16:33] Michael: Yeah, absolutely.

[00:16:34] Nikolay: In addition to filters, filter with large numbers.

[00:16:39] Michael: yeah. And that's, that's another good point, actually. If we, if you consider the general access patterns of OLTP systems for, uh, the ones that we tend to see more of, I think people start with single column indexes when they're learning about indexing, and there's only so efficient you can get with those.

Maybe that that trade off is perfect for your system. But chances are that you're going to have some access patterns that return , a couple of different columns, and for those, you're never going to be able to compete on efficiency with single column indexes, even though Postgres does have some optimizations around that, than you are with a precisely ordered multi column index.

Thanks. Bye. The key takeaway for me there that I still see people getting confused about is the order really matters. Which column you put first really matters, which column you put second really matters and

[00:17:29] Nikolay: Let's, let's consider this simple example, which is very, very common. You filter by, by city, for example, you filter by city and order by, I don't know, by ID, in descending order. So if you have two indexes, one on the city ID or city name, if you put the name or the second is on ID primary key.

[00:17:53] Michael: Mm-Hmm.

[00:17:53] Nikolay: These two indexes cannot work together well here, so Postgres will decide to choose, which index to choose, and then perform additional either filtering, or if it's chose, if it's chose, index on ID, or if it chose index on CT id then it'll be stop and sort in memory. And in this case, you just need the, which index.

You need the index on CT ID, and then Id first filter, then, then, then order by.

[00:18:24] Michael: And the opposite order I, ID And then city would be pretty useless for that query,

[00:18:32] Nikolay: Right, right. It'll be useless because order buy is executed after wear. Where comes before it. So we need the, uh, CTID to be in the first place on these two column index. Also, if you create this index, you can drop, an index on CTID.

[00:18:51] Michael: Yes. So that, we can refer

[00:18:53] Nikolay: is what people forget back to the previous episode.

[00:18:56] Michael: previous episode for, yeah, for having too many indexes.

[00:19:01] Nikolay: Good, good. So this, we discussed basic things now, like we moved to the area you started to raise when you said foreign keys and so on. But I would say there are many, many cases and foreign keys is one of them. When we talk about. Uh, we need this index, but not too often, and this is like difficult, because you need to decide, because if you delete From a table and wanted to have it executed fast, but you do it only once per year, maybe you don't need to keep the indexes, which support this delete, because the indexes occupy disk space, they slow down updates and inserts, as we discussed, and they even slow down selects, as we discussed, I mean planning time, so 16 locks, execution is slowed down as well, so maybe You don't need those indexes, so every time you need to decide, okay, I have this report, for example, or this delete with foreign keys, uh, cascaded delete, right?

But it's executed once per what? Once per week or month or year. Do I need those indexes? Or maybe I need to create indexes and then drop them, I don't know, when we do some cleanup, some temporary indexes. So it's interesting, because there is trade off here. And this price you pay for indexes, and I'm again, like, echoing our last, previous episode, right?

But let's, let's, let's consider particular examples of some cases, like foreign key is interesting one, right?

[00:20:39] Michael: Yeah, I don't think I said foreign key. I love that you jumped straight to it. I think I was giving specific

[00:20:44] Nikolay: support deletes, okay,

[00:20:46] Michael: yeah, yeah. But maybe, maybe the mistake I made was saying slow deletes, because what I should have said is slow queries. Like, it's the, the sign that you're missing an index.

The first sign is like, the web page is slow for users. Like, that, that, these are like the, I know we're back to basics. But it's slow deletes are a subset of slow queries in general, right? Like it's the slow query and the slow delete is, uh, is one I mentioned partly because I think it doesn't fit into the other categories because we can see a slow delete without a sequential scan.

We don't see the sequential scan in the execution plan. So it was more that it's

[00:21:24] Nikolay: you mean delete, because Cascaded delete. When we, we have delete on the main, on some table, it, it works using some index scan, it finds rows to delete using some index scan, but it also needs to propagate deletion to reference sync tables, so which have foreign key to us, and in this case, if we don't have, special index on those tables, Postgres will need to perform sequential scan on that table to find all rows which reference to all rows we delete in the main table.

And this is a better situation. Many, DBA kits, toolkits include a special report to find such, under indexed foreign keys. Usually on one side of foreign key we have primary key usually, right? It's okay, but on the other side we sometimes don't have it. And, if we think about it... This is the area where a reactive approach doesn't work well, because probably you will like live quite long without encountering this plan at all.

And then someone decides to delete something. For example, delete some user and all data this user produced. Very common case. And suddenly, if you don't use some asynchronous approaches, you want to delete everything, maybe it's not a good idea, but it's a different story. In this case, suddenly, like, you don't, you didn't delete anyone during couple of months of your project, when suddenly you try to delete, and it works very slow, so proactive analysis of such situation using that report.

And, uh, creation of indexes proactively here might be a good idea, right, but it depends, again, it depends a lot on your use case. You also can,

[00:23:06] Michael: it. I was gonna, I was just gonna say that I think it surprises some people that Postgres doesn't index foreign keys by default. So you can define it as a referencing, but you, like, with a primary key, we get the index,, implicitly behind the scenes to guarantee uniqueness. but it means we can use it for lookups as well. But foreign keys, we don't get indexed by default. But some frameworks do. So, like... Oh, what are you going to say?

[00:23:30] Nikolay: well, indexing foreign keys sound strange to me, because, like, again, it's, we have two ends of foreign key, right, one end, indexing of one end,

[00:23:42] Michael: ending an index to the referencing column. But Django, for example, if I add a, if I add a second table with, with a, well, it's not called that in the Django world, but if I, if I add this into my model, I'll get an index on that, on the original table added for me. And I think it's quite a sensible default for RLTP.

[00:24:05] Nikolay: yeah, yeah, but I think, This also a case when, again, like, this is a case when we need some proactive analysis and, uh, uh, understanding trade offs. There are many such cases, a lot of such cases. For example, we have, uh, monthly reports, right, and, uh, and we didn't prepare indexes for such reports.

Or, I don't know, like, for example, we have, very... So, we have very high flexibility in our UI, allowing people to filter on too many, like, in extreme case, we allow users, when working with data, we allow them to order by anything, to filter by any column, combinations of them. And this is, like, the worst situation when we try to understand which indexes we need.

And, for example, we analyzed, uh, last week, we're doing fine, but tomorrow somebody won't, uh, filter on something new, which wasn't used before. And this is not covered by any indexes we have, right? So, I'm trying to, to, like, say we have situations. When it's hard to predict all possible cases and things like ORM or graphQL, they give power to users, but this power needs to be supported by indexes and you can say, Quite like, this is very high probability that if you take some project which says we give full freedom, full flexibility to our users, you can say most likely this project is underindexed, because you can find some situations when we don't have index and we, we like have sequential scan, for example,

[00:25:56] Michael: Yeah,

[00:25:58] Nikolay: what to do in this case,

[00:26:00] Michael: so, I mean, we're getting close to the point where we're saying what to do when you're almost optimally indexed and you've got a decision to make. I personally think that most people aren't in that situation and the overhead of adding another one is going to be worthwhile. But, as we've discussed in the previous episode, if you take that, to its logical conclusion, you're going to end up in a situation where you have probably more indexes than is optimal.

, but maybe you're dealing with more complex situations than I am. But I think a lot of applications are quite simple. Like a lot of, a lot of applications only have a certain number of access patterns to each table. And you actually, you can't, once you've got a few indexes that are really precisely for certain features, you're probably going to be fine until you add a new feature or until you scale beyond a point where, like, like we talked about in the partitioning episode, you know, like where you have to make a bigger change.

So yeah, I think adding a new feature is a good, like, case where you should reconsider. But, maybe I misunderstood what you were saying. There was a little bit of trouble in this.

[00:27:08] Nikolay: right, so, so let me repeat, if you have some application with strict usage patterns, You understand how exactly users can use, uh, which queries they can produce, uh, under the hood. This is okay. Like, you can, you can build proper set. You can be, you can find very good set of indexes and, and then just slightly adjust, revisit it from time to time.

But if you build some platform... On top of Postgres. For example, extreme case like no code platform, which provides like very flexible way for like, or even developers, and you build like some, some REST API, GraphQL or something, and you give a bit like full freedom, filter by anything, order by anything, all columns are possible, like build a column, build a table with hundred columns and use all of them to filter, order by, and like do it On client side, from browser or something.

This is extreme situation and you don't know, in advance you don't know. Which indexes you will need. Right? This is a bad situation. And obviously you are under indexed because you cannot create index on each column, and then two column indexes on each combination. It's like, it's a bad idea.

[00:28:25] Michael: yeah, well this reminds me, I think when we tweeted, probably the last episode, it must have been the last episode, somebody replied saying, came across a client recently that had 50 indexes on a Uh, this, on this specific table, is that a new record? And then somebody else replied with one of Heap Analytics blog posts.

So, Heap are kind of in that situation, right? They let people dump, like, it's analytics data, right? Into Postgres, and they don't know what people are going to want to look up and order by and sort. So they... That's effectively a custom analytics product on top of Postgres, right? So I'd probably direct them at this

heap

[00:29:01] Nikolay: it was, it was not many columns situation. 100, 000? I don't remember. I don't... Millions of indexes, uh, partial ones. So, they indexed like a few, some rows, not whole table, some rows of it. And they also decided to go with create index without concurrency. They explained why, I don't, I don't remember. It's interesting article, but it's a special case. It's

[00:29:35] Michael: think it's because you can't do more than one create index concurrently at the same time.

[00:29:40] Nikolay: Uh, yeah, maybe because of that, right? Mm hmm.

[00:29:43] Michael: Which, when you're doing millions, is a problem.

[00:29:45] Nikolay: All right. All right. Interesting. Yeah. But it's not, uh, what I'm talking about when we have many columns and we need to expose this table via REST API or GraphQL and allow end users to choose which, like, filtering and ordering and this, this is bad. I mean, this is not, this is hard problem to solve.

[00:30:08] Michael: Did you, uh, I, I'm not, I'm not as familiar with that, um, as it sounds like you are, but there was a, there was a blog post not that long ago from CyberTech, I think, talking about a new, um, A new extension, I believe, that was to do with this kind of like, it was, I think it was designed, uh, well, the example they gave was for like an online shop that wanted to let you filter by any category.

Like you want to pick men's t shirts and then color, and at each stage, they wanted to give you a count of how many there were in those categories. Um, and it was like a, a new algorithm for. Getting those counts quickly on any custom dataset. So I'm guessing there's something in that.

[00:30:56] Nikolay: It's called faceted search,

[00:30:58] Michael: That's it, yeah.

[00:30:59] Nikolay: Right, right.

[00:31:00] Michael: Is that related?

[00:31:02] Nikolay: Well, yes, but if you're building e commerce system and need this, it's a solvable problem. I'm talking about platform, like, platform for people who don't want to dive into index details. For example, think about Superbase, for example, right, but maybe even more, like, People who don't understand databases and don't understand indexes, like, and they need just to visually create tables and then build some mobile app or something.

This is, this is not good. Like, this, you probably still need to think how to restrict the freedom and think about indexes. Without understanding indexes, probably you won't achieve good performance if you have a lot of data.

[00:31:46] Michael: So you're saying as a platform provider, how much should you index things by default, and how much should you let the users have, like, power to do that kind of thing?

[00:31:54] Nikolay: Yeah, yeah, yeah. So maybe you need to restrict the paths and maybe if you have GraphQL, maybe you need to restrict the possibilities it provides.

[00:32:02] Michael: or... Yeah, I mean this is just a strict trade off, right, either you restrict things and try and give a certain amount of performance as a minimum, or you accept poor performance, or you have something in between, like, like I've seen one product for example, Land on saying, if a query takes more than, like, if a user is changing a dashboard and it takes more than a few seconds to load, say to the user, could you restrict it a little bit more?

Like, could you filter down to a smaller date range or something else to give it a chance of doing a faster scan? So, there are, like, user facing things you can do, I think, in those situations, but we're in the strict trade off land, aren't we?

[00:32:46] Nikolay: Right, right. I'm just trying to say if you don't, if you want to avoid indexing topic, maybe you won't solve the problem. You need to understand indexes and you need to make some decisions probably leading to some restrictions.

[00:32:59] Michael: Yeah, I

[00:33:00] Nikolay: this, yeah. Yeah, good. And what else? Like how, uh, how to build indexes faster maybe?

[00:33:06] Michael: Honestly, I think this is probably, I think this is enough. There's so many articles I see go into kind of the different index types and they go into partial indexing and index on expressions. It gets complicated fast, but honestly, the basics are Well, I guess they're the basics for a reason, but they, if you, I think you can get a long, long way just with single and multi column B tree indexes.

but yeah, obviously everything else could be left for another day.

[00:33:30] Nikolay: Right, or gene index for some more complex data like JSON or arrays. But again, like, let's, let's finish with a reminder that if you decided to add an index, also consider the cost, the price you are going to pay for it. It's not only the storage, it's negative effect of existing, on existing queries. It will be small, but in some cases it's significant, especially if you had hot updates.

Keep only tuple updates and then you create index and lose them. Updates can be significantly slower and heavily loaded system may be very noticeable.

[00:34:07] Michael: And, and if you're in a even moderately loaded system, you probably don't want to just create an index. You probably want to do create index concurrently.

[00:34:16] Nikolay: All right, but we're going to discuss this next time, how to speed up and do it right and so on. Okay, let's do

[00:34:22] Michael: That's how to slow it down, but how to keep it, how to keep it safe.

[00:34:26] Nikolay: right, two times slower, right. Okay, good.

[00:34:30] Michael: Nice one. Thanks so much, Nikolay. Thanks everyone.

Catch you next week.

[00:34:34] Nikolay: bye.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Under-indexing

Creators and Guests

Some kind things our listeners have said