Postgres FM | Transcript: Postgres 16

September 22, 2023 • 40 Minutes

Postgres 16

[00:00:00] Nikolay: Hello, hello. This is PostgresFM episode number 64. This is Nikolay and Michael. Hi, Michael.

[00:00:06] Michael: Hello, Nikolay.

[00:00:08] Nikolay: And we are going to talk about Postgres 16 yet another time. release happened last Thursday, which is good, ago. what do you think in general?

[00:00:19] Michael: Oh, I'm really happy and I think it's great, always great to see another major version. The team's really impressive actually, shipping every year consistently, pretty much exactly every 12 months even. it's very impressive and lots of good things again. A couple of hundred features, we definitely can't list them all.

I have seen and heard people's opinions that there weren't necessarily any huge features necessarily in this one. But, I personally really like that they are continuing to improve some of the things that Postgres is best at. Continuing to improve on the reliability, a few little features that help with that.

Continuing SQL standard compliance, and on... Performance, obviously my favourite topic, but it's often a big reason for choosing Postgres, so I don't want to discount that these are important changes, even if they aren't necessarily shiny, brand spanking new shiny features. Yes,

[00:01:18] Nikolay: Yeah, right, right. I'm in the camp, uh, who say, no breakthrough features in this release. Obviously, there are several very good ones. I, I wish, uh, I had already all of them on, on my production systems, but in general, yes, it's like convenience release. a lot of hard work, of course, you mentioned like 200 changes, and it's, by the way, it's normal number for last year's, every, every major release, roughly 200, 200 something changes.

But I would like to say that, Without breakthrough features, it's also good because it means, It's less, uh, like, it's more reliable, DBS should love it because you have a lot of improvements. And imagine if, for example, threads idea already was implemented in this release. I would wait probably until 16.

10 or 20 before trying it on production. But, uh, with this, the system is becoming mature. In terms of features it's already providing, they are improving, improving, like logical replication. it was added many, many years ago. Six, right? Or how many years ago? In Postgres 10, right? I think so,

[00:02:41] Michael: wow, six years ago now.

[00:02:43] Nikolay: right.

And, uh, it was not super complete in the beginning, of course. And then logical decoding originally, right? Right. And now, last few years, a lot of very good improvements. Again, still several things in to do and I hope 17 will have, , more things. Uh, like replication of DDL, but, uh, this release, uh, adds two important improvements for logical replication and decoding, and it becomes maybe more than two, right?

And it becomes, yeah, definitely more than two. And it becomes, uh, more, like, mature, it means that, uh, upgrading should be less risky and, uh, we just benefit, In terms of performance and features and capabilities, so like , replication from physical standbys, super good, I think. So, my point is, uh, this is not breakthrough release, but, , it should be adapted with less effort.

And sooner, I think, I wish upgrade process was simple, we discussed it many times, of course, upgrade is always a task, but once, if you already use, for example, Postgres 14 and you go to 16, or 15 to 16, it's just a lot of good things to have, And, risks are not super... risks are not super high as it would bring, um, threats.

Or something like changing source code, uh, 50 percent of source code or something. yeah.

[00:04:24] Michael: I love, seeing it that way. I hadn't considered it, through that lens, but yeah, great point.

[00:04:29] Nikolay: yeah, DBAs are conservative guys, usually. They prefer some reliable... Software to run. So expect 16 should be very reliable because it's just polishing things and improving many, many small pieces.

[00:04:45] Michael: yeah, well even developers, uh, quite keen on their database. Postgres is just working in inverted commas. So I think there is a big selling point of Postgres that has been around for so long. It has been, you know, it's run by sensible people, smart, sensible people who are also risk averse. So yeah, that's cool.

And as you say, logical replication is continuing to improve. Two headline features, or two out of the seven or eight features that get called out in the, at the top of the release notes, are logical replication related. Would you like to start with those?

[00:05:23] Nikolay: Yeah, let's discuss it. First of all, performance improvements for long transactions, right, so it can be, processed in parallel mode on subscriber, which is improvement, which is transparent for you if you use logical replication and you have longer transactions, you just benefit from it, that's it.

And this is good, because usually we If we have a bottleneck on the subscriber side, as we discussed last in last episode, which was about logical application. By the way, it's natural for us to start with logical because we are still in that context. So if you have bottlenecks on the subscriber side, usual trick is to use multiple, , slots and multiple subscriptions, publication subscriptions, but it's, it has downsides as we discussed.

Foreign keys, uh, uh, violation. Is normal for this case, or you need to, , say, I want foreign keys to be maintained all the time, but then you will have lags and so like, it's, it's quite tricky. And if you use a single slot, single subscription, , obviously you don't have a lot of choice how to speed it up, but now for long transactions, it's already improved.

Automatically, transparently for you, it's a good thing. Another thing is that you can logically replicate from physical standbys, which is a very, very good thing to have, because we usually say, like, it's a risk to have higher load on the primary. Of course, if it's just one wall sender and you have many CPUs, it won't saturate your CPU or disk or something, but if there is another risk to be out of disk space, All of this on the primary and, uh, but it can be mitigated using relatively new setting.

I don't remember the name, but you can say I don't want to reach, to exceed some specific threshold for my slots. So, yeah, but still, I don't, I just won't want to mess up with primary in any way. And I can now move to this to. Secondaries, it's very good for, I mean, logically replicate or to other systems like such as like Snowflake or Clickhouse or something for analytical needs.

[00:07:43] Michael: exactly, that's the use case. I have heard for it, . Somebody asked me if this was possible a couple of years ago, and I have to say, unfortunately not because they, they wanted to replicate to, I think it was actually Redshift, just for analytical purposes, and they didn't want to put that extra load on their primary, but they had a replica.

Nearly nothing, you know, serving some read queries, but, but it was a little bit up, you know, they, if, if it's asynchronous, I think they had it set up asynchronously. They didn't mind if their analytics were, you know, however much their lag was out of date. It didn't matter that much. So there's, that's, that feels like a really good use case for this.

[00:08:18] Nikolay: It's also related to how logical replication survives switchover or failover, but it's not that. I mean, I saw this work is still in progress and we discussed Patroni implements some workaround to survive failovers. But it's about primary, so if you have logical slots on primary, how not to lose them if a lower happens?

This is the question. Uh, I think here we also have a question. If this secondary is down, we don't want to start from scratch. Oh, two more features about logical I, I now remember. One is, related to the ability to filter by origin, right? It's like MultiMaster is, is native now.

[00:09:05] Michael: Yeah. So this, well, uh, be careful. I saw a very excited blog post about this saying, Whatever multi primary is now possible and showing you how you can set up writes from two different nodes, uh, synchronizing to each other because of logical replication, you can now filter any that came in Uh, I'm forgetting the wording that they used for it, but any that came in on that node can get replicated.

Yes, origin, perfect, but any that came in via replication, you don't send out again. So if you were told it by a different node, but there's a ton of, , issues with that, aren't there?

[00:09:40] Nikolay: Without this feature you have infinite loop risk. Right, so, like, you replicated something, it was replayed, written to a wall, and it got replicated back, and you have infinite loop, and so this feature allows you to break this loop and replicate only your, like, local changes, not those which were already replayed by you from the beginning.

Like from different origin, right? And do you think I was wrong like saying it's not a big release? Because imagine this could be named as like multi master support is there, is here. Why not?

[00:10:18] Michael: I don't think it's a good idea and I know you're joking, or at least teasing me, but I think it's

[00:10:23] Nikolay: I'm partially joking.

[00:10:25] Michael: I think it's dangerous because I think people would actually use it for that, and then they would end up, having loads of issues around, data synchronization, or, you know, Basically, I think that was a good blog post by Conchi Data, who just showed the most basic version of updating a row on one of them, leaving the transaction open, just to, you know, just make it really easy to demonstrate.

Updating the same row, but with a different update, on the second primary, then letting them both go through. And you end up with, in fact, ironically, I guess, with each one getting the other ones changed, but not their own. So it's, yeah, I think there's, there's

problems that people wouldn't anticipate.

[00:11:08] Nikolay: can we say now that Postgres natively supports multi master? Topologists.

[00:11:17] Michael: With an asterisk?

That would be my argument.

[00:11:22] Nikolay: All right. Okay. Well, I, I, think multi master is an intentional split brain

[00:11:28] Michael: Okay, there you go. Mm hmm.

[00:11:29] Nikolay: and still we have a question how fast is to replay a logical replica, logically replicate, like if, for example, some update updated thousands of tuples, how cheaper it is in terms of IO and operation CPU as well on the subscriber side compared to the application side. It's actually not trivial, but also not simple, not trivial, but also not super difficult benchmark to do. I wish I checked it already. It's an interesting question because I remember BDR claimed that replaying is cheaper. We discussed it. Yes. And, um, if it's indeed in some cases cheaper than, in some cases, it's good to have such topologies and multi master approach.

Bidirectional logical replication.

[00:12:21] Michael: And, and if a customer, if a customer asked you, if they should set up a multi... Multi primary setup with Postgres 16. Tomorrow, what would you say?

[00:12:33] Nikolay: Well, uh, right now I will stick with my regular answer. You don't want to multi master and then we dive into details and explain why. But still, like, for, we can say already Postgres supports such topologies.

[00:12:48] Michael: Okay.

[00:12:49] Nikolay: Okay, let's, let's, let's change the topic. Also logical. There is interesting feature. We also discussed it by, like, related to what we discussed, I explained in last episode, I explained how to convert physical, replica to logical replica.

And interesting that in Postgres 16, you can specify that you want to initialize your, in your logical replica, you want to initialize table using binary format.

[00:13:18] Michael: I missed that.

[00:13:19] Nikolay: I never tried it, but as I understand, it's a similar thing. I mean, bloat will be preserved, and it's, it's faster.

I mean, indexes, just copy it as files, you don't, need to wait your subscriber node to rebuild them. Right? So this is what my understanding of this feature. Again, I haven't tried it. I tried different, like recipe involving, , recovery target LSN, which we discussed last time, but it's interesting.

Like, , I think I will use it in many cases because it's like now natively supported and I can. Just provision my logical replica, preserving all the bloat, right? But, I mean, downside, you don't lose the bloat. It's kind of, you're losing this benefit. But, good thing is that it's faster than dump restore approach, which is default.

[00:14:16] Michael: natively supported, right? Like it just feels less, uh, there's less moving parts on your side. Less

things that you can do to...

[00:14:24] Nikolay: just one sql command and you have it, it's good. So maybe enough, logical today.

[00:14:31] Michael: There's a good segue, yeah, there's a good segue, you mentioned the parallelisation, and it's not the only additional performance improvement around that, so that, there was some planner improvements to allow parallelisation of more join types, full joins, and write outer joins, I think?

[00:14:50] Nikolay: Right. Uh, I'm curious why right,

not left.

[00:14:53] Michael: don't see many write joins in the wild, but...

Plenty of full, like full joins. well, do you see more?

[00:15:01] Nikolay: I, I write them sometimes. Well, I

[00:15:05] Michael: Oh, oh, really?

[00:15:06] Nikolay: Why not? It depends on, on your point of view, you know.

some languages have right to left writing also, so some people write with left hand, some people write with right hand. It depends, and sometimes you start, if you have, like, quite complex query, you might want to add one more table in the list of...

If you are source, source tables, and that might happen. The right John is more convenient for you. Of course. I, I, I agree. Left is more like, more popular because, because uh, of the way how we write queries. I write, write queries. Why not? I write, uh, all the types of queries. Yeah. Let's maybe move away from ization.

Let's talk about performance. What else about performance?

[00:15:58] Michael: There's one more that mentions it in the headline features although I think it's possibly more on the kind of reliability maintenance front but improved performance of vacuum freezing

[00:16:08] Nikolay: it's from Peter Gagan about, uh, much less bytes written to wall, right? Like 5x improvement.

[00:16:18] Michael: yeah.

[00:16:20] Nikolay: And it's a part of different work. It's part of, uh, um, reconsideration of how to perform freezing like more proactively, I think.

[00:16:31] Michael: Yeah, exactly. And, and to avoid, well, so this is why I think it's more reliability related because it's to avoid those heavy anti wraparound vacuums that we've talked about several times, , tripping you up on various things. So, I think it's more around the kind of making, making vacuum more predictable and not as likely to, you know, , Hit us when we least expect it.

So I think that's really cool. It's a performance improvement in a way, but the benefits are further reaching, I think. And then the other big performance one I did want to call out was...

[00:17:07] Nikolay: I just, like, I just, I smiled a little bit because I, I haven't to do, to do, to read list on the article this morning from AlloyDB, Google, Google

cloud, uh, about how they implemented adaptive autovacuum. Which is interesting, like, also to read about this, I mean, AI performing vacuum, many people dream about getting rid of vacuum completely, right?

But let's, uh, let's, uh, AI solve this problem, not getting, I don't know, like, I, I need to read about it.

[00:17:44] Michael: I really like the approach Peter's taking, and presumably others as well, but I think he actually explained this quite well around the indexing changes he was making, but he seems to be doing the same around vacuum, which is... Trying to improve the efficiency of how it works and when it works, so sometimes doing a little bit more work up front to save lots of work in the future.

That, that to me seems so smart and so, like, it makes so much sense for that to be in the database, like, uh, so, deterministic... Approach, you know, if, if you can do a little bit extra work now to save more expensive work in the future, maybe that's worth the overhead.

[00:18:30] Nikolay: Right, and, and Postgres 13 and 14, the optimization related to B3. And the word deduplication, as I understand, appears here again. This optimization for wall writes from freezing, it's also idea of deduplication. Am I right or not?

[00:18:48] Michael: I've seen that word as well. I wish I understood it. I think we have, maybe we have to have a Pete on at some point to ask him the details. But, , yeah, it, I tried reading up on some of this stuff and it started hurting my head.

[00:19:00] Nikolay: Well, many more, many very small, like, local optimizations happened as well. I like the idea to improve setConfig. Performance. It's interesting, like, uh, if you use a lot of, changes of, uh, GUC, Postgres parameters, maybe you have your own Postgres parameters, including dot. You can say blah, blah, dot, blah, blah, and, uh, assign some string to it.

So this, like, became, like, much faster, maybe like 10x or more.

[00:19:36] Michael: Yeah, I saw your note about that.

[00:19:37] Nikolay: yeah, yeah. And also, uh,

[00:19:42] Michael: Who doesn't think there are enough Postgres parameters already and creates their own extra ones? That's some brave people.

[00:19:48] Nikolay: I do it all the time.

[00:19:49] Michael: What do you create?

[00:19:51] Nikolay: mean, first of all, extensions do it. I don't get your joke, so I'm

[00:19:55] Michael: Oh yeah, yeah, yeah.

[00:19:57] Nikolay: for example, auto explain, you should know about it. Also, sometimes I put a lot of settings there. Sometimes I prefer putting settings, application settings in a table, but sometimes I put them as G U C, GOOC, GOOC, GOOC, uh, just, I say, like, up.

Dot something equals something. For example, if you, if you use Postgres, it's natural for you, like, to put something to this, um, uh, GUK, uh, uh, parameter. It's, it makes some sense, uh, and, uh, it's good to have several choices, and you just choose what's more convenient and, uh, secure and efficient and so on.

Another very small but interesting optimization, um, wall receiver is more smart in terms of waking up if you have a server which is not doing anything wall receiver will be doing much less work and also promote trigger file was removed to avoid some checks so it looks you know it looks like i've read one article i don't remember in which it looks like This idea to optimize these two parts looks like an attempt to save energy.

If you're not using some Postgres server, it should do less movements, right? And not to, like, it's eco friendly changes. It's funny.

[00:21:24] Michael: maybe, maybe we shouldn't be doing checkpoints every five minutes, yeah.

[00:21:27] Nikolay: Yeah, yeah. and and also create wall. Uh, there's also timeout when new wall is created, filled with zeros. Right? So like, maybe also, like, it's an interesting, I remember discussion about checkpoint timeout, ah, log checkpoint. Uh, it became default a couple of years, versions and years ago. Right? And I remember discussion like, uh, it's bad because it will produce, records to log and if nobody is using server, we are filling the log.

Why? But it was made because, , benefits from having this log checkpoint in

16, as I remember, now we can see in checkpoint log messages, we see, LSN positions of, uh, checkpoint itself and the redo starting point. Which is sometimes

good to troubleshoot. Yeah. So there is a lot of small convenience things. I think it's a convenience release, a reliable convenience release.

This is what my impression, and which is not a, not a bad thing at all. Right. What

[00:22:38] Michael: One. Yeah. Well, a brand new thing, is the pg stat io view that we've

[00:22:45] Nikolay: Oh yeah. Yeah. The author is Melanie Plageman. Am I right? Yes.

[00:22:53] Michael: I, have no idea if you're pronouncing it right, but Yes.

[00:22:56] Nikolay: Right, I was at PGCon and was at her talk . So it was good talk and congratulations, this is very noticeable contribution and people need it. personally am going to use it all the time and I hope, , observability tools developers will, integrate this to their, like.

Dashboards, and so on. For example, NetData, and so on. This is a good thing to have. And it provides our favorite buffers, but globally for database. Which it doesn't

provide... Yeah, timings and many details. It's good. you understand, who is doing work. I, I, your work is the most expensive in databases. And in most cases, slow means a lot of work to be done, right?

A lot of I, your work in databases, because I, I, a database is an application, which is I, your intensive application, right? So I, your bound, we usually I, your bound. And. The main thing to understand about this new system view is that it's at Postgres level for a buffer pool, it's not about disk. It's about buffer pool, so, obstruction is high.

It means that if you see a lot of IO numbers, maybe they are, if it says reads, maybe it reads from buffer pool, for buffer pool from page cache, and under buffer pool we have page cache. So, maybe it's still all about memory, we don't know at this level. So reads is not discrete, it reads from the page cache to buffer pool.

[00:24:35] Michael: it could be either, but we don't know Right. E

[00:24:38] Nikolay: Yeah, at this level we don't know, so we need to consult, for example, Proc IO using our process IDs or some different tool. But it's already very good, like, I mean, for, at this level, it's super important to have and, , useful. Ah, and also it doesn't track some IO. It tracks IO only related to the buffer pool, it doesn't track IO, for example, wall writes or reads.

[00:25:01] Michael: exactly. But, but it does for like auto vacuum or you, you mentioned you're gonna use this all the time. What, what kind of thing do you imagine yourself using it for?

[00:25:10] Nikolay: So, if we have a problem in any analysis, we need to divide it into segments. it's about performance optimization, anything. You need to perform some segmentation to understand where to dive in further. And this is perfect starting point, so I understand, okay, we, database is slow, some developer says, they, they like to say this.

Database is slow. We look at this, better to draw this in terms of like a graph, colorful graph. With segments, ok, this IO is from backends, this IO is from checkpoint, or this IO is from autovacuum workers, and so on. And I quickly understand where the most work is done. Is it about backend or checkpoint or, or where?

And then I, then I can go to PGS io tables indexes or to PSTA statements. I can go down to psst KC if I have it. This is will be physical level for queries or I can analyze, uh, weight events also. Additionally, this is, I mean, this is perfect starting point for. Performance troubleshooting.

[00:26:17] Michael: Makes sense. And also, I guess, so if you could rule out that it's an IO problem, I know it probably, probably will be, but if there's nothing lighting up here, in your monitoring of this view, you can also rule out that it's, uh, either that it's IO related or that it's, uh, maybe it's wall related, but

[00:26:33] Nikolay: Yeah, Yeah, we have, we discussed monitoring and troubleshooting and we have some kind of run book. And also a list of what, which, what monitoring should have and troubleshooting runbook number one means like you have 30 seconds, you need to understand which directions to dig. And, I'm almost sure we will include pgSet.

io in versions for PostgreSQL 16 and newer. So as a starting point for this quick and shallow analysis. As wide as you can and very shallow.

[00:27:06] Michael: Yeah.

[00:27:07] Nikolay: This should be a part of methodology, I guess, for troubleshooting of performance.

[00:27:12] Michael: Yeah. And, and in terms of tooling, it was, um, The PG analyze team, Lucas and one of the developers were involved in reviews of this, so I'm pretty sure they'll be adding it to, uh, PG Analyze. I, I, I

[00:27:25] Nikolay: Very likely. Yeah, very likely. Okay.

[00:27:31] Michael: Next thing I had, um, I think it's important, not because I've had many people talking about it for 16, but because people talked about it so much being reverted from 15, was the sequel Jason stuff, so the sequel standard, um, Constructors and identity functions. These were reverted from 15 because they weren't quite ready.

And it was a big deal, lots of people saying they were disappointed. And now that they've been added, or at least most of them, I was a bit surprised that they didn't make a lot of, um, a lot of the... Various things I've seen on Postgres 16 already. So yeah, I think it's cool and I think standard, uh, compliance is important.

I hear about it quite often when people say why they're moving to Postgres or why they're picking Postgres in the first place. So I know this work can sometimes be thankless. So I, uh, I just wanted to say I appreciate this, all that hard work.

[00:28:26] Nikolay: Well, yeah, postgreSQL for many cases is standard de facto already. I mean, it's syntax. And we know a couple of Postgres folks are, are members of SQL standard committee. And, which is good. And of course, uh, supporting standard is always good. I wish, like I, I, I work with Jason in SQL context all the time.

I wish, uh, the Nix was less bloated and their ideas how to change it, but it would require change of grammar. Heavy change of grammar. But still, I agree with a you standard support is a good thing to have. on this note, syntax, I think breakthrough change is, now we don't need to specify alias for subqueries in from, right?

It hits everyone, everyone, like who writes SQL at some point, uh, you see this error for sure.

[00:29:21] Michael: helpful error. It does tell you exactly what you're doing wrong, but

[00:29:25] Nikolay: yeah, it says just add as alias. And

[00:29:27] Michael: Yeah,

[00:29:28] Nikolay: I think it's not standard, this change.

[00:29:32] Michael: I suspect the standard requires you to need one, but I know other databases, I think this has come up quite a lot in migrations, like migrations from other databases that don't force you to do this, and it must be quite annoying.

[00:29:46] Nikolay: my point. Here we probably deviate from standard, but for the sake of easier migrations, it's a good thing to have. But I personally am going to stop writing those addresses. Closing parentheses and space underscore was always already, I have a habit of already writing this, but now I can drop this.

Well, not now, in a couple of years when production systems will be on 16. I will, I'm going to drop this habit, which is good. I like more convenient. Yeah. Slightly back to performance. Um, I'm one of, guys who like to use some kind of newer, it's not new already, but newer SQL, syntax.

For example, you can order by inside aggregate, aggregate function. Which is interesting. Sometimes, for example, for DBA, for example, you select from Pageset Activity, and you, you group somehow, for example, by state. Right? Idle, active, idle on transaction. And you want couple of examples, few examples of queries, you cut them using left function, for example, taking only like 10 first characters.

And then you aggregate using string. uh, agg or array agg, and then you think, oh, okay. I don't want arbitrary examples. I want like the, the longest lasting queries, for example. And you say inside String agg or array agg you write right inside there you write order by, uh, query, start or exact start, , and it limits, uh, limit will be applied, uh, using array.

Indexes different, but you order by inside aggregation, you can specify order. So, when I used it, I always understood that it's not going to benefit from any indexes, it will be in memory, so it's quite not scalable approach, I mean, performance is not good, if I have billion rows, I probably won't do it.

But for persist activity, it's good. So now, the planner can use indexes to support this order by inside aggregate functions, which is good. And also distinct inside. You can write, you can write distinct inside it also.

So, String agg, distinct. This, uh, this support was always there, but, uh, it was not benefiting from indexes.

Now it can benefit from indexes.

[00:32:22] Michael: Yeah, these planner improvements, they get quite in detail, but... All of these things add up, right? It's very cool that it can do that.

[00:32:32] Nikolay: What else? And, and, and 150 more changes, right?

[00:32:36] Michael: Yeah, right,

[00:32:37] Nikolay: Which we don't have time to cover. Yeah.

[00:32:41] Michael: What about just generally, you mentioned, like, updating production in a couple of years, um, I see people, I see some people upgrading every year, or at least, you know, most years, but they don't seem to be in the majority.

Every couple of years does seem to be really popular. is overhead to doing major version upgrades, I think that's the reason. Is there anything else you wanted to add there?

[00:33:06] Nikolay: about upgrades, it's a huge topic, we've already

[00:33:08] Michael: no, no, about, about like should people upgrade, how often should they upgrade, is it okay to still have like a Postgres12 instance lying around and

just

keeping

[00:33:17] Nikolay: no. no. no. So, usually, you should upgrade, minor upgrades, the newest, in most cases, but again, like, enterprise approach should involve proper testing in lower environments. But for major versions... It depends on each case, but in general, uh, old school, recommendation is wait until, for example, 16. 2, a cup of three, it will be already third, uh, minor version, 0, 1, 2, right?

and then upgrade. If you have, uh, some kind of critical system. But here again, my point is that, for 16, probably we, have more stable release because Not half of the source code was rewritten. How Postgres was built was changed drastically. Moving to Mason, a new system to build,

which is... The same system for all operations, like the same approach now is used for Linux, FreeBSD, macOS, Windows. So it's a newer approach and a lot of legacy was removed. But it's, uh, it's, it's behind the curtain for users, right? You don't see this because you just take the binaries.

[00:34:34] Michael: it's good for the Postgres developers and for people testing patches, mm

[00:34:38] Nikolay: Yeah, yeah. So more unified and, uh, more modern approach to build Postgres.

[00:34:45] Michael: hmm.

[00:34:46] Nikolay: Okay.

[00:34:47] Michael: Actually one, one thing I really did want to say quickly, and I'm surprised we haven't had more requests for this, is that I think probably the biggest thing that's happened to Postgres in the past year is outside of Postgres itself, which is I think the pgVector extension. I suspect we're going to do a whole episode on it at

[00:35:03] Nikolay: Yes, we should, we should.

[00:35:06] Michael: but it's cool because of the extensibility that these massive changes can happen outside of the major version release cycle. So yeah,

[00:35:15] Nikolay: I have different opinion here. So I think pgVector should be part of Postgres as soon as possible, but obviously it would mean that releases would happen only once per year, which is currently for pgVector is quite slow, because the pace of development is super high right now. But... If it, if it's still like, uh, like if it's third party extension continues to be so, like, it's also losing some attention. If, for example, JSON was in core, right? Right

away. And this was good.

[00:35:53] Michael: That's kind of why I wanted to bring it up, because it could easily have been something that, that was decided to be put into Postgres, but I imagine it would have taken an extra couple of years, and maybe it still will, maybe that is what will happen, but it's really cool that it can be out there in the wild, getting quick releases that aren't tied to the Postgres major version cycle, and if it is in all the major cloud providers already, It kind of is in Postgres, like it is, in a way, because anybody that self hosts can install it and anybody on cloud platforms can...

[00:36:25] Nikolay: here because, uh... If you check, for example, versions are maybe very old, for example, they upgrade slowly

and so on.

[00:36:36] Michael: good point.

[00:36:37] Nikolay: Also, some providers still don't support it. And also, if it goes to core, there are two paths. It can be as contrib module, like official. Officially supported extension, which is already good, or it can go directly to core.

And, you know, I imagine a lot of discussions when people think to what to choose Postgres or specific vector database to store vectors, or I don't know, Elastic or others. They all support vectors, either by third party extensions as well or in core. I think, uh, support in core can be argument for choices. This is like argument. It's very high level argument, not very diving to details, but some people might think, Oh, if it's in core, it's better supported, you know, and you cannot say like, you can try to convince no, no, no. So every, every like RDS, Cloud SQL adds support. By the way, it added, right? Cloud SQL. There was a question recently

[00:37:36] Michael: did, but I think not, not the latest version, not 0. 5. 0, which is an important release, so you do make a good point on... They might support it, but is it up to date?

[00:37:46] Nikolay: without HNSW indexes, right? Which is what everyone wants right now. Almost everyone like it. So, by the way, about pgVector, someone did a very, very good point in my Twitter when I was discussing it, saying that, you know, with HNSW indexes, it breaks the rule. Like indexes should not change what is returned by query, but these, approximate neighbor search, they will produce different results.

Like without index, you have one result with index, you have different result. And this probably like the thing Postgres never had, I mean, never had in core.

It can, extensions can be, yeah.

So it's

[00:38:26] Michael: I have heard that that is a reason for not including it. But,

yeah, we'll see, I guess.

[00:38:31] Nikolay: I think it should be in core because it's, it will, like, Postgres is good at JSON. It should be good at vectors, obviously. So, but yeah, I understand some obstacles here.

[00:38:44] Michael: sorry for the detour, but I thought that was worth adding.

[00:38:47] Nikolay: It's a good, it's a good topic, I think, for future. It's very interesting and very attractive for many, very attractive topic. Okay. Thank you for interesting discussion as usual. And let's, let's remind our listeners that, uh, we need, , feedback, , either in terms of subscriptions, likes, or maybe comments and suggestions, what to talk about, and we still continue.

Working on subtitles, all new episodes have them, we are improving, so feel free to use them and just also give us feedback, please, on Twitter or on YouTube or anywhere else. Thank you.

[00:39:32] Michael: thank you, bye.

[00:39:34] Nikolay: Bye.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Postgres 16

Creators and Guests

Some kind things our listeners have said