Nikolay: Hello, hello, this is
Postgres.FM.

As usual, I'm Nik, PostgresAI,
and as usual, my co-host is

Michael, pgMustard.

Hi, Michael, how are you?

Michael: Hello, Nik, I'm good,
how are you?

Nikolay: Very good.

So, the topic you chose is to talk
about beyond Postgres, when

we should avoid using Postgres,
right?

Michael: Yeah, you put a shout
out on a few social networks asking

people what kind of questions they'd
like us to answer and we

had lots of good suggestions as
we've had for many years now

and 1 of them was particularly
good and I thought was worth a

whole episode which was yeah when
not to use Postgres I think

there's a growing trend of or a
few popular blog posts of people

saying you should consider Postgres
for most workloads these

days and I think it is still an
interesting topic to discuss

are there cases where it doesn't
make sense if so what are those

and what like what when does it
make sense not to use Postgres?

I was interested in your take on
some of these as well.

Nikolay: Yeah, well, classic example
is analytics.

Michael: Do you want to list a
few and then discuss them in a

bit more detail.

Nikolay: Yeah, let's create a frame
of this episode.

So analytics, embedded databases.

Michael: Yeah, so like, and I think
analytics, yeah, so analytics

embedded, I think storing large
objects, there are some cases

where

Nikolay: it makes sense.

Pictures and database.

Michael: Exactly, especially larger
ones like videos, like very

large objects, 100%.

Nikolay: And just let's agree in
every area we can discuss pros

and cons of using Postgres because
some people definitely have

an opinion that it's not an excuse
to avoid Postgres.

Let me add then, with this in mind,
let me add then a topic like

ML datasets and pipelines.

Right?

Yeah.

Machine learning and big data.

Michael: I think anything where
there's specialized databases

like search, vector databases.

Nikolay: Vectors, exactly.

Let's talk about vectors separately.

It's worth it.

Michael: And then 1 more, which
I think, well, actually, I had

2 more on my list 1 was potentially
controversial I wondered

if there's a case if you're at
extreme OLTP Write heavy Very

very write heavy and you, like,
let's say you've got institutional

experience with Vitess I think
sticking with that for the moment

makes a lot of sense so like what
when not to use Postgres I

wondered if like a new project
came along starting with Vitess

while we get these sharded Postgres
like OLTP sharding Postgres

solutions up and running I think
at the moment maybe it still

makes sense to not use Postgres
there.

Nikolay: Let's add then time series.

Yeah.

Let's discuss this area, time series,
and also data that can

be compressed really well.

This topic is close to analytics.

True,

Michael: yeah.

And then I have 1 more, but I guess
it's kind of what I said

just now.

I think If you or your organization
have tons of experience with

another database, the argument
for using Postgres for your next

project is weaker.

I'm not sure it's like when not
to use Postgres.

I think I could think of lots of
counter examples.

Well, this

Nikolay: is an orthogonal discussion.

You can say if we already have
a huge contract with Oracle, we

already signed for next 5 years,
it's not wise to start using

Postgres and those money will be
spent for nothing, right?

There are many such reasons, right?

Michael: Should we stick to technical
ones then?

Nikolay: Yeah, like area, like
types of usage, area.

Queue-like workloads, I would add.

Michael: Yeah, interesting.

Nikolay: Yeah, yeah, yeah.

Last 1 is like kind of Kafka territory.

Or there are others of course.

Michael: Yeah.

All right, let's, should we start
with analytics then?

I feel like that's, I know we did
a whole episode, kind of a

whole episode on this.

Nikolay: Yeah, so, so row store
is not good for analytics and

SELECT COUNT will always be slow,
you need denormalization or

estimates, estimates will be slow,
well not slow, too rough and

too wrong sometimes.

Yeah, it sucks.

Michael: I think there's a scale,
I think we've talked about

this before, but there's a scale
up to which you'll be fine on

Postgres.

You could achieve better performance
elsewhere.

But if you have hybrid, a lot of
systems now are hybrid, right?

Like they have to be transactional,
but they have to provide

some analytics dashboard for users
or something like it, but

they still want real-time data.

They still want transactional processing,
maybe 90 to 95, maybe

even 99% of the workload is transactional,
and only like there's

a few analytical queries from time to time.

I still think those make a ton of sense on Postgres.

Nikolay: Counts everyone needs, right?

Pure OLTP applications, they need to show counts or understand

pagination and like still have counts.

If you think about social media, you will need to show how many

likes, comments, and so on, reposts.

So, and like to implement it in purely in Postgres, like a good

way for generalization would be needed.

And also, there are so many rakes lying around this, You can

easily step on them and have hotspot.

Like, you know, this classic hotspot when accounting system,

like, like taking balances and let's say we have like whole balance

and all transactions Update single row.

Yeah.

So this is where it can be an issue.

And yeah, but at the same time, there are many attempts to do

better and some attempts led to companies being acquired, right?

I mean, Crunchy.

There are many new ways of aiming to solve this problem, better

analytics for Postgres.

I see 2 big trends right now.

1 trend is how Sid, founder of GitLab, recently had a post saying

that Postgres and ClickHouse is a great couple of database systems.

I don't remember the exact title of that post, but the idea is

that it's great.

They go together very well.

We also had Sai, founder of PeerDB, which was acquired by ClickHouse.

And I met with him last week, and we talked about, Again, the

same idea that ClickHouse and Postgres are great together.

This is 1 direction.

Another direction is saying ClickHouse is very different.

And not even maintaining.

Maintaining is absolutely different, but also using it requires

a different mindset and skills so it's better to choose, for

example, DuckDB.

Do everything inside Postgres but in a smarter way.

This is what multiple companies worked on recently and 1 of them

Crunchy was acquired by Snowflake.

Michael: And we had it, we did an episode on pg_duckdb as well,

so like a slightly different approach on that.

But yeah, The Crunchy one's interesting because all the Queries

go through Postgres, but a lot of the data's stored in like Iceberg

or like some other file format.

Yeah, exactly.

It's in the Column last side.

Yeah.

So yeah, this definitely feels like 1 of those ones.

There's also a third, by the way, there's a third option, which

is these massively parallel databases.

Like, well, I was at a, you spoke to Cy last week, I was at a

UK event this week, And there was presentation from the successors

of the Greenplum project, the kind of the open source successors,

which is called Cloudberry.

It looks like really interesting work.

But that's another way of doing some analytics from within Postgres,

kind of.

Nikolay: Yeah.

And from previous experience, from the past I remember cases

when Postgres and Greenplum was combined in 1 project, and it

was great, and it was some bank, even quite big 1 bank.

And yeah, but somehow I stopped looking at Greenplum for quite

long already, I don't know.

There are also, of course, commercial databases.

I remember Vertica, there is Snowflake, super popular, it's a

major player in this area.

By the way, I would distinguish 2 areas of analytics.

1 is internal needs.

For a company, we need to understand how business is doing a

lot of stuff.

We need a lot of reports.

And another need is we need to show our users some counts.

Like I said, social media.

So 2 big areas, I think, also.

Michael: Yeah, good point.

Nikolay: In the first case, users are internal.

In the second case, users are external.

I'm pretty sure there are a lot of mixed cases, additional cases.

But I personally like these 2 directions.

Of course there are others, there is there is also Redshift on

AWS.

Michael: Also originally based on Postgres, yeah.

Nikolay: So there are many options here.

Michael: So yeah, is the short version at sufficient scale?

It probably doesn't make sense to be using Postgres at this point

for analytics, but like that that level is quite high.

Nikolay: Yeah, but also I see cases when companies go to Snowflake

then try to escape it.

Michael: Come back, yeah.

Nikolay: Okay.

So going to Snowflake, it's like going to Oracle, in my opinion.

Michael: You mean like in terms of financial?

Nikolay: In terms of vendor lock-in

Michael: and so on.

Nikolay: Because it's just purely commercial offering.

There are of course many tempting things there, features, performance,

integrations.

Michael: Nice product to use as well, yeah.

Developer friendly.

Nikolay: Yeah, well, users love it.

I agree.

But if we try to remain in more
open source and vendor look in

less, then it's like it should
be excluded.

Even ClickHouse.

ClickHouse is open source itself.

Michael: Yeah.

Right.

Well, you mentioned time series
being quite close to this.

I feel like we should jump to that
next.

What do you reckon?

Nikolay: Well, TimescaleDB is
great, but it's also kind of underlooking

because it's not open.

Yeah.

Michael: Yeah.

So because of their license, other
cloud providers can't provide

Timescale as a service easily.

Or at least not the version with
lots of nice features.

Nikolay: Yeah, in Timescale Cloud,
I had a recent case where

we saw limitations again, very
badly.

Like create database doesn't work.

And moreover, lack of observability
tooling.

Like again, like I keep promoting
on this podcast, if guys who

build platforms listen to us, you
must add pg_wait_sampling.

Unless you are RDS, okay.

But even in case of RDS, we talked
about this.

It's great to have it in SQL context
and be able to combine wait

event analysis with regular pg_stat_statements
analysis.

And pg_stat_kcache, additional very
good observability point.

Because I had a case when guys
just compared everything, saw

worst performance, worked closely
with Timescale.

But in case of RDS, you see performance
insights, understand

where we wait, right?

In case of Timescale, only a rare
collection of samples from

pg_stat_activity is possible.

It's sometimes good enough, but
it's quite a rough tool to analyze

performance.

So yes, such things are lacking.

And unfortunately, more and more
I come to the conclusion that

when I recommend TimescaleDB to
customers, it contradicts with

the idea they want to stay on managed
service.

Michael: Yeah.

Yeah.

Because they're down to a single
choice.

Yeah.

Nikolay: Yeah.

That being said, the Timescale
Cloud even offered me some bounty

if I convince someone to go to
them.

And this is great, like I love
loyalty programs, but I need to

be fair, some pieces, big pieces
are missing, unfortunately.

Michael: Yeah.

And again, Postgres, even without
Timescale, can be used for

good time series workloads up to
a certain point.

We're just talking about at very
high scale, where all the features

like compression, like continuous
aggregates, like automatic

partition.

Nikolay: Straight to the point.

Straight to the 0, by the way,
for time series, ClickHouse is

also still a good option and there
is also VictoriaMetrics,

right?

Michael: Yeah.

Well, and I learned just yesterday
about even Cloudberry have

incrementally updated materialized
views.

I need to look into it.

But that's quite cool.

And if you're like, maybe that
would be

Nikolay: what you think about this?

Wouldn't it be great to have in
Postgres something like update

materialize you where in you just
define the scope.

And also concurrently.

Michael: We should do a whole episode.

I think there are several projects
that have started to look

into incrementally updating materialized
views.

And I think they're more complicated
than I've ever.

Like, they're just, it's like 1
of those topics, the more you

learn about it, the harder you
realize it is.

Nikolay: Right now I'm in a position
where most, not everyone,

but most of our customers are on
managed Postgres, so it's really

hard for me to look at extensions
which are not available on

RDS, CloudSQL and others.

Michael: I understand, I'm just
thinking, like, I think it's

worth learning from the extensions
as to what would be needed

in core, like, how did, what did
they try, what was difficult

about that, what, and it's not
just extensions, right, there

are, there are whole companies
that have been built on the premise

of, is it called, is it Materialize?

Or like what

Nikolay: was the- Yeah, yeah, yeah.

I haven't heard for a few years
from them.

What's, like, I'm curious what's
happening there.

Lack of autonomous transactions
will be an issue,

Michael: right?

Yeah.

Nikolay: Or Q-like, Q-like tool
inside Postgres.

So asynchronously update will be
propagated through Q-like.

If everyone had PgQ like CloudSQL has, from Skype, developed

20 years ago.

In this case, implementing incrementally
asynchronously updated

materialized views would be easier.

Michael: Well, yeah, async and
sync is...

Anyway, this is an

Nikolay: interesting topic.

Yeah, yeah, yeah.

And we already basically we just
touched workloads, queue-like

workloads.

It's still hard.

Bloat is an issue, right?

We discussed it, I think.

Michael: I think, well, we've discussed
there are solutions,

right?

I actually think queues is 1 of
the ones that I was going to

fight you hardest on.

Like, I think there are ways to
do it badly within Postgres.

And again, at extreme scale, I
think it wouldn't be smart to

put it in, especially.

Nikolay: I don't agree.

Skype had extreme scale.

Michael: Yeah, well, yeah.

Okay.

Nikolay: 20 years ago, 1 billion
users was a target.

Michael: Good point.

So maybe actually of all the ones
that we added to potentially

be on the list, that would be 1
where I think if you manage it

well, like PgQ at Skype did with
partitions, actually I think

that's not an excuse to not use
Postgres if that's the title.

Nikolay: Yeah, benefits are serious.

Problem is like, when I recommend
keep queue-like workloads inside

Postgres, I just say like understanding
whole complexity and

hidden issues that you just need
a high level of understanding

of what's happening to have it.

But if you have it, it will be
great.

It will be great.

It's just not a small project,
unfortunately, usually.

Michael: Yeah, good point.

Nikolay: And this recent case with
Notify as well, because it's

also sometimes used in such workloads.

Yeah, as a reminder, Notify exclusive
log on database, serializing

all Notifies makes it basically
not scalable.

Yeah, yeah, all right.

Anyway, what's the answer?

Like If you needed to create a
project and you would need to

think about analytics like think
about like, okay We will have

terabytes of data very soon fast
growing What do we choose for

analytics?

What do we choose for?

Queue like workloads for time series
was what are the choices you

would make I?

Michael: Think it does depend a
lot that you already said with

analytics.

Are we talking about a bank that
is doing?

Nightly loads of data and only
cares about Internal reporting

or are we talking about a user-facing
web app that has to do,

or like a social media app that
has to do counts and various

like aggregations that are user-facing?

Nikolay: Well you need both.

You need to think about both and
what architecture choice would

you make in the beginning.

Yeah.

For target of terabytes of data
in like in 1 year for example.

10 terabytes in 1 year.

What would you do?

Would you choose to stay in Postgres?

Michael: I'm a big fan of simplicity.

I think I would stick with Postgres
for as long, like until it

was painful.

Nikolay: Okay, re-engineered then.

Michael: Yeah, and I know that
would be painful, But that would

be my preferred option.

I think then I am tempted, I think
it's quite new at the moment,

but I am tempted by the work that
crunchy started on moving data

out to iceberg format and still
querying it from Postgres.

Like I like the,

Nikolay: I

Michael: like that I can keep

Nikolay: queries going from the
same place.

But it's possible on Cloud SQL
or RDS, right?

Michael: Not yet, right?

But I think it's quite early.

Like if I was starting a project
today, I would hope that they

caught up by then.

And if not, then a ClickHouse,
like a whatever PeerDB is called

now within ClickHouse, like having
that go out to an analytic

system like ClickHouse makes a
load of sense to me.

What do you think?

Nikolay: I would choose self-managed
Postgres 100%.

And I would use TimescaleDB full-fledged.

Michael: Got it, OK, yeah.

Nikolay: And then I would consider
DuckDB path as well, additionally,

at some point.

And queue workloads, queue-like
workloads, I would engineer perfectly

and squeeze as much as Postgres can.

Yeah.

And what else we touched?

Michael: So again, all of these
are actually sticking with Postgres

and then just at some point in
the future, you're going to have

to think about sharding if you

Nikolay: can't in the future.

Only 1 reason would make me do
it.

So I wouldn't go, in these 3 areas
we just discussed, I wouldn't

go away from Postgres.

Although I understand very well
customers who have and why they

say we need ClickHouse or something.

In my company, I would add ClickHouse
only if there is a strong

team which needs it.

And this is their choice and I
delegate and then I don't, I'm

not involved in this decision.

But while this choice is still
mine I would stick to Postgres

and just make it work better.

And can scale until IPO.

I saw it several times with several
companies.

Michael: Cool, makes sense.

So I wonder if this next one's
going to be the first 1 where

we both would and don't use Postgres,
which is storing large

objects, like large files.

Nikolay: Well definitely, yeah,
last time I tried to store a

picture inside Postgres was probably
2006 or 7, when I was just

exploring, you know, like oh this
is working, okay, But no, I

even don't know how, what will
happen, you know, like this, this,

like this piece of Postgres I touch
super rarely, you know.

Michael: Yeah.

I think the 1 exception is text-based
stuff, stuff that you might

want to query it, but even then
you probably

Nikolay: want to

Michael: be doing...

PDFs.

Nikolay: PDFs, yeah.

Michael: But you know, it's like
some representation of the PDF,

not the actual PDF, but the text
extracted from it.

Oh, yes.

Nikolay: It's going to be...

Or it can be marked down, and then
we have a Pandoc or something

which converts both to HTML and
PDF.

This is what we do with our checkups.

Originally it's in markdown.

Michael: Yeah, and by the way,
and another possible exception

I think it's almost not worth discussing
is if you only need

to store 5 pictures like maybe
you know but I just don't I don't

see many cases like that.

Nikolay: Yeah yeah yeah.

Still cool.

Michael: All right.

Nikolay: Yeah We recently implemented
attachments in our own

system for pictures and various
like archives of logs or something

which customers upload or we upload
PDFs as well sometimes.

Of course we store them in GCS
in a secure manner.

Not in Postgres 100% now.

Yeah, yeah.

I even don't think they're, yeah.

It's just exercise for those who
probably don't have other tasks.

Michael: Yeah, all right.

Another 1 that I think is maybe
in between these 2, like advanced

search, I still think there are
cases, like there are search

use cases where people that I respect
and trust would still choose

ElasticSearch over Postgres even
though they know that.

Nikolay: By the way, sorry for
interrupting.

I just realized you talked about
blobs.

Yeah, but we also can.

So And this can be an issue with
upgrades, I've heard, right?

Major upgrades.

I don't see them recently at all.

So I mean, you can try to store
it in Postgres, but then you

have some operational issues, additionally.

Because not only I don't see them
often, people who design some

procedures and tools, they also
don't see them often.

And it's kind of exotic to keep,
to have blobs.

Michael: Yeah, so when you say
major upgrades, are we talking

about like the speed of the initial
sync?

Or are we talking about pg_dump?

If we go in the dump restore route,
then actually it's all having

to get dumped out.

Nikolay: Yeah, yeah.

I just remember some notes about
it I saw maybe on RDS.

It's specifically about large objects.

Maybe I'm wrong, actually.

Michael: I just remembered I'm
actually using, not for super

large objects, but several kilobyte
files.

Like, we store query plans in JSON
format or text format.

Text format ones don't tend to
get that massive, but JSON format

ones can be hundreds of, well,
we've actually seen a couple that

were tens of megabytes.

I think 1 or 2 that are in the
hundreds of.

But.

Nikolay: We need to amend this
part of the episode.

You're talking about varlena types.

Yeah.

There is a special thing, large
object facility, special chapter.

Michael: Yes, sorry, that's a different,
yeah.

Nikolay: Yeah, in Berlin everyone
uses JSONs, large texts, everyone.

Even by the race.

Michael: Okay, you mean there's
a specific issue with the thing

called large objects?

Nikolay: I cannot say I don't touch
large JSONs.

Of course, I touch them a lot.

We have them a lot.

And yeah, we talk about how they
toast in Postgres.

Michael: Yeah, yeah.

Nikolay: Large JSONs, large XMLs
sometimes, right?

Texts, of course, large texts,
everything.

For example, our RAG system for
AI assistant has a really large

text, chunks of source code, many
of these discussions, kilobytes.

Michael: And you put all of those
in Postgres?

Nikolay: Of course, because we
need to parse them and also full

text search and vectors, that's
everything in Postgres right

now.

Michael: Well, you could store
the vectors in Postgres without

storing the text in Postgres, but
the full text search makes

a lot of sense.

Nikolay: Yeah, I understand you,
but we do everything in Postgres,

even vectorization.

Yeah, okay, cool.

It doesn't scale well if you need
to deal with billions of vectors,

but millions is fine.

Michael: Yeah, makes sense.

Nikolay: So what I was talking
about is like, it's like a

lo_create function, these things.

Michael: Yeah, I've not used that
and you're saying you don't

see it.

Nikolay: Yeah, this is what I don't
see.

Michael: Hopefully everyone got
the memo and no one's using it.

Yeah,

Nikolay: lo from byte...

So yeah, I don't use those.

And Again, last time I touched
that, it was so long ago.

Actual blobs.

lo_

lo_put, lo_get, these functions.

I have no idea, and I suspect something
will be broken if you

start using them.

Some operations like upgrade maybe.

Not broken, but you will need to
take care of them.

Like some side effects like table
spaces can have, you know.

Recently not used also often.

In cloud context we don't use often
the table spaces.

But table spaces might be headache
when you do some migrations,

move your database from place to
place or upgrade and so on.

Michael: Yeah, yeah.

Nikolay: Okay.

Yeah, good 1.

What else?

Michael: We haven't talked about
embedded databases yet on the

kind of tiny scale of things.

Nikolay: Yeah, I'm not an expert
in the better databases.

I thought SQLite is good.

Michael: Yeah in this category.

We do now have PGlite Looks like
a very interesting project

But I think at the moment unless
I was doing some syncing between

Postgres, unless I had a really
good reason, I'd probably still

default to SQLite, however
the community pronounces

it.

But actually I was going to include
in this topic, like even

browser local storage for example.

If you're wanting to do stuff client
side in like a browser app

or web app, it still makes sense
to use the local storage there,

right?

IndexedDB or whatever.

So there are like a few embedded
cases where I don't think it

makes sense to use Postgres or
if you're gonna try maybe PGlite.

Nikolay: And can you remind me,
PGlite, what does it do?

Is it related somehow to WebAssembly?

I think yes, right?

Michael: I think it must be, right?

But I don't know enough about it.

Nikolay: Yeah, I'm checking it's
a complete awesome build of

Postgres that it's under 3 megabytes
exhibit.

That's impressive.

Michael: Yeah, it's a cool project.

But I guess there's an argument
to say it's not actually Postgres.

Like it talks and behaves like
Postgres, but it's kind of its

own thing.

Nikolay: Well, we can talk about
many Postgres variants like

this, including Aurora and so on.

Some of them more Postgres, some
of them less.

Michael: Yeah.

But if the topic is like when not
to use Postgres, yeah I guess

Aurora.

I don't know if I'd count Aurora
as that or not.

Nikolay: I'm not sure I understand
what you mean.

Michael: If the only solution was
use Aurora, let's say that

was the, there was like a, 1 of
these cases, it turns out Aurora

was the absolute best for, and
way better than native Postgres.

I think I would count that as when
not to use Postgres.

Because it's kind of Postgres compatible
like or Cockroach like

or any of these kind of compatible
or

Nikolay: you know

Michael: yeah I'd say I like to
be yeah so you're right it's

kind of a scale it's hard to like
draw a line where it where

it is and isn't

Nikolay: it's a spectrum Postgres
yeah

Michael: yeah yeah cool okay but
yeah it feels like that's an

easy 1, right?

If you've got little devices or
little, you know, little sensors.

Nikolay: Yeah, default choice is
SQLite already.

And I like the idea of PGlite,
and I know Supabase used it for

database.build project, which I
like a lot.

With like merging this with AI
and right in browser, you can

create pet projects and maybe like
explore and it's a very creative

tool to think about, to bootstrap
some new project, think how

it could look like It has the AR
diagram and you can iterate

with the eyes.

It's great.

And there, PGlite works really
well.

And I'm sure they already created
this ability to deploy, to

sync what you build to real Postgres
in Supabase somewhere,

right?

Michael: Well, I think that was
the main aim of the company behind

PGlite, called like ElectricSQL
or something.

Nikolay: Replication.

Michael: Yeah, exactly.

So it's the whole premise was local,
if you heard of local first

development, so the idea like apps
like Linear, the task management

tool, they they're like lightning
fast because they do everything

locally and then sync.

Nikolay: Very thick client.

Very thick client.

Yes.

Basically like Git, like Git clone,
when you type Git clone and

execute it, it's basically a whole
repository.

It can live on your machine.

Yep.

Distributed fashion, right?

Michael: Well, and it has to handle
mergers.

Nikolay: Yeah.

You can use there.

It's great.

It's a conflict.

I'm curious if we could explore
better branching in that area

because we already very close to
implementing a synchronization

between 2 DBLab engines.

Yeah, but it's a different story.

Yeah.

I like the idea.

So it might be a foundation for...

PGlite, I mean, might be a foundation
for more apps which will

live in browser but then be synchronized
with real Postgres.

Michael: Yeah or even desktop apps
like it doesn't have to be

browser-based.

Nikolay: Well I guess the desktop
apps built on top of using

electron right?

Michael: Oh no yeah good point
good point.

Nikolay: And then if you don't
have internet connection you still

can work?

Yeah.

Like offline mode?

That's great.

I like the idea actually.

I like the idea.

So you have Postgres mirror.

Postgres, well, it reminds me of
multi-master replication.

This is complexity.

Michael: All the same problems,
like with merging and conflicts.

Nikolay: But at the same time,
recent Postgres has this ability

to create subscription and avoid
loops of replication.

Michael: Yeah, true.

Nikolay: So origin something you
can say, I want to replicate

on the data which doesn't have
origin, which means it was born

here.

Local origin basically, but means
no origin there somehow.

Terminology is strange a little
bit, as usual in Postgres, right?

But it's great ability to break
the loops, infinite loops.

Yeah, it's

Michael: interesting.

Yeah, it fixes 1 of the problems,
but it doesn't...

Nikolay: Fix all of them, no fix.

Michael: Yeah, exactly.

And like, last write win type things,
yeah.

Nikolay: But if you use, if you
need to have very good, like,

server-side application and so
on, you choose Postgres, but then

you have these very thick clients,
and you need to choose a database

for them, and you choose SQLite,
then you need to synchronize

between them somehow.

It's maybe even worse.

Michael: Even harder.

I think that's

Nikolay: the point.

Different data types and models.

Yeah.

Michael: Yeah.

Cool.

What about the specialized workloads
like vectors?

And I was going to bring up search
as well.

I think search is slightly easier.

Nikolay: Let's bring search.

Michael: I don't actually have,
I haven't written an app or been

involved in an application that
is heavily reliant on, like,

very advanced search features.

But the people I speak to that
have swear by how good ElasticSearch

is.

Nikolay: This is also what I see.

I touch Elastic only usually working
with some logs, application

logs, Postgres logs, through Kibana,
so ELK, or how is it called,

this stack.

But I also see many customers use
Elastic and like it, and shift

from full-text search to Postgres.

There, okay, their choice right
and I know limitations of Postgres

native full-text search yeah I'm also I
don't understand ParadeDB and

I haven't seen benchmarks The benchmark
I saw was only in the

beginning when they didn't create
index on tsvector column.

Michael: They made a really interesting
hire recently.

I saw, do you remember ZomboDB?

Nikolay: Hires, yes.

Michael: Do you remember?

Several hires,

Nikolay: Because they raise money,
yeah.

But where are benchmarks?

So I don't understand what's there
because I don't see benchmarks.

I tried recently because their
CEO, founder, Philip, approached

me.

Nice.

Maybe asking for, I guess, to spread
the word, but I cannot spread

the word if I don't see numbers.

If it's about performance company,
show numbers.

I might be missing something because...

Yeah, so.

Michael: The other product in this
space for Postgres was an

extension called ZomboDB, which
synchronized, which kept an ElasticSearch

index maintained, but the data
coming from Postgres originally.

So I thought that was a really
fascinating way of having both.

A bit like when we talked about
analytics, like having the interface

be from Postgres, but the actual
query being run on something

that isn't Postgres.

So that was fascinating and it
was the founder of that ZomboDB

that recently joined Parade.

So that seems interesting.

Nikolay: This whole story seems
interesting.

I don't understand it because I
cannot find numbers at the same

time I see everyone mentions them,
a lot of blog posts, a lot

of GitHub stars, a lot of like
a lot of noise, but where are

the numbers and benchmarks so they
removed it after initial ones.

Michael: Yeah, so if you were to
try and stay within Postgres,

they seem like the obvious thing
to try.

But I still see people choosing
ElasticSearch, and I'm not sure

why.

Nikolay: Yeah, yeah.

Yeah, yeah.

Please, if someone listening to
us can share benchmarks showing

how ParadeDB is behaving under
load, some number of rows, some

number of queries, like latencies,
buffers used ideally, right?

I would appreciate it because I'm
still, I'm just, I'm stuck

in question what is this?

Michael: Cool.

What about vectors?

Nikolay: Vectors I have a picture
for you I saw yesterday near

my house.

These guys definitely, yeah, those
who can see YouTube, please

check this out.

So these guys are definitely experts
in vector storage.

Michael: This is Nikolay's joke,
it's like a removal company

that are called Vector Moving and
Storage.

Nikolay: I saw storage as well,
I thought it's funny.

So we have turbopuffer, right?

Michael: Well, again, not Postgres,
right?

Nikolay: Not Postgres at all, and
not open source at all.

And not free

Michael: at all,

Nikolay: even though freemium.

And data is in S3, and this new
type of indexes is being used

there.

I already forgot the name.

But yeah, so it's not HNSW.

Michael: Oh, interesting.

Nikolay: Yeah, yeah, yeah.

Michael: I didn't know.

Nikolay: So HNSW doesn't scale
beyond a few million rows.

DiskANN and we had the Timescale
TigerData guys who developed

an advanced version of that.

My perception, I don't see what
scales to billion rows at all.

And turbopuffer says they scale
to billion rows, but as I understand,

it's a multi-tenant approach.

So every, like, it's not 1 set
of billion vectors.

I also don't understand that, but
like I see some development,

the planned scale for MySQL, they
implemented the same index.

And this development started, I
think, at Microsoft and maybe

in China, actually.

Microsoft in China, this is what
I saw.

And interestingly, they choose
Postgres for prototyping.

So this area is worth additional
research.

I started it and didn't have time,
unfortunately, but it's a

very interesting direction, what's
happening with vector search.

Because I think Postgres is losing
right now.

Michael: Well you say losing, I
think...

Nikolay: It's losing, 100%.

Michael: Well, bear with me.

I think there are a lot of use
cases that don't need the scale

that you're talking about.

And a lot of those are fine on
Postgres with pgvector.

But you're probably talking about
the ones that then succeed

and do really well and scale, they
hit a limit relatively quickly,

or like within the first couple
of years.

Nikolay: It's really hard to maintain
a huge HNSW index and latency-wise

it's not good.

turbopuffer, I'm not fully sold
on that idea that let's store

everything on S3.

Speaking on S3, a few weeks ago
they released S3 vectors.

AWS released S3 vectors, and this
might become mainstream.

So S3 itself right now supports
vector indexes.

Have you heard about this?

No.

I think this might become mainstream.

But if big change doesn't happen
in Postgres ecosystem, it will

be worse than the case with full-text
search and Elastic.

It will be worse.

How it's called?

Like, alarmist I am today, right?

Michael: Well, this is the point
of the episode, right?

It's like, it's almost by design
that we're talking about the

weaknesses.

Nikolay: I was feeling so good
saying like, I would choose Postgres

for this, this and this.

I can rely.

But here, since we have 1.1 or
1.2 million vectors in our RAG

system for Postgres knowledge.

Postgres knowledge.

Michael: Is that mostly because
of the mailing list?

Yeah.

Nikolay: Mailing list, I think,
70%.

But we also have a lot of pieces
of source code of various versions

and not only Postgres, PgBouncer
and so on and documentation

it's a lot of stuff also blog posts
and I feel not well thinking

about how to add more.

And we are going to add more.

We are going to do 10x at some
point.

Of course, we will check what TigerData
has, but at the same

time, I'm feeling not well.

In terms

Michael: of- Is the main issue
latency?

I'm not sure.

Is it query latency?

Or what's the main, okay, yeah.

Nikolay: Latency, index size, index
build time, all these things.

Interesting.

Ability to have additional filter,
which is something that HNSW

still lacks, right?

Yeah.

Maybe.

Michael: I do remember seeing an
update on the pgvector repo

but I can't remember what.

I feel like they had a something
to address this but I can't

remember what.

Nikolay: I haven't touched this
topic for several months.

I might be already lagging in terms
of updates.

It's a very hot topic, of course,
very young as well, right?

Michael: Yeah, and not something
I'm experiencing again.

It's more something I'm observing.

So you're definitely way ahead
of me on this.

Nikolay: Yeah.

So I know just companies use, and
We have several customers who

are in Postgres, but they chose
turbopuffer additionally.

And Linear you mentioned, for example.

Cursor and Linear, they also chose
turbopuffer.

Notion chose turbopuffer to store
vectors.

I'm just checking.

They have

Michael: Some cool customers on this list, yeah.

Nikolay: Yeah.

And several more companies which not mentioned here, we also

mentioned, and they are our customers in terms of Postgres consulting.

And they were super surprised to see that something is like massive

migration of vectors is happening.

Some moving company called turbopuffer helped them move their

vectors to S3.

But yeah, it's interesting and they use some different index,

which is like younger idea.

It's based on clustering and centroids, so vectors.

So it's like ANN is implemented differently, not graph-like as

in HNSW, but basically quickly understand which centroids of

clusters are closer to our vector and then work with those clusters.

Quite simple idea actually, but I guess there are many complexities

in implementation there.

Michael: Yeah.

Well, and it's cheap, right?

Being on S3.

Nikolay: Oh, yes.

And slow.

But they have, turbopuffer, I guess they have additional layer

to cache on regular disks closer to the database.

So there's caching layer, of course, yeah, but it's much cheaper,

much.

Actually, this is another area, if you have hundreds of terabytes

of data, tiering of storage in Postgres is still not fully solved

problem.

Right.

Michael: Well, unless you shard, right?

And this is like

Nikolay: the next generation.

It's too expensive to shard and keep everything on disks, especially

if there is some archive data which you touch very rarely.

I would prefer to have it on S3.

And Timescale Cloud, TigerData, they solved it in their solution.

We also had attempt to solve it from Tembo, which is not Postgres

company anymore.

Yeah.

PGT, right, it was called.

But this is, I think, this should be like more and more needed

over time.

Michael: Well, and it's a side effect of the, like, the Crunchy

Data approach of putting things in Iceberg, that also solves

the problem, right?

You can archive the data from Postgres at that point.

So it's a similar solution, isn't it?

Nikolay: So I guess these days I would explore S3 vectors at

this point.

If I needed to.

Maybe I will, actually.

Michael: Well, you are going to need to, it sounds like.

Nikolay: Well, yeah, yeah, yeah.

PostgreSQL infrastructure mostly is on CloudSQL.

Not CloudSQL, Google Cloud.

Not CloudSQL, no, no, no.

That was wrong.

Google Cloud, so...

Michael: 1 level up.

Nikolay: Yeah, yeah, yeah.

But S3 is AWS, so it's...

But it's really interesting.

Should be cheap, should be interesting to explore.

And it's a big challenge to the Postgres ecosystem.

Or maybe an opportunity if somebody creates a foreign data wrapper

source.

Actually, why not?

It should be.

It's a good project, by the way, right?

So interface...

There is a foreign data wrapper to S3 already, right?

Michael: I think so.

I think super...

I don't know, I'll check.

Nikolay: Should be just extended to have vector functions, in

my opinion.

Okay, enough.

It's like a brainstorm mode already.

Thank you so much.

See you next week.

Michael: Thanks, and catch you next week.

Bye-bye.

Nikolay: See you next week.

Some kind things our listeners have said