Postgres FM | Transcript: pgvectorscale

June 14, 2024 • 55 Minutes

pgvectorscale

Nikolay: Hello, hello, this is
PostgresFM, episode number 101.

My name is Nik Samokhvalov, founder
of Postgres.AI.

I thought about...

Since last time we had episode
number 100, and it was interesting

because we discussed databases
at scale of 100 terabytes and

beyond.

Logically this episode should be
something very simple and beginner

entry level, because it's 101.

You know, in the US, 101 means
something like some introduction

to something, right?

But I guess we will dive quite
deep, I hope we will dive quite

deep into some details of some
things.

Very interesting to me personally.

But also, since a couple of days
ago, Timescale released an addition

to pgvector called pgvectorscale.

I asked developers to join and
they agreed.

So meet Mat Arye.

Hi, Mat.

Thank you

for coming.

Mat: Thank you for having us.

Nikolay: And John Pruitt.

Thank you for coming as well.

It's great to have you.

And Let's start, it's kind of an
interview, but maybe it will

be some informal discussion, because
I don't know, we know each

other for some time and why not,
right?

Let's start from the beginning,
from maybe some distance.

Why do we need at all some vector
search in Postgres and maybe

some vector search in general?

Because recently, not long ago,
LLM became...

It's becoming bigger and bigger
at a very high pace, in terms

of how many dimensions, but also
in terms of context window.

Gemini now supports 1000000, there
are talks about 2 millions.

Although I must admit, Gemini is
terrible in terms of reliability.

We have a lot of credits being
an AI startup, we have a lot of

credits to spend.

And it's a very well-known problem
with Gemini that 500 errors

all the time.

It's making me very sad.

But 1 million context window means
that Probably we don't need

vector search at all and we can
just put everything into 1 prompt,

right?

There was such a crazy idea when
context became really large.

Imagine if we have like 100 million
tokens context window.

You can just put everything to
a single question, or it's an

insane idea?

What do you think?

Mat: SRS- I think there are a few
points to point out.

First of all, there's still the
bandwidth that you need to use

up to send the question to the
LLM, right?

So there is going to be a pure
networking limit at some point.

Another point is that even with
very big context windows or token

windows, yes, you can give the
LLM a lot of information, but

there's actually an open question
about whether it will use all

of the information it gets well.

And there's some academic research
out of LLM that's actually

showing that giving it less information
but more relevant information

in the prompt gets you better results
than giving it all the

information you have, right?

And so, I mean, with LLMs in general,
all of this is purely empirical

research.

There's actually not very good
scientific backing either way,

but with the LLMs that people have
tested, I know that huge prompts

are often not very good in terms
of performance, in terms of

the answers you get.

So in that sense, narrowing prompts
down using either VectorSearch

or TextSearch or HybridSearch,
whatever you want, in many cases

gives you a better answer.

You a better answer.

And apart from that, when this
is a bit controversial, I'm actually

not convinced that RAG is the killer
app for vector databases.

There's a lot of other use cases
for vector databases, starting

from clustering, recommendation
engines, plain old search engines,

right?

A lot of other things that have
nothing to do with the rag and

that increasing context windows
were not, you know, help or help.

These are just orthogonal things.

And the basic thing that what vector
search gives you is semantic

search.

There's a lot of applications for
that.

Nikolay: So not only RAG, right?

So interesting.

Yeah, that's...

Okay, so your answer has 2 points
to summarize.

First, we still need to narrow
down and find...

And if we put everything to 1 question,
it will be not efficient.

I agree with LLM sometimes.

If you put too much information,
the answers are not good.

Quality of answers decrease for
sure.

And second, it's not only about
RAG.

I would like to explore this.

Do you see some good examples beyond
retrieval-augmented generation?

Actually while you think, Let me
just spend maybe half a minute,

because I know our audience is
mostly Postgres engineers, DBAs,

and backend developers.

Of course, many already heard about
RUG, but just a small recap.

It's quite a simple thing.

Actually, you're right.

Maybe vector search is not needed
in RUG.

For example, we can use full text
search there.

So the idea is when some request
is coming to LLM, before passing

this request to LLM, we have intermediate
software which finds

additional information which should
be relevant.

Here vector search can be useful,
but maybe not only vector search,

for example, full text search.

Then We already augment this request
with this information, and

LLM has some context.

Basically, it's similar to if you
go to exam and you have a lot

of information in your pockets,
When you need to answer, you

can get this information out of
your pockets and answer better.

Or use Wikipedia, for example.

Similar.

But you just told me that it's
not only RUG.

What else?

What else for vector search particularly?

Mat: I mean, fundamentally, I think
any time that you are right

now using full text search or text
search, you could substitute

that with semantic search to give
you better results.

Let me just give a simple example
for people.

With the full text search, searching
for a query on car does

not return results that have truck
in the document, right?

Semantic search solves that because
it's kind of, I mean, that

sounds almost magical, but it's
kind of a search on meaning.

So things that are close together
in meaning get retrieved.

So you could use a totally different
lexical structure, totally

different word, and somehow this
type of search figured out that

they are similar in the meaning
space, if you will.

John: Or you could use

it to

augment full-text search
or as a hybrid of the 2.

Nikolay: Have you seen good examples
of this?

Like I saw terrible techniques,
like for example, let's find

100 results from full-text search,
100 results from vector search,

and then combine them.

But always I have a question, what's
the ranking system?

Because it's very different, and
sometimes we want fresh information.

Specific question I would like
to explore maybe later, how to

reflect freshness of data.

We can discuss it maybe later because
I also already want to

discuss what you particular just
released.

But did you see any good examples
of combination of full-text

search and semantic search like
vector search?

Mat: It's hard to say because we
are database providers so we

kind of see the...

We don't see the examples that
our customers have.

We get told by our customers that,
hey, this technique works,

you find it useful, blah, blah,
blah.

But we rarely actually see the
results ourselves.

Actually see the results ourselves.

But like anecdotally, a lot of
people are using hybrid set.

And like, in the AI community,
I think it's almost becoming

the standard.

What is potentially a problem for
Postgres we can go into later,

but That's what we're hearing at
least a lot.

But I will agree that this is totally
all ad hoc, non-scientific,

like, crack my left ear with my
right arm type thing, right?

Nikolay: For me, it looks almost
always like all examples I see,

they look ugly because, for example,
if you think about pagination,

there is no way to have pagination
there.

Although if we go back to vector
search, it's also problematic

there because to go to page number
100, you need to extract all

100 pages, and it's very inefficient
performance-wise.

Let's discuss the project.

It's called pgvectorscale.

It looks like you just append a
suffix scale to pgvector.

It's a very interesting approach.

Let me ask you this.

It's very well known that TimescaleDB
is a very great thing

for time series.

It has a free version, and you
can host it yourself, a community

edition, I think it's called.

I doubt it's true open source because
it doesn't have OSI approved

license.

So it's like some specific thing.

Or maybe Apache license exists,
but it lacks a lot of good stuff.

For example, I think it lacks compression.

Maybe I'm wrong.

Is it?

Mat: Yes, it lacks compression.

So we have an Apache version, we
have a community version.

Nikolay: Right.

I think you can use the extended version,
still not paying if you

host it yourself, but it cannot be
considered true open source because

this license is already something
like hybrid and so on.

Mat: We call it source available.

Nikolay: Yeah, yeah, this is a
common term.

So then Postgres itself has a very,
very, very simple and permissive

license.

That's why many commercial products
developed on top of it.

Then we have pgvector, which is
a separate extension, which in

my opinion should at some point
go to core Postgres.

But there is a big question, how
come?

Because it's breaking all the rules,
because just creating an index

you might have results which differ
from original results without

index, right?

It's something which we didn't
have before, because as always,

indexes just sped up queries.

Now we can change results because
it's approximate neighbor search.

This project, I think it's...

Which license is it?

I don't remember, but I think...

Yeah.

It's also Postgres license, Right.

And maybe that's why you also developed
your product on top of

it using Postgres license.

But it's very interesting because
if before Timescale had this

major product, everyone loved,
compression especially, but only

if you go that route, source available,
or you go to Timescale

Cloud, now you release this thing
on Postgres license, meaning

that anyone, any cloud provider,
most of them already have pgvector,

so naturally they should already
be considering adding your extra

to their services.

I think, of course, you've thought
about it very well.

So my question is why you changed
your approach compared to Timescale?

Mat: Yeah, so I'll speak for myself,
maybe not for the company.

I think, look, we all love Postgres,
but I think Postgres needs

different things in different areas.

Postgres, in the time series area,
Postgres already has a lot

of functionality to build in and
we are already building upon

a well-known foundation where I
would argue that given our broad

definition of time series, in many
ways Postgres already dominates

the market.

So we felt that we could build
a company while contributing some

things to PostGrid, but also quite
frankly making a business

for ourselves and making money
in this area by preventing hyperscalers

from making all the money off of
our work, which is what the

TSL license does.

It doesn't stop any individual
from benefiting from our work

or really any company, it only
stops the cloud providers from

making money off our work.

And that was then for pragmatic
reasons so that we could make

a business.

In terms of vector, I think the
whole vector market is much more

nascent and much younger.

I don't think Postgres has won
the market a lot.

You can look at how many other
databases have gotten millions

of dollars in the past year to
develop their own solution.

This is a very hot, fast-moving
market where I don't think Postgres

yet has a particular...

It hasn't won, right?

And so, because we like Postgres
so much, we wanted

Nikolay: to know when

Mat: Postgres wins.

And so...

Nikolay: There are many benchmarks
where Postgres has very poor

results with pgvector on ivfflat
index without HNSW and so on,

right?

And they're still around.

Mat: I will say with pgvector 0.7,
things have improved on the

pgvector side quite a lot as well.

But yeah, and I think not only
in terms of benchmarks, I think

in terms of actual usage, if you
know, last episode you talked

about 100 terabyte databases, We
want to reach 100 terabyte databases

that have extra data.

We have to do a lot of work to
make that happen, right?

And I think that's the ultimate
goal.

And there should be the ultimate
goal of everybody in the community.

We thought that it was more wise
to help the entire community

succeed at this point in time.

Nikolay: Yeah, when I saw benchmarks,
like 100,000, 500,000 vectors,

I was like, what's happening here?

People say, oh, it's already a
lot.

It's not a lot.

You take any company, take their
databases, and they have already

so many data entries in their databases.

Out of those entries we can create
vectors usually.

It means that we should speak about
billions already, but it's

not there yet because it's problematic
because vectors are large

and building indexes is slow, takes
a lot of time, and the latencies

of search, of course.

Everything is quite slow.

I'm glad you did your benchmarks
starting with 50 million vectors,

already very good.

I'm super glad to hear that you
talk about terabytes of data.

By the way, some people say, okay,
for us, for Postgres guys,

not for Timescale, regular Postgres
guys, 1 terabyte is usually

1000000000 rows.

With Timescale compression, it's
many more.

1 terabyte should be maybe tens
of billions rows, right?

And I know...

Tens of billions.

Right, right.

So it's a different order of magnitude.

But if we talk about pgvector,
50 million vectors.

Let's talk about...

This project has 2 things to bring
on the table.

Maybe let's close first with a
non-technical question, because

I cannot skip this question just
sitting in my head.

So pgvector had ivfflat index originally,
then HNSW was added.

I remember NUAN participated, starting
with a separate project,

then decided to contribute to pgvector.

So now it has 2 indexes.

And pgvectorscale brings a third
type of index, right?

It could be a pull request to pgvector,
no?

Mat: So we actually talked with
Andrew Kane about this.

The issue is that we are written
in Rust, not C.

And Andrew thought that it would
be better as a separate project.

I can't say I disagree.

I think either could have worked,
but I think given the language

difference, it makes some sense
to have it as a separate project.

But we did all for...

Nikolay: Yeah, well, makes sense.

Makes total sense.

But why Rust?

Mat: Because I like to work quickly.

And I thought Rust would allow
me to do that more than other

things.

I should say I'm 1 of the people
that worked compression Timescale

and we did, you know, our compression
system had our own data

type on disk and I remember the
pain of having to make sure that

the way you are writing data on
disk would work very well.

And the cross-over platforms, big-end,
little-end, all of these,

the alignment issues, it's just
a lot of double checking that

you have to do to make sure everything
is correct.

I wanted to avoid this.

And with Rust, a lot of this
could be a lot simpler.

So, like...

Nikolay: Yeah, makes total sense.

Yeah.

Mm-hmm.

Right, right.

So, this makes total sense, but
for end user, it's interesting.

Especially those who self-manage.

So you need to take Postgres, then
you need to bring pgvector,

then you need to bring pgvectorscale.

It's an interesting situation.

We do

Mat: have web-based packages.

Nikolay: This is interesting, but
now I understand better.

Let's talk about the index itself.

What can you tell us about the
index compared to HNSW?

I know HNSW is like a memory thing,
this works with disk, so

this is definitely the right thing
to do if we talk about terabytes

of data and billions of rows and
so on?

Mat: Yeah.

So I'm going to hand wave a lot
here because this is a lot of

history, but it turns out for various
reasons that the best way

to index vector data is using a
graph structure, so a graph database.

You can imagine each vector is
a node and is connected to other

nodes that are mostly that are
close to it, right?

And then there are some for edges.

And the traditional problem in
these types of indexes is getting

from 1 end of the graph to another
end of the graph.

So your starting point is very
far from your query vector, which

is what you're looking for.

It takes a lot of hops in these
graphs.

And HNSW and DiskANN are pretty
much different ways of solving

that basic problem.

HNSW solves this by introducing
layers into the graph, where

the top layer where you start only
has long distance edges.

So it kind of allows you to jump
a long distance but close to

where your query is, and then you
go down a level in order to

make more fine-grained jumps.

And there are several levels.

By the time you get to the lowest
level, you are kind of in the

most fine-grained area of the graph.

And those top levels are the things
that help you solve this

long jump issue, if you would.

This KNN doesn't use multiple graphs.

Instead it kind of uses a neat
trick in the construction to...

How do I put this?

It puts in enough long edges to
be useful.

So the way it constructs the graph
is modified from the regular

way these types of graphs are usually
constructed, specifically

to inject these long edges into
the graph, which probabilistically

allows you to jump faster to where
you need to go.

And this kind of flat structure
is what allows you to keep a

better locality of where you are.

So instead of using multiple levels,
the flat structure allows

you to make fewer hops in memory,
which allows us to work on disk,

on SSD rather than RAM.

Because...

Nikolay: Yeah.

I noticed, by the way, maybe it's
like a side note, but I noticed

you in Postgresstab you use local
SSD disks, right?

Not like EBS volumes.

Is it fair?

I mean, they're ephemeral.

If you restart the machine, you
lose them.

So in your benchmark particularly,
I noticed this small point.

Mat: Our benchmarks very concretely
benchmark PyCon against self-hosted

Postgres.

And the most self-hosted Postgres
people I know actually use

NVMes and they do like...

Nikolay: But not local NVMes.

Mat: But you can do backups other
ways, right?

You could

Nikolay: do string

Mat: replication, you could do
other ways to get around this

issue of things don't go away.

Nikolay: I'm just very curious
how much it helped because of

course latency is better, discrepancy
is better.

Mat: I can tell you it helped immensely
and I can say that we

are right now actively thinking
about how to bring this kind

of performance to cloud as well.

Stay tuned, I think we'll have
some exciting stuff coming out.

But yes, that's a very astute observation
because, you know,

Quite frankly, none of this works
on EBS.

You need the random read performance
of SSDs to make this work.

So by the way, I would be shocked
if Pinecone wasn't using...

Nikolay: ...VMWare's NVMes?

Mat: I don't know.

I have no idea what they use, but
I would be shocked if this

is EBR based.

Nikolay: And for these 50 million
vectors, how big was the database?

Like in terms of...

John: The disk and index was about
40 gigs.

The entire table, I think, was
around 250, somewhere between

250 and 300 gigs.

Nikolay: Right, and the machine
has definitely less than that,

so of course it was not cached.

I mean, we do need the disk.

Okay, and back to indexes.

DiskANN, and I saw you mentioned
in the article and the project

itself, Microsoft work, but you
modified this work, right?

But the question is, like, HNSW
and this DiskANN, both are approximate

nearest neighbors algorithms.

Can we say one is better than another
for most of workloads and

data types, data sets?

Or there are cases where one can
win, there are cases where another

can win.

That's what I think.

John: ANDREW BROGDON-PICKETT That's
windows index supports concurrent

index builds, which could make
it faster, at least for building

the index.

SIAMAK

Nikolay: ROSHANI For building the
index, So, and yeah.

But for search?

Mat: For search and accuracy, look,
we haven't benchmarked everything.

The thing is that we have benchmarked,
we've seen higher throughput

and higher accuracy.

The trade-off was always kind of
better than HNSW, but you know,

we haven't benchmarked everything.

We've kind of concentrated on modern
embedding systems.

So, for example, we haven't gotten
to benchmark very low dimensional

vectors like 128 or 256, the lowest
thing we've benchmarked is

768.

So a lot of caveats and I think,
you know, the space is so new

people should actually test on
their own data, but I can say

in terms of research we haven't
seen where we're worse yet.

Nikolay: People should always test
on their own data, even if

it's old school full-text search
and so on, because who knows

what you have, right?

Mat: Absolutely.

John: That's one thing that made
benchmarking so hard, especially

versus our competitors, because
a lot of the specialized vector

databases have very few parameters
that let you actually control

the index, whereas on Postgres
with both pgvector and pgvectorscale,

there are several different parameters
to tune the build of the

index, and then also query time,
plus all of the Postgres settings.

And if you're self-hosting, you've
also got OS and machine level

things to play with.

So there's just a mind-boggling
number of variables you could

play with if you had the time and
money to spend on it.

Nikolay: Yeah, and in pgvectorscale
particularly, I also checked

the source code and documentation
that we have so far, and I

saw also parameters you can touch.

A couple of them, like query time,
and if you can be also adjusted

during build time.

What can you say about some list
sizes and so on?

Could you search list size?

And during build time, num neighbors,
how many neighbors?

Do you recommend trying to tune
this so far now?

Mat: So we tried to put in the
small defaults.

That is what we saw as the best
combination.

But obviously, we wanted to let
people also experiment.

But in terms of build parameters,
I would recommend keeping them

as a default.

The runtime parameters is really
where I think people should

experiment because that allows
you, at query time, to make the

trade-off between accuracy and
speed, right?

And there's the re-school parameter
that I think is the 1 we

recommend to actually tune.

I just realized today we probably
should change the default is

pretty low.

That was my fault.

But that is the parameter I think
people should play with.

John: Yeah, that 1 seemed to have
the most impact on both speed

and accuracy.

At query time.

Nikolay: Right.

Okay, so 1 more question about
the index.

You call it StreamingDiskANN,
right?

Why streaming?

How does it differ from the original
Microsoft implementation?

Yeah,

Mat: so I think both the HNSW implementation
and the original

DiskANN implementation, you tell
it ahead of time how many things

you want returned.

And that just doesn't play very
well with Postgres, quite frankly,

because the Postgres

John: model

Mat: For indexes, Postgres, the
other part of Postgres, can always

ask the index for the next tuple.

And this is done so that you can
filter your results after index

retrieval.

So for example, let's say I want
to find the closest vectors

to a given query that also meets
some other criteria that belong

to the business department or the
engineering department, right?

The vector index only fetches the
closest things to the query,

and then what Postgres does is
that after index retrieval it

will filter out department equals
engineering, right?

Now let's say your closest hundreds
of vectors are all from the

business department.

That means if that-

Nikolay: It will

Mat: be fast.

What?

Nikolay: That means- I mean, in
this case, it will be fast because-

Mat: Right, but if your parameter
is, you say you're returning

the closest 50 from the index,
then no results will be returned

at all.

Because the index returns 50 results,
Those 50 results are then

filtered by department equals engineering.

Nikolay: None of them bad.

Different department, right, right,
right.

A lot of work to be done, but 0
results,

Mat: right?

Exactly.

And there are 0 results because
there's this arbitrary limit

that you give at the beginning
of the query to tell you, hey,

retrieve this many results.

Whereas in reality, Postgres has
no idea how many results you

need to retrieve in order to match
both the query and the department

equals engineering, right?

So What the streaming part of the
algorithm does is it removes

that restriction and makes the
algorithm work in the way Postgres

expects other indexes to work.

So you can tell the index, hey,
give me the next closest thing,

the next closest.

Actually you could traverse the
entire graph, your entire table

like that.

And that makes all of your queries
that have a secondary filter

completely accurate.

Nikolay: Sven And this secondary
filter, it's a different column,

for example, text or integer, right?

Or not?

Mat: Rony Yeah, it's a different
column.

Often it's JSONB, right?

So you could have an article, you
could have a list of associated

tags with it in JSON-B, and you
can say, hey, find me all the

articles about cars that also come
from country USA, or any kind

of other metadata.

So it's this combination of semantic
and metadata, which is actually

incredibly common.

Nikolay: Right.

But there we have usually either
a situation when we have Btree

and gin index and Postgres needs
to decide which 1 to use and

then still apply a filter, or we
need to use something like additional

extension like GiST, I'm
bad with names as usual, and then

try to achieve single index scan.

But here it's not possible.

I mean, ideal world is single index
scan without additional filtering.

Mat: Yes, that is the ideal world.

Right now, none of the indexes
supports that.

And so all of the best you can
do is the best, the state of the

art on Postgres right now, and
that would over you, state of

the art period, but we could leave
that argument for another

time, is you can hope that you
can retrieve from the vector index

and then post filter and still
have accurate results.

Nikolay: I can imagine, for example,
if we take this department

name, put it inside this input
text which builds a vector, and

then additionally we filter by
integer like department ID, it

probably will work better in this
case, right?

Because first we apply semantic
search involving in the query

which department we want, But then
we think, okay, this filter

will make final polishing.

But it sounds again like some ugly
solution.

Mat: So...

Yeah, it's not only ugly.

I'm actually not sure, it won't
work, but like using my intuition,

like the department engineering
in the text would kind of skew

the semantics away from the actual
thing in the text you're talking

about.

And so I'm not sure that would
combine well in this kind of semantic

multi-dimensional space.

John: You could maybe
add a dimension and synthetically

set a value in the dimension to
represent which department it

is.

Nikolay: And 1 more dimension for
time, like timestamp.

This is so natural.

I'm very curious if there are some
works already in this direction,

because everyone needs creation
time, like publication time or

something for each data entry.

I didn't see good discussions about
that yet.

Because if you, for example, we
loaded almost 1000000 entries

from Postgres mailing list archives.

Then it started working great,
but when you see some discussion

from Bruce Momjan from 2002 about
something, I don't know, I

remember just some cases, It's
already not relevant at all.

You think, maybe I should delete
all data, but still, it might

be relevant.

What we did, ugly solution, we
return 1,000 or maybe 5,000 results,

and then we dynamically apply some
score, I think it's logarithm

approach for time, so we add some
penalty if the article, well,

email is very old.

It quickly becomes less relevant.

We just combine it with score,
pgvector provides us similarity

or distance.

This works well, but sometimes
you have 1 second latency on 1000000

rows dataset, and this is terrible
and doesn't scale.

So this problem of additional dimensions,
I think it's huge.

We need to extend original vector
with non-semantic dimensions

and use them in filtering and achieve
1 single index scan.

Good performance, right?

Mat: I would say that in the academic
literature or there is

some progress being made.

That is, the filtering, this can
end paper, which I believe is

2 years old, which is ancient in
this space.

There was another paper from a
group out in Berkeley talking

about building and filtering into
these graph-based indexes as

well.

But you gotta walk a crore people
you can walk.

We just haven't had any chance
to really implement these inside

this index, but there is, there's
kind of a chaotic work in this

area.

Nikolay: Right, for this algorithm.

John: You'll be able to signal
that like certain dimensions,

if that's where you're putting
your filters, are, that they would

need to be exact versus approximate.

Nikolay: Right.

And Do you think this KNN approach
is better than HNSW in this

particular problem?

Mat: For filtering, yes, I do.

Obviously, I'm biased, but I think
the simplicity of going from

multi-levels to a single level
really helps in a lot of these

things, just because there's a
lot less edge cases to consider.

Nikolay: Let's not forget...

John: This helps as well, right?

Mat: Yeah, I think Streaming was
a lot easier to implement because

it's a single level.

Nikolay: Interesting.

Let's not forget to talk about
compression, because I'm sure

we need it in vector world, in
vector search.

We need it.

I saw in a recent pgvector there
are ideas, let's not use integers,

let's use floats and so on, with
a kind of compressions, so to

speak.

But you talk about real compression,
right?

Maybe some experience from TimescaleDB
extension, or no, or it's

different.

Because there is time series, I
remember articles, TimescaleBlock

has excellent articles, but maybe
it's different here, right?

Mat: I don't think I directly used
any of the algorithms from

the time series space for this,
but the basic insight of using

the statistical properties of the
vectors you have to kind of

make compression, give you better
results is exactly what led

to...

So we have an algorithm called
Statistical Binary Quantization,

SBQ, And it takes quite a simple
but well-known algorithm called

BQ and kind of adapts it to your
dataset in a better way.

You know, we have a blog post about
it.

It's pretty simple.

It's pretty much using the means
of each dimension and the kind

of standard deviations to kind
of better segment your space,

if you would.

And yeah, and we just, honestly,
we took a very experimental

approach.

We took various datasets and we
tried different things and this

turned out that it worked.

It added a few percentage points
to the accuracy, which once

you get into the 90s, a few percentage
points is a pretty big

deal.

And so, yeah, it's, you get to
read about it in our blog post.

The algorithm is fully explained
there.

Nikolay: To make sure...

I'm just noticing, and...

John: The compression that Mat's
talking about is happening

in the index and the TimescaleDB
compression is, you know, converting

row-based into columnar and then
compressing each column in the

heap.

We actually tried compressing the
vectors in the heap with TimescaleDB

compression. And seemingly random
vector strings of numbers don't

compress very well.

So that didn't help very much.

Nikolay: RADUKIJ YURIYENKIN Because
in TimescaleDB, one of the

key ideas is that for time series,
it's like values are changing,

not jumping, how to say.

So there are deltas and these deltas
are quite low and so on.

This is what I remember from those
blog posts.

For vectors this doesn't work,
I understand.

For index compression, how much
could we expect in terms of compression

ratio to achieve?

Mat: Uses 1 bit per dimension.

The uncompressed version is float4.

It's a very easy calculation.

It's always a 32x compression ratio.

Nikolay: Roman Karpukhin Good.

Understood.

Great.

So yeah, anything else in technical
area we should mention?

Mat: James Johnson

Nikolay: That helps

John: with the size, but it also
helps with performance because

you can fit

Nikolay: more in.

Fewer buffers to load to the buffer
pool.

This makes sense.

I remember your blog post also
mentions storage costs you compare

with Pinecone in terms of how much
do you need to spend each

month to store 50 million vectors.

The difference is very noticeable,
I would say.

So yeah, this makes sense for sure.

Anything else, like technical stuff?

Anything maybe you're working on
right now?

Mat: I think about PG vector scale.

That's about it.

We haven't talked about PGAI, which
is the other thing we announced

this week.

Nikolay: This is a Python untrusted,
so it won't be possible

to run it on managed services,
as I understand, but it allows

you to...

Or it's something different?

Because it makes calls to OpenAI
API or...

Mat: does.

So, with untrusted languages, you
can't allow users on clouds

to write their own functions.

But if the functions are included
inside an extension, that's

fine.

And it's easy to see because most
extensions are written C, which

is completely untrusted, right?

So the entire point of the PGA
extension is specifically so that

this could be run on the clouds.

Nikolay: So it limits capabilities,
and if a cloud vendor decides,

verifies everything works well,
nothing, Bad calls cannot be

done through it.

So we are in the

John: center.

It's almost like whitelisting certain
programs.

Nikolay: Interesting.

Makes sense.

So the idea is to make it really
simple to create vectors from

regular data types, just transparently
calling these, just with

SQL, right?

I think there are other similar
implementations of this, But

I like your idea of betting on
cloud providers, including this,

and even not providing untrusted
languages, capabilities.

It's interesting.

Mat: Yeah, there are a few things
that do similar things, and

I was always curious why they didn't
just run the Python code

to do this.

And so that's what we did.

A lot of people do very complicated
stuff to get the same result.

I don't know why, but yeah.

We had to work that.

Nikolay: I like this approach myself
very well.

I mean, I do it a lot for many
years, just select and call something

externally.

But of course, there should be
a huge warning on site.

It doesn't scale well because if
Python calls...

You basically add latency to your
queries of the primary.

Primary CPU is the most expensive
resource you have.

If you want to generate a vector,
you cannot run it on a replica

because you need to write it.

You must do it on the primary.

While this query is running, it's
a primary node, and offloading

this work on Python application
nodes makes total sense because

the database doesn't notice this
at all, the primary doesn't

notice.

You only speak to it when the result
is already retrieved from

OpenAI or another LLM provider.

This is handy, but very dangerous
in terms of scalability and

performance and future issues.

Mat: Completely agreed with you.

This is a way for people to get
up and running quickly and testing

and experimenting.

And we do have another project
called pgvectorizer for when you

need to batch things up and scale
it and do it in the background.

We have all of that as well.

Nikolay: I haven't seen that, that's
interesting also

Mat: to check.

Nikolay: Yeah,

John: but all of that plumbing
that you have to write to drag

data out of the database and then
send it off to an API and put

it back in the database.

You end up writing so much code
that when it's in the database,

you don't have to do not to mention
all of the bandwidth you're

consuming and latency there.

Nikolay: Right.

I remember I was copying images
from S3 in SQL.

And then, of course, it's similar.

You call some APIs from Postgres.

It's great, but it's an interesting
direction.

With warning, for start it's good.

I think I don't have any more questions.

I'm looking forward to trying this
project.

We wanted to do it before this
call, like yesterday already,

but we had some technical difficulties.

I hope we will solve them soon
and try pgvectorscale.

For our case, I will tell you when
we have results.

Thank you.

So as a summary.

Mat: There's 1 more thing about
that.

This is a new project.

We're trying to be very responsive.

So if you run into any problems
at all, you know, file GitHub

issues or contact us on Discord.

We are more than happy and very
eager to talk to anybody.

This is a young project, as I said,
so all feedback is very young.

Nikolay: Sure, yeah, that's a good
point.

Honestly, seeing this work in Timescale
company repository on

GitHub, it makes me have some good
expectations in terms of if

some problems are encountered that
will be addressed and so on.

As a summary, let's say everyone
who works with vectors in Postgres

should check out this new type
of index and compare.

This is great.

I think there are many people who
do this right now and looking

forward to results in our case
and other cases as well.

Thank you for a good project

Mat: and very interesting.

Nikolay: Thank you.

Good luck with it.

Creators and Guests

Host

Nikolay Samokhvalov

Founder of Postgres AI

Guest

John Pruitt

Postgres-loving Software Engineer and AI Engineer at Timescale

Guest

Matvey Arye

Leading the AI / Vector Database effort at Timescale.

pgvectorscale

Creators and Guests

Some kind things our listeners have said