Postgres FM | Transcript: Soft delete

June 28, 2024 • 38 Minutes

Soft delete

Nikolay: Hello, this is PostgresFM.

I don't remember the number of episodes.

This is Nikolay Samokhvalov from Postgres.AI and my co-host as

usual, Michael Christofides from pgMustard.

Hi, Michael.

Michael: Hello, Nikolay.

I think this is 103.

Nikolay: Okay, not bad.

So we discussed what?

Michael: Yeah, so it's one of those things that's come up in discussion

a few times.

And we said we should do an episode on that one day.

And so we finally got to it and it's soft deletes, which is a

pattern that a lot of us have come across whilst working with

databases more generally.

It's not specific to Postgres, but it's the idea of instead of

deleting data, not deleting it, but hiding it from the application

in some way.

Nikolay: Let me explain it for regular, like how it feels since

I've built three social networks in the past.

I used soft deletes a lot, of course.

And then like Postgres is not good with deletes due to how MVCC

is organized, vacuum and so on.

Deletes are not good, right?

It's like an old idea.

Although each update consists of delete and insert at a physical

level, right?

Unless it's a hot update.

So how it feels in the eyes of millions of users I know, and

it's not only about the projects I've built, like almost any

social media uses it.

You go and try to unregister your account, you press all the

buttons say delete me or something like everything or just I

don't know sometimes you need to write to support or something

but usually it's good services have this button ready for you

right So I want to delete my account fully.

And then you find out that, for example, depending on implementation,

then you find out that your page is, I don't know, some small

artifacts you can see and feel that your page was not fully deleted

or your account was not fully deleted.

One of these artifacts, not artifacts, signs that SoftDelete is

used is You can actually restore it, right?

Easy, right?

You can, for example, Facebook.

Try to delete yourself on Facebook.

It will give you many chances to restore, and even after you

delete it, you still can restore it.

So you can, like, one month later, you can just press a button

and restore.

If you, like, what users want when they are angry about some

service?

They don't want to deal with it
completely, like, something can

happen in their lives, right?

They can see having an account
as a problem, for example.

They think it's a mistake to post
such pictures or to do some

conversations.

It doesn't matter.
I want to delete it.

I'm a new person right now.

It's there, right?

So, delete.

But then the server says, you can
restore.

Did you delete my data or no?

And then compliance, we have also
interesting points.

In some cases, we have mandatory,
like demand to store data sometimes.

Right?

Michael: We have both sides, right?

Yeah, we have laws that demand
we have to store data for a certain

amount of time in some environment.

Nikolay: And vice versa.

Sometimes if user decides the service
must delete data.

So delete, the word delete has
many meanings, right?

It's not SQL statement delete.

When you delete your account, it
doesn't mean the service executed

actually delete.

Most likely they just executed
updates or delete with triggers

and so on.

We will discuss, dive into details.

But for users, it's really annoying
to feel lack of ability to

truly delete.

I must say, like I saw it and I
was on the other side of, I was

on both sides many times actually.

And I know how it feels when you
want to delete for sure, like

100% I won't delete my data, it's
my data deleted.

No, you can restore it means it
was not actually deleted.

Michael: Yeah, for sure.

Well, I think we can come back
to like,

Nikolay: technical

Michael: the user side.

Well, I think there's I think there's
so many parts of this that

are interesting.

1 is kind of, If you do decide
you want this as a provider, implementation

details are quite interesting and
there are different and Postgres

specific details are quite interesting
as well.

But from the usability side of
it, there are different implementations.

I think various companies and organizations
are implementing

features around deletion, around
kind of making sure people know

exactly what they're doing before
hitting delete, you know, write

the words of your account name
in here.

Everyone's come across these kind
of make sure you type out,

yes, I'm sure I really want to
delete my account and I will not

beg support to undelete it later.

Nikolay: We give you 1 month to
make, to reconsider it or something.

Michael: Exactly, Or we can recover
it, but only in the next

30 days.

Type them.

Yeah, exactly.

Nikolay: I remember another way,
another sign that you can see

that soft delete was applied is
instead of having 404 error,

404 error, you go to your X page,
like you deleted it, but you

see something like account deleted.

It means that records still exist
in database, right?

Just changed its status.

So status like active, deleted,
right?

Pending, like pending, pending,
it means you need to activate

it, like email activation or something,
phone activation.

Then it's active and then it's
deleted, lifecycle, right?

Michael: In my mind, that's kind
of like a bug though, right?

Like that's just a poorly implemented
soft deletion.

Nikolay: Yeah, but it's so common.

And I actually implemented, I implemented
soft deletes in many

various ways.

But honestly, this morning, thinking
like walking my dog, By

the way, hello to everyone who
is running, walking dog, like,

I don't know, like riding bicycle.

Yeah.

I know we promised to keep 30 minutes,
but it's good for you

to run longer and ride bicycle
longer and for your dog as well.

So walking my dogs I was thinking,
soft delete, soft delete.

I also chatted with our bot to
like to brainstorm what to discuss

and I realized that if I needed
to implement it right now, looking

like back to my 20 years of experience,
3 social networks and

many other systems and our customers,
all those ways to implement

soft delete, what would I choose
now?

I don't know.

Maybe by the end of this discussion,
I will have some clarity.

But it's so difficult.

It's not an easy topic, actually.

I expected a simple topic today.

Michael: I think it's simple if
you only consider a single case

at a time.

If you want to give generic advice,
I think it becomes more complicated.

And we also dove into the most
complicated version, which is

deleting a whole account which
has multiple things.

Nikolay: Yeah, let's talk about
only 1 table.

Yeah, propagation is a different
topic.

Consider only 1 table.

We have many choices.

For example, 1 I already discussed,
like status.

You have pending, active, deleted,
something, archived.

Or it can be a different column
is deleted by default.

It's false, you can set it to true.

Both are very similar approaches.

Both I don't like at all.

Michael: I quite like deleted at,
which is with it being null

by default, and then a timestamp
as to when it was deleted.

Right.

Nikolay: I would not choose these
approaches.

Neither of them.

Today.

I chose them in the past.

Yeah.

Today, no, thank you.

Why?

Because then you need to deal with,
like, optimizing your queries,

thinking about indexes.

Most of the time, queries work
with active data.

So you need to keep all your indexes
probably partial with the

WHERE clause having this filter
to deal only with active records.

And you accumulate like non-active
data set in the same table,

which is not useful at all, right?

So and also foreign keys.

Michael: Yeah, unique constraints.

There's so many downsides.

Nikolay: That's complexity.

Michael: Well, and you need to,
you need to adapt your, your

application to always view things
where deleted at is not.

Because otherwise people will see
deleted data.

So there is definitely complexity
to that, but implementation-wise,

I think it's on the simpler side.

I think there are some advantages.

Nikolay: I maybe would choose it
only if I know that deletes

will be much less than 1% of all
data.

But who knows in advance, right?

Maybe we will need to delete a
lot.

Sometimes, from social media experience,
spammers, in ways they

come, They can register a million
accounts and you need to delete

them.

At the same time, to follow your
procedures you developed, if

you developed a soft deletes approach,
you're deleting fake accounts

is also deletes, right?

So you need to still keep them
in the same table and it's like

a lot of records which will sit
on the disk and most importantly

in memory, in caches.

Michael: Let's talk about the use
cases for it because I think

they're different enough that I
think it will then become clearer

when we would use different ones.

Like, for example, I think the
primary use case, the only use

case I've ever used Soft Deletes
for in the past, I've always

been at companies and working on
products where we haven't done

soft deletes by default.

I've only ever had it implemented
it once for an application

because several times a year, we
were restoring data that had

been deleted accidentally or prematurely.

So it was a,

Nikolay: and you didn't have that
by slab to, to, to go back

in time.

Easily.

Michael: We could risk the solution.

We, we, this is actually on the
Microsoft SQL Server database

at the time.

And we would actually restore from
a backup.

That's what we'd end up doing.

Nikolay: It takes many hours.

Michael: The larger the database,
the longer it takes.

And it's painful and manual.

And we were doing this often enough
that we decided to implement

soft deletes.

But with a slight difference.

I think sometimes people describe
soft deletes as if it's like

permanently gonna be in that soft
deletion state we would do

it and then have a like a Cron
job that anything older than so

people would delete something by
mistake and very quickly email

support and say, I deleted the
wrong project in this case.

Nikolay: We need to restore.

Yeah.
And restore procedures is complex.

Michael: Yeah, but it was always
quickly.

It was always within a day

Nikolay: or 2.

It's quicker.

Michael: No, sorry.

So in this case, with this application,
people that delete things

accidentally, they knew pretty,
like almost straight away that

they had done it and that they'd
regretted doing it.

So we knew we could, for example,
have this data in in a soft

deleted fashion with deleted app
with a timestamp.

And then we could have a cron job,
hard delete, or actually delete

data that was more than like, that
was deleted app more than

30 days ago.

So in that case, it meant we could
keep the structure simple,

only add 1 column, update the application
only in a few places

to view the data that wasn't deleted
yet.

You know, so it was a relatively
simple case where we were having

to restore data from time to time.

Nikolay: It's a good point.

So I took extreme case when we
discuss accounts and it's like

sensitive to like personal feelings
are affected.

But you're describing basically
the same thing as emails in Gmail

and Recycle Bin or like how it's
called trash folder.

Michael: Yeah, like on a desktop.

Yeah.

Nikolay: Right.

So you put it there basically when
you do deletes and it sits

there for 30 days, right?

And then automatically hard deleted.

Michael: Exactly.

Nikolay: And you can restore it
anytime and users usually appreciate

this feature and consider it as
good if you can undo your action.

I agree.

Like when we talk about some objects
inside the account, right?

So some data.

Michael: The only difference is
in this case is we didn't implement

the ability to view what's in the...

Like in those 2 cases, what we
could have done.

And that's actually, potentially
would have been nice for users.

Nikolay: Yeah, actually in this
case, this approach, now I'm

thinking, yeah, in this case, I
would probably choose some status

approach because it's actually
the same data, we just don't show

it in 1 place, but we show it in
another place.

So it's just some category, that's
it.

Right?

You're right.

So in this case, I would prefer
actually status, maybe timestamp

of changes, but maybe I would put
timestamps to special different

like audit, like log style append
only table to track actions

when some action was done.

Yeah, but you're right, actually.

Maybe this case, let's call it
in-place soft-delete when we keep

data in the same table and so on.

Michael: I was thinking use-case-wise,
that's if you're implementing

it to avoid or to make restoration
easier or undelete easier.

Nikolay: And viewing deleted data,
if some in some cases like

trash, trash data, if you want
to allow users to review it, Why

not?

Michael: Yeah, well, that's the
other like, there's another use

case for soft deletes I've heard
and seen people write about

which is to give you the ability
to examine it or to audit it

or to look through why it was done
or who did what.

So I think there are some...

Exactly.

And I know we've had a whole episode
on audit, but it could be

not necessarily for regulatory
purposes, but it could be for

debugging purposes.

Nikolay: And there we can have
a hard delete button for users

as well.

If like only there, not in main
place, but for like if object

already soft deleted, we could
allow users to actually delete.

But let's talk about Postgres spec.

This is UX discussion.

What else?

Compliance.

Sometimes you do need to delete
data for sure, like keeping it

only in some backups and also have
some policy to delete there

as well.

I mean, backups should be deleted
as well at some point.

Sometimes you do need to keep information
and soft deletes help

to comply some policies and regulations.

Depending on law you need to follow,
like it's hard.

But honestly, I prefer, speaking
of law, I prefer when I need

to delete everything.

But for convenience and as a business
owner, it's actually much

more convenient if you still have
data all the time and you can

review what happened and restore
it?

Well, sometimes, depending on the
nature of your business.

I see more and more guys follow,
like, it's like some compliance

rules, but I see more and more
companies implement the rule that

Slack deletes messages after some
time and everything is deleted.

I don't know.

Maybe it's because of costs.

Maybe it's some compliance, actually.

I don't know.

Michael: Well, I think there was
an age where this going back

to kind of philosophical side of
things, but I think there was

a time where people, I remember
reading headlines like data is

the new oil and things like that
and people thought hoarding

as much data or gathering as much
data as they could would be

valuable in the future for analysis
reasons or for mining, that

kind of thing.

And I feel like with a lot of data
breaches, especially, sorry,

I think I've got a siren in my
background, with data breaches,

I still feel like things flipped
and people started to think

of having data as a liability.

And the more data they held, or
the more personal it was, the

more at risk they are of, or the
bigger target they become for

attacks.

I feel like that's changed even
within the last, well, even within

my career of how people view holding
data if you don't really

need it.

Nikolay: Yeah.

Also performance reasons.

If you keep everything, it's hard.

I see many successful e-commerce
companies, they do need to actually

delete data, and having soft deletes,
they at some point decide

to clean up and actually delete
data.

So in place of soft delete, you
just have a column or a couple

of columns, and then you have a
lot of partial indexes, obviously,

maybe.

Right?

Maybe.

Maybe not.

It depends on the distribution
and selectivity you have.

And how could we improve this?

What about having a special partition
or a set of partitions

and include this column to partition
key.

It would be transparent for application
if we update status setting

from active to deleted.

In this case, the row will be deleted
in the main partition,

like actual working partition and
it will go to this kind of

archive partition, right?

Michael: Yeah, it would be like
moved.

So I guess it's similar in terms
of like write-ahead log impact

and but actually could it be higher?

I guess you're forcing no hot updates
but I'm guessing you're

not going to get hot updates anyway
because it's the delete that.

Nikolay: Yeah.

You cannot have hot update here.

So 0 chances because it's actually
deleted, actually inserted.

I guess hat updates is a good thing
to have, but in this case

you should avoid partial...

Back to a single table and just
a column.

If you want hot updates to be used
when you update this column,

you cannot use this column in partial
indexes in the where clause

of those indexes.

You must avoid it.

It works well only if you have,
as I said, only low part of your

table of deletes.

Like deletes are rare.

Michael: Yeah.

So another, like another solution
I see mentioned that I've not

used is using a view.

So defining a view where deleted
at is, has no value for example.

Nikolay: Yeah, I don't know.

You know like when you're in love
with something many years and

then this love is converted to
hate.

This is my relationship with views
because my, yeah, my thesis,

martial arts master, I don't remember,
was about updatable views

and then updatable XML views and
like, oh, I spent so much time

and I worked with them so much
time.

Now the only views I can deal with
now these days myself in my

projects where I have full control.

Only Postgres views.

Michael: That's it.

Nikolay: Everywhere else I avoid
them because I don't want to

deal with dependency hell later.

Because it's just a lot of overhead.

You just change something or add
column or drop column.

You need to recreate your views
and logs and logs when you issue

DDL, like, I don't want this.

But Yes, views can be used in many
cases, but you cannot create

an index on a regular view.

Again, I'm speaking of performance.

Just views will hide it, right?

And that's it.

I mean, hide some rows.

Michael: Exactly.

But it also makes the implementation
a little bit easier because

you don't have to update.

You don't have to make the application
aware of your soft delete

implementation.

Nikolay: Well, yeah.

Michael: You could just point the
application at the view now.

The Application doesn't

Nikolay: see those rows at all,
right?

Michael: Should we move on to some
other, like...

Nikolay: Approach.

Separate table approach.

Michael: Yes.

Nikolay: So straightforward approach.

Let's just have the table with
same structure, right?

Maybe not the same set of indexes
because we don't need all of

them, maybe, right?

And just insert to that table,
maybe using some trigger when

delete happens.

Michael: Yeah, so the 2 approaches
I've had are a shadow table

for each table that you want to,
yeah.

I don't know if that's the correct
term.

Nikolay: And I also hated this.

Because DDL, maintaining schema.

Michael: But I did see a neat alternative
to this in a blog post

by Brander, who's got a couple
of blog posts about DDL.

Nikolay: So let's discuss why it's
bad, like in detail, just

to clarify.

Good idea.

First of all, you need to create,
if you have many tables where

you want soft delete to be implemented,
you always need additional

table.

It's just overhead.

And also you need, every time you
create a column, for example,

or a name column or something,
you also need not to forget to

do it in the shadow table.

And living with such schema many
years in 1 project, I must admit,

it's a huge headache.

People always forget it.

You can try to invent some procedures
and automation not to forget

it, but still, deployments to production
when shadow table DDL

was forgotten, quite significant
risk.

So It's a lot of objects to deal
with.

Of course, you can put it to some,
say, schema named shadow,

for example, and keep the table
names exactly the same as in

the original schema, public or
something new, if it's a new project.

So public schema is not popular
anymore, right?

So, and it's good because it means
convenience because exactly

the same table names, but it introduces
a lot of overhead, it

requires good culture of automation,
like tooling, not to be,

to be always in a consistent state.

Michael: Yeah, I think there was
a lot, there are a lot of difficulties

as part of this.

Obviously, it does come with some
benefits, like easier to restore

than the next, the alternative
way of doing this.

Nikolay: Let me guess about alternative
way.

I also implemented it.

I didn't read this post by Brandur,
but I can guess because I

also implement I think we walked
on the same paths, right?

So you create a single table with
probably like some ID maybe

some surrogate ID then table name
who deleted timestamp and then

just JSON or JSONB column where
you just put all the data from

any table.

Right.
And you can...

Exactly.

Okay.
Why not?

Preserving column names maybe even.

Right.

So you...

Michael: Yes.

I think they used like a to JSONB
function.

Nikolay: Right.

And you do everything in triggers,
so everything again is transparent.

When a delete statement happens,
your row from any table which

has this trigger is packed and
sent to this table.

Very convenient.

I like this schema.

Michael: It's super.

So again, this is optimized for
like, this is harder to restore

from, right, because it's now in
a difficult format.

And especially if like the schema
has changed in between.

Nikolay: Well, this is a price.

You have trade-off.

Either you survive any schema changes
easily, or you need to

deal with some mutations during
restore.

You need to understand, okay, this
delete happened A year ago,

we had a different schema.

What to do?

You do need to do this.

Or you deal with schema...

You need to maintain mirror schema
changes all the time.

I would prefer actually deal with...

Since restore...

Depending on the project, of course.

If restore is not super common,
I would prefer this schema.

Super convenient, super universal.

Michael: Yeah, me too.

I think there's a lot of benefits
here.

Not for the 1 case I did implement
soft deletes in the past,

where the whole point was to make
restoration easier in a very

small number of cases.

This case is very different to
that.

But yeah, I do think there are
some tricky parts of this, though,

I think, for example, the compliance
thing is tricky here.

Like, it's another place that you
need to remember to delete

data from in the future.

If you get a GDPR request, you
can't just search on normal tables

for that data.

You also have to search this delete
log.

And it might be tricky to search
because it's JSON B data with

like unstructured.

So there's a there I think there
are downsides.

Nikolay: Right.

Is it fair to call this soft delete?

Maybe it should be called just
archiving?

Michael: Good point.

There's no definition of soft delete
that I know of.

It's not part of the SQL standard.

So I think it's more around like
what are the, what's it for?

And maybe we've lost sight of like
why are we implementing this

feature?

If it's for easy undeletes, maybe
this isn't the right solution.

But if it's for debugging or if
it's for different purposes,

then yeah, archiving makes some
sense.

But I guess it is, it's deleting
it in a way that it's not deleted.

That is the, you know, you mentioned
the, as a user seeing some

signs that it wasn't deleted, this
avoids a lot of those, right?

Like it's not in the same table
anymore.

Nikolay: Yeah.

Application can, If we define the
trigger function with security

definer, we can revoke or just
do not grant any permissions to

application user to deal with schema
or this table.

And that's it.

So application won't be able to
reach the data at all.

At very low, like, I mean, at database
level, database guarantees,

it's not possible.

But at the same time, Trigger can
still insert it because it

has...

Trigger function has security definer,
meaning that it has permissions

to run queries which are inside
the function using the level

of access of the user who defined
this function.

Michael: Cool.
I didn't know you could do that.

That's very cool.

Nikolay: It's called security definer.

When you create a function, trigger
function, for trigger you

need to create a function, you
just say Security Definer.

And I like it because you can encapsulate
some actions which

normal user, application user,
When I say application user, I

mean database user which is used
by your application.

Usually there are some tables that
this user cannot access.

For example, this shadow table
or shadow schema with multiple

tables.

But I still prefer single table.

But you can still allow, for example,
specific insert encapsulated

in this function and just grant
permissions for specific action

indirectly via the security definer.

Michael: Nice.

I like it.

Nikolay: Yeah.

I use it all the time for many
cases.

So yeah, better control what can
be done, what cannot be done.

So you basically it's good for
security reasons.

It's good to forbid everything
first as first thing, and then

just allow some specific action
in very controlled manner.

Intrigue.

Michael: I feel like there's a
phrase for this, isn't it?

Like security by minimum permission
or something.

I can't remember what the phrase
is.

It's like a security principle
of least permission, give things

the least permission they need.

Right, right,

Nikolay: right.

Well, in former life, we called
it a blacklisting, white listing.

I don't know why now it's not possible
to tell this, right?

But it means that you will just
say nothing is possible except

this, right?

And yeah, it's good.

So yeah, it's a good approach,
but you are right, Restoration

will be tricky.

You need to develop specific procedures,
deal somehow with schema

evolution.

And in this case, I would say if
we talk about some account deletion,

I would do this.

If we talk about something like
we have some objects user can

create, modify, or delete, and
we want support undo for deletion,

Like in Gmail with emails, when
you just delete email, it goes

to trash, and then you can undo
or delete forever, right?

There's a button, delete forever.

This is hard to delete.

In this case, I would prefer maybe
still in place, like in the

same, like just a column in the
same table status or something.

Michael: Nice.

Nikolay: Right, right.

Both are, we can say, both are
soft deletes with their own pros

and cons.

Michael: Yeah, seems very reasonable.

Nikolay: Do you think they can
coexist in the same project?

Not because they are, like decisions
were made by different persons,

different

Michael: developers.

I think I genuinely think that
the end of Brando's blog post,

he mentioned switching to this
in all but a couple of tables.

So I think they are currently a
good example

Nikolay: of a project that

Michael: has both.

Nikolay: You mean switching to
JSON B Shadow Table?

Michael: Yes, instead of you calling
it in place.

I think they still implement in
place in a couple of places.

Nikolay: The best thing about this
single shadow table with JSON-B

colon is that it's so easy to attach
it to any table.

It's just a trigger.

That's it.

It will slow down, delete a little
bit.

That's it.

Michael: Yeah, because of the triggers.

You could use it for the hard delete
and then have it for around

for a while longer.

Nikolay: We discussed about compliance
and so on, but, and I

mentioned that for business owners,
like there are fears that

like something was deleted, hard
to recover.

It's very good for audit purposes.

For example, you have some account
in your application, and there

are some admins, or like owners,
managers, and there are some

regular users.

Some users do something, they deleted
something, and you just

need to not only see and undo it
but see when it was done by

whom like

Michael: I saw another person in
a hacker news thread talk.

Yeah.

Not just audit but also what really
happened.

So I saw somebody who had implemented
a chat system.

Let's say it's a support business
to business support system

and 1 of your internal works for
your company users said something

they regret saying as part of this
chat and then deleted 1 of

those messages.

It's really useful for an admin
to see the whole history or you

know, you mentioned social media
and a lot of social media sites

that you edit a message, keeping
the old version.

I edited
Nikolay: this update, not delete.

So this leads us to like time travel
basically.

We cannot say, name it soft updates,
right?

But tracking all versions, it actually
also possible they are

the same approach.

And I think I did it as well.

So when you have delete with this
trigger with security, security,

definer function, trigger function,
which just archives previous

record.

In the trigger function, you use
old key name.

This is just record, right?

And you convert it to JSONB and
insert, right?

Absolutely the same trigger can
be used for updates.

With any update, maybe I would
have some checks if some actual

data was changed, not just something
small was changed, right?

But actual data, like message was
changed, right?

Title, message.

New dot title doesn't equal old
dot title, and so on.

If something, just to avoid, you
know, like spam archiving, which

like useless records in this shadow
table.

So if I see something valuable
has been changed with this update,

again, I back old to JSONB and
sort the same, like the same trigger

basically.

Right?

Michael: Yeah, I'm not sure.

It depends.

It depends how and when you need,
how often you need to be viewing

those older ones, because it might
be tricky to show the admin

those if you, how are you going
to select that data back out

to show them the old versions?

I guess it's tricky.

Nikolay: I would, by the way, no,
for updates, I think This is

what I did actually.

For updates, I would archive probably
both old and new records.

Because just, yes, to be able to
show diff.

Yes, it's like too much data to
store, but why not?

In this case, I would extend shadow
table with additional JSONB

column.

Like, delete...

Records related to deletes won't
have it.

Like, they would have only nulls
in that column.

For updates, I would have 2 JSONB
values.

Michael: Interesting.

Nikolay: Old and new, basically.

And then we can build some diff
if needed.

Right?

Michael: Maybe.

I think this is getting beyond
the scope of what...

But this is interesting.

I think my...

Nikolay: No, it's useful and easy
to implement and super universal.

You can again attach it to any
table and have simple audit like

that.

Michael: But what I don't like
or understand is because this

is a single shadow table across
multiple, what are you querying?

Like, how are you looking for that
in a, let's say it was a month

ago, in your, how you're efficiently
finding that in that shadow

table?

What are you searching by?

Nikolay: Well, depending on the
application, right?

I would probably in this shadow
table, I would probably have

some columns which would allow
me to search faster, or I would

deal with searching inside JSON-B.

Why not?

Genetics?

Michael: Well, yeah, maybe.

This slows down updates.

Yeah, there's a lot of...

I think that it's a tricky one because
if it's a shadow table for

all tables, you'd like having user
ID or like having other columns

that you could search by might
not make sense for some of those.

Nikolay: You can have null there.

Michael: True, true, true.

Nikolay: Well, yeah, I agree with
that.

Especially if you add gin, insert
this implicit insert, which

happens during delete or update,
it will be not fast at all because

of gin.

So yeah, overhead.

This is the price.

You have it.

Michael: Yeah, I like that.

It's another trade-off.

It depends what you want, and it
depends how much data you have,

and it depends what proportion
is ever going to be deleted, or

what proportion is ever going to
be restored or looked into.

But yeah, I think it's really cool.

I'll link up the blog post we mentioned.

Any last things you wanted to add?

Nikolay: Actually you know what,
during this discussion, I was

curious what we have in PostgreSQL
system and actually we have

exactly this approach I just described.

We have a shadow table, single
one for many tables to track deletes

and updates.

It's called audit log and we have
data before and data after

2 JSONB columns.

Michael: Interesting.

Yeah,

Nikolay: and actually a lot of
additional stuff.

This is just to be able to show
admins of organizations what

happened.

Michael: In pgMustard, we have hard
deletes.

It's slightly different because
it's just query plans, right?

And you can always submit them
again.

But if you delete them, they're
gone.

So we decided not to at all.

Nikolay: Yeah, and now an application
cannot reach this part

of database at all.

It's only possible like implicitly,
as I described, via security

definers.

So quite a good approach, very
reliable.

Michael: Yeah.

Nice.

All right.

Well, thanks so much, Nikolay.

Catch you

Nikolay: next week.
Thank you.

Good chat.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Soft delete

Creators and Guests

Some kind things our listeners have said