Postgres FM | Transcript: Connections

September 1, 2023 • 28 Minutes

Connections

[00:00:00] Michael: Hello and welcome to Postgres fm. A week show about all things Postgres PostgreSQL. I'm Michael. This is my co-host, Nikolay. Hey Nikolay. What are we talking about today?

[00:00:09] Nikolay: Hi, connections.

[00:00:10] Michael: Yeah, so this was your choice. we've done an episode already on connection pooling, but this, we are thinking of covering more generally, the basics, a few interesting things, around connections. forward to this.

[00:00:24] Nikolay: Yeah, a few interesting things and a few boring things, and this topic was my choice, and I just see how many people struggle with trying to connect to freshly installed Postgres, for example. I also see how many people make mistake. How many managed Postgres providers continue opening port of the world and sometimes not enabling encryption in the beginning and then enabling it, but still, don't know, like, I don't like port being open to the world, but many do it for the sake of simplicity and like

[00:01:00] Michael: Yeah.

[00:01:01] Nikolay: user experience. Let's discuss this maybe first, like, why shouldn't port be open to the world? Heroku started, , very long ago, obviously, because, uh, this is the easiest way, right? They, they ob like, you know, uh, security by obs security, uh, is not security. So if you use some randomly generated Host name, And use randomly generated username, very long password like , strong password. It's hard to find this password with brute force, right?

[00:01:43] Michael: Yeah, but that's, that's the argument, isn't it? It's, depends what you're protecting against. One, one of them protects against fewer things, right? Of course there's the brute force vector, but there's also leaked credentials. You know, if somebody gets hold of the credentials and your Database is exposed to the public internet, then they're in. So, but, but if it's

[00:02:05] Nikolay: any, from anywhere.

[00:02:07] Michael: mm-hmm. Yeah. So I know there are additional things you can do to like limit, for example, the IP addresses , but it's nowhere near as as good. So if you can, definitely a good, good. I guess that's the same, advice with most security things, right? Trade-offs to security are often ux, but they're often worth it. There's also performance considerations.

[00:02:31] Nikolay: You can also change port from 5, 4, 3, 2 to something unexpected. Well, by the way, it's, it helps a little bit because, uh, I'm sure a lot of bad bots are scanning well known ports first, but still it's security biosecurity doesn't work well. Right. So, also encryption can be in place if, if this, if port is open to the world without encryption, it's absolutely huge, very bad mistake.

And with no services, which started from this approach, not having encryption, meaning that, anyone can steal your password and then access from anywhere. It's insane. But the best thing is to. Not open port to public IP addresses to public, public interfaces, not to, to open the port to the world. I mean, just listen addresses setting in.com. It should be limited to only local Host and or private networks, right?

[00:03:34] Michael: Yeah.

[00:03:35] Nikolay: This is the best security.

[00:03:36] Michael: Mm-hmm.

[00:03:37] Nikolay: Of course, it makes, , much harder to reach. And, of course you need like something like s ss h port forwarding or some jump Host or something to, reach it. But, uh, any client supports it or without client.

You can just in a separate term terminal, say Ss s h dash and t l I don't remember exactly. And, have this such portfolio. But of course it destroys user experience. I know this, I know this, like many, many developers will not be able to reach Postgres if you require this, but if you open the port, like all modern, how to say RDS by default doesn't do it right. Am I

[00:04:18] Michael: I didn't check.

[00:04:19] Nikolay: They have checkbox, but it's disabled. It's not checked by default. But Neon, Supabase, all these guys, they open port to the world. Is it okay? In my opinion, no, it's not. Okay. And some of them, some of these guys started with no encryption.

[00:04:37] Michael: But they have it now, right? Or

[00:04:40] Nikolay: yes, in weird way, but yes.

[00:04:43] Michael: okay.

[00:04:43] Nikolay: I mean the, I usually when I connect, I usually check it, selecting from PG status, ssl and seeing if my connection is encrypted. But sometimes it's not. It's shown and is not encrypted. But, uh, I think it's, there is still encryption. It's just till the proxy, but after it, after proxy to Postgres, there's no encryption anyway. I don't like, port open to the world. If everything else is, uh, okay. Still, I don't like it.

[00:05:09] Michael: going back to like regular Postgres. We have Ss s l mode and I was wondering if you thought, so the default is prefer, but there are stronger settings, I wondered. Yeah. Well, and so there's yeah. Required, there's verify ca and there's verify full.

[00:05:27] Nikolay: Don't ask me. I

[00:05:29] Michael: yeah. Yeah. Fair enough.

[00:05:30] Nikolay: I, I always forget details. I always check documentation. I just, I'm, I just wanted to point out, remember, Mongo by default opened the portal of the world and it, so like many years was nothing. And then a huge wave of ransomware stealing data and, blackmailing people. Started to be a huge problem because by default it was open to the world and many people used some weak password.

Of course, in the case of managed services, they dictate that password should be strong. They generated for you. But still, I, I don't like it port open to the world. Well, we can say, you know, we live in the world when secrets are public. For full transparency, we should publish all the data, you know, but I don't know.

I don't like it. it's enough. Let's switch

[00:06:20] Michael: I have a quick, one quick thing to add there. There is a good blog, the documentation's great on Ss ss L mode in terms of which levels allow you to do what. And there's a really good blog post that I came across today while looking into this by Andrew Kane, called Postgres, SS s L Mode Explained. So I'll link that up for anybody that wants to go into it.

[00:06:38] Nikolay: Good? Yeah. Well, if at least you have ss, SS L required so nobody can connect without it. If you have strong password and nobody can set weak password. It's already something good, even if you don't, uh, have like weird Host names trying to obscure this thing. Uh, yeah.

[00:07:04] Michael: You said something at the beginning, you said you see a lot of people struggling to connect. when do you see that? Where, because I don't see it as much. I.

[00:07:13] Nikolay: Well, like maybe you just, just don't deal with new people, new people to s for example, if you just install, uh, it on Mac or Ubuntu, on Makos, if you use, Homebrew, maybe it's easier to connect, but then it'll be harder to find how to, for example, started and so on. But you can Google it easy.

But for Ubuntu, for example, by default you need to memorize, this Sudu U Postgres, psql-fu. And I, I wanted to explain why is it solved? Because in some less common situations, people. Have issues connecting to local podcasts, but you should always be able to understand how to connect to local Postgres, even if it's running on weird port and it's installed at weird location using weird, username and password and everything.

I think it's also a good, practical exercise for people who. Work with Postgres, like here's the box. Postgres is installed. I won't tell you password, but you have to do rights. For example, you can scan everything, you can see all directories, files, everything. Try to connect as far as po as soon as possible and as fast as possible.

And uh, this is good exercise and , I think. people master this, uh, ability over time, but it probably should be just an exercise and, you just can learn faster. So, my usual approach, I checked with PSS command. I checked how Postgres is running and where, also maybe PG Confi and I try to understand which configuration files it is using.

Then we check maybe straight to P G H B conf, which is still separate configuration. Many people argue that, it should be part of, main configuration. Remember, we had recovery.com and then it, it was a huge improvement when it was merged with the main configuration P G H B A one, related to , which connections are allowed and using which methods of connection.

it's still a separate part. It cannot be, , configured using SQL. And it's a pt. Maybe someday it'll be merged as well. So you just check it. If you found the proper, configuration file using PS and then navigation and PG confi maybe, right? You found that you just check what is allowed there and if you see trust, you are lucky, but it's also bad practice.

Trust means, these connections don't require password at all. You just specify user and the way. Of connection and you're in. if you see peer, probably on the tu will be by default, right? For SREs, Linux user. Linux user named Postgres. Uh, in this case, if you connecting through that user, you also don't need password, right? 'cause it'll be like Gates are opening for this user. Automatically, that's why you say, so do you. Postgres psql-fu. In this case, you don't need to specify password at all. you're switching to Linux, username to Postgres, and then you just run psql-fu. And in this case, Postgres needs to have the same user, like names should match in operational system and then Postgres if they match.

No password is needed. This is the main rule you can use, uh, your own Linux user. You just need to create user in Postgres. For example, Michael, right? You have Linux user. Michael. If, Postgres also has, Database username, Michael and P G H B A Con has peer for Michael. For T C P connections, for example, you can go through local Host and connect without password or, , it can be not, uh, T C P connection, but Unix Domain Socket connection, which is by the way, if you, when you say so do the, you post peer skill, this is how you use, , connections through this unique socket, not via T C P.

And, uh, in this case, no password is needed and it's a good way for local connections. Uh, right.

[00:11:20] Michael: Yeah. Makes a lot of

[00:11:21] Nikolay: Otherwise you need passwords.

[00:11:24] Michael: yeah. Are, are you talking about development databases? Uh, or are you even including production things in that?

[00:11:32] Nikolay: Well, I'm definitely, this is not about managed Postgres databases because they take care about password generation and, uh, restricting the ways, limiting the ways you can. Connect to Postgres, but development databases or self-manage Postgres. And when you install it on fresh UND or Santos or something and you need to connect, in this case, you can be lost a little bit and you need to reach your P G H icon first and see what's happening there.

And maybe to adjust. If you adjust, you can reload configuration. Even without connection, you just need to send, sec hub, H H U P signal to Postmaster and to, to reload configuration. You just say kill dash capital h u p and process id and, uh, that's it. So you reloading configuration like that, or you can just start via system c t L or something depending on your system.

Right? And, uh, this is how you can also, Like this is a way to connect through locally. By the way, Unix Domain sockets are interesting thing, to understand like for new users, Postgres can work through G C P IP connections or through Unix. Uh, unique Socket Connections and the Un Socket connections are local only because it's, the way in Linux how processes can, can communicate locally, only

locally. So, and they're faster. By the way. It's easy to benchmark with Pitch Bench,

[00:13:00] Michael: Yeah, I saw, uh, in researching this, I saw a blog post by Bruce Mgen looking at exactly this, and his numbers were something like 35% faster, without ss s l and like 175% faster. Once you turned s ss l on that, I was expecting some difference, but I wasn't expecting that much difference.

[00:13:21] Nikolay: Yeah, well, I benchmarked myself multiple times. It's very, really easy to benchmark. You just, connect picture bench through, like, through so, and with, so it's like, it's strange, but you can say dash H mean, Host. You need to specify directory where socket is located. And then in inside directory you'll see some files starting dot s dot something and port number, and you specify Dash P.

Like with T c P connections, dash H is a directory. Dash P dash P is the support number. it's looking like, uh, T P I P connection, but it's all not T P I. It's Unix. Unix Domain Circuit Connection, and, if you just specify for PG Bench, you can compare dash h local Host and and dash h uh, directory where the socket is located.

so Unique Socket directory, it's, this is a setting in pg post scale.com and, uh, you'll see different numbers definitely in terms of. T p s and latencies. And this, what does this mean? This, this means that if we have a pull pooler Pooler, on, uh, the same Host as Postgres, right?

Running locally, , pitch bouncer, for example. We probably should make it connecting to Postgres using. This type unique Domain sockets. And also if you have not a huge application and, and somehow you like combine, application and Database on the same machine sometimes, like we have small project, right?

Probably we should also make it connecting through Unix, Domain sockets, not through T C P I P and this will be beneficial for latencies, numbers. You provided, sound very, very good, right?

[00:15:08] Michael: Yeah. Have you seen similar?

[00:15:10] Nikolay: Yeah. Similar. Yes, exactly. I was, I was expecting small difference, but uh, the numbers I saw in my benchmarks, I didn't remember them. They made me think, oh, uh, Unix circuit connections are much better in terms of performance, less overhead, obviously.

[00:15:26] Michael: Yeah, that makes sense.

[00:15:28] Nikolay: Mm-hmm.

[00:15:29] Michael: did you have anything else you wanted to share on? on the topic of like what to do if you can't connect, let's say for the, for people that you mentioned,

[00:15:37] Nikolay: Well, yeah, well, just, once again, some, for some people it's very strange to see that they need to provide, uh, Unix, uh, so directory as the h meaning Host, this is strange. Maybe it's not a perfect, uh, option naming,

right? Because maybe it should be, I, I, I don't know, like some other, parameter. So it's confusing a hundred percent.

So, and Port also, oh, this, uh, file name. Has port number also strange, but what else? I think that's it. Understanding this checking listing addresses and port and PostgreSQL dot com. Also checking Ps like where is everything is located. Pg conf config, uh, which provides you understanding where configuration files are located.

It should be enough. You start understanding where PGDATA is located and how it's running, sometimes, for example, Ubuntu, unlike RHEL approach, uh, red Hat and Santos approach, configuration is not inside PGDATA. It's moved to U T C. like more canonical style style of putting files.

In this case, you also need to figure it out and understand how to find configs. But, uh, if you have root access to the Host, you should be able to find how to connect to running Postgres. Otherwise, you need to study it a little bit,

[00:17:02] Michael: one more thing to add. Ba and I'm stealing this from a really good flow chart from a blog post by Letitia Ro a few years ago. she's mentioned that if you're not connecting with psql-fu try that as well. If you, so, if, if you're trying it to use a different tool and that you've got a connection issue.

There's a chance that it's the tool being buggy, for example. Um, so that, that, I think that's a good tip. And then the final one that she mentions is if you hit an error message, read the error message. Like Postgres errors in this area are really well worded. So I mean, this is good advice in general.

I do remember the first time, uh, I sat with a, a developer and they read an error message and I was like, what? You can read these and they're actually useful. it was, um, yeah, eyeopening.

[00:17:48] Nikolay: And it's translated to many languages sometimes you see, I, I see translations of it sometimes, so yeah, I agree. I I use it all the time. So if I have issues connecting, I downgrade myself to psql-fu very quickly, and, uh, because I trust a hundred percent, I know that it's reliable. And if it, if I cannot connect using psql-fu, that's a problem somewhere, maybe in between on the server side or, or like, depends, but yeah, this is probably the.

Way to study it. You need to be able to connect using psql-fu first, and then if you know how to connect to psql-fu you bring this understanding to any client application. By the way, just to remind that some people like. If we discuss connections, we should discuss that overhead of connections is high in Postgres because it's not thread based.

It's process based. Each connection means Backend on server side. I. And, , to create new connection, you need to create Backend. That's why pooling is needed. We discussed it and then Postgres 14, it was improved. So now we can have a lot of I connections. Also some, uh, managed service providers and probably you, if you manage Postgres yourself, also need to configure, Postgres to close connection if it's not used for a long, longer period of times.

And, , previously it was possible only if you have pitch bounce or something, because Postgres itself didn't have the setting. To disconnect idle connections. Now it has, I don't think like idle connection, time out or something,

[00:19:18] Michael: something, like that.

Yeah.

[00:19:19] Nikolay: out. Maybe. I, I always forget, setting names, but it's definitely possible in newer versions of Postgres and it's worth doing because who needs all those either connections, right?

and troubleshooting connections from server side. It's sometimes challenging as well because of course we have activity with a lot of data in it, so we have client address and so on. But if connections go through poolers, which installed on different server, for example, locally with Postgres in client address, we will see wrong IP address usually right.

Sometimes, uh, you can see real client address and application name. So the, just a possibility in pitch bounce as I remember. But sometimes you just don't understand who is connected, especially if you don't follow good practice and don't distinguish, uh, usernames. Sometimes people, like, for example, hundreds of engineers in one organization, they all use some like super user or some U user with elevated.

Privileges, but it's a single username. It's a bad practice, right? Because we, it's hard to trace them. You need to understand process id. Then you need to deal with who is connected with who start, who started, what parent process. Id, then who is running this shell. It's like, It's slow and it's bad.

So in my opinion, every person should have personal username. Of course managing this is challenging. That's why sometimes people involve LD A or something

[00:20:53] Michael: Uh, well, as well as users, right? Like also applications, user per application there.

[00:20:59] Nikolay: of applications, different applications should use different usernames. It, it's, um, traditional divide and concur approach. You need to divide all of them, and then you understand who is who, and then you can adjust settings. Because at at user level, you can adjust settings. It's also possible, and then you see everything in logs properly.

It's great. But I see sometimes people don't implement this, unfortunately.

[00:21:26] Michael: Yeah, I think there are like, it makes tons of sense, but there are also like interesting side effects from, like, for example, I think it's not necessarily obvious about like object ownership kind of consequences, that kind of thing.

but yeah, definitely good

[00:21:41] Nikolay: you can, you can always say, set role to something else if you have permissions, and then you can switch, pretend that you are something else, somebody else, for example, to run some d l if you, if needed. Yeah. And, uh, the side effects, uh, it's one of the biggest mistakes in my career, side effect related to, different settings users.

Had I already explained it, maybe once. We had situation when, a Log statement was said to all, and I thought, uh, it's global. But application user had d D L. Yeah. Yeah. And I, I, I made conclusion wrong, very wrong conclusion that we are logging all queries. It's, it was so stupid. So we had downtime and it was not good at all.

It was big, big company and, uh, yeah. it was. Huge. Maybe one of the huge biggest mistakes in my career related to cuss. So you always need, if you have multiple users, you always need to check if they have, user level settings selecting from PJ roles or something. Yeah. Or maybe backslash u plus return settings. I don't remember exactly, but I think yes. So it should, in psql-fu you connect and see everything. With plus. Yeah.

What else about connections?

[00:23:04] Michael: so you had a nice thing on your notes actually, how to connect if you don't know the password, which is something I didn't know. I had to Google, so I dunno if you wanted to cover that.

[00:23:14] Nikolay: Well, first of all, if you have, you need to check inspect pg h b a conf

if if you, if you have peer there or trust, you know, you don't need password at all and you,

you

can just.

[00:23:27] Michael: yeah. You mentioned trust earlier. That is the advice on how to temporarily

[00:23:33] Nikolay: Temporarily. Yeah. Well all you can. Mm, yeah. This is the way ex Exactly. So you, you just, uh, switch it. Then you can already change passwords and then, then you can already revert it and, and connect normally with password in secure manner. Right. I will do this probably just why not?

[00:23:54] Michael: Yeah, well top voted answer on Stack Overflow agrees with you.

[00:23:58] Nikolay: Ah, okay. So yeah, it's, of course, it's a lot of actions, unfortunately. I mean, you need to edit for P G H B, then you need to reload configuration to apply it or just restart post. 'cause if you, if it's not a problem for you. And then don't forget to revert once you have, uh, normal user with normal password. Right.

[00:24:21] Michael: Sounds good. The only last thing I had was you mentioned overhead from having long running connections. So, um, other than the memory footprint, is there anything else? Like, I know long running Transactions, for example, have big, big side effects, but I couldn't think of any myself for long running connections other than that.

memory

[00:24:41] Nikolay: Yeah, well there are co couple of articles from and Andress Fre, uh, about, uh, connection, overhead and memory. He, he claimed that memory overhead was not the big, biggest problem at all. And Snapshot. Snapshot how Pocus POCUS works with Snapshot. So, If we have a lot of idle, purely idle connections, not idle Transaction, but idle, not doing anything before Postgres 14 it, uh, it was a problem and it's easy to test PG bench.

You check t p s, then you add thousand of or 2000 idle connections. of course, you need to increase max connections setting, and you test again, and you see t p s drops like 20, 30% or something like that. In my experiment was sold with Postgres maybe. 11, 12, I don't remember

exactly and this, this is how I demonstrated, to some guys, uh, you don't want to set 2000 , max connections and then allow a lot of vitals and, uh, if it's newer, post version overhead is much smaller.

I honestly didn't check myself. I trust, claims. From, from, from blog posts and so on, right? It should be. So, and it's not directly about memory. Memory and memory also is a difficult thing to measure properly memory footprint. So, so I like maybe, uh, in newer Postgres versions, maybe if you have thousands of idol connections, extra, maybe it's not a huge problem at all if you have a lot of memory also, like who knows?

But, uh, I prefer limiting this. Uh, if you have a pool and, uh, you just don't allow this to happen.

[00:26:18] Michael: Yeah, makes sense. Wonderful. I suspect we can call it there.

[00:26:24] Nikolay: Yeah, I think that's it, right? So connections working with Postgres is not possible without connecting to it.

[00:26:31] Michael: Yep.

[00:26:32] Nikolay: Well, there is single user mode of course, but it's not pleasant to work with. And even if you, if you have Transaction idea up around situation as Hanno crossing from Google Cloud platform, teaches us there's a, a recipe to avoid single user mode, the use of single user mode.

He explained it in his perfect talk. do you Vacuum every day, right? So, yeah. So connections. Unnecessary connections means backends processes. and also by the way, terminating connections is not a huge problem. PG terminate Backend is not a huge problem normally. of course, kill is not a good idea because this causes post crash and recovery immediately globally.

Postmaster watches, to see. But if you need to disconnect someone, p. With process id. Yeah, that's it. Working with pg_stat_activity is essential. Uh, what else? Like, , don't use now with PG activity, now is a mistake. You will see negative values for connection duration or, or statement duration or Transaction duration.

You need to use clock timestamp instead because it's not a, it's not, it's not a snapshot. It's not a table. It's like Row by Row.

[00:27:50] Michael: Took me a while to get my head around that. I think you've got a good tweet or something we can link to on that.

[00:27:54] Nikolay: Yeah. But yeah, I think we kept this episode quite simple. I hope it was, it was useful for some folks, maybe not for experts. It's, it was trivial information, but I, I just felt it's needed because a lot of new users,

[00:28:09] Michael: Yeah, yeah, absolutely. Well, thank you Nikolay. Thanks everybody who's been sharing it and commenting like and that kind of thing, Appreciate it. Cheers.

[00:28:18] Nikolay: Yeah. Bye-bye. Thank you.

Creators and Guests

Host

Michael Christofides

Founder of pgMustard

Host

Nikolay Samokhvalov

Founder of Postgres AI

Connections

Creators and Guests

Some kind things our listeners have said