This blog post explains the costs of a PostgreSQL connection.
TLDR; Keep the number of PostgreSQL connections low, preferably around
2*cores + hdd spindles. More connections will only cause you more trouble.
Paying attention to the number of connections to PostgreSQL helped us a lot at 6Wunderkinder. We’ve had trouble with PostgreSQL, manifesting itself in two ways:
- We were running out of memory which lead to significantly more reads from disk and in turn slower response times.
- We were running out of file handles which caused the database to crash
In both cases we had a couple hundred open connections to the database, and we were able to solve both cases by putting a PGBouncer in front of the database.
PGBouncer is a connection pool for PostgreSQL databases. We configured it to allow 20 connections to the database while providing 1000 connections to clients. Apparently 20 connections are enough for us to get the work done. Depending on your situation it might be enough to set
The issue was solved, but I still wasn’t sure what was going on. So I decided to collect every piece of information I could find about the costs of a connection in PostgreSQL 9.3!
There are two different kind of costs:
- resources necessary for global state:
- lock table: lists every lock
- procarray: lists every connection
- local data.
- resources for each connection, which is its own forked process:
- work_mem: used for sort operations and hash tables; defaults to 1MB
- max_files_per_process: postgres will only clean up when it is exceeding the limit; defaults to 1000
- temp_buffers: used only for access to temporary tables; defaults to 8MB.
According to  the memory footprint usually amounts to ~10MB. A secondary effect is once you need more memory to satisfy each connection there is more pressure on the cache since less memory is available (which was our problem).
In retrospect it sounds perfectly reasonable that reducing the number of connections helped us! Lets assume we have 370 connections:
10MB * 370 connections = 3700MB
10MB * 20 connections = 200MB.
Freeing ~3.5GB of memory was exactly what we saw when we switched to PGBouncer! We then saw the free memory being used and the performance getting better.
The second issue we had makes sense too! Let’s again assume we have 370 connections:
1000 files per connection * 370 = 370,000 files
1000 files per connection * 20 = 20,000 files.
Running out of files is not surprising any more since PostgreSQL will only clean them up when it hits its limits. Until then it relies on the OS to handle this for it.
Digging into PostgreSQL was fun, and I hope it helps you dealing with your database!
I gathered this information while working with Torsten on our database and want to thank Ryan for proofreading!
Get in touch with me, if you want to share your thoughts!
Edit: There is a discussion on Hackernews.
Edit: I added Cybertec: max_connections - Performance impacts10 to the sources. It is an experiment demonstrating how idle connections hurt performance.
- Heroku: Connection Limit Guidance
- PostgreSQL: Resource Consumption
- PostgreSQL: procarray.c
- Bruce Momjian: Inside PostgreSQL Shared Memory
- PostgreSQL: pg_locks
- PostgreSQL Wiki: PGBouncer
- PostgreSQL: Connections and Authentication
- PostgreSQL Wiki: Number of Database Connections
- Cybertec: max_connections - Performance impacts