Hans Hasselberg

Costs of a PostgreSQL connection

19 Feb 2014

This blog post explains the costs of a PostgreSQL connection.

TLDR; Keep the number of PostgreSQL connections low, preferably around 2*cores + hdd spindles[9]. More connections will only cause you more trouble.

Background

Paying attention to the number of connections to PostgreSQL helped us a lot at 6Wunderkinder. We’ve had trouble with PostgreSQL, manifesting itself in two ways:

  1. We were running out of memory which lead to significantly more reads from disk and in turn slower response times.
  2. We were running out of file handles which caused the database to crash

In both cases we had a couple hundred open connections to the database, and we were able to solve both cases by putting a PGBouncer[5] in front of the database. PGBouncer is a connection pool for PostgreSQL databases. We configured it to allow 20 connections to the database while providing 1000 connections to clients. Apparently 20 connections are enough for us to get the work done. Depending on your situation it might be enough to set max_connections[8] appropriately.

The issue was solved, but I still wasn’t sure what was going on. So I decided to collect every piece of information I could find about the costs of a connection in PostgreSQL 9.3!

Costs

There are two different kind of costs:

  1. resources necessary for global state:
    • lock table[1][5]: lists every lock
    • procarray[1][3]: lists every connection
    • local data.
  2. resources for each connection, which is its own forked process:
    • work_mem[2]: used for sort operations and hash tables; defaults to 1MB
    • max_files_per_process[2]: postgres will only clean up when it is exceeding the limit; defaults to 1000
    • temp_buffers[2]: used only for access to temporary tables; defaults to 8MB.

According to [1] the memory footprint usually amounts to ~10MB. A secondary effect is once you need more memory to satisfy each connection there is more pressure on the cache since less memory is available (which was our problem).

Fin

In retrospect it sounds perfectly reasonable that reducing the number of connections helped us! Lets assume we have 370 connections:

without PGBouncer:
10MB * 370 connections = 3700MB
with PGBouncer:
10MB * 20 connections = 200MB.

Freeing ~3.5GB of memory was exactly what we saw when we switched to PGBouncer! We then saw the free memory being used and the performance getting better.

The second issue we had makes sense too! Let’s again assume we have 370 connections:

without PGBouncer:
1000 files per connection * 370 = 370,000 files
with PGBouncer:
1000 files per connection * 20 = 20,000 files.

Running out of files is not surprising any more since PostgreSQL will only clean them up when it hits its limits. Until then it relies on the OS to handle this for it.

Digging into PostgreSQL was fun, and I hope it helps you dealing with your database!

Acknowledgements

I gathered this information while working with Torsten on our database and want to thank Ryan for proofreading!

Get in touch with me, if you want to share your thoughts!

Edit: There is a discussion on Hackernews.

Edit: I added Cybertec: max_connections - Performance impacts10 to the sources. It is an experiment demonstrating how idle connections hurt performance.

Sources

  1. Heroku: Connection Limit Guidance
  2. PostgreSQL: Resource Consumption
  3. PostgreSQL: procarray.c
  4. Bruce Momjian: Inside PostgreSQL Shared Memory
  5. PostgreSQL: pg_locks
  6. PGBouncer
  7. PostgreSQL Wiki: PGBouncer
  8. PostgreSQL: Connections and Authentication
  9. PostgreSQL Wiki: Number of Database Connections
  10. Cybertec: max_connections - Performance impacts

If you read this far you should probably follow me on twitter.