Today I Learned

A Hashrocket project

129 posts about #sql

Set foreign key to null on delete in #Postgres

Say you have a table with a foreign key:

id serial primary key
primary_image_id references images (id)

id serial primary key
post_id references posts (id)

If you attempted to delete an image from the images table while that image's ID is referenced in one of the posts you will receive an error preventing you from deleting the image. Postgres is trying to protect your data integrity by preventing foreign keys that point to records which don't exist.

To prevent errors and allow deleting records from the images table freely you must define the on delete strategy on the posts table.

One of the options available to you is set null so if you are creating a new table it will look like this:

create table posts (
  id int serial primary key,
  -- ...
  primary_image_id int references images (id) on delete set null

Now if the primary image is deleted it will set the primary_image_id to null.

This an alternative to on delete cascade which in this case will delete the post from the posts table and is not what we want.

Read the full documentation under '5.3.5. Foreign Keys'

Compute group proportions of a whole in SQL

It's easy with SQL to use aggregates to count members of a group. But what if you also want the proportion that group is of the whole. The answer is to use a window function combined with the aggregate. The window function is evaluated after the normal aggregation. i.e. it can reference the values of other post-aggregation rows.

select type, count(*), count(*)::numeric / sum(count(*)) over ()
from pets
group by type;
  type  | count |        ?column?        
 dog    |    13 | 0.52000000000000000000
 rabbit |     3 | 0.12000000000000000000
 cat    |     9 | 0.36000000000000000000

Pretty Printing JSONB Rows in PostgreSQL

Who needs a document store when you can just use PostgreSQL's JSONB data type? Viewing rows of JSONB output can be challenging though because it defaults to printing them as a single line of text.

> select '{"what": "is this", "nested": {"items 1": "are the best", "items 2": [1, 2, 3]}}'::jsonb;
 {"what": "is this", "nested": {"items 1": "are the best", "items 2": [1, 2, 3]}}
(1 row)

Fortunately, Postgres comes with a function for prettying up the format of the output of these rows -- jsonb_pretty

> select jsonb_pretty('{"what": "is this", "nested": {"items 1": "are the best", "items 2": [1, 2, 3]}}'::jsonb);
 {                                 +
     "what": "is this",            +
     "nested": {                   +
         "items 1": "are the best",+
         "items 2": [              +
             1,                    +
             2,                    +
             3                     +
         ]                         +
     }                             +
(1 row)

h/t Jack Christensen

Write A Query Result To File With Postgres

Generally when writing a query in psql a statement will be terminated with a semicolon. An alternative approach is to end it with a \g instead. This will also send the query to the Postgres server for execution.

select 1 \g

If a filename is included after the \g, then the result of the query will be written to that file instead of output to the psql session.

> select 1, 2, 3 \g query_result.txt

If we cat that file, we can see the query result.

Time: 4.293 ms
> \! cat query_result.txt
 ?column? | ?column? | ?column?
        1 |        2 |        3
(1 row)

See man psql for more details.

Age in Postgres

On this day every year I get curious about how old I actually am. Postgres has some utilities to help me out with that, like subtraction of dates.

> select now() - '1978-3-24';
14245 days 17:39:55.623866

Ah yes, 14245 days on this earth, it really makes you think. What got returned anyways, that looks weird:

> select pg_typeof(now() - '1978-3-24');

Got it, that's an interval. It doesn't really tell me much, is this interval infinite?

> select isfinite(now() - '1978-3-24');

I can squint and read isfinite as infinite and suddenly my outlook changes, but only momentarily.

Postgres, just tell me how many years I've been alive ok? None of this days stuff.

> select age('1978-3-24'::date);
39 years

Create jsonb data structure from columns

You can create a jsonb object by casting a string version of the object to jsonb:

> select '{"a": 1, "b": 2}'::jsonb;
 {"a": 1, "b": 2}

When using values from columns to create a jsonb object its not practical to construct a string. In this case you want to use jsonb_build_object:

> select jsonb_build_object('a', a.x) from (select x from generate_series(1, 10) as x) as a;
 {"a": 1}
 {"a": 2}
 {"a": 3}
 {"a": 4}
 {"a": 5}
 {"a": 6}
 {"a": 7}
 {"a": 8}
 {"a": 9}
 {"a": 10}

jsonb_build_object is a variatic function and takes a variable number of arguments. So you can add additional values like this:

jsonb_build_object('a', 1, 'b', 2)

Remove key/value from jsonb data structure

If you store jsonb in your postgres database, you will find yourself manipulating json data periodically in postgres.

Removing a key/value from a jsonb object is very simple, just use the subtraction operator.

> select '{"a": 1, "b": 2}'::jsonb - 'a';
 {"b": 2}
(1 row)

Prepare, Execute, and Deallocate Statements

In PostgreSQL, you can prepare a named statement to be executed later using prepare.

> prepare column_names (text) as
    select column_name from information_schema.columns where table_name = $1;

These statements are kept around for the duration of the session. To see the available statements, check out the pg_prepared_statements view.

> select * from pg_prepared_statements;
     name     |                                  statement                                  |         prepare_time          | parameter_types | from_sql
 column_names | prepare column_names (text) as                                             +| 2017-03-10 15:01:09.154528-06 | {text}          | t
              |   select column_name from information_schema.columns where table_name = $1; |                               |                 |

To run a prepared statement, use execute with the name of the statement and any arguments.

> execute column_names('users');

You can also delete a statement with deallocate if you'd like.

> deallocate column_names;

New PostgreSQL 9.6 slice syntax

PostgreSQL arrays support slice operations.

jack=# select ('{a,b,c,d,e,f}'::text[])[2:4];

New in PostgreSQL 9.6 is slice syntax with implicit lower and upper bounds.

jack=# select ('{a,b,c,d,e,f}'::text[])[:4];
jack=# select ('{a,b,c,d,e,f}'::text[])[3:];

Previously, array_lower and array_upper was needed to slice from a particular index to the bounds of an array.

Between Symmetric in PostgreSQL

PostgreSQL's between construct allows you to make a comparison between two values (numbers, timestamps, etc.).

> select * from generate_series(1,10) as numbers(a)
    where numbers.a between 3 and 6;

If you supply an empty range by using the larger of the two values first, an empty set will result.

> select * from generate_series(1,10) as numbers(a)
    where numbers.a between 6 and 3;

Tacking symmetric onto the between construct is one way to avoid this issue.

> select * from generate_series(1,10) as numbers(a)
    where numbers.a between symmetric 6 and 3;

BETWEEN SYMMETRIC is the same as BETWEEN except there is no requirement that the argument to the left of AND be less than or equal to the argument on the right. If it is not, those two arguments are automatically swapped, so that a nonempty range is always implied.

group by 1, 2, 3 🔢

Postgres allow group by and order by to reference column order in a sql statement. This is particularly useful when an aggregate result needs to be referenced in the group by or order by statement.

-- group by aggregate
  (created_at at time zone 'UTC' at time zone 'America/Chicago')::date,
from posts
group by (created_at at time zone 'UTC' at time zone 'America/Chicago')::date
order by (created_at at time zone 'UTC' at time zone 'America/Chicago')::date


-- group by 1, 2, 3
  (created_at at time zone 'UTC' at time zone 'America/Chicago')::date,
from posts
group by 1
order by 1

Generating UUIDs With pgcrypto

If you check out the docs for the uuid-ossp extension, you'll come across the following message.

The OSSP UUID library... is not well maintained, and is becoming increasingly difficult to port to newer platforms.

A little bit later, it says:

If you only need randomly-generated (version 4) UUIDs, consider using the gen_random_uuid() function from the pgcrypto module instead.

So, if we are using the UUID data type and only need to generate random UUIDs, we can rely on the pgcrypto extension. It comes with the gen_random_uuid() function which generates random v4 UUIDs.

> create extension "pgcrypto";

> select gen_random_uuid();

> select gen_random_uuid();

Postgres Intersect

Today I got to see some great SQL debugging that included the use of intersect to compare two datasets.

intersect computes the set intersection of the rows returned by the involved select statements.

Here's a simple implementation:

dev=# select generate_series(1,1000) intersect 
  select generate_series(10,15);

Like a lot of things in Postgres, the order seems deterministic, but not predictable.



Today I learned that PostgreSQL INSERT/UPDATE/DELETE has a RETURNING clause that returns the computed values.

We can use it to get values that are generated by database such as sequences or defaults.

Check this out:



INSERT INTO users (email) VALUES ('');
-- INSERT 0 1

INSERT INTO users (email) VALUES ('') RETURNING *;
--  id |       email
-- ----+-------------------
--   2 |
-- (1 row)

h/t @joshuadavey

Aggregate Expressions with FILTER in Postgresql

Today I learned to use FILTER on aggregate expressions such as COUNT in postgresql.

See this example:

SELECT u.login,
       COUNT(1) FILTER (WHERE r.language ilike 'ruby') as ruby,
       COUNT(1) FILTER (WHERE r.language ilike 'javascript') as js,
       COUNT(1) FILTER (WHERE r.language ilike 'elixir') as elixir,
       COUNT(1) as all_langs
FROM users u
LEFT JOIN repositories r ON ( = r.user_id)

-- login | ruby | js | elixir | all_langs
-- bill  |    5 |  2 |      3 |         15
-- karen |    2 |  7 |      4 |         19
-- bob   |    9 |  1 |      2 |         23

h/t @joshuadavey

Migrate from MySQL to Postgres

Today I wanted to see if a large client app would see any performance improvement using Postgres vs. MySQL.

Luckily there is a great tool called pgloader that can quickly scan and import a MySQL DB right into Postgres.

Homebrew FTW!

$ brew install pgloader

Create an import script to define the connection strings to each of your databases and configuration the import options you want.

-- import_mysql.load
FROM mysql://root@localhost/db_name
INTO postgresql://localhost/db_name

 WITH include drop, create tables, no truncate,
      create indexes, reset sequences, foreign keys;

Run the import:

$ pgloader import_mysql.load

If you get this error:

An unhandled error condition has been signalled:
   MySQL Error [1055]: "Expression #6 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'information_schema.rc.UPDATE_RULE' which is not functionally dependent on columns in GROUP
 BY clause; this is incompatible with sql_mode=only_full_group_by"

Then create a MySQL configuration file (if you don't have one already) and open it:

$ sudo cp $(brew --prefix mysql)/support-files/my-default.cnf /etc/my.cnf
$ sudo vim /etc/my.cnf

Make sure ONLY_FULL_GROUP_BY is not in the mode list:

# /etc/my.cnf

Now restart MySQL and try to import with pgloader again.

Checking Inequality

In most languages there is a != operator for checking inequality of two things.

Postgres also supports the synonymous <> operator for checking inequality.

> select 1 <> 1;

> select true <> false;

> select 'taco' <> 'burrito';

h/t Brian Dunn


Two swings at guassian distribution in Postgres

I've come across 2 ways to replicate a bell curve.

This is adding 3 random numbers together.

floor(random()*3)+floor(random()*3)+floor(random()*3) as y, 
from generate_series(1,10000000) group by y order by y;
 y |  count
 0 |  369905
 1 | 1111997
 2 | 2222763
 3 | 2593306
 4 | 2220452
 5 | 1111506
 6 |  370071
(7 rows)

Time: 18806.885 ms

Consistent distribution, but 18 seconds for 10 million rows!

The second method has a tail end of the curve that keeps going even as it tapers to 0. This uses the normal_rand function provided by postgres with arguments of quantity, median, and standard deviation.

select  x, count(*) 
(select abs(ceil(normal_rand(10000000,2,1))) as x) as y 
group by x order by count;
 x |  count
 0 |  214152
 1 | 1373200
 2 | 3414028
 3 | 3411424
 4 | 1359447
 5 |  214194
 6 |   13234
 7 |     319
 8 |       2
(9 rows)

Time: 3221.894 ms

I want the peak of the curve around 2 and 3 but for this method of Gaussian distribution I get results below 0 unless I use abs. And its fast, 3 seconds!

Ever so slightly faster intervals in postgres

When working with dates in postgres I often construct an interval like so:

> select (2 || ' days')::interval;
2 days

Which is fine! But there's an alternative that's ever so slightly faster.

> select 2 * '1 day'::interval;
2 days

The time difference can be seen over large quantities of data, like 10 million.

> select (x || ' days')::interval from generate_series(1, 10000000) as x;
Time: 10437.314 ms
 > select x * '1 days'::interval from generate_series(1, 10000000) as x;
Time: 7831.568 ms

Microbenchmarking in action! Remember, take all microbenchmarks with a grain of salt.

Postgres age function #postgresql

If you want to select records according to a specific interval, like Rails' ActiveSupport 1.year.ago PostgreSQL has you covered.

The age function returns an interval type and can be used in queries like so:

select * from sometbl where age(created_at) > '1 year';

By default the age function will use the current system time to calculate the age. If you want to calculate the age relative to a different time you can simply pass in another argument:

psql> select age(timestamp '2016-08-28', timestamp '1957-06-13');
 59 years 2 mons 15 days
(1 row)

You can also use the make_interval function to create intervals using numeric parameters:

psql> select make_interval(1,2,3,4,5);
 1 year 2 mons 25 days 05:00:00
 (1 row)

In a query it can be used like so:

select * from sometbl where age(created_at) > make_interval(1);

to select rows with created_at older than one year from now.

Read more about the age function and other cool date/time manipulations see the official documentation.

h/t Jack Christensen

Psql Command with Tuples Only

With psql, we can run any string command once and exit using the -c (command) flag. Here's an example:

$ psql hr-til_development -c 'select title, slug from posts limit 5';
                  title                   |    slug
 Start rails server in production mode    | 97e26f5f68
 Enable ScreenSharing from a terminal     | 55dfe1a1a3
 Variable Hoisting in JavaScript          | eeedb8dda0
 Three ways to compile Elixir             | f2a423d1e1
 Indent and de-indent line in INSERT MODE | 8de0fb5d93
(5 rows)

If you are parsing the result, the headers can get in the way. This is where the -t (tuples only) flag is useful. Same query:

$ psql hr-til_development -tc 'select title, slug from posts limit 5';
 Start rails server in production mode    | 97e26f5f68
 Enable ScreenSharing from a terminal     | 55dfe1a1a3
 Variable Hoisting in JavaScript          | eeedb8dda0
 Three ways to compile Elixir             | f2a423d1e1
 Indent and de-indent line in INSERT MODE | 8de0fb5d93

Notice the headers and row count are gone. Toggle this in a psql session with the \t command.

Grep your Database 🔎

Wading through a database full of text (as might support a CMS) and feeling lost? Sometimes you gotta ditch the queries and just start grepping.

This Stack Overflow post sums up the idea. Here's a version of the command we used today to pick through a text-rich database:

$ curl `heroku pg:backups public-url -rproduction` | pg_restore | egrep -o '[a-z]+\'

This found, via a regex match, all subdomains of our target URL, in any text field of the production database. Pretty awesome.

Treat null as if it is a known value #postgresql

When you query a table sometimes you want to check if a nullable field is not equal to a value. For example:

select * from sometable where afield != 'avalue';

However the query above will exclude rows where afield is null, so you would typically add that as an additional condition:

select * from sometable where afield is null or afield != 'avalue';

When you are doing it once it may be ok but as queries get bigger this makes the query messy and harder to read. Fortunately Postgres offers a more idiomatic way to check if a value does not equal something, including null values: is distinct from and is not distinct from.

select * from sometable where afield is distinct from 'avalue';

This query will return all the rows where afield is null or anything but avalue. Conversely:

select * from sometable where afield is NOT distinct from (select x from y limit 1);

will return all the values that are equal to the result of the subquery above and is useful when the result of the subquery could be null.

h/t Jack Christensen

Original docs:

`NULLIF` in Postgres

If there multiple values in a database column that both mean the the column for that row has no data, then the NULLIF function can be used in queries to normalize the values in that column.

Null char is 😵

chriserin=# \pset null 😵
Null display is "😵".
chriserin=# select nullif('NOTHING', 'NOTHING');
(1 row)

chriserin=# select nullif('something', 'NOTHING');
(1 row)

Nulls Last, Nulls First

Postgres query sorting has a great feature,nulls first/nulls last, that helps us specify where in a query result null values appear.

Nulls in Postgres are sorted as larger than any other value, so this is a descending sort containing null values (💀 == null):

example=# select * from users order by full_name desc;
 Tracy Jordan
 Liz Lemon
 Jack Donaghy

Reverse the null position with nulls last:

 example=# select * from users order by full_name desc nulls last;
 Tracy Jordan
 Liz Lemon
 Jack Donaghy
(6 rows)

Here's the opposite query, ascending:

example=# select * from users order by full_name asc;
 Jack Donaghy
 Liz Lemon
 Tracy Jordan
(6 rows)

Now, reverse the null position with nulls first:

example=# select * from users order by full_name asc nulls first;
 Jack Donaghy
 Liz Lemon
 Tracy Jordan
(6 rows)

Lots of ways to explore our data.

Assumed Radius Of The Earth In PostgreSQL

Using the earthdistance module, we can get the assumed radius of the earth (in meters).

> create extension cube;

> create extension earthdistance;

> select earth();

Create Database from a Template

Today I got to see Postgres' createdb invoked with a specified template, as such:

$ createdb -T prod_dump myapp_development

The -T feature flag corresponds to the TEMPLATE modifier in a SQL CREATE DATABASE statement. In the example above, this will create myapp_development that is a deep clone (structure and data) of prod_dump.

For more info, see:

$ man createdb


$ psql
user=# \h create database;

h/t Brian Dunn

Is the hstore extension enabled?

Have you already enabled the Hstore extension in Postgres?

Running the below command in the psql console will let you know what extensions have been installed.


Determining The Age Of Things

In PostgreSQL, we can determine the age of something (or someone) by passing a timestamp to the age function.

For instance, if we want to know how long it has been since y2k, we can run the following query:

> select age(timestamp '2000-01-01');
 16 years 4 mons 12 days

Additionally, if we want to know the amount of time between two dates, we can pass two timestamps to the age function.

For example, we can find out how old Prince lived to be by passing in the date of death and then date of birth:

> select age(timestamp 'April 21, 2016', timestamp 'June 7, 1958');
 57 years 10 mons 14 days

h/t Josh Davey

Dump And Restore A PostgreSQL Database

PostgreSQL comes with two command-line utilities for dumping and then restoring a database -- pg_dump and pg_restore, respectively.

Using the pg_dump with the -Fc flag will create a dump of the given database in a custom format. The output of this command can be redirected into a file (the .dump extension is a standard convention):

$ pg_dump -Fc my_database > my_database.dump

Using the custom format option provides a couple benefits. The output is significantly compressed in comparison to a generic SQL dump. The dump and restoration is more flexible. Lastly, the dump can be performed in parallel if your machine has multiple cores to work with. Likewise, the restoration can be done in parallel with multiple jobs.

To restore the dump, create a fresh database and then use pg_restore:

$ createdb my_new_database
$ pg_restore -d my_new_database my_database.dump

Note: the dumped tables will depend on some user role. You will need to ensure that this role exists on the database cluster where the restore is happening. You can use the createuser command if necessary.

See the pg_dump docs and pg_restore docs for more details.

List Database Objects With Disk Usage

I'll often times use \d or \dt to check out the tables in my database. This shows the schema, object name, object type (e.g. table), and owner for each.

By adding the + to that meta-command, I can also see the disk usage for each database object.

Here is an example of look at all tables in a database with the additional Size (or disk usage) information:

> \dt+
                              List of relations
 Schema |        Name        | Type  |   Owner    |    Size    | Description
 public | amount_types       | table | jbranchaud | 16 kB      |
 public | ingredient_amounts | table | jbranchaud | 8192 bytes |
 public | ingredient_types   | table | jbranchaud | 16 kB      |
 public | ingredients        | table | jbranchaud | 48 kB      |
 public | recipes            | table | jbranchaud | 16 kB      |
 public | schema_migrations  | table | jbranchaud | 16 kB      |
 public | users              | table | jbranchaud | 16 kB      |

Add a Comment in Postgres

Today I learned about a neat Postgres feature, commenting. Here is a statement adding a comment to a table:

hr-til_development=#  comment on table developers is 'these are the authors';

Let's check it out:

hr-til_development=# \dt+ developers;
List of relations
-[ RECORD 1 ]----------------------
Schema      | public
Name        | developers
Type        | table
Owner       | jwworth
Size        | 16 kB
Description | these are the authors

According to the documentation, you can comment on many things. Use this wherever you feel a comment is necessary or appropriate.

h/t Chris Erin and Jack Christensen

Get The Size Of An Index

Want to get an idea of how much disk space that additional index is taking up? In PostgreSQL, you can query for it with the same methods discussed in Get The Size Of A Table and Pretty Print Data Sizes.

For instance, if I have a table with a users_pkey index and a users_unique_lower_email_idx index, I can check the sizes like so:

> select pg_size_pretty(pg_relation_size('users_pkey'));
 240 kB

> select pg_size_pretty(pg_relation_size('users_unique_lower_email_idx'));
 704 kB


Change The Current Directory For psql

When you start a psql session, your current directory is what psql will use as its current directory. This is important for meta-commands that use relative paths based on the current directory -- for instance, the \i meta-command for importing files.

You can change the current directory within a psql session using the \cd meta-command.

If my current directory is home and there is a sql directory in `home, these commands will do the following:

\! pwd
-- /home
\cd sql
\! pwd
-- /home/sql

The \cd meta-command even supports tab completion relative to the current directory.

You can also change to your home directory using just \cd.

Use Not Valid To Immediately Enforce A Constraint

When adding a constraint to a table, you can optionally include not valid. This tells Postgres that it doesn't need to enforce the constraint on existing records in the table. At least not immediately. This constraint will be enforced for any updates and subsequent insertions. Thus, you can immediately enforce the constraint while giving yourself time to clean up or massage any existing records that conflict with the constraint.

Here is an example of how you would add a constraint this way:

alter table boxes
add constraint check_valid_length
check (length > 0) not valid;

Eventually, you will want to ensure that all data in the table conforms to the constraint. Once you get to that point, you can mark the constraint as valid with a validate constraint command:

alter table boxes
validate constraint check_valid_length;

As long as all records are valid with respect to this constraint, it will be marked as valid.

h/t Chris Erin

Lower Is Faster Than ilike

There are a couple ways to do a case-insensitive comparison of data in PostgreSQL. One way is to use the ilike operator for comparison. Another way is to use the lower() function on both sides of the = operator for comparison. Using lower() is a bit faster than using ilike.

When comparing

select * from users where email ilike '';


select * from users where lower(email) = lower('');

we find (via explain analyze) that using lower() was taking around 12ms where as the ilike example was taking around 17ms.

We earn orders of magnitude in performance when adding a functional index that uses the lower() function like so:

create unique index users_unique_lower_email_idx on users (lower(email));

After adding this index, the example using lower() drops to around 0.08ms.

For the full example and explain analyze outputs, see this document.

Add ON DELETE CASCADE To Foreign Key Constraint

The alter table command lets you do quite a bit. But when it comes to altering existing constraints, there is not much you can do. If you want to add an on delete cascade to an existing foreign key constraint, you are going to need two statements.

The first statement will drop the constraint and the second statement will recreate it with the addition of the on delete clause. Furthermore, you'll want to do this in a transaction to ensure the integrity of your data during the transition between indexes.

Here is an example:


alter table orders
drop constraint orders_customer_id_fkey;

alter table orders
add constraint orders_customer_id_fkey
foreign key (customer_id)
references customers (id)
on delete cascade;



Print The Query Buffer In psql

I'll often be composing a PostgreSQL query in Vim and decide I want to give it a try in psql. I copy the relevant snippet of SQL to my system buffer and then paste into psql. I'm usually hit with a mess of text like this though:

jbranchaud=# create table nullable_fields (
jbranchaud(#   id serial primary key,
  first varchar,
  last varchar
  id serial primary key,
jbranchaud(#   first varchar,
  last varchar
  first varchar,
jbranchaud(#   last varchar
  last varchar
jbranchaud(# )

Yikes. That's not readable. Fortunately, psql provides a command for printing the current contents of the query buffer. By typing \p I'll see a more readable version of what I just pasted in.

jbranchaud-# \p
create table nullable_fields (
  id serial primary key,
  first varchar,
  last varchar

After taking another glance at the snippet of SQL, I decide to complete the query to create my new table.

jbranchaud-# ;

Using Expressions In Indexes With PostgreSQL

Though we usually see column names by themselves when defining an index, it is also possible to create an index with an expression.

Let's say I have a users table with an email column. Then I may end up creating an index like this

create index email_idx on users (email);

If I always perform queries on the email column with the lower() function, like this

select * from users where lower(email) = lower('');

then I will want to also create an index with that full expression -- lower(email)

I can do this with a statement like the following

create index lower_email_idx on users (lower(email));

Without an index that uses the full lower(email) expression, select statements like the one above will be forced to do full sequential scans instead of indexed scans.

List Various Kinds Of Objects

Our PostgreSQL database can end up with all kinds of objects: tables, sequences, views, etc. We can use a variety of psql meta-commands to list the different types of (user-created) objects in our database.

  • \dt will list all the tables
  • \dE will list all the foreign tables
  • \di will list all the indexes
  • \ds will list all the sequences
  • \dv will list all the views
  • \dm will list all the materialized views

These can also be combined. For instance, to see all the tables and sequences, we can run \dts.

Creating Custom Types in PostgreSQL

PostgreSQL has support for creating custom types. When you need something more expressive than the built-in types and you don't want your data spread across multiple columns, you can instead create a custom type.

create type dimensions as (
  width integer,
  height integer,
  depth integer

This new type can then be used in the definition of a new table

create table moving_boxes (
  id serial primary key,
  dims dimensions not null

and when inserting data

insert into moving_boxes (dims) values (row(3,4,5)::dimensions);

See the create type docs for more details.

Show All Versions Of An Operator

We may be familiar with PostgreSQL's containment operator (@>). Maybe we've used it with an array before, so we understand the general idea. But now we are curious about what are the other types with which this containment operator can be used.

We can quickly find out the answer with the \do command in psql:

> \do @>
                               List of operators
   Schema   | Name | Left arg type | Right arg type | Result type | Description
 pg_catalog | @>   | aclitem[]     | aclitem        | boolean     | contains
 pg_catalog | @>   | anyarray      | anyarray       | boolean     | contains
 pg_catalog | @>   | anyrange      | anyelement     | boolean     | contains
 pg_catalog | @>   | anyrange      | anyrange       | boolean     | contains
 pg_catalog | @>   | jsonb         | jsonb          | boolean     | contains

The Left arg type and Right arg type columns tell us what we need to know.

This \do command can be used with any operator for a similar set of information.

h/t Bruce Momjian

Clear The Screen In psql

The psql interactive terminal does not have a built-in way of clearing the screen. What I usually do if I really need the screen cleared is quit, run clear from the shell, and then open a new psql session. This is unnecessary though. Instead, we can use the \! command to execute a shell command, in this case, the clear command.

> \! clear

This clears the screen in one step and keeps our current session running.

See the psql docs for more details.

Storing Emails With citext

Email addresses should be treated as case-insensitive because they are. If a user is trying to sign in with their email address, we shouldn't care if they type or Both of those email addresses should be treated as equal and ultimately lead us to the same User record.

With the citext extension in PostgreSQL, we can create a column that acts as a case-insensitive text type. Any comparisons on a column of that type will internally have the lower function executed on the arguments.

The following example shows this in action:

create extension if not exists citext;

create table citext_emails (
  id serial primary key,
  email citext not null unique

insert into citext_emails (email) values ('');

select * from citext_emails where email = '';
--  id |      email
-- ----+------------------
--   1 |

See citext_emails.sql for a full example.

Renaming A Table

Using the alter table command in PostgreSQL, you can rename an existing table. This command will also update any references to the table such as via foreign key constraints. Just run a command like the following:

alter table ingredient_types rename to item_types;

Note that this may result in breaking a number of conventions. Foreign keys, sequences, and constraints with names eponymous to the original table will no longer follow convention despite the references being updated. These can be renamed as well if desired.

See renaming_table.sql for a full example.

See the alter table docs for more details.

Psql Help

Psql includes a built-in help command that is amazing.

With no arguments, it returns all the help:

app_development=# \h
Available help:

With a Postgres command as an argument, it returns a specific help:

app_development=# \h abort;
Command:     ABORT
Description: abort the current transaction

And with an asterisk, it prints the full description for all the help commands.

Plus we get this nicety:

Note: To simplify typing, commands that consists of several words do not have to be quoted. Thus it is fine to type \help alter table.

Write better SQL in your REPL today!

h/t Jack Christensen

Salt And Hash A Password With Postgres' pgcrypto

The pgcrypto extension that ships with PostgreSQL can be used to do a number of interesting things. This includes functions for doing salted password hashing. Using the crypt and gen_salt functions, we can securely store a user password and later compare it to plain-text passwords for authentication purposes.

create extension pgcrypto;

select crypt('pa$$w0rd', gen_salt('bf'));

> select (
    '$2a$06$Z7wmrkYMOyLboLcULUYzNe6nHUcWywSZTt6nSrT5Xdv/VLdJ4g99K' =
  ) as matched;

> select (
    '$2a$06$Z7wmrkYMOyLboLcULUYzNe6nHUcWywSZTt6nSrT5Xdv/VLdJ4g99K' =
  ) as matched;

The salt value is generated using the blowfish encryption algorithm (hence, the 'bf'). There is support for other algorithms such as md5.

See the pgcrypt documentation for more details.

Watch for database changes on #PostgreSQL

If you are trying to debug multi-threaded environments or a really fast job queue you might need to watch a table for changes.

In psql use the \watch command like so:

\watch 1 "select * from job_queue";

This will run the query every 1 second (you can change the second argument if you need it slower) and display the result as well as past results.

Types and type casting in #PostgreSQL

To see the type of column or any entity in PostgreSQL use pg_typeof. Here's an example:

select pg_typeof(array['thing']);


-- pg_typeof
-- ---------
-- text[]

To cast to another type use the :: operator:

select pg_typeof(array['thing']::varchar[]);


--      pg_typeof
-- -------------------
-- character varying[]

h/t Josh Branchaud

Quickly see the contents of a table in #PostgreSQL

Next time you want to see the contents of a table in Postgres' CLI don't type the whole:

select * from name_of_table;

Just use:

table name_of_table;

h/t Josh Branchaud

Terminating A Connection

Consider the scenario where you are trying to drop a database, but there are existing connections.

$ dropdb sample_db
dropdb: database removal failed: ERROR:  database "sample_db" is being accessed by other users
DETAIL:  There is 1 other session using the database.

If you don't know where these connections are, you can terminate them within a psql session. You just have to figure out the pid of those connections. In List Connections To A Database, I explained how to get at the pid values of connections. Using the pid value and pg_terminate_backend(), you can terminate a connection.

> select pg_terminate_backend(12345);

To terminate all connections to a particular database, use a query like the following:

select pg_terminate_backend(
from pg_stat_activity
where pg_stat_activity.datname = 'sample_db'
  and pid <> pg_backend_pid();

This excludes the current session, so you'll need to exit psql as well before dropping the database.


List Connections To A PostgreSQL Database

The pg_stat_activity table can be used to determine what connections there currently are to the PostgreSQL server and to a particular database. To see the process ids and usernames of all connection to your PostgreSQL server, run the following query:

> select pid, usename from pg_stat_activity;
  pid  |  usename
 57174 | jbranchaud
 83420 | jbranchaud

Include datname in the requested columns to figure out the database of each connection.

> select pid, usename, datname from pg_stat_activity;
  pid  |  usename   |  datname
 57174 | jbranchaud | hr_hotels
 83420 | jbranchaud | pgbyex

The results can be restricted to a particular database as necessary.

> select pid, usename from pg_stat_activity where datname = 'hr_hotels';
  pid  |  usename
 57174 | jbranchaud

Query Size of Postgres Array

Today I learned how to query the size of a Postgres array. There are (at least) two methods that work.

The first is array_length. This requires you to know the array dimension you seek; a useful filter, although most of the time this will probably be 1:

hr-til_development=# select title from posts
hr-til_development=# where array_length(slack_notified_at_likes_threshold, 1) = 1;
 Because Javascript
 Percent Notation
 DIY Grids for Designing UI in Illustrator
(3 rows)

When the dimension is 1, a more terse solution is cardinality:

hr-til_development=# select title from posts
hr-til_development=# where cardinality(slack_notified_at_likes_threshold) = 1;
 Because Javascript
 Percent Notation
 DIY Grids for Designing UI in Illustrator
(3 rows)


Default to Empty Array in Postgres

Today I added an array of integers to a Postgres table, and like all such migrations, it will default to null for each new record.

This was a problem, because I wanted to use Rails 4's built-in Postgres array support to make decisions based on that data. Ruby array methods like include? will raise NoMethodError: undefined method 'include?' for nil:NilClass if that array is ever nil, which it is by default.

This led me to learn how to set the default value to an empty array using Postgres array literal syntax. I was then able to include a not null constraint as an added benefit:

# db/migrate/20160211043316_add_slack_notified.rb

def up
  execute <<-SQL
    alter table posts
      add column slack_notified integer[] not null default '{}';

Sets With The Values Command

You can concisely create sets of values in PostgreSQL using the values command.

> values (1), (2), (3);

You can even create multiple columns of values.

> values (1, 'a', true), (2, 'b', false);
 column1 | column2 | column3
       1 | a       | t
       2 | b       | f

This is most often used with an insert command, but can be used on its own, as a subquery, within a CTE, etc.


Listing All Rows In A Table

In PostgreSQL, perhaps the more common way to list all rows in a table is with the following select command:

select * from bedding_types;

There is an alternative approach that also selects all rows from a table. It's essentially a shorthand -- the table command.

> table bedding_types;
 No Bed
 1 Full
 1 Double
 2 Double
 1 Twin
 2 Twins
 1 Queen
 2 Queen
 1 King
 2 Kings
 3 Kings
 Sofa Bed

h/t Jack Christensen

List Available Schemas

Use the \dn command within a psql session to list the available schemas. This will only included user created schemas. This means that schemas like public will be listed whereas schemas like information_schema and pg_catalog will not.

You can use \dnS to also list system schemas.


Be aware! Postgres rounds.

Yesterday, my pair and I created a test that calculated a value and compared that to the value of a calculation in the code we were testing. This worked out great except for one hitch, we were asserting about the derived value after it had been inserted into the database. What we didn't count on is that Postgres rounds. Check this out:

create table money (amount numeric(4, 2));
insert into money (amount) values (10.342) returning amount;

insert into money (amount) values (10.347) returning amount;

Postgres rounds!

Postgres permissions to insert, but not return.

Postgres permissions can be harsh. If you don't have the select permission on a table you might not be able to do some very rails like things that you think you ought to be able to do, take my friend Jimmy.

set role 'admin';
create role 'jimmy';
create table fruits (id serial primary key, name varchar);
grant insert on fruits to jimmy;
grant usage on fruits_id_seq to jimmy;

Given these permissions it's possible for Jimmy to make inserts into the fruit table like so:

set role 'jimmy';
insert into fruits (name) values ('Apple');

But Rails wants a little more, it wants to know the id of the thing that was just created which is a problem because Jimmy doesn't have select permissions.

set role 'jimmy';
insert into fruits (name) values ('Orange') returning id;
ERROR:  permission denied for relation fruits

Argh this complicates matters, but I'll relent and give Jimmy the appropriate permissions so that he can add records through ActiveRecord.

set role 'admin';
grant select on fruits to jimmy;
set role 'jimmy';
insert into fruits (name) values ('orange') returning id;

Add value to Postgres enum type, dropping tho...

An enum type in Postgres is a type that can restrict a value to only certain values. It can be defined like this:

create type fruit as Enum ('orange', 'apple');

We can now cast a string to a fruit.

chriserin=# select 'orange'::fruit;

Well... some strings...

chriserin=# select 'rasberry'::fruit;
ERROR:  invalid input value for enum fruit: "rasberry"

It's all good! We can add rasberry to the enum type.

chriserin=# alter type fruit add value 'rasberry';
chriserin=# select 'rasberry'::fruit;

Postgres allows you to add values to an enum type, but you can't drop values which is a bit of an inconvenience when creating up/down db migrations like you do in Rails or with a tool like tern.

Set A Seed For The Random Number Generator

In PostgreSQL, the internal seed for the random number generator is a run-time configuration parameter. This seed parameter can be set to a particular seed in order to get some determinism from functions that utilize the random number generator. The seed needs to be something between 0 and 1.

We can see this in action by setting the seed and then invoking random() a couple times. Doing this twice, we will see the reproducibility we can achieve with a seed.

> set seed to 0.1234;

> select random();

> select random();
(1 row)

> set seed to 0.1234;

> select random();

> select random();

The seed can also be configured with the setseed() function.

See the PostgreSQL docs for more details.

Shuffle an array in postgres

Somtimes an array is just too sequential

select ARRAY[1,2,3,4,5];
chriserin=# select ARRAY[1,2,3,4,5];
(1 row)

I want to shuffle this array into an unpredictable mess of numbers, and one way to do this in PostgreSQL is to turn these array elements into rows, order them randomly and then squash the rows back into an array like so:

chriserin=# select array_agg(foo.x) from (select unnest(ARRAY[1,2,3,4,5]) x order by random()) foo;
(1 row)

The key here is the order by random() clause, which is a special postgres case. order by 0.284325 which is a number that random() could return, is not valid sql. There are dragons in order by, read here for more info

Pretty Print Data Sizes

Use the pg_size_pretty() function to pretty print the sizes of data in PostgreSQL. Given a bigint, it will determine the most human-readable format with which to print the value:

> select pg_size_pretty(1234::bigint);
 1234 bytes

> select pg_size_pretty(123456::bigint);
 121 kB

> select pg_size_pretty(1234567899::bigint);
 1177 MB

> select pg_size_pretty(12345678999::bigint);
 11 GB

This function is particularly useful when used with the pg_database_size() and pg_relation_size() functions.

> select pg_size_pretty(pg_database_size('hr_hotels'));
 12 MB

Get The Size Of A Table

In Get The Size Of A Database, I showed a PostgreSQL administrative function, pg_database_size(), that gets the size of a given database. With the pg_relation_size() function, we can get the size of a given table. For instance, if we'd like to see the size of the reservations table, we can executing the following query:

> select pg_relation_size('reservations');

This gives us the size of the reservations table in bytes. As you might expect, the referenced table needs to be part of the connected database and on the search path.

See the Postgres docs for more details.

Get The Size Of A Database

If you have connect access to a PostgreSQL database, you can use the pg_database_size() function to get the size of a database in bytes.

> select pg_database_size('hr_hotels');

Just give it the name of the database and it will tell you how much disk space that database is taking up.

Checkout the Postgres docs for more details.

Say "midnight" in postgres like a rocket scientist

nasa=# select 'allballs'::time;
(1 row)

We are hopeful that the roots of this silly time literal go back to some NASA slang:

array indexes in postgres

Lets say I have an array of fruit types (Apple, Pear, Banana, Tangerine) and I want to turn each element in the array into a row I can do this.

select unnest(fruits) 
from (select Array['Apple', 'Pear', 'Tangerine', 'Banana'] fruits) as list;

which returns:

(4 rows)

But it would be nice to have an array index as column in case the order of the elements in this array had some meaning. Postgres has a function generate_subscripts to help you with array index elements.

select generate_subscripts(fruits, 1), unnest(fruits) from (select Array['Apple', 'Pear', 'Tangerine', 'Banana'] fruits) as list;

which returns:

 generate_subscripts |  unnest
                   1 | Apple
                   2 | Pear
                   3 | Tangerine
                   4 | Banana

Throw a minus 1 in there if you want to be base zero :).

Day Of Week By Name For A Date

In Day Of Week For A Date, I explained how to determine what day of the week a date is as an integer with PostgreSQL. This used the date_part() function. By using the to_char() function with a date or timestamp, we can determine the day of the week by name (e.g. Monday). For instance, to determine what day today is, try a statement like the following:

> select to_char(now(), 'Day');

The Day part of the second argument is just one of many template patterns that can be used for formatting dates and times.

See Data Type Formatting Functions in the Postgres docs for more details.

Day Of Week For A Date

Given a date in PostgreSQL

> select '2050-1-1'::date;

you can determine the day of the week for that date with the date_part() function

> select date_part('dow', '2050-1-1'::date);

The days of week are 0 through 6, 0 being Sunday and 6 being Saturday.


pg Prefix Is Reserved For System Schemas

Have you ever tried to create a schema with pg_ as the first part of the name of the schema? If so, you probably didn't get very far. Postgres won't let you do that. It reserves the pg_ prefix for system schemas. If you try to create a schema in this way, you'll get an unacceptable schema name error.

> create schema pg_cannot_do_this;
ERROR:  unacceptable schema name "pg_cannot_do_this"
DETAIL:  The prefix "pg_" is reserved for system schemas.

Escaping A Quote In A String

In PostgreSQL, string (varchar and text) literals are declared with single quotes ('). That means that any string containing a single quote will need some escaping. The way to escape a single quote is with another single quote.

> select 'what''s up!';
 what's up!


Compute Hashes With pgcrypto

The pgcrypto extension that comes with PostgreSQL adds access to some general hashing functions. Included are md5, sha1, sha224, sha256, sha384 and sha512. Any of these hashing functions can be applied to an arbitrary string using the digest function. Here are example of the md5 and sha1 algorithms:

> create extension pgcrypto;

> select digest('Hello, World!', 'md5');

> select digest('Hello, World!', 'sha1');

See the pgcrypto docs for more details.

Custom order #Postgres #SQL union queries

Consider the following data:

insert into users (email, first, last) values ('', 'New', 'User');
insert into users (email, first, last) values ('', 'Lemmy', 'Kilmister');
insert into users (email, first, last) values ('', 'Lemmy', 'Kilmister');

Assuming you want to match a user to the following data:

{ email: '', first: 'New', last: 'User' }

And you consider email a better indication of a match, and first and last name are a fallback. The following query will not yield the desired results despite the where conditions being ordered per the requirements:

select * from users where email = '' or (first = 'New' and last = 'User') limit 1;

This will result in:

1 | '' | 'New' | 'User'

To find by email first we will need to union two queries, rank the match then order by rank:

(select 1 as rank, id, email, first, last from users where email = '')
(select 2 as rank, id, email, first, last from users where first = 'New' and last = 'User')
order by rank
limit 1;

This will yield the correct result:

3 | '' | 'Lemmy' | 'Kilmister'

Creating Conditional Constraints in PostgreSQL

There are times when it doesn't make sense for a constraint to apply to all records in a table. For instance, if we have a table of pokemon, we may only want to apply a unique index constraint to the names of non-wild pokemon. This can be achieved in PostgreSQL with the following conditional constraint:

create unique index pokemons_names on pokemons (names)
where wild = false;

If we try to insert a non-wild pokemon with a duplicate name, we will get an error. Likewise, if we try to update a pokemon with a duplicate name from wild to non-wild, we will get an error.


Sleeping In PostgreSQL

Generally you want your SQL statements to run against your database as quickly as possible. For those times when you are doing some sort of debugging or just want your queries to look very computationally expensive, PostgreSQL offers the pg_sleep function.

To sleep for 5 seconds, try the following:

> select now(); select pg_sleep(5); select now();
 2016-01-08 16:30:21.251081-06
(1 row)

Time: 0.274 ms

(1 row)

Time: 5001.459 ms
 2016-01-08 16:30:26.252953-06
(1 row)

Time: 0.260 ms

As you'll notice, the pg_sleep statement took about 5 seconds.


Is It Null Or Not Null?

In PostgreSQL, the standard way to check if something is NULL is like so:

select * as wild_pokemons from pokemons where trainer_id is null;

To check if something is not null, you just add not:

select * as captured_pokemons from pokemons where trainer_id is not null;

PostgreSQL also comes with ISNULL and NOTNULL which are non-standard ways of doing the same as above:

select * as wild_pokemons from pokemons where trainer_id isnull;
select * as captured_pokemons from pokemons where trainer_id notnull;

Restarting Sequences When Truncating Tables

PostgreSQL's truncate feature is a handy way to clear out all the data from a table. If you use truncate on a table that has a serial primary key, you may notice that subsequent insertions keep counting up from where you left off. This is because the sequence the table is using hasn't been restarted. Sure, you can restart it manually or you can tell truncate to do it for you. By appending restart identity to the end of a truncate statement, Postgres will make sure to restart any associated sequences at 1.

truncate pokemons, trainers, pokemons_trainers restart identity;

Two Ways To Compute Factorial in PostgreSQL

In PostgreSQL, there are two ways to compute the factorial of a number. There is a prefix operator and a postfix operator. The prefix operator is !! and can be used like so:

> select !!5;

The postfix operator is ! and can be used like so:

> select 5!;

See the mathematical functions and operators docs for more details.

Postgres, varchar, and silent white space trimming

Given this table:

# create table initials (monogram varchar(2) primary key);

Insert a three character monogram.

# insert into initials(monogram) values ('BPD');
ERROR:  value too long for type character varying(2)

Ok, cool. Thanks Postgres. But what if my string ends in a space?

# insert into initials(monogram) values ('BP ');

What? This caused me great pain when trying to do an insert. Here in my example I've used a CTE for brevity, but imagine you can't see the input table, and it contains thousands of records. My insert went something like this:

with input as (
        select 'BD '::text as data
insert into initials(monogram)
select data from input where not exists (select 1 from initials where monogram =
ERROR:  duplicate key value violates unique constraint "initials_pkey"
DETAIL:  Key (monogram)=(BD) already exists.

A spot check based on the error message turns up no results:

select count(*) from input where data = 'BD';
(1 row)

But of course 'BD ' = 'BD', if you keep in mind that Postgres will silently truncate trailing whitespace when casting to a varchar.

Postgres Arrays are One-Based

Today I was reminded that Postgres arrays are one-based. This realization came while using Ruby's times method (zero-based index) to generate SQL targeting a Postgres array (one-based index).

Take this table:

 name  |      pay_by_quarter
 Bill  | {10000,10000,10000,10000}
 Carol | {20000,25000,25000,25000}
(2 rows)

To select the first-quarter pay for all employees, use index 1.

my_database=# select pay_by_quarter[1] from sal_emp;
(2 rows)

Using your first Ruby index value 0 to produce pay_by_quarter[0] returns a sad, empty column.

PostgreSQL's Max Identifier Length Is 63 Bytes

In PostgreSQL, identifiers -- table names, column names, constraint names, etc. -- are limited to a maximum length of 63 bytes. Identifiers longer than 63 characters can be used, but they will be truncated to the allowed length of 63.

> alter table articles
    add constraint this_constraint_is_going_to_be_longer_than_sixty_three_characters_id_idx
    check (char_length(title) > 0);
NOTICE:  identifier "this_constraint_is_going_to_be_longer_than_sixty_three_characters_id_idx" will be truncated to "this_constraint_is_going_to_be_longer_than_sixty_three_characte"

Postgres warns us of identifiers longer than 63 characters, informing us of what they will be truncated to. It then proceeds to create the identifier.

If postgres is trying to generate an identifier for us - say, for a foreign key constraint - and that identifier is longer than 63 characters, postgres will truncate the identifier somewhere in the middle so as to maintain the convention of terminating with, for example, _fkey.

The 63 byte limit is not arbitrary. It comes from NAMEDATALEN - 1. By default NAMEDATALEN is 64. If need be, this value can be modified in the Postgres source. Yay, open-source database implementations.

See the postgres docs for more details.

Adding Composite Uniqueness Constraints (Postgres)

There are two ways in Postgres to create a composite uniqueness constraint; that is, a constraint that ensures that the combination of two or more values on a table only appear once. For the following two code snippets, assume that we have a table relating Pokemon and Trainers and that our domain restricts each Trainer to only having at most one of each Pokemon.

The first approach is to create a constraint directly on the table:

alter table pokemons_trainers
  add constraint pokemons_trainers_pokemon_id_trainer_id_key
  unique (pokemon_id, trainer_id);

The second approach is to create a unique index:

create unique index pokemons_trainers_pokemon_id_trainer_id_idx
  on pokemons_trainers (pokemon_id, trainer_id);

Export Query Results To A CSV

Digging through the results of queries in Postgres's psql is great if you are a programmer, but eventually someone without the skills or access may need to check out that data. Exporting the results of a query to CSV is a friendly way to share said results because most people will have a program on their computer that can read a CSV file.

For example, exporting all your pokemon to /tmp/pokemon_dump.csv can be accomplished with:

copy (select * from pokemons) to '/tmp/pokemon_dump.csv' csv;

Because we are grabbing the entire table, we can just specify the table name instead of using a subquery:

copy pokemons to '/tmp/pokemon_dump.csv' csv;

Include the column names as headers to the CSV file with the header keyword:

copy (select * from pokemons) to '/tmp/pokemon_dump.csv' csv header;


Know your HISTFILE

Today I learned that psql has a cool feature called the HISTFILE. It's a record of your commands in the REPL, named ~/.psql_history by default.

Building off Josh's post yesterday about ~/.psqlrc settings, add this there to get a unique log for each database:

\set HISTFILE ~/.psql_history-:DBNAME

Here's the tail of my TIL log:

$ tail ~./psql_history-hr-til_development
select * from developers limit 1;
select * from developers limit 2;
select * from developers limit 3;

This is just a text file; it can be edited. You could add a complex command, or meaningful reminder to yourself, and it will be available in psql when you traverse your history (↑ or CTRL-P).

Use A psqlrc File For Common Settings

There are a handful of settings that I inevitably turn on or configure each time I open up a psql session. I can save myself a little time and sanity by configuring these things in a .psqlrc dotfile that is located in my home directory. This will ensure my psql session is configured just how I like it each time I launch it. Here is what my ~/.psqlrc file currently looks like:

\x auto
\pset null 'Ø'

Create A Composite Primary Key

The unique identifier for a given row in a table is the primary key. Generally, a row can be uniquely identified by a single data point (such as an id), so the primary key is simply that single data point. In some cases, your data can be more appropriately uniquely identified by multiple values. This is where composite primary keys can lend a hand. Consider an example plane_tickets table where each ticket can be uniquely identified by the passenger and flight it is associated with:

create table plane_tickets (
  passenger_id integer references passengers not null,
  flight_id integer references flights not null,
  confirmation_number varchar(6) not null,
  seat_assignment varchar not null,
  primary key (passenger_id, flight_id)

Truncate Tables With Dependents

In Truncate All Rows, I talked about how postgres's truncate can be used to quickly delete all rows in a table. In practice this alone won't be very useful though, because tables usually have other tables that depend on them via foreign keys. If you have tables A and B where B has a foreign key referencing A, then trying to truncate A will result in something like this:

> truncate A;
ERROR:  cannot truncate a table referenced in a foreign key constraint

Fortunately, truncate has some tricks up its sleeve.

If you know two tables are tied together via a foreign key constraint, you can just truncate both of them at once:

> truncate A, B;

If many tables are tied together in this way and you are looking to throw all of it out, then a simpler approach is to cascade the truncation:

> truncate A cascade;
NOTICE:  truncate cascades to table "B"

Use these with care and potentially within transactions because your data will go bye bye.

h/t Dillon Hafer and Jack Christensen

Truncate All Rows

Given a postgres database, if you want to delete all rows in a table, you can use the DELETE query without any conditions.

> delete from pokemons;

Though DELETE can do the job, if you really are deleting all rows to clear out a table, you are better off using TRUNCATE. A TRUNCATE query will be faster than a DELETE query because it will just delete the rows without scanning them as it goes.

> truncate pokemons;


Cleanup Postgres Databases

Today I learned that I have thirty-nine Postgres databases on my computer, after running this command inside psql:

\l ;

Each one is small, but I don't like to carrying around old data. I ended up dropping nine of them, with:

drop database foo_development;

For a lighter storage and cognitive load.

Insert Just The Defaults

If you are constructing an INSERT statement for a table whose required columns all have default values, you may just want to use the defaults. In this situation, you can break away from the standard:

> insert into table_name (column1, column2) values (value1, value2);

Instead, simply tell Postgres that you want it to use the default values:

> insert into table_name default values;

Generate A UUID In PostgreSQL

Postgres has support for universally unique identifiers (UUIDs) as a column data type via uuid. If you have a UUID column, you may need to generate a UUID. This requires the uuid-ossp module. This module provides a number of functions for generating UUIDs including the uuid_generate_v4() function which bases the UUID entirely off random numbers.

> create extension "uuid-ossp";
> select uuid_generate_v4();

See the postgres docs for more details on UUID generation functions.

Getting A Slice Of An Array In PostgreSQL

Postgres has a very natural syntax for grabbing a slice of an array. You simply add brackets after the array declaring the lower and upper bounds of the slice separated by a colon.

> select (array[4,5,6,7,8,9])[2:4];

Notice that the bounds are inclusive, the array index is 1-based, and the array declaration above needs to be wrapped in parentheses in order to not trip up the array slice syntax.

You can also select rectangular slices from two dimensional arrays like so:

> select (array[[1,2,3],[4,5,6],[7,8,9]])[2:3][1:2];

Defining Arrays In PostgreSQL

In postgres, an array can be defined using the array syntax like so:

> select array['a','b','c'];

If you are inserting into an existing array column, you can use the array literal syntax.

> create temp table favorite_numbers(numbers integer[]);
> insert into favorite_numbers values( '{7,3,9}' );
> select numbers[2] from favorite_numbers;

Postgres also supports two-dimensional arrays.

select array[[1,2,3],[4,5,6],[7,8,9]] telephone;

Postgres Unlogged

Using a Postgres table for caching? You might want to try making it unlogged.

unlogged tables are not written to the write-ahead log, which makes them much faster. This also means they are not crash-safe, and are truncated when a crash or unclean shutdown occurs. For caching purposes, that's likely to be an acceptable tradeoff.


List Postgres Database Users

Within psql, type \du to list all the users for a database and their respective permissions.

> \du
                              List of roles
 Role name  |                   Attributes                   | Member of
 jbranchaud | Superuser, Create role, Create DB, Replication | {}
 sampleuser | Create DB                                      | {}

Psql connect

Want to change database connections from inside psql? You can!

Psql, the REPL for Postgres, has a useful meta-command called \connect, or \c. This lets you establish a new connection to a Postgres server, while closing the current connection.

Here is the required format:

\c or \connect [ dbname [ username ] [ host ] [ port ] ] | conninfo

Only \c [my_database] is required; omitted parameters are taken from the previous connection.


SQL Truncate

Today during a smoke-test I watched a SQL-savvy colleague wipe a database table with the truncate syntax.

truncate users;

truncate is a feature of Postgres and is included in the SQL:2008 standard.

truncate includes a cascade option that further truncates all tables that have foreign-key references to any of the named tables, or any tables included by cascade itself. That sounds either useful or extremely destructive, depending on context.

This replaces my go-to for this type of cleanup, the Rails console:


The benefits of truncate over delete_all are that you have more control and it's probably a little faster due to requiring one less layer of abstraction.

delete_all & truncate

List All Versions Of A Function

Within psql you can use \df to list a postgres function with a given name

> \df now
                              List of functions
   Schema   | Name |     Result data type     | Argument data types |  Type
 pg_catalog | now  | timestamp with time zone |                     | normal
(1 row)

When a function has multiple definitions across a number of types, \df will list all versions of that function

> \df generate_series
List of functions
-[ RECORD 1 ]-------+-------------------------------------------------------------------
Schema              | pg_catalog
Name                | generate_series
Result data type    | SETOF bigint
Argument data types | bigint, bigint
Type                | normal
-[ RECORD 2 ]-------+-------------------------------------------------------------------
Schema              | pg_catalog
Name                | generate_series
Result data type    | SETOF bigint
Argument data types | bigint, bigint, bigint
Type                | normal
-[ RECORD 3 ]-------+-------------------------------------------------------------------
Schema              | pg_catalog
Name                | generate_series
Result data type    | SETOF integer
Argument data types | integer, integer
Type                | normal
-[ RECORD 4 ]-------+-------------------------------------------------------------------
Schema              | pg_catalog
Name                | generate_series
Result data type    | SETOF integer
Argument data types | integer, integer, integer
Type                | normal
-[ RECORD 5 ]-------+-------------------------------------------------------------------

Reset Postgres User Password

Resetting a Postgres user's password with alter role is fine, but there is a better way! Try:

\password [ username ]

This changes the password of the specified user, defaulting to the current user. I like this because it prompts you for a password, encrypts it (security!), and sends it to the server as alter role. Your password will not appear as cleartext in the command history, server log, or anywhere else.


h/t Jack Christensen

Postgres Contains/Contained By Array Operators

Today I learned about the Postgres 'contains' array operator (@>). This compares two arrays, returning true if the first array contains all of the elements of the second array.

myapp_development=# select array[1,2,3] @> array[1,3];
(1 row)

myapp_development=# select array[1,2,3] @> array[5,9];
(1 row)

It works in reverse via the 'is contained by' array operator (<@).

myapp_development=# select array[1,3] <@ array[1,2,3,4];
(1 row)

A practical example might be comparing two arrays, one of names and one of common nicknames associated with that name.

Special Math Operators

Postgres has all the mathematical operators you might expect in any programming language (e.g. +,-,*,/,%). It also has a few extras that you might not be expecting.

Factorial Operator:

> select 5!;
(1 row)

Square Root Operator:

> select |/81;
(1 row)

Absolute Value Operator:

> select @ -23.4;
(1 row)

Send A Command To psql

You can send a command to psql to be executed by using the -c flag

$ psql -c "select 'Hello, World!';"
 Hello, World!
(1 row)

Specify a particular database as needed

$ psql blog_prod -c 'select count(*) from posts;'
(1 row)

h/t Jack Christensen

Checking The Type Of A Value

The pg_typeof() function allows you to determine the data type of anything in Postgres.

> select pg_typeof(1);
(1 row)

> select pg_typeof(true);
(1 row)

If you try it on an arbitrary string, it is unable to disambiguate which string type (e.g. text vs varchar).

> select pg_typeof('hello');
(1 row)

You just have to be a bit more specific.

> select pg_typeof('hello'::varchar);
 character varying
(1 row)


Intervals Of Time By Week

It is pretty common to use hours or days when creating a Postgres interval. However, intervals can also be created in week-sized chunks

> select '2 weeks'::interval;
 14 days
(1 row)

> select make_interval(0,0,7,0,0,0,0);
 49 days
(1 row)

Use Argument Indexes

In Postgres, each of the arguments you specify in a select statement has a 1-based index tied to it. You can use these indexes in the order by and group by parts of the statement.

Instead of writing

select id, updated_at from posts order by updated_at;

you can write

select id, updated_at from posts order by 2;

If you want to group by a table's type and then order by the counts from highest to lowest, you can do the following

select type, count(*) from transaction group by 1 order by 2 desc;

Treat a subquery like a value

Postgres is cool with returning a value from a query.

select 1;

And also returning a value from a subquery.

select (select 1);

Turtles all the way down.

select (select (select 1));

But that subquery can only return one row or you'll get an error.

select (select unnest('{1, 2}'::integer[]));
ERROR:  more than one row returned by a subquery used as an expression

This is used sometimes when you need to use a count in an expression.

select round((select count(*) from pg_class where reltype = 0) / (select count(*)::float from pg_class) * 100) as percent_with_zero_reltype;

Temporary Tables

Create a temporary table in Postgres like so

create temp table posts (

This table (and its data) will only last for the duration of the session. It is created on a schema specific to temporary tables. It is also worth noting that it won't be autovacuumed, so this must be done manually as necessary.

Fizzbuzz With Common Table Expressions

In learning about CTEs (common table expressions) in postgres, I discovered that you can do some interesting and powerful things using the with recursive construct. The following solves the fizzbuzz problem for integers up to 100

with recursive fizzbuzz (num,val) as (
    select 0, ''
    select (num + 1),
      when (num + 1) % 15 = 0 then 'fizzbuzz'
      when (num + 1) % 5  = 0 then 'buzz'
      when (num + 1) % 3  = 0 then 'fizz'
      else (num + 1)::text
    from fizzbuzz
    where num < 100
select val from fizzbuzz where num > 0;

Check out With Queries (Common Table Expressions) for more details on CTEs.

Default Schema

Schemas can be used to organize tables within a database. In postgres, we can see all the schemas our database has like so

> select schema_name from information_schema.schemata;
(6 rows)

When you create a new table, it will need to be placed under one of these schemas. So if we have a create table posts (...), how does postgres know what schema to put it under?

Postgres checks your search_path for a default.

> show search_path;
 "$user", public
(1 row)

From our first select statement, we see that there is no schema with my user name, so postgres uses public as the default schema.

If we set the search path to something that won't resolve to a schema name, postgres will complain

> set search_path = '$user';
> create table posts (...);
ERROR:  no schema has been selected to create in

List All Columns Of A Specific Type

In postgres, we can access information about all the columns in our database through the information_schema tables; in particular, the columns table. After connecting to a particular database, we can list all columns (across all our tables) of a specific type. We just need to know the schema of the tables we are interested in and the type that we want to track down.

My application's tables are under the public schema and I want to track down all timestamp columns. My query can look something like this

> select table_name, column_name, data_type from information_schema.columns where table_schema = 'public' and data_type = 'timestamp without time zone';
   table_name    | column_name |          data_type
 articles        | created_at  | timestamp without time zone
 articles        | updated_at  | timestamp without time zone
 users           | created_at  | timestamp without time zone
 users           | updated_at  | timestamp without time zone
(4 rows)

Alternatively, I could look for both timestamp and timestamptz with a query like this

> select table_name, column_name, data_type from information_schema.columns where table_schema = 'public' and data_type like '%timestamp%';

Temporarily Disable Triggers

In general, you are always going to want your triggers to fire. That's why they are there. Though special circumstances may arise where you need to temporarily disable them. In postgres, you can use

> set session_replication_role = 'replica';

By changing the replication role from origin to replica you are essentially disabling all non-replica triggers across the database (for that session). When you are done, you can simply set the replication role back so that normal trigger behavior can resume

> set session_replication_role = 'origin';

A more direct and fine-grained approach to disabling triggers is to use an alter table command that targets a specific trigger.

h/t Jack Christensen

Limit Execution Time Of Statements

You can limit the amount of time that postgres will execute a statement by setting a hard timeout. By default the timeout is 0 (see show statement_timeout;) which means statements will be given as much time as they need.

If you do want to limit your statements, to say, 1 second, you can set the execution time like so

> set statement_timeout = '1s';
> show statement_timeout;
(1 row)

Any queries taking longer than 1 second will be aborted with the following message output

ERROR:  canceling statement due to statement timeout

Auto Expanded Display

By default, postgres has expanded display turned off. This means that results of a query are displayed horizontally. At times, the results of a query can be so wide that line wrapping occurs. This can make the results and their corresponding column names rather difficult to read. In these situations, it is preferable to turn on expanded display so that results are displayed vertically. The \x command can be used to toggle expanded display on and off.

Having to toggle expanded display on and off depending on the way a particular set of results is going to display can be a bit tedious. Fortunately, running \x auto will turn on auto expanded display. This means postgres will display the results normally when they fit and only switch to expanded display when it is necessary.

h/t Jack Christensen

Generate Series Of Numbers

Postgres has a generate_series function that can be used to, well, generate a series of something. The simplest way to use it is by giving it start and stop arguments

> select generate_series(1,5);

The default step is 1, so if you want to count backwards, you need to specify a negative step

> select generate_series(5,1,-1);

You can use a larger step value to, for instance, get only multiples of 3

> select generate_series(3,17,3);

Trying this out with timestamps is left as an exercise for the reader.

String Contains Another String

In postgres, you can check if a string contains another string using the position function.

> select position('One' in 'One Two Three');

It returns the 1-based index of the first character of the first match of that substring.

> select position('Four' in 'One Two Three');

If the substring doesn't appear within the string, then the result is 0.

Thus, you can determine if a string contains another string by checking if the value resulting from position is greater than 0.

Configure The Timezone

In postgres, running show timezone; will reveal the timezone for your connection. If you want to change the timezone for the duration of the connection, you can run something like

> set timezone='America/New_York';
> show timezone;
(1 row)

Now, if you run a command such as select now();, the time will be in Eastern time.

h/t Jack Christensen

Extracting values from Postgres json blob

Lets get value 'a':

select '{"a": "b"}'::json->>'a';

Howbout a nested value 'b'?

select ('{"a": {"b": "c"}}'::json->>'a')::json->>'b';

Howbout the first value of an array?

select ('{"a": [{"b": "c"}]}'::json->>'a')::json->>0;
 {"b": "c"}

There's a better way to get a nested path

select json_extract_path('{"a": {"b": "c"}}', 'a', 'b');
(1 row)

But you can't grab a value from an array with that method. But at least json_extract_path returns a value of type json so you don't have to cast it before using the feather (->>) operator.

select json_extract_path('{"a": {"b": ["c"]}}', 'a', 'b')->>0;
(1 row)

Time Zone offsets in the DB

When you need a timezone offset in the database use

select utc_offset from pg_timezone_names where name = 'Asia/Hong_Kong';

But in this case utc_offset is an interval and maybe you want an integer

select '1 hour'::interval::integer;   -- << THROWS ERROR

Throws an error cannot cast type interval to integer

But you can extract the hour from an interval

select extract(hour from utc_offset) from pg_timezone_names where name = 'Asia/Hong_Kong';

-[ RECORD 1 ]
date_part | 8

This operation can be unexpectedly expensive to run (100ms) so be careful where you run it or maybe put this rarely changing information into a materialized view.

Count Records By Type

If you have a table with some sort of type column on it, you can come up with a count of the records in that table by type. You just need to take advantage of group by:

> select type, count(*) from pokemon group by type;

  type   | count 
 fire    |    10
 water   |     4
 plant   |     7
 psychic |     3
 rock    |    12

Extracting Nested JSON Data

If you are storing nested JSON data in a postgres JSON column, you are likely going to find yourself in a situation where you need to access some of those nested values in your database code. For instance, you may need to get at the license number in this JSON column

'{ "name": "Jason Borne", "license": { "number": "T1234F5G6", "state": "MA" } }'

Unfortunately, the -> operator isn't going to do the trick. You need the json_extract_path function

> select json_extract_path(owner, 'license', 'number') from some_table;


Read more about JSON Functions and Operators.

Toggling The Pager In PSQL

When the pager is enabled in psql, commands that produce larger output will be opened in a pager. The pager can be enabled within psql by running \pset pager on.

If you'd like to retain the output of commands, perhaps as reference for subsequent commands, you can turn the pager off. As you might expect, the pager can be disabled with \pset pager off.


Check to see if a subquery returns any records

A subquery may or may not return results. And you may want to make decisions for your data set based on that.

select 'Here exists a record' as word where not exists (select 1 where false);
 Here exists a record
(1 row)

select 'no record :(' as word where not exists (select 1 where true);
(0 rows)

Some connections are just hangin' KILL THEM

Jake has covered in the past how to see connections and their queries in postgres with

select * from pg_stat_activity;

It returns a pid column and now you can use that pid to do these processes harm.

select pg_terminate_backend(pid) from pg_stat_activity where THERES THAT QUERY GET IT

Prepare / Execute

You can store a SQL query with prepare.

db=# prepare posts_search as select title from posts limit 5;

Call the method with execute.

db=# execute posts_search;
 Hello World!
 My First Pull Request: HAML
 My First Pull Request: Sinatra
 My First Pull Request: Capistrano
 My First Pull Request: SASS
(5 rows)

Deallocate the query with deallocate, and you can set it again.

db=# deallocate posts_search;
db=# prepare posts_search as select title from posts limit 10;

How does postgres get its meta data?

When you get to a psql command line prompt and enter \dt psql will output all the tables in the current schema.

But if you open psql with

psql -E database_name

postgres will show you the query it runs to satisfy \dt.

SELECT n.nspname as "Schema",
  c.relname as "Name",
  CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','')
      AND n.nspname <> 'pg_catalog'
      AND n.nspname <> 'information_schema'
      AND n.nspname !~ '^pg_toast'
  AND pg_catalog.pg_table_is_visible(c.oid)

Gist indexes understand ranges

Imagine you're selling time in a room, but that room has a limited capacity. Anybody can book an arbitrary amount of time in the future, but there can only be 5 people in the room at any given time. You'll want to query the db to ask it how many people are in the room for a certain amount of time before you sell that time.

For many many rooms and many many slices of time, you'll get a query like

select count(*) from rooms r join slices s on s.room_id = where = ? and tstzrange('2016-01-02', '2016-01-05') && s.time_range;

This is going to take in the many hundreds of ms to run, but never fear, an index is near.

alter table slices create index speed_things_up gist(room_id, time_range);

And now the above query takes 10s of ms to run!

Word Count for a Column

Assuming I have a database with a posts table:

> select * from posts where id = 1;
 id |  title   |              content               
  1 | My Title | This is the content of my article. 

I can compute the word count of the content of a given post like so:

> select sum(array_length(regexp_split_to_array(content, '\s+'), 1)) from posts where id = 1;


The most generic type of constraint

The Check constraint is the most generic type of constraint.

alter table things add CONSTRAINT corners_must_be_greater_than_0 CHECK (corners > 0);

command line SQL help


\h [NAME]              help on syntax of SQL commands, * for all commands




Description: roll back to a savepoint

/ht Brian