Today I Learned

A Hashrocket project

185 posts about #sql

Psql Help

Psql includes a built-in help command that is amazing.

With no arguments, it returns all the help:

app_development=# \h
Available help:

With a Postgres command as an argument, it returns a specific help:

app_development=# \h abort;
Command:     ABORT
Description: abort the current transaction

And with an asterisk, it prints the full description for all the help commands.

Plus we get this nicety:

Note: To simplify typing, commands that consists of several words do not have to be quoted. Thus it is fine to type \help alter table.

Write better SQL in your REPL today!

h/t Jack Christensen

Salt And Hash A Password With Postgres' pgcrypto

The pgcrypto extension that ships with PostgreSQL can be used to do a number of interesting things. This includes functions for doing salted password hashing. Using the crypt and gen_salt functions, we can securely store a user password and later compare it to plain-text passwords for authentication purposes.

create extension pgcrypto;

select crypt('pa$$w0rd', gen_salt('bf'));

> select (
    '$2a$06$Z7wmrkYMOyLboLcULUYzNe6nHUcWywSZTt6nSrT5Xdv/VLdJ4g99K' =
  ) as matched;

> select (
    '$2a$06$Z7wmrkYMOyLboLcULUYzNe6nHUcWywSZTt6nSrT5Xdv/VLdJ4g99K' =
  ) as matched;

The salt value is generated using the blowfish encryption algorithm (hence, the 'bf'). There is support for other algorithms such as md5.

See the pgcrypt documentation for more details.

Types and type casting in #PostgreSQL

To see the type of column or any entity in PostgreSQL use pg_typeof. Here’s an example:

select pg_typeof(array['thing']);


-- pg_typeof
-- ---------
-- text[]

To cast to another type use the :: operator:

select pg_typeof(array['thing']::varchar[]);


--      pg_typeof
-- -------------------
-- character varying[]

h/t Josh Branchaud

Terminating A Connection

Consider the scenario where you are trying to drop a database, but there are existing connections.

$ dropdb sample_db
dropdb: database removal failed: ERROR:  database "sample_db" is being accessed by other users
DETAIL:  There is 1 other session using the database.

If you don’t know where these connections are, you can terminate them within a psql session. You just have to figure out the pid of those connections. In List Connections To A Database, I explained how to get at the pid values of connections. Using the pid value and pg_terminate_backend(), you can terminate a connection.

> select pg_terminate_backend(12345);

To terminate all connections to a particular database, use a query like the following:

select pg_terminate_backend(
from pg_stat_activity
where pg_stat_activity.datname = 'sample_db'
  and pid <> pg_backend_pid();

This excludes the current session, so you’ll need to exit psql as well before dropping the database.


List Connections To A PostgreSQL Database

The pg_stat_activity table can be used to determine what connections there currently are to the PostgreSQL server and to a particular database. To see the process ids and usernames of all connection to your PostgreSQL server, run the following query:

> select pid, usename from pg_stat_activity;
  pid  |  usename
 57174 | jbranchaud
 83420 | jbranchaud

Include datname in the requested columns to figure out the database of each connection.

> select pid, usename, datname from pg_stat_activity;
  pid  |  usename   |  datname
 57174 | jbranchaud | hr_hotels
 83420 | jbranchaud | pgbyex

The results can be restricted to a particular database as necessary.

> select pid, usename from pg_stat_activity where datname = 'hr_hotels';
  pid  |  usename
 57174 | jbranchaud

Query Size of Postgres Array

Today I learned how to query the size of a Postgres array. There are (at least) two methods that work.

The first is array_length. This requires you to know the array dimension you seek; a useful filter, although most of the time this will probably be 1:

hr-til_development=# select title from posts
hr-til_development=# where array_length(slack_notified_at_likes_threshold, 1) = 1;
 Because Javascript
 Percent Notation
 DIY Grids for Designing UI in Illustrator
(3 rows)

When the dimension is 1, a more terse solution is cardinality:

hr-til_development=# select title from posts
hr-til_development=# where cardinality(slack_notified_at_likes_threshold) = 1;
 Because Javascript
 Percent Notation
 DIY Grids for Designing UI in Illustrator
(3 rows)


Default to Empty Array in Postgres

Today I added an array of integers to a Postgres table, and like all such migrations, it will default to null for each new record.

This was a problem, because I wanted to use Rails 4’s built-in Postgres array support to make decisions based on that data. Ruby array methods like include? will raise NoMethodError: undefined method 'include?' for nil:NilClass if that array is ever nil, which it is by default.

This led me to learn how to set the default value to an empty array using Postgres array literal syntax. I was then able to include a not null constraint as an added benefit:

# db/migrate/20160211043316_add_slack_notified.rb

def up
  execute <<-SQL
    alter table posts
      add column slack_notified integer[] not null default '{}';

Sets With The Values Command

You can concisely create sets of values in PostgreSQL using the values command.

> values (1), (2), (3);

You can even create multiple columns of values.

> values (1, 'a', true), (2, 'b', false);
 column1 | column2 | column3
       1 | a       | t
       2 | b       | f

This is most often used with an insert command, but can be used on its own, as a subquery, within a CTE, etc.


Listing All Rows In A Table

In PostgreSQL, perhaps the more common way to list all rows in a table is with the following select command:

select * from bedding_types;

There is an alternative approach that also selects all rows from a table. It’s essentially a shorthand — the table command.

> table bedding_types;
 No Bed
 1 Full
 1 Double
 2 Double
 1 Twin
 2 Twins
 1 Queen
 2 Queen
 1 King
 2 Kings
 3 Kings
 Sofa Bed

h/t Jack Christensen

Be aware! Postgres rounds.

Yesterday, my pair and I created a test that calculated a value and compared that to the value of a calculation in the code we were testing. This worked out great except for one hitch, we were asserting about the derived value after it had been inserted into the database. What we didn’t count on is that Postgres rounds. Check this out:

create table money (amount numeric(4, 2));
insert into money (amount) values (10.342) returning amount;

insert into money (amount) values (10.347) returning amount;

Postgres rounds!

Postgres permissions to insert, but not return.

Postgres permissions can be harsh. If you don’t have the select permission on a table you might not be able to do some very rails like things that you think you ought to be able to do, take my friend Jimmy.

set role 'admin';
create role 'jimmy';
create table fruits (id serial primary key, name varchar);
grant insert on fruits to jimmy;
grant usage on fruits_id_seq to jimmy;

Given these permissions it’s possible for Jimmy to make inserts into the fruit table like so:

set role 'jimmy';
insert into fruits (name) values ('Apple');

But Rails wants a little more, it wants to know the id of the thing that was just created which is a problem because Jimmy doesn’t have select permissions.

set role 'jimmy';
insert into fruits (name) values ('Orange') returning id;
ERROR:  permission denied for relation fruits

Argh this complicates matters, but I’ll relent and give Jimmy the appropriate permissions so that he can add records through ActiveRecord.

set role 'admin';
grant select on fruits to jimmy;
set role 'jimmy';
insert into fruits (name) values ('orange') returning id;

Add value to Postgres enum type, dropping tho...

An enum type in Postgres is a type that can restrict a value to only certain values. It can be defined like this:

create type fruit as Enum ('orange', 'apple');

We can now cast a string to a fruit.

chriserin=# select 'orange'::fruit;

Well… some strings…

chriserin=# select 'rasberry'::fruit;
ERROR:  invalid input value for enum fruit: "rasberry"

It’s all good! We can add rasberry to the enum type.

chriserin=# alter type fruit add value 'rasberry';
chriserin=# select 'rasberry'::fruit;

Postgres allows you to add values to an enum type, but you can’t drop values which is a bit of an inconvenience when creating up/down db migrations like you do in Rails or with a tool like tern.

Set A Seed For The Random Number Generator

In PostgreSQL, the internal seed for the random number generator is a run-time configuration parameter. This seed parameter can be set to a particular seed in order to get some determinism from functions that utilize the random number generator. The seed needs to be something between 0 and 1.

We can see this in action by setting the seed and then invoking random() a couple times. Doing this twice, we will see the reproducibility we can achieve with a seed.

> set seed to 0.1234;

> select random();

> select random();
(1 row)

> set seed to 0.1234;

> select random();

> select random();

The seed can also be configured with the setseed() function.

See the PostgreSQL docs for more details.

Shuffle an array in postgres

Somtimes an array is just too sequential

select ARRAY[1,2,3,4,5];
chriserin=# select ARRAY[1,2,3,4,5];
(1 row)

I want to shuffle this array into an unpredictable mess of numbers, and one way to do this in PostgreSQL is to turn these array elements into rows, order them randomly and then squash the rows back into an array like so:

chriserin=# select array_agg(foo.x) from (select unnest(ARRAY[1,2,3,4,5]) x order by random()) foo;
(1 row)

The key here is the order by random() clause, which is a special postgres case. order by 0.284325 which is a number that random() could return, is not valid sql. There are dragons in order by, read here for more info

Pretty Print Data Sizes

Use the pg_size_pretty() function to pretty print the sizes of data in PostgreSQL. Given a bigint, it will determine the most human-readable format with which to print the value:

> select pg_size_pretty(1234::bigint);
 1234 bytes

> select pg_size_pretty(123456::bigint);
 121 kB

> select pg_size_pretty(1234567899::bigint);
 1177 MB

> select pg_size_pretty(12345678999::bigint);
 11 GB

This function is particularly useful when used with the pg_database_size() and pg_relation_size() functions.

> select pg_size_pretty(pg_database_size('hr_hotels'));
 12 MB

Get The Size Of A Table

In Get The Size Of A Database, I showed a PostgreSQL administrative function, pg_database_size(), that gets the size of a given database. With the pg_relation_size() function, we can get the size of a given table. For instance, if we’d like to see the size of the reservations table, we can executing the following query:

> select pg_relation_size('reservations');

This gives us the size of the reservations table in bytes. As you might expect, the referenced table needs to be part of the connected database and on the search path.

See the Postgres docs for more details.

array indexes in postgres

Lets say I have an array of fruit types (Apple, Pear, Banana, Tangerine) and I want to turn each element in the array into a row I can do this.

select unnest(fruits) 
from (select Array['Apple', 'Pear', 'Tangerine', 'Banana'] fruits) as list;

which returns:

(4 rows)

But it would be nice to have an array index as column in case the order of the elements in this array had some meaning. Postgres has a function generate_subscripts to help you with array index elements.

select generate_subscripts(fruits, 1), unnest(fruits) from (select Array['Apple', 'Pear', 'Tangerine', 'Banana'] fruits) as list;

which returns:

 generate_subscripts |  unnest
                   1 | Apple
                   2 | Pear
                   3 | Tangerine
                   4 | Banana

Throw a minus 1 in there if you want to be base zero :).

Day Of Week By Name For A Date

In Day Of Week For A Date, I explained how to determine what day of the week a date is as an integer with PostgreSQL. This used the date_part() function. By using the to_char() function with a date or timestamp, we can determine the day of the week by name (e.g. Monday). For instance, to determine what day today is, try a statement like the following:

> select to_char(now(), 'Day');

The Day part of the second argument is just one of many template patterns that can be used for formatting dates and times.

See Data Type Formatting Functions in the Postgres docs for more details.

pg Prefix Is Reserved For System Schemas

Have you ever tried to create a schema with pg_ as the first part of the name of the schema? If so, you probably didn’t get very far. Postgres won’t let you do that. It reserves the pg_ prefix for system schemas. If you try to create a schema in this way, you’ll get an unacceptable schema name error.

> create schema pg_cannot_do_this;
ERROR:  unacceptable schema name "pg_cannot_do_this"
DETAIL:  The prefix "pg_" is reserved for system schemas.

Compute Hashes With pgcrypto

The pgcrypto extension that comes with PostgreSQL adds access to some general hashing functions. Included are md5, sha1, sha224, sha256, sha384 and sha512. Any of these hashing functions can be applied to an arbitrary string using the digest function. Here are example of the md5 and sha1 algorithms:

> create extension pgcrypto;

> select digest('Hello, World!', 'md5');

> select digest('Hello, World!', 'sha1');

See the pgcrypto docs for more details.

Custom order #Postgres #SQL union queries

Consider the following data:

insert into users (email, first, last) values ('', 'New', 'User');
insert into users (email, first, last) values ('', 'Lemmy', 'Kilmister');
insert into users (email, first, last) values ('', 'Lemmy', 'Kilmister');

Assuming you want to match a user to the following data:

{ email: '', first: 'New', last: 'User' }

And you consider email a better indication of a match, and first and last name are a fallback. The following query will not yield the desired results despite the `where` conditions being ordered per the requirements:

select * from users where email = ‘’ or (first = ‘New’ and last = ‘User’) limit 1;

This will result in:

1 | ‘’ | ‘New’ | ‘User’

To find by email first we will need to union two queries, rank the match then order by rank:

(select 1 as rank, id, email, first, last from users where email = ‘’) union (select 2 as rank, id, email, first, last from users where first = ‘New’ and last = ‘User’) order by rank limit 1;

This will yield the correct result:

3 | ‘’ | ‘Lemmy’ | ‘Kilmister’

Creating Conditional Constraints in PostgreSQL

There are times when it doesn’t make sense for a constraint to apply to all records in a table. For instance, if we have a table of pokemon, we may only want to apply a unique index constraint to the names of non-wild pokemon. This can be achieved in PostgreSQL with the following conditional constraint:

create unique index pokemons_names on pokemons (names)
where wild = false;

If we try to insert a non-wild pokemon with a duplicate name, we will get an error. Likewise, if we try to update a pokemon with a duplicate name from wild to non-wild, we will get an error.


Sleeping In PostgreSQL

Generally you want your SQL statements to run against your database as quickly as possible. For those times when you are doing some sort of debugging or just want your queries to look very computationally expensive, PostgreSQL offers the pg_sleep function.

To sleep for 5 seconds, try the following:

> select now(); select pg_sleep(5); select now();
 2016-01-08 16:30:21.251081-06
(1 row)

Time: 0.274 ms

(1 row)

Time: 5001.459 ms
 2016-01-08 16:30:26.252953-06
(1 row)

Time: 0.260 ms

As you’ll notice, the pg_sleep statement took about 5 seconds.


Is It Null Or Not Null?

In PostgreSQL, the standard way to check if something is NULL is like so:

select * as wild_pokemons from pokemons where trainer_id is null;

To check if something is not null, you just add not:

select * as captured_pokemons from pokemons where trainer_id is not null;

PostgreSQL also comes with ISNULL and NOTNULL which are non-standard ways of doing the same as above:

select * as wild_pokemons from pokemons where trainer_id isnull;
select * as captured_pokemons from pokemons where trainer_id notnull;

Restarting Sequences When Truncating Tables

PostgreSQL’s truncate feature is a handy way to clear out all the data from a table. If you use truncate on a table that has a serial primary key, you may notice that subsequent insertions keep counting up from where you left off. This is because the sequence the table is using hasn’t been restarted. Sure, you can restart it manually or you can tell truncate to do it for you. By appending restart identity to the end of a truncate statement, Postgres will make sure to restart any associated sequences at 1.

truncate pokemons, trainers, pokemons_trainers restart identity;

Postgres, varchar, and silent white space trimming

Given this table:

# create table initials (monogram varchar(2) primary key);

Insert a three character monogram.

# insert into initials(monogram) values ('BPD');
ERROR:  value too long for type character varying(2)

Ok, cool. Thanks Postgres. But what if my string ends in a space?

# insert into initials(monogram) values ('BP ');

What? This caused me great pain when trying to do an insert. Here in my example I’ve used a CTE for brevity, but imagine you can’t see the input table, and it contains thousands of records. My insert went something like this:

with input as (
        select 'BD '::text as data
insert into initials(monogram)
select data from input where not exists (select 1 from initials where monogram =
ERROR:  duplicate key value violates unique constraint "initials_pkey"
DETAIL:  Key (monogram)=(BD) already exists.

A spot check based on the error message turns up no results:

select count(*) from input where data = 'BD';
(1 row)

But of course 'BD ' = 'BD', if you keep in mind that Postgres will silently truncate trailing whitespace when casting to a varchar.

Postgres Arrays are One-Based

Today I was reminded that Postgres arrays are one-based. This realization came while using Ruby’s times method (zero-based index) to generate SQL targeting a Postgres array (one-based index).

Take this table:

 name  |      pay_by_quarter
 Bill  | {10000,10000,10000,10000}
 Carol | {20000,25000,25000,25000}
(2 rows)

To select the first-quarter pay for all employees, use index 1.

my_database=# select pay_by_quarter[1] from sal_emp;
(2 rows)

Using your first Ruby index value 0 to produce pay_by_quarter[0] returns a sad, empty column.

PostgreSQL's Max Identifier Length Is 63 Bytes

In PostgreSQL, identifiers — table names, column names, constraint names, etc. — are limited to a maximum length of 63 bytes. Identifiers longer than 63 characters can be used, but they will be truncated to the allowed length of 63.

> alter table articles
    add constraint this_constraint_is_going_to_be_longer_than_sixty_three_characters_id_idx
    check (char_length(title) > 0);
NOTICE:  identifier "this_constraint_is_going_to_be_longer_than_sixty_three_characters_id_idx" will be truncated to "this_constraint_is_going_to_be_longer_than_sixty_three_characte"

Postgres warns us of identifiers longer than 63 characters, informing us of what they will be truncated to. It then proceeds to create the identifier.

If postgres is trying to generate an identifier for us - say, for a foreign key constraint - and that identifier is longer than 63 characters, postgres will truncate the identifier somewhere in the middle so as to maintain the convention of terminating with, for example, _fkey.

The 63 byte limit is not arbitrary. It comes from NAMEDATALEN - 1. By default NAMEDATALEN is 64. If need be, this value can be modified in the Postgres source. Yay, open-source database implementations.

See the postgres docs for more details.

Adding Composite Uniqueness Constraints (Postgres)

There are two ways in Postgres to create a composite uniqueness constraint; that is, a constraint that ensures that the combination of two or more values on a table only appear once. For the following two code snippets, assume that we have a table relating Pokemon and Trainers and that our domain restricts each Trainer to only having at most one of each Pokemon.

The first approach is to create a constraint directly on the table:

alter table pokemons_trainers
  add constraint pokemons_trainers_pokemon_id_trainer_id_key
  unique (pokemon_id, trainer_id);

The second approach is to create a unique index:

create unique index pokemons_trainers_pokemon_id_trainer_id_idx
  on pokemons_trainers (pokemon_id, trainer_id);

Export Query Results To A CSV

Digging through the results of queries in Postgres’s psql is great if you are a programmer, but eventually someone without the skills or access may need to check out that data. Exporting the results of a query to CSV is a friendly way to share said results because most people will have a program on their computer that can read a CSV file.

For example, exporting all your pokemon to /tmp/pokemon_dump.csv can be accomplished with:

copy (select * from pokemons) to '/tmp/pokemon_dump.csv' csv;

Because we are grabbing the entire table, we can just specify the table name instead of using a subquery:

copy pokemons to '/tmp/pokemon_dump.csv' csv;

Include the column names as headers to the CSV file with the header keyword:

copy (select * from pokemons) to '/tmp/pokemon_dump.csv' csv header;


Know your HISTFILE

Today I learned that psql has a cool feature called the HISTFILE. It’s a record of your commands in the REPL, named ~/.psql_history by default.

Building off Josh’s post yesterday about ~/.psqlrc settings, add this there to get a unique log for each database:

\set HISTFILE ~/.psql_history-:DBNAME

Here’s the tail of my TIL log:

$ tail ~./psql_history-hr-til_development
select * from developers limit 1;
select * from developers limit 2;
select * from developers limit 3;

This is just a text file; it can be edited. You could add a complex command, or meaningful reminder to yourself, and it will be available in psql when you traverse your history (↑ or CTRL-P).

Use A psqlrc File For Common Settings

There are a handful of settings that I inevitably turn on or configure each time I open up a psql session. I can save myself a little time and sanity by configuring these things in a .psqlrc dotfile that is located in my home directory. This will ensure my psql session is configured just how I like it each time I launch it. Here is what my ~/.psqlrc file currently looks like:

\x auto
\pset null 'Ø'

Create A Composite Primary Key

The unique identifier for a given row in a table is the primary key. Generally, a row can be uniquely identified by a single data point (such as an id), so the primary key is simply that single data point. In some cases, your data can be more appropriately uniquely identified by multiple values. This is where composite primary keys can lend a hand. Consider an example plane_tickets table where each ticket can be uniquely identified by the passenger and flight it is associated with:

create table plane_tickets (
  passenger_id integer references passengers not null,
  flight_id integer references flights not null,
  confirmation_number varchar(6) not null,
  seat_assignment varchar not null,
  primary key (passenger_id, flight_id)

Truncate Tables With Dependents

In Truncate All Rows, I talked about how postgres’s truncate can be used to quickly delete all rows in a table. In practice this alone won’t be very useful though, because tables usually have other tables that depend on them via foreign keys. If you have tables A and B where B has a foreign key referencing A, then trying to truncate A will result in something like this:

> truncate A;
ERROR:  cannot truncate a table referenced in a foreign key constraint

Fortunately, truncate has some tricks up its sleeve.

If you know two tables are tied together via a foreign key constraint, you can just truncate both of them at once:

> truncate A, B;

If many tables are tied together in this way and you are looking to throw all of it out, then a simpler approach is to cascade the truncation:

> truncate A cascade;
NOTICE:  truncate cascades to table "B"

Use these with care and potentially within transactions because your data will go bye bye.

h/t Dillon Hafer and Jack Christensen

Truncate All Rows

Given a postgres database, if you want to delete all rows in a table, you can use the DELETE query without any conditions.

> delete from pokemons;

Though DELETE can do the job, if you really are deleting all rows to clear out a table, you are better off using TRUNCATE. A TRUNCATE query will be faster than a DELETE query because it will just delete the rows without scanning them as it goes.

> truncate pokemons;


Insert Just The Defaults

If you are constructing an INSERT statement for a table whose required columns all have default values, you may just want to use the defaults. In this situation, you can break away from the standard:

> insert into table_name (column1, column2) values (value1, value2);

Instead, simply tell Postgres that you want it to use the default values:

> insert into table_name default values;

Generate A UUID In PostgreSQL

Postgres has support for universally unique identifiers (UUIDs) as a column data type via uuid. If you have a UUID column, you may need to generate a UUID. This requires the uuid-ossp module. This module provides a number of functions for generating UUIDs including the uuid_generate_v4() function which bases the UUID entirely off random numbers.

> create extension "uuid-ossp";
> select uuid_generate_v4();

See the Postgres docs for more details on UUID generation functions.

Getting A Slice Of An Array In PostgreSQL

Postgres has a very natural syntax for grabbing a slice of an array. You simply add brackets after the array declaring the lower and upper bounds of the slice separated by a colon.

> select (array[4,5,6,7,8,9])[2:4];

Notice that the bounds are inclusive, the array index is 1-based, and the array declaration above needs to be wrapped in parentheses in order to not trip up the array slice syntax.

You can also select rectangular slices from two dimensional arrays like so:

> select (array[[1,2,3],[4,5,6],[7,8,9]])[2:3][1:2];

Defining Arrays In PostgreSQL

In postgres, an array can be defined using the array syntax like so:

> select array['a','b','c'];

If you are inserting into an existing array column, you can use the array literal syntax.

> create temp table favorite_numbers(numbers integer[]);
> insert into favorite_numbers values( '{7,3,9}' );
> select numbers[2] from favorite_numbers;

Postgres also supports two-dimensional arrays.

select array[[1,2,3],[4,5,6],[7,8,9]] telephone;

Postgres Unlogged

Using a Postgres table for caching? You might want to try making it unlogged.

unlogged tables are not written to the write-ahead log, which makes them much faster. This also means they are not crash-safe, and are truncated when a crash or unclean shutdown occurs. For caching purposes, that’s likely to be an acceptable tradeoff.