Today I Learned

A Hashrocket project

148 posts about #sql

Show Create Statement For A Table In MySQL

In MySQL, you can get a quick rundown of a table using describe users. An alternative to this approach is to have MySQL show the create statement for a table.

> show create table users\G
*************************** 1. row ***************************
       Table: users
Create Table: CREATE TABLE `users` (
  `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `first_name` varchar(80) NOT NULL,
  `last_name` varchar(80) NOT NULL,
  `email` varchar(80) NOT NULL,
  `middle_initial` varchar(80) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique_email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

This includes some additional information like primary key and index information. It is also a great way to study the SQL that it takes to create all the facets of a table.

See the show create table docs for more details.

h/t Jake Worth

Escaping String Literals With Dollar Quoting

String literals in PostgreSQL are defined by surrounding the content with the ' character. For string literals that contain the ' character, you may have seen it escaped with a preceding '.

> select 'Isn''t this nice?';
     ?column?
------------------
 Isn't this nice?

This is easy enough to do, but can be error prone and doesn’t work well if SQL is being programmatically generated. A great workaround is to escape string literals using what is called dollar quoting.

> select $$Isn't this even nicer?$$;
        ?column?
------------------------
 Isn't this even nicer?

Just wrap both ends in $$ instead of '.

source

Capitalize All The Words In PostgreSQL

PostgreSQL provides the string function initcap() as a way of capitalizing all words. In the process, it cleans up the casing of the remaining parts of the words.

Here are some examples of how it works.

> select initcap('hello, world');
   initcap
--------------
 Hello, World

> select initcap('HELLO, WORLD');
   initcap
--------------
 Hello, World

See the String Functions and Operators docs for more details.

Difference between `union` and `union all` in psql

union is an operator that operates on two sets of results, combining them:

chris=# select 1 union select 2;
 ?column?
----------
        1
        2
(2 rows)

It filters duplicate results though:

chris=# select 1 union select 1;
 ?column?
----------
        1
(1 row)

If you want to not worry about duplicate results and just stack the result sets on top of each other use union all:

chris=# select 1 union all select 1;
 ?column?
----------
        1
        1
(2 rows)

Postgres Natural joins squash unneeded columns

Dealing mostly with Rails style databases we generally write inner joins on ids and foreign keys but there are other ways to create joins, the most fun being the natural join.

The below query works like a natural join but uses inner join syntax. When two tables having columns with the same name, join on that column. You can see that both columns have been returned even though they have the same value.

chris=# select * from (select 1 x) foo inner join (select 1 x) bar on bar.x = foo.x ;
 x | x
---+---
 1 | 1
(1 row)

When using the natural join syntax the query is less verbose and only returns one column. Postgres knows that they have the same value, thats the whole point! So only present one column.

chris=# select * from (select 1 x) foo natural join (select 1 x) bar;
 x
---
 1
(1 row)

Postgres `coalesce` errors with multiple types

The Postgres function coalesce takes a variable number of arguments and returns the first non-null argument.

psql> select coalesce(null, null, null, null, 'hi!');
hi!

But if you include a number and a string, then Postgres throws an error.

psql> select coalesce(1, 'hi!');
ERROR:  invalid input syntax for integer: "hi!"

Coalesce will try to coerce value to the type of the first argument if possible.

psql> select coalesce(1, '2');
1

Mysql behaves a bit differently, it allows values of different types.

mysql> select coalesce(1, 'hi!');
1

The postgres way makes a bit more sense to me. What is the type of a column if there are multiple possible types? Is it always a string? But I wanted a number? I feel like the additional type safety will help catch errors a bit earlier.

Operators to get a string or more json in Postgres

There are two operators in Postgres that I have trouble telling apart. The -> operator and the ->> operator. They both take a json object on the left and a key on the right, but result in different types.

Given json in postgres that looks like this:

> select '{"a": {"b": 2}}'::jsonb;
{"a": {"b": 2 }}

I can traverse the data structure with the -> operator because it returns a data structure.

> select '{"a": {"b": 2}}'::jsonb  -> 'a' -> 'b';
2

I can get a value as a string from the json with the ->> operator.

> select '{"a": {"b": 2}}'::jsonb ->> 'a';
{"b": 2 } -- ITS A STRING

But if you use the ->> operator to get a string, then you can’t traverse the data structure with the -> operator anymore.

> select '{"a": {"b": 2}}'::jsonb ->> 'a' -> 'b';
ERROR:  operator does not exist: text -> unkown

Show Tables That Match A Pattern In MySQL

An unfamiliar MySQL database with tons of tables can be a difficult thing to navigate. You may have an idea of the kind of table you are looking for based on a domain concept you’ve seen elsewhere.

You can pare down the results returned by show tables by including a like clause with a pattern. For example, this statement will show me only tables that have the word user in them:

> show tables like '%user%';
+-------------------------------+
| Tables_in_jbranchaud (%user%) |
+-------------------------------+
| admin_users                   |
| users                         |
+-------------------------------+

Dump A MySQL Database To A File

The mysqldump client is a handy tool for creating a backup or snapshot of a MySQL database. The standard use of this command produces an alphabetical series of statements that comprise the structure and data of the specified database. It directs all of this to stdout. You’ll likely want to redirect it to a file.

$ mysqldump my_database > my_database_backup.sql

The output will include special comments with MySQL directives that disable things like constraint checking. This is what allows the output to be in alphabetical order without necessarily violating any foreign key constraints.

If you need to dump multiple databases, include the --databases flag with a space-separated list of database names. Or dump all of them with --all-databases.

See man mysqldump for more details.

Display Output In A Vertical Format In MySQL

Output for tables with lots of columns can be hard to read and sometimes overflow the terminal window. Consider the output from Show Indexes For A Table:

> show indexes in users;
+-------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name     | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| users |          0 | PRIMARY      |            1 | id          | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| users |          0 | unique_email |            1 | email       | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

We can vertically orient the output of a statement by terminating it with \G instead of ; (or \g).

> show indexes in users\G
*************************** 1. row ***************************
        Table: users
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: id
    Collation: A
  Cardinality: 0
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
*************************** 2. row ***************************
        Table: users
   Non_unique: 0
     Key_name: unique_email
 Seq_in_index: 1
  Column_name: email
    Collation: A
  Cardinality: 0
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:

Show Indexes For A Table In MySQL

When describing a table, such as users:

> describe users;
+------------+-----------------------+------+-----+---------+----------------+
| Field      | Type                  | Null | Key | Default | Extra          |
+------------+-----------------------+------+-----+---------+----------------+
| id         | mediumint(8) unsigned | NO   | PRI | NULL    | auto_increment |
| first_name | varchar(80)           | NO   |     | NULL    |                |
| last_name  | varchar(80)           | NO   |     | NULL    |                |
| email      | varchar(80)           | NO   | UNI | NULL    |                |
+------------+-----------------------+------+-----+---------+----------------+

We can see in the Key column that there’s a primary key and a unique key for this table on id and email, respectively.

These keys are indexes. To get more details about each of the indexes on this table, we can use the show indexes command.

> show indexes in users;
+-------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| Table | Non_unique | Key_name     | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type |
+-------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| users |          0 | PRIMARY      |            1 | id          | A         |           0 |     NULL | NULL   |      | BTREE      |
| users |          0 | unique_email |            1 | email       | A         |           0 |     NULL | NULL   |      | BTREE      |
+-------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+

List Databases And Tables In MySQL

If you’ve started a mysql session, but haven’t connected to a particular database yet, you can list the available databases like so:

> show databases;
+-----------------------------+
| Database                    |
+-----------------------------+
| information_schema          |
| my_app_dev                  |
+-----------------------------+

If you are curious about the tables in a particular database, you can list them by specifying the database’s name:

> show tables in my_app_dev;
+------------------------------+
| Tables_in_my_app_dev         |
+------------------------------+
| pokemons                     |
| trainers                     |
+------------------------------+

Alternatively, you can connect to the database of interest and then there is no need to specify the name of the database going forward.

> use my_app_dev;
> show tables;
+------------------------------+
| Tables_in_my_app_dev         |
+------------------------------+
| pokemons                     |
| trainers                     |
+------------------------------+

`any` and `all` special Postgres constructs

To determine if a value exists in a subquery or scalar value list you can use the in operator like:

> select 1 in (1, 2)
true

any and all go a step further by allowing for an operator in the special construct which is used against each value in the subquery or array (no scalar value list for these constructs).

any examples:

> select 1 = any(Array[1, 2, 3])
true
> select 1 > any(Array[1, 2, 3])
false

all examples

> select 17 <> any(Array[2, 4, 8])
true
> select 17 < any(Array[2, 4, 8])
false

some is a synonym for any.

Set foreign key to null on delete in #Postgres

Say you have a table with a foreign key:

posts
------
id serial primary key
...
primary_image_id references images (id)

images
------
id serial primary key
post_id references posts (id)

If you attempted to delete an image from the images table while that image’s ID is referenced in one of the posts you will receive an error preventing you from deleting the image. Postgres is trying to protect your data integrity by preventing foreign keys that point to records which don’t exist.

To prevent errors and allow deleting records from the images table freely you must define the on delete strategy on the posts table.

One of the options available to you is set null so if you are creating a new table it will look like this:

create table posts (
  id int serial primary key,
  -- ...
  primary_image_id int references images (id) on delete set null
);

Now if the primary image is deleted it will set the primary_image_id to null.

This an alternative to on delete cascade which in this case will delete the post from the posts table and is not what we want.

Read the full documentation under ‘5.3.5. Foreign Keys’

Compute group proportions of a whole in SQL

It’s easy with SQL to use aggregates to count members of a group. But what if you also want the proportion that group is of the whole. The answer is to use a window function combined with the aggregate. The window function is evaluated after the normal aggregation. i.e. it can reference the values of other post-aggregation rows.

select type, count(*), count(*)::numeric / sum(count(*)) over ()
from pets
group by type;
  type  | count |        ?column?        
--------+-------+------------------------
 dog    |    13 | 0.52000000000000000000
 rabbit |     3 | 0.12000000000000000000
 cat    |     9 | 0.36000000000000000000

Pretty Printing JSONB Rows in PostgreSQL

Who needs a document store when you can just use PostgreSQL’s JSONB data type? Viewing rows of JSONB output can be challenging though because it defaults to printing them as a single line of text.

> select '{"what": "is this", "nested": {"items 1": "are the best", "items 2": [1, 2, 3]}}'::jsonb;
                                      jsonb
----------------------------------------------------------------------------------
 {"what": "is this", "nested": {"items 1": "are the best", "items 2": [1, 2, 3]}}
(1 row)

Fortunately, Postgres comes with a function for prettying up the format of the output of these rows — jsonb_pretty

> select jsonb_pretty('{"what": "is this", "nested": {"items 1": "are the best", "items 2": [1, 2, 3]}}'::jsonb);
            jsonb_pretty
------------------------------------
 {                                 +
     "what": "is this",            +
     "nested": {                   +
         "items 1": "are the best",+
         "items 2": [              +
             1,                    +
             2,                    +
             3                     +
         ]                         +
     }                             +
 }
(1 row)

h/t Jack Christensen

Write A Query Result To File With Postgres

Generally when writing a query in psql a statement will be terminated with a semicolon. An alternative approach is to end it with a \g instead. This will also send the query to the Postgres server for execution.

select 1 \g

If a filename is included after the \g, then the result of the query will be written to that file instead of output to the psql session.

> select 1, 2, 3 \g query_result.txt

If we cat that file, we can see the query result.

Time: 4.293 ms
> \! cat query_result.txt
 ?column? | ?column? | ?column?
----------+----------+----------
        1 |        2 |        3
(1 row)

See man psql for more details.

Age in Postgres

On this day every year I get curious about how old I actually am. Postgres has some utilities to help me out with that, like subtraction of dates.

> select now() - '1978-3-24';
14245 days 17:39:55.623866

Ah yes, 14245 days on this earth, it really makes you think. What got returned anyways, that looks weird:

> select pg_typeof(now() - '1978-3-24');
interval

Got it, that’s an interval. It doesn’t really tell me much, is this interval infinite?

> select isfinite(now() - '1978-3-24');
t

I can squint and read isfinite as infinite and suddenly my outlook changes, but only momentarily.

Postgres, just tell me how many years I’ve been alive ok? None of this days stuff.

> select age('1978-3-24'::date);
39 years

Create jsonb data structure from columns

You can create a jsonb object by casting a string version of the object to jsonb:

> select '{"a": 1, "b": 2}'::jsonb;
      jsonb
------------------
 {"a": 1, "b": 2}

When using values from columns to create a jsonb object its not practical to construct a string. In this case you want to use jsonb_build_object:

> select jsonb_build_object('a', a.x) from (select x from generate_series(1, 10) as x) as a;
 jsonb_build_object
--------------------
 {"a": 1}
 {"a": 2}
 {"a": 3}
 {"a": 4}
 {"a": 5}
 {"a": 6}
 {"a": 7}
 {"a": 8}
 {"a": 9}
 {"a": 10}

jsonb_build_object is a variatic function and takes a variable number of arguments. So you can add additional values like this:

jsonb_build_object('a', 1, 'b', 2)

Prepare, Execute, and Deallocate Statements

In PostgreSQL, you can prepare a named statement to be executed later using prepare.

> prepare column_names (text) as
    select column_name from information_schema.columns where table_name = $1;
PREPARE

These statements are kept around for the duration of the session. To see the available statements, check out the pg_prepared_statements view.

> select * from pg_prepared_statements;
     name     |                                  statement                                  |         prepare_time          | parameter_types | from_sql
--------------+-----------------------------------------------------------------------------+-------------------------------+-----------------+----------
 column_names | prepare column_names (text) as                                             +| 2017-03-10 15:01:09.154528-06 | {text}          | t
              |   select column_name from information_schema.columns where table_name = $1; |                               |                 |

To run a prepared statement, use execute with the name of the statement and any arguments.

> execute column_names('users');
   column_name
-----------------
 id
 email
 password_digest
 created_at
 updated_at
 first_name
 last_name

You can also delete a statement with deallocate if you’d like.

> deallocate column_names;
DEALLOCATE

New PostgreSQL 9.6 slice syntax

PostgreSQL arrays support slice operations.

jack=# select ('{a,b,c,d,e,f}'::text[])[2:4];
  text
---------
 {b,c,d}

New in PostgreSQL 9.6 is slice syntax with implicit lower and upper bounds.

jack=# select ('{a,b,c,d,e,f}'::text[])[:4];
   text
-----------
 {a,b,c,d}
jack=# select ('{a,b,c,d,e,f}'::text[])[3:];
   text
-----------
 {c,d,e,f}

Previously, array_lower and array_upper was needed to slice from a particular index to the bounds of an array.

Between Symmetric in PostgreSQL

PostgreSQL’s between construct allows you to make a comparison between two values (numbers, timestamps, etc.).

> select * from generate_series(1,10) as numbers(a)
    where numbers.a between 3 and 6;
 a
---
 3
 4
 5
 6

If you supply an empty range by using the larger of the two values first, an empty set will result.

> select * from generate_series(1,10) as numbers(a)
    where numbers.a between 6 and 3;
 a
---

Tacking symmetric onto the between construct is one way to avoid this issue.

> select * from generate_series(1,10) as numbers(a)
    where numbers.a between symmetric 6 and 3;
 a
---
 3
 4
 5
 6

BETWEEN SYMMETRIC is the same as BETWEEN except there is no requirement that the argument to the left of AND be less than or equal to the argument on the right. If it is not, those two arguments are automatically swapped, so that a nonempty range is always implied.

group by 1, 2, 3 🔢

Postgres allow group by and order by to reference column order in a sql statement. This is particularly useful when an aggregate result needs to be referenced in the group by or order by statement.

-- group by aggregate
select
  (created_at at time zone 'UTC' at time zone 'America/Chicago')::date,
  count(*)
from posts
group by (created_at at time zone 'UTC' at time zone 'America/Chicago')::date
order by (created_at at time zone 'UTC' at time zone 'America/Chicago')::date
;

becomes

-- group by 1, 2, 3
select
  (created_at at time zone 'UTC' at time zone 'America/Chicago')::date,
  count(*)
from posts
group by 1
order by 1
;

Generating UUIDs With pgcrypto

If you check out the docs for the uuid-ossp extension, you’ll come across the following message.

The OSSP UUID library… is not well maintained, and is becoming increasingly difficult to port to newer platforms.

A little bit later, it says:

If you only need randomly-generated (version 4) UUIDs, consider using the gen_random_uuid() function from the pgcrypto module instead.

So, if we are using the UUID data type and only need to generate random UUIDs, we can rely on the pgcrypto extension. It comes with the gen_random_uuid() function which generates random v4 UUIDs.

> create extension "pgcrypto";
CREATE EXTENSION

> select gen_random_uuid();
           gen_random_uuid
--------------------------------------
 0a557c31-0632-4d3e-a349-e0adefb66a69

> select gen_random_uuid();
           gen_random_uuid
--------------------------------------
 83cdd678-8198-4d56-935d-d052f2e9db37

Postgres Intersect

Today I got to see some great SQL debugging that included the use of intersect to compare two datasets.

intersect computes the set intersection of the rows returned by the involved select statements.

Here’s a simple implementation:

dev=# select generate_series(1,1000) intersect 
  select generate_series(10,15);
 generate_series
-----------------
              11
              12
              10
              15
              13
              14

Like a lot of things in Postgres, the order seems deterministic, but not predictable.

Docs

PostgreSQL "RETURNING"

Today I learned that PostgreSQL INSERT/UPDATE/DELETE has a RETURNING clause that returns the computed values.

We can use it to get values that are generated by database such as sequences or defaults.

Check this out:

DROP TABLE IF EXISTS users;

CREATE TABLE users (
  id    SERIAL PRIMARY KEY,
  email VARCHAR NOT NULL UNIQUE
);

INSERT INTO users (email) VALUES ('user1@example.com');
-- INSERT 0 1

INSERT INTO users (email) VALUES ('user2@example.com') RETURNING *;
--  id |       email
-- ----+-------------------
--   2 | user2@example.com
-- (1 row)

h/t @joshuadavey

Aggregate Expressions with FILTER in Postgresql

Today I learned to use FILTER on aggregate expressions such as COUNT in postgresql.

See this example:

SELECT u.login,
       COUNT(1) FILTER (WHERE r.language ilike 'ruby') as ruby,
       COUNT(1) FILTER (WHERE r.language ilike 'javascript') as js,
       COUNT(1) FILTER (WHERE r.language ilike 'elixir') as elixir,
       COUNT(1) as all_langs
FROM users u
LEFT JOIN repositories r ON (u.id = r.user_id)
GROUP BY u.id;

-- login | ruby | js | elixir | all_langs
---------+------+----+--------+------------
-- bill  |    5 |  2 |      3 |         15
-- karen |    2 |  7 |      4 |         19
-- bob   |    9 |  1 |      2 |         23

h/t @joshuadavey

Migrate from MySQL to Postgres

Today I wanted to see if a large client app would see any performance improvement using Postgres vs. MySQL.

Luckily there is a great tool called pgloader that can quickly scan and import a MySQL DB right into Postgres.

Homebrew FTW!

$ brew install pgloader

Create an import script to define the connection strings to each of your databases and configuration the import options you want.

-- import_mysql.load
LOAD DATABASE
FROM mysql://root@localhost/db_name
INTO postgresql://localhost/db_name

 WITH include drop, create tables, no truncate,
      create indexes, reset sequences, foreign keys;

Run the import:

$ pgloader import_mysql.load

If you get this error:

An unhandled error condition has been signalled:
   MySQL Error [1055]: "Expression #6 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'information_schema.rc.UPDATE_RULE' which is not functionally dependent on columns in GROUP
 BY clause; this is incompatible with sql_mode=only_full_group_by"

Then create a MySQL configuration file (if you don’t have one already) and open it:

$ sudo cp $(brew --prefix mysql)/support-files/my-default.cnf /etc/my.cnf
$ sudo vim /etc/my.cnf

Make sure ONLY_FULL_GROUP_BY is not in the mode list:

# /etc/my.cnf
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

Now restart MySQL and try to import with pgloader again.

Two swings at guassian distribution in Postgres

I’ve come across 2 ways to replicate a bell curve.

This is adding 3 random numbers together.

select 
floor(random()*3)+floor(random()*3)+floor(random()*3) as y, 
count(*) 
from generate_series(1,10000000) group by y order by y;
 y |  count
---+---------
 0 |  369905
 1 | 1111997
 2 | 2222763
 3 | 2593306
 4 | 2220452
 5 | 1111506
 6 |  370071
(7 rows)

Time: 18806.885 ms

Consistent distribution, but 18 seconds for 10 million rows!

The second method has a tail end of the curve that keeps going even as it tapers to 0. This uses the normal_rand function provided by postgres with arguments of quantity, median, and standard deviation.

select  x, count(*) 
from 
(select abs(ceil(normal_rand(10000000,2,1))) as x) as y 
group by x order by count;
 x |  count
---+---------
 0 |  214152
 1 | 1373200
 2 | 3414028
 3 | 3411424
 4 | 1359447
 5 |  214194
 6 |   13234
 7 |     319
 8 |       2
(9 rows)

Time: 3221.894 ms

I want the peak of the curve around 2 and 3 but for this method of Gaussian distribution I get results below 0 unless I use abs. And its fast, 3 seconds!

Ever so slightly faster intervals in postgres

When working with dates in postgres I often construct an interval like so:

> select (2 || ' days')::interval;
2 days

Which is fine! But there’s an alternative that’s ever so slightly faster.

> select 2 * '1 day'::interval;
2 days

The time difference can be seen over large quantities of data, like 10 million.

> select (x || ' days')::interval from generate_series(1, 10000000) as x;
...
Time: 10437.314 ms
 > select x * '1 days'::interval from generate_series(1, 10000000) as x;
...
Time: 7831.568 ms

Microbenchmarking in action! Remember, take all microbenchmarks with a grain of salt.

Postgres age function #postgresql

If you want to select records according to a specific interval, like Rails’ ActiveSupport 1.year.ago PostgreSQL has you covered.

The age function returns an interval type and can be used in queries like so:

select * from sometbl where age(created_at) > '1 year';

By default the age function will use the current system time to calculate the age. If you want to calculate the age relative to a different time you can simply pass in another argument:

psql> select age(timestamp '2016-08-28', timestamp '1957-06-13');
           age
-------------------------
 59 years 2 mons 15 days
(1 row)

You can also use the make_interval function to create intervals using numeric parameters:

psql> select make_interval(1,2,3,4,5);
         make_interval
--------------------------------
 1 year 2 mons 25 days 05:00:00
 (1 row)

In a query it can be used like so:

select * from sometbl where age(created_at) > make_interval(1);

to select rows with created_at older than one year from now.

Read more about the age function and other cool date/time manipulations see the official documentation.

h/t Jack Christensen

Psql Command with Tuples Only

With psql, we can run any string command once and exit using the -c (command) flag. Here’s an example:

$ psql hr-til_development -c 'select title, slug from posts limit 5';
                  title                   |    slug
------------------------------------------+------------
 Start rails server in production mode    | 97e26f5f68
 Enable ScreenSharing from a terminal     | 55dfe1a1a3
 Variable Hoisting in JavaScript          | eeedb8dda0
 Three ways to compile Elixir             | f2a423d1e1
 Indent and de-indent line in INSERT MODE | 8de0fb5d93
(5 rows)

If you are parsing the result, the headers can get in the way. This is where the -t (tuples only) flag is useful. Same query:

$ psql hr-til_development -tc 'select title, slug from posts limit 5';
 Start rails server in production mode    | 97e26f5f68
 Enable ScreenSharing from a terminal     | 55dfe1a1a3
 Variable Hoisting in JavaScript          | eeedb8dda0
 Three ways to compile Elixir             | f2a423d1e1
 Indent and de-indent line in INSERT MODE | 8de0fb5d93

Notice the headers and row count are gone. Toggle this in a psql session with the \t command.

Grep your Database 🔎

Wading through a database full of text (as might support a CMS) and feeling lost? Sometimes you gotta ditch the queries and just start grepping.

This Stack Overflow post sums up the idea. Here’s a version of the command we used today to pick through a text-rich database:

$ curl `heroku pg:backups public-url -rproduction` | pg_restore | egrep -o '[a-z]+\.chicagocatlounge.biz'

This found, via a regex match, all subdomains of our target URL, in any text field of the production database. Pretty awesome.

Treat null as if it is a known value #postgresql

When you query a table sometimes you want to check if a nullable field is not equal to a value. For example:

select * from sometable where afield != 'avalue';

However the query above will exclude rows where afield is null, so you would typically add that as an additional condition:

select * from sometable where afield is null or afield != 'avalue';

When you are doing it once it may be ok but as queries get bigger this makes the query messy and harder to read. Fortunately Postgres offers a more idiomatic way to check if a value does not equal something, including null values: is distinct from and is not distinct from.

select * from sometable where afield is distinct from 'avalue';

This query will return all the rows where afield is null or anything but avalue. Conversely:

select * from sometable where afield is NOT distinct from (select x from y limit 1);

will return all the values that are equal to the result of the subquery above and is useful when the result of the subquery could be null.

h/t Jack Christensen

Original docs: https://wiki.postgresql.org/wiki/Is_distinct_from

`NULLIF` in Postgres

If there multiple values in a database column that both mean the the column for that row has no data, then the NULLIF function can be used in queries to normalize the values in that column.

Null char is 😵

chriserin=# \pset null 😵
Null display is "😵".
chriserin=# select nullif('NOTHING', 'NOTHING');
 nullif
--------
 😵
(1 row)

chriserin=# select nullif('something', 'NOTHING');
  nullif
-----------
 something
(1 row)

Nulls Last, Nulls First

Postgres query sorting has a great feature,nulls first/nulls last, that helps us specify where in a query result null values appear.

Nulls in Postgres are sorted as larger than any other value, so this is a descending sort containing null values (💀 == null):

example=# select * from users order by full_name desc;
  full_name
--------------
 💀
 💀
 💀
 Tracy Jordan
 Liz Lemon
 Jack Donaghy

Reverse the null position with nulls last:

 example=# select * from users order by full_name desc nulls last;
  full_name
--------------
 Tracy Jordan
 Liz Lemon
 Jack Donaghy
 💀
 💀
 💀
(6 rows)

Here’s the opposite query, ascending:

example=# select * from users order by full_name asc;
  full_name
--------------
 Jack Donaghy
 Liz Lemon
 Tracy Jordan
 💀
 💀
 💀
(6 rows)

Now, reverse the null position with nulls first:

example=# select * from users order by full_name asc nulls first;
  full_name
--------------
 💀
 💀
 💀
 Jack Donaghy
 Liz Lemon
 Tracy Jordan
(6 rows)

Lots of ways to explore our data.

Create Database from a Template

Today I got to see Postgres’ createdb invoked with a specified template, as such:

$ createdb -T prod_dump myapp_development

The -T feature flag corresponds to the TEMPLATE modifier in a SQL CREATE DATABASE statement. In the example above, this will create myapp_development that is a deep clone (structure and data) of prod_dump.

For more info, see:

$ man createdb

and:

$ psql
user=# \h create database;

h/t Brian Dunn

Determining The Age Of Things

In PostgreSQL, we can determine the age of something (or someone) by passing a timestamp to the age function.

For instance, if we want to know how long it has been since y2k, we can run the following query:

> select age(timestamp '2000-01-01');
           age
-------------------------
 16 years 4 mons 12 days

Additionally, if we want to know the amount of time between two dates, we can pass two timestamps to the age function.

For example, we can find out how old Prince lived to be by passing in the date of death and then date of birth:

> select age(timestamp 'April 21, 2016', timestamp 'June 7, 1958');
           age
--------------------------
 57 years 10 mons 14 days

h/t Josh Davey

Dump And Restore A PostgreSQL Database

PostgreSQL comes with two command-line utilities for dumping and then restoring a database — pg_dump and pg_restore, respectively.

Using the pg_dump with the -Fc flag will create a dump of the given database in a custom format. The output of this command can be redirected into a file (the .dump extension is a standard convention):

$ pg_dump -Fc my_database > my_database.dump

Using the custom format option provides a couple benefits. The output is significantly compressed in comparison to a generic SQL dump. The dump and restoration is more flexible. Lastly, the dump can be performed in parallel if your machine has multiple cores to work with. Likewise, the restoration can be done in parallel with multiple jobs.

To restore the dump, create a fresh database and then use pg_restore:

$ createdb my_new_database
$ pg_restore -d my_new_database my_database.dump

Note: the dumped tables will depend on some user role. You will need to ensure that this role exists on the database cluster where the restore is happening. You can use the createuser command if necessary.

See the pg_dump docs and pg_restore docs for more details.

List Database Objects With Disk Usage

I’ll often times use \d or \dt to check out the tables in my database. This shows the schema, object name, object type (e.g. table), and owner for each.

By adding the + to that meta-command, I can also see the disk usage for each database object.

Here is an example of look at all tables in a database with the additional Size (or disk usage) information:

> \dt+
                              List of relations
 Schema |        Name        | Type  |   Owner    |    Size    | Description
--------+--------------------+-------+------------+------------+-------------
 public | amount_types       | table | jbranchaud | 16 kB      |
 public | ingredient_amounts | table | jbranchaud | 8192 bytes |
 public | ingredient_types   | table | jbranchaud | 16 kB      |
 public | ingredients        | table | jbranchaud | 48 kB      |
 public | recipes            | table | jbranchaud | 16 kB      |
 public | schema_migrations  | table | jbranchaud | 16 kB      |
 public | users              | table | jbranchaud | 16 kB      |