Today I Learned

A Hashrocket project

Two swings at guassian distribution in Postgres

I’ve come across 2 ways to replicate a bell curve.

This is adding 3 random numbers together.

select 
floor(random()*3)+floor(random()*3)+floor(random()*3) as y, 
count(*) 
from generate_series(1,10000000) group by y order by y;
 y |  count
---+---------
 0 |  369905
 1 | 1111997
 2 | 2222763
 3 | 2593306
 4 | 2220452
 5 | 1111506
 6 |  370071
(7 rows)

Time: 18806.885 ms

Consistent distribution, but 18 seconds for 10 million rows!

The second method has a tail end of the curve that keeps going even as it tapers to 0. This uses the normal_rand function provided by postgres with arguments of quantity, median, and standard deviation.

select  x, count(*) 
from 
(select abs(ceil(normal_rand(10000000,2,1))) as x) as y 
group by x order by count;
 x |  count
---+---------
 0 |  214152
 1 | 1373200
 2 | 3414028
 3 | 3411424
 4 | 1359447
 5 |  214194
 6 |   13234
 7 |     319
 8 |       2
(9 rows)

Time: 3221.894 ms

I want the peak of the curve around 2 and 3 but for this method of Gaussian distribution I get results below 0 unless I use abs. And its fast, 3 seconds!

Looking for help? Hashrocket developers believe that data quality is as important as code quality. We enjoy all the challenges of relational databases, from finding the fastest index, to structuring data to fit the needs of an application. We're eager to share our experiences; check out PG Casts, a series of free weekly PostgreSQL screencasts.