Two swings at guassian distribution in Postgres
I've come across 2 ways to replicate a bell curve.
This is adding 3 random numbers together.
``` sql
select
floor(random()*3)+floor(random()*3)+floor(random()*3) as y,
count(*)
from generate_series(1,10000000) group by y order by y;
y | count
---+---------
0 | 369905
1 | 1111997
2 | 2222763
3 | 2593306
4 | 2220452
5 | 1111506
6 | 370071
(7 rows)
Time: 18806.885 ms
```
Consistent distribution, but 18 seconds for 10 million rows!
The second method has a tail end of the curve that keeps going even as it tapers to 0. This uses the `normal_rand` function provided by postgres with arguments of quantity, median, and standard deviation.
``` sql
select x, count(*)
from
(select abs(ceil(normal_rand(10000000,2,1))) as x) as y
group by x order by count;
x | count
---+---------
0 | 214152
1 | 1373200
2 | 3414028
3 | 3411424
4 | 1359447
5 | 214194
6 | 13234
7 | 319
8 | 2
(9 rows)
Time: 3221.894 ms
```
I want the peak of the curve around 2 and 3 but for this method of Gaussian distribution I get results below 0 unless I use `abs`. And its fast, 3 seconds!
chriserin
September 6, 2016