1

I have a column of numbers in my database. How can I computer the standard deviation? I do not want use the stddev function.

6
  • Then use other functions that are needed to calculate std deviation. If you need the mean in the formula then just use the AVG and count(), whatever is needed, basically use other aggregate functions to make the std dev. Commented Oct 9, 2016 at 21:02
  • SELECT SQRT(SUM( (number-AVG(number))*(number-AVG(number)) /COUNT(value)) FROM mytable Commented Oct 9, 2016 at 21:03
  • but does not work Commented Oct 9, 2016 at 21:03
  • 1
    Why don't you want to use the stddev function? And which DBMS are you using? Postgres? Oracle? Commented Oct 9, 2016 at 21:05
  • I don't think your query will work without GROUP BY unless your group is the entire SELECTed set. You've got aggregate functions, e.g., SUM and COUNT Commented Oct 9, 2016 at 21:06

2 Answers 2

3

Just because I was curious, I decided to test the actual STDEV(). Now, I could not nail the built in function.

I was close... 0.000141009220002264 or 0.00748% off

Also, The Total Average and Count has to be converted to float (variance was greater with decimal)

The example below is going after my Treasury Rates Table for the 10 Year Yield (not that it matters)

Select SQLFunction = Stdev([TR_Y10]) 
      ,ManualCalc  = Sqrt(Sum(Power(((cast([TR_Y10] as float)-B.TotalAvg)),2) / B.TotalCnt))
      ,Variance    = Stdev([TR_Y10]) - Sqrt(Sum(Power(((cast([TR_Y10] as float)-B.TotalAvg)),2) / B.TotalCnt))
 From [Chinrus-Shared].[dbo].[DS_Treasury_Rates]
 Join (Select TotalAvg=Avg(cast([TR_Y10] as float)),TotalCnt=count(*) From [Chinrus-Shared].[dbo].[DS_Treasury_Rates]) B on 1=1

Returns

SQLFunction         ManualCalc          Variance
1.88409468982299    1.88395368060299    0.000141009220002264
Sign up to request clarification or add additional context in comments.

Comments

0

The standard deviation is the square root of the variance divided by n.

The variance is the sum of the squares of the differences between the average and the observed value.

So, in most databases, you can use window functions:

select sqrt(avg(var))
from (select square(t.x - avg(t.x) over ()) as var
      from t
     ) t;

Notes:

  • The square() function might have some other name (such as power()).
  • The sqrt() function might have some other name.
  • This is not a good way to calculate the standard deviation in general. In particular, this is a numerically unstable algorithm (it will work just fine for finite numbers of normal numbers).
  • The subquery is needed because window functions cannot be the arguments to aggregation functions.

2 Comments

Fascinating. Three calculations, three different results. Makes one question gravity. Results: MS> 1.88409468982299 JC> 1.88395368060299 GL> 1.88395368262047.
@JohnCappelletti . . . John, it is not surprising that we cannot reproduce the results of the built-in function. Hopefully, that implementation uses a method that is more numerically stable, which can produce the small variations in unimportant decimal places.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.