@@ -35,6 +35,35 @@ and implementation-dependent. If your input data consists of mixed types,
3535you may be able to use :func: `map ` to ensure a consistent result, for
3636example: ``map(float, input_data) ``.
3737
38+ Some datasets use ``NaN `` (not a number) values to represent missing data.
39+ Since NaNs have unusual comparison semantics, they cause surprising or
40+ undefined behaviors in the statistics functions that sort data or that count
41+ occurrences. The functions affected are ``median() ``, ``median_low() ``,
42+ ``median_high() ``, ``median_grouped() ``, ``mode() ``, ``multimode() ``, and
43+ ``quantiles() ``. The ``NaN `` values should be stripped before calling these
44+ functions::
45+
46+ >>> from statistics import median
47+ >>> from math import isnan
48+ >>> from itertools import filterfalse
49+
50+ >>> data = [20.7, float('NaN'),19.2, 18.3, float('NaN'), 14.4]
51+ >>> sorted(data) # This has surprising behavior
52+ [20.7, nan, 14.4, 18.3, 19.2, nan]
53+ >>> median(data) # This result is unexpected
54+ 16.35
55+
56+ >>> sum(map(isnan, data)) # Number of missing values
57+ 2
58+ >>> clean = list(filterfalse(isnan, data)) # Strip NaN values
59+ >>> clean
60+ [20.7, 19.2, 18.3, 14.4]
61+ >>> sorted(clean) # Sorting now works as expected
62+ [14.4, 18.3, 19.2, 20.7]
63+ >>> median(clean) # This result is now well defined
64+ 18.75
65+
66+
3867Averages and measures of central location
3968-----------------------------------------
4069
0 commit comments