Blame - doc/distributions/binomial.qbk - RealtimeRoboticsGroup/test

blob: eae3e83104926ff5ac660b7eea18eeae158df69f [file] [log] [blame]

Brian Silverman	32ed54e	2018-08-04 23:37:28 -0700	[diff] [blame^]	1	[section:binomial_dist Binomial Distribution]
				2
				3	``#include <boost/math/distributions/binomial.hpp>``
				4
				5	namespace boost{ namespace math{
				6
				7	template <class RealType = double,
				8	class ``__Policy`` = ``__policy_class`` >
				9	class binomial_distribution;
				10
				11	typedef binomial_distribution<> binomial;
				12
				13	template <class RealType, class ``__Policy``>
				14	class binomial_distribution
				15	{
				16	public:
				17	typedef RealType value_type;
				18	typedef Policy policy_type;
				19
				20	static const ``['unspecified-type]`` clopper_pearson_exact_interval;
				21	static const ``['unspecified-type]`` jeffreys_prior_interval;
				22
				23	// construct:
				24	binomial_distribution(RealType n, RealType p);
				25
				26	// parameter access::
				27	RealType success_fraction() const;
				28	RealType trials() const;
				29
				30	// Bounds on success fraction:
				31	static RealType find_lower_bound_on_p(
				32	RealType trials,
				33	RealType successes,
				34	RealType probability,
				35	``['unspecified-type]`` method = clopper_pearson_exact_interval);
				36	static RealType find_upper_bound_on_p(
				37	RealType trials,
				38	RealType successes,
				39	RealType probability,
				40	``['unspecified-type]`` method = clopper_pearson_exact_interval);
				41
				42	// estimate min/max number of trials:
				43	static RealType find_minimum_number_of_trials(
				44	RealType k, // number of events
				45	RealType p, // success fraction
				46	RealType alpha); // risk level
				47
				48	static RealType find_maximum_number_of_trials(
				49	RealType k, // number of events
				50	RealType p, // success fraction
				51	RealType alpha); // risk level
				52	};
				53
				54	}} // namespaces
				55
				56	The class type `binomial_distribution` represents a
				57	[@http://mathworld.wolfram.com/BinomialDistribution.html binomial distribution]:
				58	it is used when there are exactly two mutually
				59	exclusive outcomes of a trial. These outcomes are labelled
				60	"success" and "failure". The
				61	__binomial_distrib is used to obtain
				62	the probability of observing k successes in N trials, with the
				63	probability of success on a single trial denoted by p. The
				64	binomial distribution assumes that p is fixed for all trials.
				65
				66	[note The random variable for the binomial distribution is the number of successes,
				67	(the number of trials is a fixed property of the distribution)
				68	whereas for the negative binomial,
				69	the random variable is the number of trials, for a fixed number of successes.]
				70
				71	The PDF for the binomial distribution is given by:
				72
				73	[equation binomial_ref2]
				74
				75	The following two graphs illustrate how the PDF changes depending
				76	upon the distributions parameters, first we'll keep the success
				77	fraction /p/ fixed at 0.5, and vary the sample size:
				78
				79	[graph binomial_pdf_1]
				80
				81	Alternatively, we can keep the sample size fixed at N=20 and
				82	vary the success fraction /p/:
				83
				84	[graph binomial_pdf_2]
				85
				86	[discrete_quantile_warning Binomial]
				87
				88	[h4 Member Functions]
				89
				90	[h5 Construct]
				91
				92	binomial_distribution(RealType n, RealType p);
				93
				94	Constructor: /n/ is the total number of trials, /p/ is the
				95	probability of success of a single trial.
				96
				97	Requires `0 <= p <= 1`, and `n >= 0`, otherwise calls __domain_error.
				98
				99	[h5 Accessors]
				100
				101	RealType success_fraction() const;
				102
				103	Returns the parameter /p/ from which this distribution was constructed.
				104
				105	RealType trials() const;
				106
				107	Returns the parameter /n/ from which this distribution was constructed.
				108
				109	[h5 Lower Bound on the Success Fraction]
				110
				111	static RealType find_lower_bound_on_p(
				112	RealType trials,
				113	RealType successes,
				114	RealType alpha,
				115	``['unspecified-type]`` method = clopper_pearson_exact_interval);
				116
				117	Returns a lower bound on the success fraction:
				118
				119	[variablelist
				120	[[trials][The total number of trials conducted.]]
				121	[[successes][The number of successes that occurred.]]
				122	[[alpha][The largest acceptable probability that the true value of
				123	the success fraction is [*less than] the value returned.]]
				124	[[method][An optional parameter that specifies the method to be used
				125	to compute the interval (See below).]]
				126	]
				127
				128	For example, if you observe /k/ successes from /n/ trials the
				129	best estimate for the success fraction is simply ['k/n], but if you
				130	want to be 95% sure that the true value is [*greater than] some value,
				131	['p[sub min]], then:
				132
				133	p``[sub min]`` = binomial_distribution<RealType>::find_lower_bound_on_p(
				134	n, k, 0.05);
				135
				136	[link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
				137
				138	There are currently two possible values available for the /method/
				139	optional parameter: /clopper_pearson_exact_interval/
				140	or /jeffreys_prior_interval/. These constants are both members of
				141	class template `binomial_distribution`, so usage is for example:
				142
				143	p = binomial_distribution<RealType>::find_lower_bound_on_p(
				144	n, k, 0.05, binomial_distribution<RealType>::jeffreys_prior_interval);
				145
				146	The default method if this parameter is not specified is the Clopper Pearson
				147	"exact" interval. This produces an interval that guarantees at least
				148	`100(1-alpha)%` coverage, but which is known to be overly conservative,
				149	sometimes producing intervals with much greater than the requested coverage.
				150
				151	The alternative calculation method produces a non-informative
				152	Jeffreys Prior interval. It produces `100(1-alpha)%` coverage only
				153	['in the average case], though is typically very close to the requested
				154	coverage level. It is one of the main methods of calculation recommended
				155	in the review by Brown, Cai and DasGupta.
				156
				157	Please note that the "textbook" calculation method using
				158	a normal approximation (the Wald interval) is deliberately
				159	not provided: it is known to produce consistently poor results,
				160	even when the sample size is surprisingly large.
				161	Refer to Brown, Cai and DasGupta for a full explanation. Many other methods
				162	of calculation are available, and may be more appropriate for specific
				163	situations. Unfortunately there appears to be no consensus amongst
				164	statisticians as to which is "best": refer to the discussion at the end of
				165	Brown, Cai and DasGupta for examples.
				166
				167	The two methods provided here were chosen principally because they
				168	can be used for both one and two sided intervals.
				169	See also:
				170
				171	Lawrence D. Brown, T. Tony Cai and Anirban DasGupta (2001),
				172	Interval Estimation for a Binomial Proportion,
				173	Statistical Science, Vol. 16, No. 2, 101-133.
				174
				175	T. Tony Cai (2005),
				176	One-sided confidence intervals in discrete distributions,
				177	Journal of Statistical Planning and Inference 131, 63-88.
				178
				179	Agresti, A. and Coull, B. A. (1998). Approximate is better than
				180	"exact" for interval estimation of binomial proportions. Amer.
				181	Statist. 52 119-126.
				182
				183	Clopper, C. J. and Pearson, E. S. (1934). The use of confidence
				184	or fiducial limits illustrated in the case of the binomial.
				185	Biometrika 26 404-413.
				186
				187	[h5 Upper Bound on the Success Fraction]
				188
				189	static RealType find_upper_bound_on_p(
				190	RealType trials,
				191	RealType successes,
				192	RealType alpha,
				193	``['unspecified-type]`` method = clopper_pearson_exact_interval);
				194
				195	Returns an upper bound on the success fraction:
				196
				197	[variablelist
				198	[[trials][The total number of trials conducted.]]
				199	[[successes][The number of successes that occurred.]]
				200	[[alpha][The largest acceptable probability that the true value of
				201	the success fraction is [*greater than] the value returned.]]
				202	[[method][An optional parameter that specifies the method to be used
				203	to compute the interval. Refer to the documentation for
				204	`find_upper_bound_on_p` above for the meaning of the
				205	method options.]]
				206	]
				207
				208	For example, if you observe /k/ successes from /n/ trials the
				209	best estimate for the success fraction is simply ['k/n], but if you
				210	want to be 95% sure that the true value is [*less than] some value,
				211	['p[sub max]], then:
				212
				213	p``[sub max]`` = binomial_distribution<RealType>::find_upper_bound_on_p(
				214	n, k, 0.05);
				215
				216	[link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
				217
				218	[note
				219	In order to obtain a two sided bound on the success fraction, you
				220	call both `find_lower_bound_on_p` and `find_upper_bound_on_p`
				221	each with the same arguments.
				222
				223	If the desired risk level
				224	that the true success fraction lies outside the bounds is [alpha],
				225	then you pass [alpha]/2 to these functions.
				226
				227	So for example a two sided 95% confidence interval would be obtained
				228	by passing [alpha] = 0.025 to each of the functions.
				229
				230	[link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
				231	]
				232
				233
				234	[h5 Estimating the Number of Trials Required for a Certain Number of Successes]
				235
				236	static RealType find_minimum_number_of_trials(
				237	RealType k, // number of events
				238	RealType p, // success fraction
				239	RealType alpha); // probability threshold
				240
				241	This function estimates the minimum number of trials required to ensure that
				242	more than k events is observed with a level of risk /alpha/ that k or
				243	fewer events occur.
				244
				245	[variablelist
				246	[[k][The number of success observed.]]
				247	[[p][The probability of success for each trial.]]
				248	[[alpha][The maximum acceptable probability that k events or fewer will be observed.]]
				249	]
				250
				251	For example:
				252
				253	binomial_distribution<RealType>::find_number_of_trials(10, 0.5, 0.05);
				254
				255	Returns the smallest number of trials we must conduct to be 95% sure
				256	of seeing 10 events that occur with frequency one half.
				257
				258	[h5 Estimating the Maximum Number of Trials to Ensure no more than a Certain Number of Successes]
				259
				260	static RealType find_maximum_number_of_trials(
				261	RealType k, // number of events
				262	RealType p, // success fraction
				263	RealType alpha); // probability threshold
				264
				265	This function estimates the maximum number of trials we can conduct
				266	to ensure that k successes or fewer are observed, with a risk /alpha/
				267	that more than k occur.
				268
				269	[variablelist
				270	[[k][The number of success observed.]]
				271	[[p][The probability of success for each trial.]]
				272	[[alpha][The maximum acceptable probability that more than k events will be observed.]]
				273	]
				274
				275	For example:
				276
				277	binomial_distribution<RealType>::find_maximum_number_of_trials(0, 1e-6, 0.05);
				278
				279	Returns the largest number of trials we can conduct and still be 95% certain
				280	of not observing any events that occur with one in a million frequency.
				281	This is typically used in failure analysis.
				282
				283	[link math_toolkit.stat_tut.weg.binom_eg.binom_size_eg See Worked Example.]
				284
				285	[h4 Non-member Accessors]
				286
				287	All the [link math_toolkit.dist_ref.nmp usual non-member accessor functions]
				288	that are generic to all distributions are supported: __usual_accessors.
				289
				290	The domain for the random variable /k/ is `0 <= k <= N`, otherwise a
				291	__domain_error is returned.
				292
				293	It's worth taking a moment to define what these accessors actually mean in
				294	the context of this distribution:
				295
				296	[table Meaning of the non-member accessors
				297	[[Function][Meaning]]
				298	[[__pdf]
				299	[The probability of obtaining [*exactly k successes] from n trials
				300	with success fraction p. For example:
				301
				302	`pdf(binomial(n, p), k)`]]
				303	[[__cdf]
				304	[The probability of obtaining [*k successes or fewer] from n trials
				305	with success fraction p. For example:
				306
				307	`cdf(binomial(n, p), k)`]]
				308	[[__ccdf]
				309	[The probability of obtaining [*more than k successes] from n trials
				310	with success fraction p. For example:
				311
				312	`cdf(complement(binomial(n, p), k))`]]
				313	[[__quantile]
				314	[The [*greatest] number of successes that may be observed from n trials
				315	with success fraction p, at probability P. Note that the value returned
				316	is a real-number, and not an integer. Depending on the use case you may
				317	want to take either the floor or ceiling of the result. For example:
				318
				319	`quantile(binomial(n, p), P)`]]
				320	[[__quantile_c]
				321	[The [*smallest] number of successes that may be observed from n trials
				322	with success fraction p, at probability P. Note that the value returned
				323	is a real-number, and not an integer. Depending on the use case you may
				324	want to take either the floor or ceiling of the result. For example:
				325
				326	`quantile(complement(binomial(n, p), P))`]]
				327	]
				328
				329	[h4 Examples]
				330
				331	Various [link math_toolkit.stat_tut.weg.binom_eg worked examples]
				332	are available illustrating the use of the binomial distribution.
				333
				334	[h4 Accuracy]
				335
				336	This distribution is implemented using the
				337	incomplete beta functions __ibeta and __ibetac,
				338	please refer to these functions for information on accuracy.
				339
				340	[h4 Implementation]
				341
				342	In the following table /p/ is the probability that one trial will
				343	be successful (the success fraction), /n/ is the number of trials,
				344	/k/ is the number of successes, /p/ is the probability and /q = 1-p/.
				345
				346	[table
				347	[[Function][Implementation Notes]]
				348	[[pdf][Implementation is in terms of __ibeta_derivative: if [sub n]C[sub k ] is the binomial
				349	coefficient of a and b, then we have:
				350
				351	[equation binomial_ref1]
				352
				353	Which can be evaluated as `ibeta_derivative(k+1, n-k+1, p) / (n+1)`
				354
				355	The function __ibeta_derivative is used here, since it has already
				356	been optimised for the lowest possible error - indeed this is really
				357	just a thin wrapper around part of the internals of the incomplete
				358	beta function.
				359
				360	There are also various special cases: refer to the code for details.
				361	]]
				362	[[cdf][Using the relation:
				363
				364	``
				365	p = I[sub 1-p](n - k, k + 1)
				366	= 1 - I[sub p](k + 1, n - k)
				367	= __ibetac(k + 1, n - k, p)``
				368
				369	There are also various special cases: refer to the code for details.
				370	]]
				371	[[cdf complement][Using the relation: q = __ibeta(k + 1, n - k, p)
				372
				373	There are also various special cases: refer to the code for details. ]]
				374	[[quantile][Since the cdf is non-linear in variate /k/ none of the inverse
				375	incomplete beta functions can be used here. Instead the quantile
				376	is found numerically using a derivative free method
				377	(__root_finding_TOMS748).]]
				378	[[quantile from the complement][Found numerically as above.]]
				379	[[mean][ `p * n` ]]
				380	[[variance][ `p * n * (1-p)` ]]
				381	[[mode][`floor(p * (n + 1))`]]
				382	[[skewness][`(1 - 2 * p) / sqrt(n * p * (1 - p))`]]
				383	[[kurtosis][`3 - (6 / n) + (1 / (n * p * (1 - p)))`]]
				384	[[kurtosis excess][`(1 - 6 * p * q) / (n * p * q)`]]
				385	[[parameter estimation][The member functions `find_upper_bound_on_p`
				386	`find_lower_bound_on_p` and `find_number_of_trials` are
				387	implemented in terms of the inverse incomplete beta functions
				388	__ibetac_inv, __ibeta_inv, and __ibetac_invb respectively]]
				389	]
				390
				391	[h4 References]
				392
				393	* [@http://mathworld.wolfram.com/BinomialDistribution.html Weisstein, Eric W. "Binomial Distribution." From MathWorld--A Wolfram Web Resource].
				394	* [@http://en.wikipedia.org/wiki/Beta_distribution Wikipedia binomial distribution].
				395	* [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda366i.htm NIST Explorary Data Analysis].
				396
				397	[endsect] [/section:binomial_dist Binomial]
				398
				399	[/ binomial.qbk
				400	Copyright 2006 John Maddock and Paul A. Bristow.
				401	Distributed under the Boost Software License, Version 1.0.
				402	(See accompanying file LICENSE_1_0.txt or copy at
				403	http://www.boost.org/LICENSE_1_0.txt).
				404	]