Math_Stats
[ class tree: Math_Stats ] [ index: Math_Stats ] [ all elements ]

Class: Base

Source Location: /PHPExcel/Shared/JAMA/examples/Stats.php

Class Overview


A class to calculate descriptive statistics from a data set.


Author(s):

Version:

  • 0.8

Variables

Methods



Class Details

[line 119]
A class to calculate descriptive statistics from a data set.

A class to calculate descriptive statistics from a data set. Data sets can be simple arrays of data, or a cummulative hash. The second form is useful when passing large data set, for example the data set:

 $data1 = array (1,2,1,1,1,1,3,3,4.1,3,2,2,4.1,1,1,2,3,3,2,2,1,1,2,2);
can be epxressed more compactly as:
 $data2 = array('1'=>9, '2'=>8, '3'=>5, '4.1'=>2);
Example of use:
 include_once 'Math/Stats.php';
 $s = new Math_Stats();
 $s->setData($data1);
 // or
 // $s->setData($data2, STATS_DATA_CUMMULATIVE);
 $stats = $s->calcBasic();
 echo 'Mean: '.$stats['mean'].' StDev: '.$stats['stdev'].' 
\n'; // using data with nulls // first ignoring them: $data3 = array(1.2, 'foo', 2.4, 3.1, 4.2, 3.2, null, 5.1, 6.2); $s->setNullOption(STATS_IGNORE_NULL); $s->setData($data3); $stats3 = $s->calcFull(); // and then assuming nulls == 0 $s->setNullOption(STATS_USE_NULL_AS_ZERO); $s->setData($data3); $stats3 = $s->calcFull();
Originally this class was part of NumPHP (Numeric PHP package)




Tags:

author:  Jesus M. Castagnetto <jmcastagnetto@php.net>
version:  0.8
access:  public


[ Top ]


Class Variables

$_calculatedValues = array()

[line 165]

Array for caching result values, should be reset

Array for caching result values, should be reset when using setData()




Tags:

access:  private

Type:   array


[ Top ]

$_data =  null

[line 129]

The simple or cummulative data set.

The simple or cummulative data set. Null by default.




Tags:

access:  private

Type:   array


[ Top ]

$_dataExpanded =  null

[line 138]

Expanded data set. is being used.

Expanded data set. Only set when cummulative data is being used. Null by default.




Tags:

access:  private

Type:   array


[ Top ]

$_dataOption =  null

[line 147]

Flag for data type, one of STATS_DATA_SIMPLE or STATS_DATA_CUMMULATIVE.

Flag for data type, one of STATS_DATA_SIMPLE or STATS_DATA_CUMMULATIVE. Null by default.




Tags:

access:  private

Type:   int


[ Top ]

$_nullOption =

[line 156]

Flag for null handling options.

Flag for null handling options. One of STATS_REJECT_NULL, STATS_IGNORE_NULL or STATS_USE_NULL_AS_ZERO




Tags:

access:  private

Type:   int


[ Top ]



Class Methods


method absDev [line 750]

mixed absDev( )

Calculates the absolute deviation of the data points in the set

Calculates the absolute deviation of the data points in the set Handles cummulative data sets correctly




Tags:

return:  the absolute deviation on success, a PEAR_Error object otherwise
see:  Base::absDevWithMean()
see:  Base::count()
see:  Base::__sumabsdev()
see:  Base::calc()
access:  public


[ Top ]

method absDevWithMean [line 773]

mixed absDevWithMean( numeric $mean)

Calculates the absolute deviation of the data points in the set given a fixed mean (average) value. or calc().

Calculates the absolute deviation of the data points in the set given a fixed mean (average) value. Not used in calcBasic(), calcFull() or calc(). Handles cummulative data sets correctly




Tags:

return:  the absolute deviation on success, a PEAR_Error object otherwise
see:  Base::absDev()
see:  Base::__sumabsdev()
access:  public


Parameters:

numeric   $mean   the fixed mean value

[ Top ]

method calc [line 326]

mixed calc( int $mode, [boolean $returnErrorObject = true])

Calculates the basic or full statistics for the data set

Calculates the basic or full statistics for the data set




Tags:

return:  an associative array of statistics on success, a PEAR_Error object otherwise
see:  Base::calcFull()
see:  Base::calcBasic()
access:  public


Parameters:

int   $mode   one of STATS_BASIC or STATS_FULL
boolean   $returnErrorObject   whether the raw PEAR_Error (when true, default), or only the error message will be returned (when false), if an error happens.

[ Top ]

method calcBasic [line 349]

mixed calcBasic( [boolean $returnErrorObject = true])

Calculates a basic set of statistics

Calculates a basic set of statistics




Tags:

return:  an associative array of statistics on success, a PEAR_Error object otherwise
see:  Base::calcFull()
see:  Base::calc()
access:  public


Parameters:

boolean   $returnErrorObject   whether the raw PEAR_Error (when true, default), or only the error message will be returned (when false), if an error happens.

[ Top ]

method calcFull [line 373]

mixed calcFull( [boolean $returnErrorObject = true])

Calculates a full set of statistics

Calculates a full set of statistics




Tags:

return:  an associative array of statistics on success, a PEAR_Error object otherwise
see:  Base::calcBasic()
see:  Base::calc()
access:  public


Parameters:

boolean   $returnErrorObject   whether the raw PEAR_Error (when true, default), or only the error message will be returned (when false), if an error happens.

[ Top ]

method center [line 295]

mixed center( )

Transforms the data by substracting each entry from the mean.

Transforms the data by substracting each entry from the mean. This will reset all pre-calculated values to their original (unset) defaults.




Tags:

return:  true on success, a PEAR_Error object otherwise
see:  Base::setData()
see:  Base::mean()
access:  public


[ Top ]

method coeffOfVariation [line 1109]

mixed coeffOfVariation( )

Calculates the coefficient of variation of a data set.

Calculates the coefficient of variation of a data set. The coefficient of variation measures the spread of a set of data as a proportion of its mean. It is often expressed as a percentage. Handles cummulative data sets correctly




Tags:

return:  the coefficient of variation on success, a PEAR_Error object otherwise
see:  Base::calc()
see:  Base::mean()
see:  Base::stDev()
access:  public


[ Top ]

method count [line 599]

mixed count( )

Calculates the number of data points in the set

Calculates the number of data points in the set Handles cummulative data sets correctly




Tags:

return:  the count on success, a PEAR_Error object otherwise
see:  Base::calc()
access:  public


[ Top ]

method frequency [line 1171]

mixed frequency( )

Calculates the value frequency table of a data set.

Calculates the value frequency table of a data set. Handles cummulative data sets correctly




Tags:

return:  an associative array of value=>frequency items on success, a PEAR_Error object otherwise
see:  Base::calc()
see:  Base::max()
see:  Base::min()
access:  public


[ Top ]

method geometricMean [line 965]

mixed geometricMean( )

Calculates the geometrical mean of the data points in the set

Calculates the geometrical mean of the data points in the set Handles cummulative data sets correctly




Tags:

return:  the geometrical mean value on success, a PEAR_Error object otherwise
see:  Base::count()
see:  Base::product()
see:  Base::calc()
access:  public


[ Top ]

method getData [line 217]

mixed getData( [boolean $expanded = false])

Returns the data which might have been modified according to the current null handling options.

Returns the data which might have been modified according to the current null handling options.




Tags:

return:  array of data on success, a PEAR_Error object otherwise
see:  Base::_validate()
access:  public


Parameters:

boolean   $expanded   whether to return a expanded list, default is false

[ Top ]

method harmonicMean [line 995]

mixed harmonicMean( )

Calculates the harmonic mean of the data points in the set

Calculates the harmonic mean of the data points in the set Handles cummulative data sets correctly




Tags:

return:  the harmonic mean value on success, a PEAR_Error object otherwise
see:  Base::count()
see:  Base::calc()
access:  public


[ Top ]

method interquartileMean [line 1235]

mixed interquartileMean( )

The interquartile mean is defined as the mean of the values left

The interquartile mean is defined as the mean of the values left after discarding the lower 25% and top 25% ranked values, i.e.: interquart mean = mean(<P(25),P(75)>) where: P = percentile




Tags:

return:  a numeric value on success, a PEAR_Error otherwise
see:  Base::quartiles()
todo:  need to double check the equation
access:  public


[ Top ]

method interquartileRange [line 1273]

mixed interquartileRange( )

The interquartile range is the distance between the 75th and 25th percentiles. and thus is not affected by outliers or extreme values.

The interquartile range is the distance between the 75th and 25th percentiles. Basically the range of the middle 50% of the data set, and thus is not affected by outliers or extreme values. interquart range = P(75) - P(25) where: P = percentile




Tags:

return:  a numeric value on success, a PEAR_Error otherwise
see:  Base::quartiles()
access:  public


[ Top ]

method kurtosis [line 832]

mixed kurtosis( )

Calculates the kurtosis of the data distribution in the set The kurtosis measures the degrees of peakedness of a distribution.

Calculates the kurtosis of the data distribution in the set The kurtosis measures the degrees of peakedness of a distribution. It is also called the "excess" or "excess coefficient", and is a normalized form of the fourth central moment of a distribution. A normal distributions has kurtosis = 0 A narrow and peaked (leptokurtic) distribution has a kurtosis > 0 A flat and wide (platykurtic) distribution has a kurtosis < 0 Handles cummulative data sets correctly




Tags:

return:  the kurtosis value on success, a PEAR_Error object otherwise
see:  Base::calc()
see:  Base::stDev()
see:  Base::count()
see:  Base::__sumdiff()
access:  public


[ Top ]

method Math_Stats [line 176]

object Math_Stats Math_Stats( [optional $nullOption = STATS_REJECT_NULL])

Constructor for the class

Constructor for the class




Tags:

access:  public


Parameters:

optional   $nullOption   int $nullOption how to handle null values

[ Top ]

method max [line 451]

mixed max( )

Calculates the maximum of a data set.

Calculates the maximum of a data set. Handles cummulative data sets correctly




Tags:

return:  the maximum value on success, a PEAR_Error object otherwise
see:  Base::min()
see:  Base::calc()
access:  public


[ Top ]

method mean [line 624]

mixed mean( )

Calculates the mean (average) of the data points in the set

Calculates the mean (average) of the data points in the set Handles cummulative data sets correctly




Tags:

return:  the mean value on success, a PEAR_Error object otherwise
see:  Base::count()
see:  Base::sum()
see:  Base::calc()
access:  public


[ Top ]

method median [line 864]

mixed median( )

Calculates the median of a data set.

Calculates the median of a data set. The median is the value such that half of the points are below it in a sorted data set. If the number of values is odd, it is the middle item. If the number of values is even, is the average of the two middle items. Handles cummulative data sets correctly




Tags:

return:  the median value on success, a PEAR_Error object otherwise
see:  Base::calc()
see:  Base::count()
access:  public


[ Top ]

method midrange [line 940]

mixed midrange( )

Calculates the midrange of a data set.

Calculates the midrange of a data set. The midrange is the average of the minimum and maximum of the data set. Handles cummulative data sets correctly




Tags:

return:  the midrange value on success, a PEAR_Error object otherwise
see:  Base::calc()
see:  Base::max()
see:  Base::min()
access:  public


[ Top ]

method min [line 427]

mixed min( )

Calculates the minimum of a data set.

Calculates the minimum of a data set. Handles cummulative data sets correctly




Tags:

return:  the minimum value on success, a PEAR_Error object otherwise
see:  Base::max()
see:  Base::calc()
access:  public


[ Top ]

method mode [line 900]

mixed mode( )

Calculates the mode of a data set.

Calculates the mode of a data set. The mode is the value with the highest frequency in the data set. There can be more than one mode. Handles cummulative data sets correctly




Tags:

return:  an array of mode value on success, a PEAR_Error object otherwise
see:  Base::calc()
see:  Base::frequency()
access:  public


[ Top ]

method percentile [line 1389]

mixed percentile( numeric $p)

The pth percentile is the value such that p% of the a sorted data set is smaller than it, and (100 - p)% of the data is larger.

The pth percentile is the value such that p% of the a sorted data set is smaller than it, and (100 - p)% of the data is larger. A quick algorithm to pick the appropriate value from a sorted data set is as follows:

  • Count the number of values: n
  • Calculate the position of the value in the data list: i = p * (n + 1)
  • if i is an integer, return the data at that position
  • if i < 1, return the minimum of the data set
  • if i > n, return the maximum of the data set
  • otherwise, average the entries at adjacent positions to i
The median is the 50th percentile value.




Tags:

return:  a numeric value on success, a PEAR_Error otherwise
see:  Base::median()
see:  Base::quartiles()
todo:  need to double check generality of the algorithm
access:  public


Parameters:

numeric   $p   the percentile to estimate, e.g. 25 for 25th percentile

[ Top ]

method product [line 546]

mixed product( )

Calculates PROD { (xi) }, (the product of all observations)

Calculates PROD { (xi) }, (the product of all observations) Handles cummulative data sets correctly




Tags:

return:  the product on success, a PEAR_Error object otherwise
see:  Base::productN()
access:  public


[ Top ]

method productN [line 567]

mixed productN( numeric $n)

Calculates PROD { (xi)^n }, which is the product of all observations

Calculates PROD { (xi)^n }, which is the product of all observations Handles cummulative data sets correctly




Tags:

return:  the product on success, a PEAR_Error object otherwise
see:  Base::product()
access:  public


Parameters:

numeric   $n   the exponent

[ Top ]

method quartileDeviation [line 1298]

mixed quartileDeviation( )

The quartile deviation is half of the interquartile range value

The quartile deviation is half of the interquartile range value quart dev = (P(75) - P(25)) / 2 where: P = percentile




Tags:

return:  a numeric value on success, a PEAR_Error otherwise
see:  Base::interquartileRange()
see:  Base::quartiles()
access:  public


[ Top ]

method quartiles [line 1199]

mixed quartiles( )

The quartiles are defined as the values that divide a sorted data set into four equal-sized subsets, and correspond to the 25th, 50th, and 75th percentiles.

The quartiles are defined as the values that divide a sorted data set into four equal-sized subsets, and correspond to the 25th, 50th, and 75th percentiles.




Tags:

return:  an associative array of quartiles on success, a PEAR_Error otherwise
see:  Base::percentile()
access:  public


[ Top ]

method quartileSkewnessCoefficient [line 1349]

mixed quartileSkewnessCoefficient( )

The quartile skewness coefficient (also known as Bowley Skewness),

The quartile skewness coefficient (also known as Bowley Skewness), is defined as follows: quart skewness coeff = (P(25) - 2*P(50) + P(75)) / (P(75) - P(25)) where: P = percentile




Tags:

return:  a numeric value on success, a PEAR_Error otherwise
see:  Base::quartiles()
todo:  need to double check the equation
access:  public


[ Top ]

method quartileVariationCoefficient [line 1321]

mixed quartileVariationCoefficient( )

The quartile variation coefficient is defines as follows:

The quartile variation coefficient is defines as follows: quart var coeff = 100 * (P(75) - P(25)) / (P(75) + P(25)) where: P = percentile




Tags:

return:  a numeric value on success, a PEAR_Error otherwise
see:  Base::quartiles()
todo:  need to double check the equation
access:  public


[ Top ]

method range [line 645]

mixed range( )

Calculates the range of the data set = max - min

Calculates the range of the data set = max - min




Tags:

return:  the value of the range on success, a PEAR_Error object otherwise.
access:  public


[ Top ]

method sampleCentralMoment [line 1040]

mixed sampleCentralMoment( integer $n)

Calculates the nth central moment (m{n}) of a data set.

Calculates the nth central moment (m{n}) of a data set. The definition of a sample central moment is: m{n} = 1/N * SUM { (xi - avg)^n } where: N = sample size, avg = sample mean.




Tags:

return:  the numeric value of the moment on success, PEAR_Error otherwise
access:  public


Parameters:

integer   $n   moment to calculate

[ Top ]

method sampleRawMoment [line 1076]

mixed sampleRawMoment( integer $n)

Calculates the nth raw moment (m{n}) of a data set.

Calculates the nth raw moment (m{n}) of a data set. The definition of a sample central moment is: m{n} = 1/N * SUM { xi^n } where: N = sample size, avg = sample mean.




Tags:

return:  the numeric value of the moment on success, PEAR_Error otherwise
access:  public


Parameters:

integer   $n   moment to calculate

[ Top ]

method setData [line 189]

mixed setData( array $arr, [optional $opt = STATS_DATA_SIMPLE])

Sets and verifies the data, checking for nulls and using

Sets and verifies the data, checking for nulls and using the current null handling option




Tags:

return:  true on success, a PEAR_Error object otherwise
access:  public


Parameters:

array   $arr   the data set
optional   $opt   int $opt data format: STATS_DATA_CUMMULATIVE or STATS_DATA_SIMPLE (default)

[ Top ]

method setNullOption [line 236]

mixed setNullOption( $nullOption)

Sets the null handling option.

Sets the null handling option. Must be called before assigning a new data set containing null values




Tags:

return:  true on success, a PEAR_Error object otherwise
see:  Base::_validate()
access:  public


Parameters:

   $nullOption  

[ Top ]

method skewness [line 795]

mixed skewness( )

Calculates the skewness of the data distribution in the set The skewness measures the degree of asymmetry of a distribution, and is related to the third central moment of a distribution.

Calculates the skewness of the data distribution in the set The skewness measures the degree of asymmetry of a distribution, and is related to the third central moment of a distribution. A normal distribution has a skewness = 0 A distribution with a tail off towards the high end of the scale (positive skew) has a skewness > 0 A distribution with a tail off towards the low end of the scale (negative skew) has a skewness < 0 Handles cummulative data sets correctly




Tags:

return:  the skewness value on success, a PEAR_Error object otherwise
see:  Base::calc()
see:  Base::stDev()
see:  Base::count()
see:  Base::__sumdiff()
access:  public


[ Top ]

method stdErrorOfMean [line 1146]

mixed stdErrorOfMean( )

Calculates the standard error of the mean.

Calculates the standard error of the mean. It is the standard deviation of the sampling distribution of the mean. The formula is: S.E. Mean = SD / (N)^(1/2) This formula does not assume a normal distribution, and shows that the size of the standard error of the mean is inversely proportional to the square root of the sample size.




Tags:

return:  the standard error of the mean on success, a PEAR_Error object otherwise
see:  Base::calc()
see:  Base::count()
see:  Base::stDev()
access:  public


[ Top ]

method stDev [line 691]

mixed stDev( )

Calculates the standard deviation (unbiased) of the data points in the set

Calculates the standard deviation (unbiased) of the data points in the set Handles cummulative data sets correctly




Tags:

return:  the standard deviation on success, a PEAR_Error object otherwise
see:  Base::variance()
see:  Base::calc()
access:  public


[ Top ]

method stDevWithMean [line 731]

mixed stDevWithMean( numeric $mean)

Calculates the standard deviation (unbiased) of the data points in the set given a fixed mean (average) value. or calc().

Calculates the standard deviation (unbiased) of the data points in the set given a fixed mean (average) value. Not used in calcBasic(), calcFull() or calc(). Handles cummulative data sets correctly




Tags:

return:  the standard deviation on success, a PEAR_Error object otherwise
see:  Base::stDev()
see:  Base::varianceWithMean()
access:  public


Parameters:

numeric   $mean   the fixed mean value

[ Top ]

method studentize [line 259]

mixed studentize( )

Transforms the data by substracting each entry from the mean and dividing by its standard deviation. values to their original (unset) defaults.

Transforms the data by substracting each entry from the mean and dividing by its standard deviation. This will reset all pre-calculated values to their original (unset) defaults.




Tags:

return:  true on success, a PEAR_Error object otherwise
see:  Base::setData()
see:  Base::stDev()
see:  Base::mean()
access:  public


[ Top ]

method sum [line 476]

mixed sum( )

Calculates SUM { xi }

Calculates SUM { xi } Handles cummulative data sets correctly




Tags:

return:  the sum on success, a PEAR_Error object otherwise
see:  Base::sumN()
see:  Base::sum2()
see:  Base::calc()
access:  public


[ Top ]

method sum2 [line 498]

mixed sum2( )

Calculates SUM { (xi)^2 }

Calculates SUM { (xi)^2 } Handles cummulative data sets correctly




Tags:

return:  the sum on success, a PEAR_Error object otherwise
see:  Base::sumN()
see:  Base::sum()
see:  Base::calc()
access:  public


[ Top ]

method sumN [line 521]

mixed sumN( numeric $n)

Calculates SUM { (xi)^n }

Calculates SUM { (xi)^n } Handles cummulative data sets correctly




Tags:

return:  the sum on success, a PEAR_Error object otherwise
see:  Base::sum2()
see:  Base::sum()
see:  Base::calc()
access:  public


Parameters:

numeric   $n   the exponent

[ Top ]

method variance [line 671]

mixed variance( )

Calculates the variance (unbiased) of the data points in the set

Calculates the variance (unbiased) of the data points in the set Handles cummulative data sets correctly




Tags:

return:  the variance value on success, a PEAR_Error object otherwise
see:  Base::count()
see:  Base::__sumdiff()
see:  Base::calc()
access:  public


[ Top ]

method varianceWithMean [line 715]

mixed varianceWithMean( numeric $mean)

Calculates the variance (unbiased) of the data points in the set given a fixed mean (average) value. or calc().

Calculates the variance (unbiased) of the data points in the set given a fixed mean (average) value. Not used in calcBasic(), calcFull() or calc(). Handles cummulative data sets correctly




Tags:

return:  the variance on success, a PEAR_Error object otherwise
see:  Base::variance()
see:  Base::count()
see:  Base::__sumdiff()
access:  public


Parameters:

numeric   $mean   the fixed mean value

[ Top ]

method _validate [line 1562]

mixed _validate( )

Utility function to validate the data and modify it

Utility function to validate the data and modify it according to the current null handling option




Tags:

return:  true on success, a PEAR_Error object otherwise
see:  Base::setData()
access:  private


[ Top ]

method __calcAbsoluteDeviation [line 1488]

mixed __calcAbsoluteDeviation( [$mean $mean = null])

Utility function to calculate the absolute deviation with or without

Utility function to calculate the absolute deviation with or without a fixed mean




Tags:

return:  a numeric value on success, a PEAR_Error otherwise
see:  Base::absDevWithMean()
see:  Base::absDev()
access:  private


Parameters:

$mean   $mean   the fixed mean to use, null as default

[ Top ]

method __calcVariance [line 1460]

mixed __calcVariance( [$mean $mean = null])

Utility function to calculate the variance with or without

Utility function to calculate the variance with or without a fixed mean




Tags:

return:  a numeric value on success, a PEAR_Error otherwise
see:  Base::varianceWithMean()
see:  Base::variance()
access:  private


Parameters:

$mean   $mean   the fixed mean to use, null as default

[ Top ]

method __format [line 1545]

mixed __format( mixed $v, [ $useErrorObject = true], boolean $returnErrorObject)

Utility function to format a PEAR_Error to be used by calc(),

Utility function to format a PEAR_Error to be used by calc(), calcBasic() and calcFull()




Tags:

return:  if the value is a PEAR_Error object, and $useErrorObject is false, then a string with the error message will be returned, otherwise the value will not be modified and returned as passed.
access:  private


Parameters:

mixed   $v   value to be formatted
boolean   $returnErrorObject   whether the raw PEAR_Error (when true, default), or only the error message will be returned (when false)
   $useErrorObject  

[ Top ]

method __sumabsdev [line 1513]

mixed __sumabsdev( [optional $mean = null])

Utility function to calculate: SUM { | xi - mean | }

Utility function to calculate: SUM { | xi - mean | }




Tags:

return:  the sum on success, a PEAR_Error object otherwise
see:  Base::absDevWithMean()
see:  Base::absDev()
access:  private


Parameters:

optional   $mean   double $mean the mean value for the set or population

[ Top ]

method __sumdiff [line 1428]

mixed __sumdiff( numeric $power, [optional $mean = null])

Utility function to calculate: SUM { (xi - mean)^n }

Utility function to calculate: SUM { (xi - mean)^n }




Tags:

return:  the sum on success, a PEAR_Error object otherwise
see:  kurtosis();
see:  skewness();
see:  variaceWithMean();
see:  Base::stDev()
access:  private


Parameters:

numeric   $power   the exponent
optional   $mean   double $mean the data set mean value

[ Top ]


Documentation generated on Mon, 05 Jan 2009 20:38:33 +0100 by phpDocumentor 1.4.1