(maxima.info)Introduction to descriptive


Next: Functions and Variables for data manipulation Prev: descriptive-pkg Up: descriptive-pkg
Enter node , (file) or (file)node

50.1 Introduction to descriptive
================================

Package 'descriptive' contains a set of functions for making descriptive
statistical computations and graphing.  Together with the source code
there are three data sets in your Maxima tree: 'pidigits.data',
'wind.data' and 'biomed.data'.

   Any statistics manual can be used as a reference to the functions in
package 'descriptive'.

   For comments, bugs or suggestions, please contact me at <'riotorto AT
yahoo DOT com'>.

   Here is a simple example on how the descriptive functions in
'descriptive' do they work, depending on the nature of their arguments,
lists or matrices,

     (%i1) load ("descriptive")$
     (%i2) /* univariate sample */   mean ([a, b, c]);
                                 c + b + a
     (%o2)                       ---------
                                     3
     (%i3) matrix ([a, b], [c, d], [e, f]);
                                 [ a  b ]
                                 [      ]
     (%o3)                       [ c  d ]
                                 [      ]
                                 [ e  f ]
     (%i4) /* multivariate sample */ mean (%);
                           e + c + a  f + d + b
     (%o4)                [---------, ---------]
                               3          3

   Note that in multivariate samples the mean is calculated for each
column.

   In case of several samples with possible different sizes, the Maxima
function 'map' can be used to get the desired results for each sample,

     (%i1) load ("descriptive")$
     (%i2) map (mean, [[a, b, c], [d, e]]);
                             c + b + a  e + d
     (%o2)                  [---------, -----]
                                 3        2

   In this case, two samples of sizes 3 and 2 were stored into a list.

   Univariate samples must be stored in lists like

     (%i1) s1 : [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5];
     (%o1)           [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]

   and multivariate samples in matrices as in

     (%i1) s2 : matrix ([13.17, 9.29], [14.71, 16.88], [18.50, 16.88],
                  [10.58, 6.63], [13.33, 13.25], [13.21,  8.12]);
                             [ 13.17  9.29  ]
                             [              ]
                             [ 14.71  16.88 ]
                             [              ]
                             [ 18.5   16.88 ]
     (%o1)                   [              ]
                             [ 10.58  6.63  ]
                             [              ]
                             [ 13.33  13.25 ]
                             [              ]
                             [ 13.21  8.12  ]

   In this case, the number of columns equals the random variable
dimension and the number of rows is the sample size.

   Data can be introduced by hand, but big samples are usually stored in
plain text files.  For example, file 'pidigits.data' contains the first
100 digits of number '%pi':
           3
           1
           4
           1
           5
           9
           2
           6
           5
           3 ...

   In order to load these digits in Maxima,

     (%i1) s1 : read_list (file_search ("pidigits.data"))$
     (%i2) length (s1);
     (%o2)                          100

   On the other hand, file 'wind.data' contains daily average wind
speeds at 5 meteorological stations in the Republic of Ireland (This is
part of a data set taken at 12 meteorological stations.  The original
file is freely downloadable from the StatLib Data Repository and its
analysis is discused in Haslett, J., Raftery, A. E. (1989) <Space-time
Modelling with Long-memory Dependence: Assessing Ireland's Wind Power
Resource, with Discussion>.  Applied Statistics 38, 1-50).  This loads
the data:

     (%i1) s2 : read_matrix (file_search ("wind.data"))$
     (%i2) length (s2);
     (%o2)                          100
     (%i3) s2 [%]; /* last record */
     (%o3)            [3.58, 6.0, 4.58, 7.62, 11.25]

   Some samples contain non numeric data.  As an example, file
'biomed.data' (which is part of another bigger one downloaded from the
StatLib Data Repository) contains four blood measures taken from two
groups of patients, 'A' and 'B', of different ages,

     (%i1) s3 : read_matrix (file_search ("biomed.data"))$
     (%i2) length (s3);
     (%o2)                          100
     (%i3) s3 [1]; /* first record */
     (%o3)            [A, 30, 167.0, 89.0, 25.6, 364]

   The first individual belongs to group 'A', is 30 years old and
his/her blood measures were 167.0, 89.0, 25.6 and 364.

   One must take care when working with categorical data.  In the next
example, symbol 'a' is assigned a value in some previous moment and then
a sample with categorical value 'a' is taken,

     (%i1) a : 1$
     (%i2) matrix ([a, 3], [b, 5]);
                                 [ 1  3 ]
     (%o2)                       [      ]
                                 [ b  5 ]


automatically generated by info2www version 1.2.2.9