Open the pdf and explore..
In this article, 7 statistical tests are explained which are essential for doing statistical analysis inside a CMMI High Maturity (HM) compliant project or organization.
1 Stability test
Data stability for the selected parameter is tested using minitab before making performance baselines.
- Go to Stat->Control Charts-> I-MR (In variables, enter the column containing the parameter)
- From the section for Estimate, choose ‘Average Moving Range” as methods of estimating sigma and ‘2’ as moving range of length
- From section of tests, choose specific tests to perform
- From the section for ‘Options’ enter sigma limit positions as 1 2 3.
After eliminating all the out of turn points, the system attains stability and is ready for baseline.
2 Capability test
Once the selected parameter is baselined, capability of the same to meet the specification limits are tested.
- Go to Stat->Quality Tools->Capability Sixpack(Normal), Choose single Column, (In variables, enter the column containing the parameter), Enter ‘1’ as subgroup size),Enter Lower spec and upper spec,
- From the section for Estimate choose ‘Average Moving Range” as methods of estimating sigma and ‘2’ as moving range of length,
- From the section for Options, enter ‘6’ as sigma tolerance, choose ‘within subgroup analysis’ and ‘percents’, and Opt ‘display graph’
If the control limits are within specification limits or the Cp and Cpk values are equal to or greater than one, the data is found to be capable.
3 Correlation test
Correlation test will be conducted between each independent parameter and the dependent parameter (if both are of continuous data type) in the Process Performance Model.
- Go to Stat->Basis Statistics ->Correlation (Opt display P values)
For each correlation test p-value has to be less than 0.05 (or the decided p value within the organization based on risk analysis)
4 Regression test
Regression test will be conducted including all the independent parameters and the dependent parameter in the Process Performance Model.
- Go to Stat->regression->regression; (In response and predictors, enter the columns containing the dependent and independent parameters respectively)
- From the section for storage, include Residuals also
- p-value has to be less than 0.05 for each factor as well as for the regression equation obtained. (Or the decided p value within the organization based on risk analysis)
- [R-Sq (adj)] has to be greater than 70 %( or the decided value within the organization based on risk analysis) for ensuring the correlation between the independent parameters and the dependent parameter. Otherwise, the parameter cannot be taken.
- Variance Inflation Factor (VIF) has to be less than 10. If VIF is greater than 10, correlation test (stat->basic statistics->correlation) will be conducted among the different parameters which are influencing Process Performance Model. In cases where correlation is high i.e. correlation greater than 0.5 or -0.5, the factors have dependency. In such cases if degree of correlation is quite high one of the factors will be avoided or relooked for new terms.
Normality of the data is tested using the Anderson-Darling test.
- Go to Stat > Basic Statistics > Normality Test> Anderson-Darling test
- In Variables, enter the columns containing the measurement data.
For the data to be normally distributed, null hypothesis cannot be rejected. For this p value has to be greater than 0.05 (or the decided p value within the organization based on risk analysis) and A2 value has to be less than .757.
Test for Two Variances is conducted to analyse whether variances are significantly different in two sets of data.
This null hypothesis is tested against the alternate hypothesis (two samples are having unequal variance)
- Go to Stat > Basic Statistics > 2 Variances.
- Opt ‘Samples in Different Columns’. In Variables, enter the columns containing the measurement data
Results: If the test’s p-value is less than the chosen significance level (normally 0.05), null hypothesis will be rejected.
Two sample T test is used to check whether means are significantly different in two periods for two groups of data.
The null hypothesis is checked against one of the alternative hypotheses, depending on the situation.
- Go to Stat > Basic Statistics > 2 Sample T
- Opt ‘Samples in Different Columns’. In Variables, enter the columns containing the measurement data.( First should be the initial data and second should be current data)
- Check or uncheck the box for “Assume equal variances” depending upon the F test results (Two variance Test results)
- In the Options, use the required alternative, whether ‘not equal’, ‘less than’ or ‘greater than’.
- Put test difference as 0 and confidence interval as 95.
If the test’s p-value is less than the chosen significance level (normally 0.05), null hypothesis has to be rejected.
Baselines are derived statistically using performance data collected over a period of time. They are indicators of current performance of an organization. Hence proper attention must be paid while deriving baselines as an error can cause even a loss of a business. There are some critical, but common mistakes observed in the baselining process as explained below. Crucial steps must be taken to avoid such mistakes.
Organization must plan and define measures that are tangible indicators of process performance. Baselining does not simply imply gathering and baselining the entire set of data available in the organization. Based on the business objectives, the critical processes of the organization whose performance needs to be analyzed is selected. Then process parameters for monitoring the same are defined, collected data and finally baselining done. There is no harm in collecting and baselining the entire parameters defined in the organization, but why should we waste our time collecting data which won’t be used.
For baselining with control charts, it is essential that the data to be chronological. Hence during data collection itself, time stamp of the data must be noted.
In software industry, often we hear complaints from baselining team regarding the deficiency of data points. And when the question is put on project team, they tell like “we just don’t have time” or “it is too difficult”. In order to derive baselines there needs to be a minimum number of data points, say like 10 or so. Then only, at least all the 4 rules of stability can be applied over the data points. But in a software industry people try to build baselines with 8 or less data points. Then it won’t indicate the correct performance level of the process under investigation. In such cases where number of data points is insufficient, baselining needs to be postponed. Or organization can plan to collect more samples by increasing the frequency of data collection.
While collecting as well as baselining data, one must use consistent methods and processes. What is being measured in the post baseline data needs to be same as what was measured in the baseline data collection process.
Data taken for baselining needs to be of homogenous nature. Otherwise the baselining output won’t give the correct indication of process performance. The data can be categorized based on the qualitative parameters like type of project, complexity of the work, nature of development, programming languages etc. instead of clubbing it altogether and thereby leading to loss the homogeneity
Usually it is a common mistake to take data blindly from organizational database and start the baselining process. Essentially, data must be verified to ensure its completeness, correctness and consistency before any statistical processing.
Processes that permits self-selection by respondents aren’t random samples and often aren’t representative of the target population. In order to have a random, representative sample, it has to be ensured that it’s truly random and representative.
People have a tendency to believe that the collected data follows a normal distribution. Sometimes they don’t even check the normality statistically. Another case is like, even after data is found to be non- normal statistically, people try to make it normal by removing some data points. It is logical to remove one or two points out of 15 to 20 points, if there are some assignable reasons. Other than that it is not a good practice, to simply remove the data points in order to make the distribution normal. It is essential to check the actual distribution of the data before going ahead with baselining. Control charts work on a normal data set only. One can check the distribution of the data visually using histograms or so, and can confirm the distribution statistically using some other tools (there are a plenty of excel addins to check the distribution).
Suppose in an organization yearly baselining is done. In the start of the year 2013 baselines were derived using data points in the previous year, say 2012. Objectives were set to ‘maintain the current process performance’ and no higher targets. And hence no improvement initiatives were triggered to raise the performance level. Next year, data points in the year 2013 were collected for baselining and it was confirmed statistically that both sets of data were equal (data points in 2012 and those in 2013), may the results from a 2 sample T test. Now which data set is taken by the organization for 2014 baselining? It is a common mistake to ignore the 2012 data and do the baselining with 2013 data points alone. Since both sets of data points were similar and statistically equal, both set must be combined in the chronological order while baselining.
Null hypothesis is rejected if p value is less than a significant level. In the industry, usually the significant level of P is taken as 0.05. Actually P value is an arbitrary value. Higher the p values means; risk attached with it is increasing as we reject a null hypothesis when it was actually true. (Refer more details of p value in the blog hypothesis test ) And it is up to the organization to decide that significant level.
Out of turn points cannot be removed if there are no assignable reasons behind it. If there is no reason for an out of turn point, it implies that data is not stable and one cannot go ahead with baselining.
Sometimes the control limits derived statistically during baselining process may be unworkable. Say for example a baseline of review effectiveness data (in %) cannot have an upper control limit (UCL) as 120% even though statistically it is correct. Similarly a coding speed baseline cannot have a lower control limits (LCL) as -15 lines of code/hr. All such values are unusable. So an organization needs to have a policy to handle such situations. Say for example, an organization can use 25th and 75th percentiles of the stable data as control limits in such scenario. Or organization can decide to change the LCL/UCL to the minimum/maximum permissible value of that parameter. i.e. organization can change the LCL of coding speed as ‘zero’ instead of a negative value and UCL of review effectiveness as 100% in the above examples.
Stating the context description involves a consistent understanding of the result of the measurement process. Contextual information refers to the additional data related to the environment in which a process is executed. As a part of contextual information, timestamp, context, measurement units etc. are collected.
Nowadays, our computer software supports a wide range of graphs. And people try to use those graphs altogether and finally making real stuffs hidden or complex. One must select the right graph to communicate the processed data. Run charts, pie charts, control charts and bar charts are all good means of communication, but the best fit must be chosen.
One must determine in advance how the processed data is going to be used. This helps to make good choices in what data to be collected (never waste time collecting data which won’t be used), what tool to be used. Also one must plan to measure everything needed to know how the effect of the change is going to be calculated. It is usually too late to go back and correct things if something is left out.