Baselines are derived statistically using performance data collected over a period of time. They are indicators of current performance of an organization. Hence proper attention must be paid while deriving baselines as an error can cause even a loss of a business. There are some critical, but common mistakes observed in the baselining process as explained below. Crucial steps must be taken to avoid such mistakes.
Organization must plan and define measures that are tangible indicators of process performance. Baselining does not simply imply gathering and baselining the entire set of data available in the organization. Based on the business objectives, the critical processes of the organization whose performance needs to be analyzed is selected. Then process parameters for monitoring the same are defined, collected data and finally baselining done. There is no harm in collecting and baselining the entire parameters defined in the organization, but why should we waste our time collecting data which won’t be used.
For baselining with control charts, it is essential that the data to be chronological. Hence during data collection itself, time stamp of the data must be noted.
In software industry, often we hear complaints from baselining team regarding the deficiency of data points. And when the question is put on project team, they tell like “we just don’t have time” or “it is too difficult”. In order to derive baselines there needs to be a minimum number of data points, say like 10 or so. Then only, at least all the 4 rules of stability can be applied over the data points. But in a software industry people try to build baselines with 8 or less data points. Then it won’t indicate the correct performance level of the process under investigation. In such cases where number of data points is insufficient, baselining needs to be postponed. Or organization can plan to collect more samples by increasing the frequency of data collection.
While collecting as well as baselining data, one must use consistent methods and processes. What is being measured in the post baseline data needs to be same as what was measured in the baseline data collection process.
Data taken for baselining needs to be of homogenous nature. Otherwise the baselining output won’t give the correct indication of process performance. The data can be categorized based on the qualitative parameters like type of project, complexity of the work, nature of development, programming languages etc. instead of clubbing it altogether and thereby leading to loss the homogeneity
Usually it is a common mistake to take data blindly from organizational database and start the baselining process. Essentially, data must be verified to ensure its completeness, correctness and consistency before any statistical processing.
Processes that permits self-selection by respondents aren’t random samples and often aren’t representative of the target population. In order to have a random, representative sample, it has to be ensured that it’s truly random and representative.
People have a tendency to believe that the collected data follows a normal distribution. Sometimes they don’t even check the normality statistically. Another case is like, even after data is found to be non- normal statistically, people try to make it normal by removing some data points. It is logical to remove one or two points out of 15 to 20 points, if there are some assignable reasons. Other than that it is not a good practice, to simply remove the data points in order to make the distribution normal. It is essential to check the actual distribution of the data before going ahead with baselining. Control charts work on a normal data set only. One can check the distribution of the data visually using histograms or so, and can confirm the distribution statistically using some other tools (there are a plenty of excel addins to check the distribution).
Suppose in an organization yearly baselining is done. In the start of the year 2013 baselines were derived using data points in the previous year, say 2012. Objectives were set to ‘maintain the current process performance’ and no higher targets. And hence no improvement initiatives were triggered to raise the performance level. Next year, data points in the year 2013 were collected for baselining and it was confirmed statistically that both sets of data were equal (data points in 2012 and those in 2013), may the results from a 2 sample T test. Now which data set is taken by the organization for 2014 baselining? It is a common mistake to ignore the 2012 data and do the baselining with 2013 data points alone. Since both sets of data points were similar and statistically equal, both set must be combined in the chronological order while baselining.
Null hypothesis is rejected if p value is less than a significant level. In the industry, usually the significant level of P is taken as 0.05. Actually P value is an arbitrary value. Higher the p values means; risk attached with it is increasing as we reject a null hypothesis when it was actually true. (Refer more details of p value in the blog hypothesis test ) And it is up to the organization to decide that significant level.
Out of turn points cannot be removed if there are no assignable reasons behind it. If there is no reason for an out of turn point, it implies that data is not stable and one cannot go ahead with baselining.
Sometimes the control limits derived statistically during baselining process may be unworkable. Say for example a baseline of review effectiveness data (in %) cannot have an upper control limit (UCL) as 120% even though statistically it is correct. Similarly a coding speed baseline cannot have a lower control limits (LCL) as -15 lines of code/hr. All such values are unusable. So an organization needs to have a policy to handle such situations. Say for example, an organization can use 25th and 75th percentiles of the stable data as control limits in such scenario. Or organization can decide to change the LCL/UCL to the minimum/maximum permissible value of that parameter. i.e. organization can change the LCL of coding speed as ‘zero’ instead of a negative value and UCL of review effectiveness as 100% in the above examples.
Stating the context description involves a consistent understanding of the result of the measurement process. Contextual information refers to the additional data related to the environment in which a process is executed. As a part of contextual information, timestamp, context, measurement units etc. are collected.
Nowadays, our computer software supports a wide range of graphs. And people try to use those graphs altogether and finally making real stuffs hidden or complex. One must select the right graph to communicate the processed data. Run charts, pie charts, control charts and bar charts are all good means of communication, but the best fit must be chosen.
One must determine in advance how the processed data is going to be used. This helps to make good choices in what data to be collected (never waste time collecting data which won’t be used), what tool to be used. Also one must plan to measure everything needed to know how the effect of the change is going to be calculated. It is usually too late to go back and correct things if something is left out.