intuition behind Naivebayes

Continuous Naive Bayes

Naive Bayes is a supervised machine learning algorithm. As the name implies it’s based on Bayes theorem. In this post, you will discover what’s happening behind the Naive Bayes classifier when you are dealing with continuous predictor variables.

Here I have used R language for coding. Let us see what’s going on behind the scenes in naiveBayes function when the features or predictor variables are continuous in nature.

Understanding Bayes’ theorem

A strong foundation on Bayes theorem as well as Probability functions (density function and distribution function) is essential if you really wanna get an idea of intuitions behind the Naive Bayes algorithm.

(You are free to skip this section if you are comfortable with Bayes’ theorem and you may jump to the next section on “How does probability is calculated in Naive Bayes?”)

Bayes’ theorem is all about finding a probability (we call it posterior probability) based on certain other probabilities which we know in advance.

As per the theorem,

P(A|B) = P(A) P(B|A)/P(B)

  • P(A|B) and P(B|A) are called the conditional probabilities where in P(A|B) means how often A happens given that B happens.
  • P(A) and P(B) are called the marginal probabilities which says how likely A or B is on its own (The probability of an event, irrespective of the outcomes of other random variables)

P(A/B) is what we are gonna predict, hence called as posterior probability also.

Now in real world we would be having many predictor variables and many class variables. For easy mapping let us call these classes as, C1, C2,…, Ck and the predictor variables (feature vectors) x1,x2,…,xn.

Then using Bayes theorem we would be measuring the conditional probability of an event with a feature vector x1,x2,…,xn belonging to a particular class Ci.

We can formulate the posterior probability P(c|x) from P(c), P(x) and P(x|c) as given below

Continuous Naive Bayes

How probability is calculated in Naive Bayes?

Usually we use the e1071 package to build a Naive Bayes classifier in R. And then using this classifier, we make some predictions on the training data.

So probability for these predictions can be directly calculated based on frequency of occurrences if the features are categorical.

But what if, there are features with continuous values? What the Naive Bayes classifier is actually doing behind the scenes to predict the probabilities of continuous data?

It’s nothing but usage of probability density functions. So here Naive Bayes is generating a Gaussian (Normal) distributions for each predictor variable. The distribution is characterized by two parameters, its mean and standard deviation. Then based on mean and standard deviation of the each predictor variable, the probability for a value to be ‘x’ is calculated using probability density function. (Probability density function gives the probability of observing a measurement with a specific value)

The normal distribution (bell curve) has density

where μ is the mean of the distribution and σ the standard deviation.

f(x) or the probability density function for a value ‘x’ can be calculated using some standard z-table calculations or in R language we have the dnorm function.

So in short once we know the distributions parameters (mean and standard deviation in case of normally distributed data) we can calculate any probability.

dnorm function in R

You can mirror what the naiveBayes function is doing by using dnorm (x, mean=, sd=) function for each class of outcomes. (remember, the class variable is categorical and features can be a mix of continuous and categorical). dnorm in R gives us the probability density function.

dnorm function in R is the back bone of continuous naiveBayes.

Understanding the intuitions behind continuous Naive Bayes – with iris data in R

Let us consider the Iris data in R language.

Iris dataset contains three plant species (setosa,viriginica,versicolor) and four features (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) measured for each sample.

First we will build the model using Naive Bayes function in e1071 package. And then given a set of features, say Sepal.Length=6.9, Sepal.Width=3.1, Petal.Length=5.0, Petal.Width=2.3 we will predict what would be the species.

So here is the complete code using naiveBayes function for predicting the species.

#Installing e1071 R Package



# Read the dataset


#studying structure of data


# Partitioning the dataset into training set and test set

split=sample.split(iris,SplitRatio =0.7)



# Fitting naïve_bayes model to training Set




classifier=naiveBayes(x = Trainingset[,-5],

y = Trainingset$Species)


# Predicting on test data

Y_Pred=predict(classifier,newdata = Testset)


# Confusion Matrix



#Probelm: given a set of features, find to which species that belongs

#defining a new set of data (features) to check the classification






Y_Pred=predict(classifier,newdata = newfeatures)


Now on executing the code you can see that the predicted species is Virginica as per naiveBayes function

And now here comes the most interesting part- what’s going on behind the scenes:

We know that Naive Bayes predict the results using probability density functions in the back end.

We are gonna straightaway find out the probabilities using dnorm function for each class variable. The result has to be same as that predicted by naiveBayes function.

For a given set of features,

  1. Based on the mean and standard deviation conditional probability would be derived.
  2. And then applying Baye’s theorem, probability for each species under the given set of predictor variables would be derived and compared against each other.
  3. The one with higher probability would be the predicted result.

Here is the complete code for using prediction by hand (with dnorm function)



# Read the dataset


#studying structure of data


# Partitioning the dataset into training set and test set

split=sample.split(iris,SplitRatio =0.7)



# Fitting naïve_bayes model to training Set




classifier=naiveBayes(x = Trainingset[,-5],

y = Trainingset$Species)

#Probelm: given a set of features, find to which species that belongs

#defining a new set of data (features) to check the classification





#Finding Class Prior Probabilities of each species

PriorProb_Setosa= mean(Trainingset$Species==’setosa’)

PriorProb_Virginica= mean(Trainingset$Species==’virginica’)

PriorProb_versicolor= mean(Trainingset$Species==’versicolor’)

#Species wise mean and standard deviation of Sepal Length

#Finding Conditional Probabilities or Likelihood or Prior Probabilities

Setosa= subset(Trainingset, Trainingset$Species==’setosa’)

Virginica= subset(Trainingset, Trainingset$Species==’virginica’)

Versicolor= subset(Trainingset, Trainingset$Species==’versicolor’)

Set=Setosa%>% summarise(mean(Sepal.Length),mean(Sepal.Width),mean(Petal.Length),mean(Petal.Width),


Vir=Virginica%>% summarise(mean(Sepal.Length),mean(Sepal.Width),mean(Petal.Length),mean(Petal.Width),


Ver=Versicolor%>% summarise(mean(Sepal.Length),mean(Sepal.Width),mean(Petal.Length),mean(Petal.Width),


Set_sl=dnorm(sl,mean=Set$`mean(Sepal.Length)`, sd=Set$`sd(Sepal.Length)`)

Set_sw=dnorm(sw,mean=Set$`mean(Sepal.Width)` , sd=Set$`sd(Sepal.Width)`)

Set_pl=dnorm(pl,mean=Set$`mean(Petal.Length)`, sd=Set$`sd(Petal.Length)`)

Set_pw=dnorm(pw,mean=Set$`mean(Petal.Width)` , sd=Set$`sd(Petal.Width)`)

#denominator would be same for all three probabilities. SO we can ignore them in calculations

ProbabilitytobeSetosa =Set_sl*Set_sw*Set_pl*Set_pw*PriorProb_Setosa

Vir_sl=dnorm(sl,mean=Vir$`mean(Sepal.Length)`, sd=Vir$`sd(Sepal.Length)`)

Vir_sw=dnorm(sw,mean=Vir$`mean(Sepal.Width)` , sd=Vir$`sd(Sepal.Width)`)

Vir_pl=dnorm(pl,mean=Vir$`mean(Petal.Length)`, sd=Vir$`sd(Petal.Length)`)

Vir_pw=dnorm(pw,mean=Vir$`mean(Petal.Width)` , sd=Vir$`sd(Petal.Width)`)

ProbabilitytobeVirginica =Vir_sl*Vir_sw*Vir_pl*Vir_pw*PriorProb_Virginica

Ver_sl=dnorm(sl,mean=Ver$`mean(Sepal.Length)`, sd=Ver$`sd(Sepal.Length)`)

Ver_sw=dnorm(sw,mean=Ver$`mean(Sepal.Width)` , sd=Ver$`sd(Sepal.Width)`)

Ver_pl=dnorm(pl,mean=Ver$`mean(Petal.Length)`, sd=Ver$`sd(Petal.Length)`)

Ver_pw=dnorm(pw,mean=Ver$`mean(Petal.Width)` , sd=Ver$`sd(Petal.Width)`)





On executing this code, you can see that the probability to be Virginica is higher than that of other two. And this implies that the given set of features belong to the class Virginica. And the same results were predicted using naiveBayes function.

A-priori probabilities and conditional probabilities

When you run the scripts in R for the continuous / numeric variables, you might have seen tables titled A-priori probabilities and conditional probabilities. A screenshot from console is given below

The table titled A-priori probabilities gives prior probability for each class (P(c)) in your training set. This gives the class distribution in the data(‘A priori’ is Latin for ‘from before’) which can be straight away calculated based on the number of occurrences as below,

P(c) = n(c)/n(S), where, P(c) is the probability of an event “c” n(c) is the number of favorable outcomes. n(S) is the total number of events in the sample space.

The given table of conditional probabilities is not showing the probabilities, but the distribution parameters (or rather the mean and standard deviation of the continuous data). Remember, if features were categorical, this table would be indicating the probability value itself.

Some Additional points to keep in mind

1. Rather than calculating the tables by hand you may just use the naiveBayes results itself

Here in the above script using dnorm, I have calculated mean and standard deviation by hand. Instead you can derive it using a simple code.

For example if you want to see the mean and standard deviation of sepal length for each species, just run this


2. Dropping the denominator (p(x)) probabilities in calculations

Have you noticed that I have dropped the denominator value in probability calculations?

Because the denominator (p(x)) would be same for all when we compare the probabilities for each class under the specified features. So we can just get rid of that. We just need to compare the top parts of the calculation. Also keep in mind that we are comparing the probabilities only and hence omitted the denominator. If we need to get the actual probability value, denominator shouldn’t be omitted.

3. What if the continuous data is not normally distributed?

There are of course other distributions like Bernoulli, multinomial etc, and not just Gausian distribution alone. But the logic behind all is the same: assuming the feature satisfies a certain distribution, estimating the parameters of the distribution, and then getting the probability density function.

4. Kernel based densities

Kernel based densities may perform better when continuous variables are not normally distributed. It might improve the test accuracy rate. While making the model input this code ‘useKernel=T’

5. Disretization strategy for continuous Naivebayes

For predicting the class-conditional probabilities for continuous attributes in naive Bayes classifiers we can use a disretization strategy also. Discretization works by breaking the data into categorical values. This approach which transforms the continuous attributes into ordinal attributes is not covered in this article at present.

6. Why Naivebayes is called Naive?

The Naive Bayes algorithm is called “Naive” because it makes the assumption that the occurrence of a certain feature is independent of the occurrence of other features while in reality, they may be dependent in some way!

covid analysis

Six plus months had elapsed since the World Health Organization declared Covid -19 as a pandemic. The daily confirmed cases are still rising, but interestingly google trend shows a lose of interest in searches related to Covid-19 recently. Maybe the initial panic has come down to a greater extent. But how long can the pandemic last? How many months more we have to live with this?

And India’s case load has become the world’s second highest. Now the question is “how many months more?

When can India get back on its feet?

With the help of worldometer data, an analysis was done using the growth/decay factor of daily new cases. By growth/decay factor, I meant the increase/decrease factor and not the % change.

As per this simple mathematical analysis, progression of Covid-19 is slowing down with time since the very beginning itself. That is, it’s not actually a growth factor, but a decay factor. (Anyway the term growth factor itself will be used till the end of this article).

Growth factor was calculated across months for the daily new confirmed cases since April 2020 (considerable cases were being reported in India since April 2020). And further growth of growth factor was also computed.

One straight observation was the approximately constant ‘growth of growth factor’ over the months. That is, ‘the increase of increase’ was not much fluctuating. Instead it was showing a somewhat constant figure.

Then, using this ‘growth of growth factor’, data points are extrapolated for future months. So as per the data, new confirmed cases might be highest somewhere in Sept-October and then it starts slowly declining.

Figure 1 reflects a sample trend of daily new confirmed cases across months.

Figure 1: Trend of daily new cases

Correlation – daily active cases Vs new cases

Using ggplot package in R, a scatter plot is generated for daily total active cases against new confirmed cases.(This plot has used data from Covid-19 package in R).

Total active cases on a day appears to be approximately ten times (especially since July 2020) of the new confirmed cases on that day. And which implies recoveries are progressing at a constant rate as of now. If any dip/delay occurs in medicare services, the total active cases would drastically increase and which would lead to a severe catastrophe.

Figure 2:Scatter plot


Hopefully India can get back on its feet by say, third quarter of 2021 with strict adherence to social distancing measures and better medicare services. Social distancing is a must as a single infected person can become a bigger vulnerability later. Even though social distancing won’t end the disease, it can save more lives.

And last, but not least,

Recovery is not actually the end of this crisis. We are yet to face the lingering impacts of Covid-19, So let us make ourselves immunized to the best way possible.


If the curve has been flattened, maybe we would have a better understanding and better predictions about the end of the pandemic. But the graphs are still rising or fluctuating.

More over we cannot expect a symmetric rise and fall of an epidemic. It could be a sharp rise and a little bit random decline after the peak. Then probably before touching the x axis, it may again surge back up and appear with another peak.

Hence I know it’s not wise to do such a forecasting especially when there are too many other factors at play like possibility of mutations happening to the virus gene, changes in testings etc.

Hence the data presented therein are purely based on my intuitions out of the mathematical analysis done and publicly available data at the time of publication.

And the information provided here are merely with an analysis purpose. I wouldn’t be responsible for any negative occurrences pertaining to the usage of this information. These reports are not peer-reviewed and therefore should not be treated as established information.

R language plays a major role in big data analysis. It is an open source programming language which mainly deals with statistical investigation of data.

R can be self- learnt easily with the help of some online courses or books.

After installing R and R studio, I would suggest to go with some books to kick start. Then read online, take some courses as in udemy and read more and more books. And keep practicing in the R studio. Even if you are without any prior programming experiences, this language is easily understandable and well structured.

Online courses are really worth as it gives you a one to one connection with the instructor while practicing. If you are serious of learning R, don’t be hesitated to take even paid courses.

Recommended books

R for Everyone: Advanced Analytics and Graphics 

Big Data in Practice : How 45 Successful Companies Used Big Data Analytics to Deliver Extraordinary Results

R’s creators: Ross Ihaka and Robert Gentleman (Stable beta version in 2000)

Again problems with apk installations via itunes..?
Apple had removed that app section in the latest version of iTunes 12.7 [released on September 19th 2017]. So there after this simple drag and drop solution as in the post Where in the world are the apps in iTunes library? helped to rectify the trouble.
But installing from the iTunes is not working for some times even with this drag and drop solution
I got a solution for this from Apple support site itself. Just go to this link and download the iTunes version Apple released
This itunes version can be used to download your app. And with this the previous App section is back in our ITunes folder.

While filing an issue, many a time the tester miss to add additional details specific to the bug. They must have done all the normal procedures, say like, steps, expected and actual results, environment details and all should have filed along with the issue.

But still sometimes the customer ends up in asking for further information, which is in fact a bit frustrating for them. During those cases both parties [customer and tester] need to invest additional effort to make the bug report adequate.

So just ensure a few of the points below if your bug is any of this type.

#If your bug is onDon't miss to attach
1Partial visibility of some buttons or something like thatDetails of screen resolution
2Some sentence correctionsYour suggestions on how to rewrite the sentence
3Some overlapping related issuesDetails in both landscape and portrait mode
4Page load issues Speed test result

A bug title is read more frequently than any other part of the bug report. So it has to be meaningful and easily understandable.

For example,

A bug : In the application while the user go to ‘xxx’ tab and search for an item with numeric characters, the app crashes

Poorly written bug title: App crashes

Here the bug title doesn’t summarize the scenario behind the crash

An average bug title: Searching in the ‘xxx’ tab leads to a crash

Here again the bug title is actually creating a false impression of the bug.

It could have better reworded like, “App crashes on searching in the xxx tab with numeric characters

Infact my suggestion would be to revisit and update the bug title finally, once the report is made fully.

After describing the issue, steps, expected and actual results, just look at the bug title. Definitely, you can do betterment on the same at the final stage.

So initially while starting to file the bug, just add a sentence or even some random keywords in the issue. Or even sometimes, just put whatever comes to your mind, say “my family”. [but of course don’t forget to change it properly at the end].

And finally, once the entire report is ready, revisit the bug title and update it properly. Spend a little more time on the issue title and have two or three revised titles so that it becomes fully meaningful at the end.

It’s always nice when you learn something new, try it yourself, test it, rectify the glitches, and finally feel the moment of fulfillment.

This is what I felt when I hosted my personal blog

There were so many issues while I was shifting from site to this self-hosted site. Infact actually it’s easy if you are starting from a self-hosted site itself as I did for… 656 more words

via Moving to Self-hosted WordPress Site — Words and Notion

Oops… that’s a show stopper one..

How did it happen..?

Most of the time when I try to reproduce  a crash issue I start thinking from the very first step and finally end up nowhere. Often the crash should have happened with the last step only.. Mostly it wouldn’t have any relation with the initial steps. So I am training myself to just recollect and redo the final steps.

Okay, so how all normally a show stopper issue or the so called crash issue happens in a mobile application?

I am adding a few of such checkpoints here…

  1. While the activity is playing in the app, just come out of the same by pressing the home button of the device and then try to return back
  2. Make the app in background by launching another app and then return to the app under test from recent window.
  3. Do interrupt the app via some mechanisms like phone calls or messaging
  4. Try an offline testing
  5. Try a test under weak battery
  6. Try offline-online switching frequently
  7. Landscape- portrait switching
  8. Access the app via multiple accounts at the same time
  9. Delete and already deleted item from the app if it’s shown for a while

How to install ipa on the device?

Are apps missing from the left section of your iTunes window?

Wondering where in the world are the apps in iTunes library?

A bit Google search helped me to understand that Apple had removed that app section in the latest version of iTunes 12.7 [released on September 19th 2017]

As a tester/developer it would be real trouble for you especially when you want to install a new apk in your ios device.

Never you are left alone. Always there is a workaround if something fails

I was so happy when I found this simple drag and drop solution to this trouble.

So here go the simple steps for you

  1. Save the ipa in your desktop
  2. Connect your ios device and your computer/laptop using the USB cable
  3. Open iTunes [might appear automatically]
  4. Drag and drop ipa on theDevices section on your device name only
  5. You are done.

It is time for a change, a renovation in process consultancy. For the last 40 or 50 years, the same steps are being followed; the same thought process is being applied with little or no change.

In between, industry specific new standards/models got evolved and practiced. Other than that no major change happened in the process consultancy arena. From where to start the change..?

If the right process is followed in the right way by the project team itself, then there is no need of a second person to ensure process compliance. But occurrence of mistakes is a common phenomenon especially when there is human intervention (and even with machines too). Hence it is always recommended to have an external person for verifying the process compliance.

The following questions may seem questioning the validity of existing well known standards or models. With due respect to all of those standards, I am trying to see the process frame work through different angles- “the second generation of Process Consultancy”.

• If all the projects are being executed in the same way following the same written process and procedures, how can they be innovative?

• If lessons earned are documented, passed to the followers and practiced, won’t they be copycats?

• Is the written process really fitting your projects or aren’t you pretending it to be okay, just to avoid the tailoring procedures?

• Is there really an established process for at least half part of your project or aren’t you just executing it as it come up on your way?

The questions won’t end continues till there is a second generation. And then, only those consultants who have subject expertise as well as analytical nature may be the most suitable for next century organizations.