# ### Setting the Cutoff Criterion for Probabilistic Models

For decision making, human perception tends to arrange probabilities into above 50% and below - which is plausible. For most probabilistic models in contrast, this is not the case at all. Frequently, resulting probabilities are neither normal distributed between zero and one with a mean of 0.5 nor correct in terms of absolute values. This is not seldom an issue accompanied with the existence of a minority class - in the underlying dataset.

For example, if the result of a…

Continue

Added by Frank Raulf on January 4, 2020 at 3:00am — No Comments

### Naive Bayes Classifier using Kernel Density Estimation (with example)

Bayesian inference is the re-allocation of credibilities over possibilities [Krutschke 2015]. This means that a bayesian statistician has an “a priori” opinion regarding the probabilities of an event:

p(d)   (1)

By observing new data x, the statistician will adjust his opinions to get the "a posteriori" probabilities.

p(d|x)   (2)

The conditional probability of an event d given x is the share of  the joint…

Continue

Added by Frank Raulf on January 3, 2020 at 4:30am — No Comments

### Which one is faster in multiprocessing, R or Python?

This post is the third one of a series regarding loops in R an Python.

The first one was Different kinds of loops in R. The recommendation…

Continue

Added by Frank Raulf on December 19, 2019 at 9:00am — 1 Comment

### Omitted Variables in Linear Regressions

The importance of completeness of linear regressions is an often-discussed issue. By leaving out relevant variables the coefficients might be inconsistent.

But why on earth?!

Assuming a linear complete model of the form:

z = a + bx + cy + ε.

Where z is supposed to be dependent, x and y are independent and ε is the error term.

Now we drop y to check…

Continue

Added by Frank Raulf on November 13, 2019 at 2:00am — No Comments

### Loop-Runtime Comparison R, RCPP, Python

The positive reactions on my last post: “Different kinds of loops in R” lead me to compare some different versions of loops in R, RCPP (C++ integration of R). To see a bigger picture, I apply the Python for-loop additionally. The comparison focuses on the runtime for non-costly tasks with different numbers of iterations. For comparison purpose I create vectors in the form of (R syntax):

Vector <- 1:k

k = (1.000, 100.000, 1.000.000)

Continue

Added by Frank Raulf on September 1, 2019 at 4:30am — No Comments

### Different kinds of loops in R.

Normally, it is better to avoid loops in R. But for highly individual tasks a vectorization is not always possible. Hence, a loop is needed – if the problem is decomposable.

Which different kinds of loops exist in R and which one to use in which situation?

In each programming language, for- and while-loops (sometimes until-loops) exist. These loops are sequential and not that fast – in R.

for(i in…

Continue

Added by Frank Raulf on August 12, 2019 at 12:30am — No Comments

### Data Quality Maintenance

Added by Frank Raulf on August 2, 2019 at 12:00am — No Comments

2020

2019