There are many ways to deal with time-data. Sometimes one can use it as time-series to take possible trends into account. Sometimes this is not possible because time can not be arranged in a sequence. For example, if there are just weekdays (1 to 7) in a dataset over several month. In this case one could use one-hot-encoding. However, considering minutes or seconds of a day one-hot-encoding might lead to high complexity. Another approach is to make time cyclical. This approach leads to a…

ContinueAdded by Frank Raulf on January 26, 2020 at 4:00am — No Comments

For decision making, human perception tends to arrange probabilities into above 50% and below - which is plausible. For most probabilistic models in contrast, this is not the case at all. Frequently, resulting probabilities are neither normal distributed between zero and one with a mean of 0.5 nor correct in terms of absolute values. This is not seldom an issue accompanied with the existence of a minority class - in the underlying dataset.

*For example, if the result of a…*

Added by Frank Raulf on January 4, 2020 at 3:00am — No Comments

Bayesian inference is the re-allocation of credibilities over possibilities [Krutschke 2015]. This means that a bayesian statistician has an “a priori” opinion regarding the probabilities of an event:

p(d) (1)

By observing new data x, the statistician will adjust his opinions to get the "a posteriori" probabilities.

p(d|x) (2)

The conditional probability of an event d given x is the share of the joint…

ContinueAdded by Frank Raulf on January 3, 2020 at 4:30am — No Comments

This post is the third one of a series regarding loops in R an Python.

The first one was Different kinds of loops in R. The recommendation…

ContinueAdded by Frank Raulf on December 19, 2019 at 9:00am — 2 Comments

The importance of completeness of linear regressions is an often-discussed issue. By leaving out relevant variables the coefficients __might__ be inconsistent.

But why on earth?!

Assuming a linear complete model of the form:

*z = a + bx + cy + **ε**.*

Where *z* is supposed to be dependent, *x* and *y* are independent and *ε* is the error term.

Now we drop *y* to check…

Added by Frank Raulf on November 13, 2019 at 2:00am — No Comments

The positive reactions on my last post: “Different kinds of loops in R” lead me to compare some different versions of loops in R, RCPP (C++ integration of R). To see a bigger picture, I apply the Python for-loop additionally. The comparison focuses on the runtime for non-costly tasks with different numbers of iterations. For comparison purpose I create vectors in the form of (R syntax):

Vector <- 1:k

k = (1.000, 100.000, 1.000.000)

The task is to…

ContinueAdded by Frank Raulf on September 1, 2019 at 4:30am — 1 Comment

Normally, it is better to avoid loops in R. But for highly individual tasks a vectorization is not always possible. Hence, a loop is needed – if the problem is decomposable.

Which different kinds of loops exist in R and which one to use in which situation?

In each programming language, for- and while-loops (sometimes until-loops) exist. These loops are sequential and not that fast – in R.

*for(i in…*

Added by Frank Raulf on August 12, 2019 at 12:30am — No Comments

- How to make time-data cyclical for prediction?
- Setting the Cutoff Criterion for Probabilistic Models
- Naive Bayes Classifier using Kernel Density Estimation (with example)
- Which one is faster in multiprocessing, R or Python?
- Omitted Variables in Linear Regressions
- Loop-Runtime Comparison R, RCPP, Python
- Different kinds of loops in R.

- Which one is faster in multiprocessing, R or Python?
- Different kinds of loops in R.
- Loop-Runtime Comparison R, RCPP, Python
- Naive Bayes Classifier using Kernel Density Estimation (with example)
- Omitted Variables in Linear Regressions
- How to make time-data cyclical for prediction?
- Setting the Cutoff Criterion for Probabilistic Models

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions