Subscribe to DSC Newsletter

Maiia Bakhova's Blog (7)

San Diego Water Pollution Map by Stations

I have been working with San Diego Water quality data project:

https://www.sandiegodata.org/2018/04/summer-water-quality-data-project/

Here are data sets:

https://data.sandiegodata.org/dataset?tags=water-project

Regretfully my complete works do not fit into the blog post (or even a few posts) because of a post…

Continue

Added by Maiia Bakhova on November 6, 2018 at 11:46am — No Comments

Neural Networks as a Corporation Chain of Command

Neural networks are considered complicated and they are always explained using neurons and a brain function. But we do not need to learn how to brain works to understand Neural networks structure and how they operate. We can look as something people encounter in everyday life more…

Continue

Added by Maiia Bakhova on June 26, 2017 at 10:00am — 2 Comments

Detection of Practical Dependency of Variables with Confidence Intervals

This is an article which attempts to detect dependable variables with non-linear method.

I'm going to apply a method for checking variable dependency which was introduced in my previous post. Because the "dependency" I get with this rule is not true dependency as defined in Probability then I will call variables practically dependent at a confidence level…

Continue

Added by Maiia Bakhova on November 2, 2016 at 11:30am — No Comments

Measuring Dependence of Variables with Confidence Intervals.

In this post I will sometimes use a term “variable” for “feature”(“predictor”“) or”outcome“(”predicted value“”).

The question of variable dependencies for a particular data is quite important, because it can help to reduce an amount of predictors used for a model. Or it can tell us what feature is not helpful for a model construction, although it still can be used for engineering of another predictor. For example sometimes it is better to compute speed than to use distance values. In…

Continue

Added by Maiia Bakhova on September 6, 2016 at 1:07pm — No Comments

Visualizing Bagged Trees as Approximating Borders

The bagged trees algorithm is a commonly used classification method. By resampling our data and creating trees for the resampled data, we can get an aggregated vote of classification prediction. In this blog post I will demonstrate how bagged trees work visualizing each step.…

Continue

Added by Maiia Bakhova on May 18, 2016 at 2:12pm — No Comments

Improving performance of random forests for a particular value of outcome by adding chosen features

Choosing features to improve a performance of a particular algorithm is a difficult question. Currently here is PCA, which is difficult to understand (although it can be used out-of-the-box), requires centralizing and scaling of features and is not easy to interpret. In addition, it does not allows to improve prediction performance for a particular outcome (if its accuracy is lower than for others or it has a particular importance). My method  enables to use features without preprocessing.…

Continue

Added by Maiia Bakhova on May 5, 2016 at 11:30am — No Comments

Choosing features for random forests algorithm

There are many ways to choose features with given data, and it is always a challenge to pick up the ones with which a particular algorithm will work better. Here I will consider data from monitoring performance of physical exercises with wearable accelerometers, for example, wrist bands.

The data for this project come from this source: http://groupware.les.inf.puc-rio.br/har.

In this project, researchers used data from…

Continue

Added by Maiia Bakhova on February 18, 2016 at 11:00am — 5 Comments

Videos

  • Add Videos
  • View All

Follow Us

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service