In this blog, I will introduce a R package for Heterogeneous Ensemble Learning (Classification, Regression) that is fully automated. It significantly lowers the barrier for the practitioners to apply heterogeneous ensemble learning techniques in an amateur fashion to their everyday predictive problems.…Continue
Added by Ajay Arunachalam on November 7, 2020 at 1:50am — No Comments
Mobile health (mHealth) is considered one of the most transformative drivers for health informatics delivery of ubiquitous medical applications. Machine learning has proven to be a powerful tool in classifying medical images for detecting various diseases. However, supervised machine learning requires a large amount of data to train the model, whose storage and processing pose considerable system requirements challenges for mobile applications. Therefore, many studies focus on…Continue
Added by AI on August 16, 2019 at 6:00am — No Comments
Ecommerce sites generate tons of web server log data which can provide valuable insights through analysis. For example, if we know which users are more likely to buy a product, we can perform targeted marketing, improve relevant product placement on our site and lift conversion rates. However, raw web logs are often enormous and messy so preparing the data to train a predictive model is time consuming for data scientists.…
Added by Ayumi Owada on July 18, 2019 at 2:00pm — No Comments
Stochastic Signal Analysis is a field of science concerned with the processing, modification and analysis of (stochastic) signals.
Anyone with a background in Physics or Engineering knows to some degree about signal analysis techniques, what these technique are and how they can be used to analyze, model and classify signals.
Data Scientists coming from a different fields, like Computer Science or Statistics, might not be aware of the analytical power these techniques bring with…Continue
Added by Ahmet Taspinar on April 12, 2018 at 6:00am — No Comments
Each year, Risk Quant Europe Conference, a conference well-attended by practitioners from banking, asset management, insurers as well as academics from Europe, selects two papers to present in their annual conference.
For 2018, our paper is lucky to be one of the two winning papers selected by the Advisory Board for the conference to be held in London. Please feel free to check out our paper titled CDS Rate Construction Methods by Machine Learning…Continue
Added by Zhongmin Luo on February 24, 2018 at 2:00am — No Comments
Binary classification is one of the most frequent studies in applied machine learning problems in various domains, from medicine to biology to meteorology to malware analysis. Many researchers use some performance metrics in their classification studies to report their success. However, the literature has shown a widespread confusion about the terminology and ignorance of the fundamental aspects behind metrics.
In our paper tittled "Binary Classification Performance…Continue
In this article, a semi-supervised classification algorithm implementation will be described using Markov Chains and Random Walks. We have the following 2D circles dataset (with 1000 points) with only 2 points labeled (as shown in the figure, colored red and blue respectively, for all others the labels are unknown, indicated by the…Continue
Does it sound familiar to you? In order to get an idea of how to choose a parameter for a given classifier, you have to cross reference to a number of papers or books, which often turn out to present competing arguments for or against a certain parameterization choice but with few applications to real-world problems.
For example, you may find a few papers discussing optimal selection of K in…Continue
Cross Validation is often used as a tool for model selection across classifiers. As discussed in detail in the following paper https://ssrn.com/abstract=2967184, Cross Validation is typically performed in the following steps:
In practice, we often have to make parameterization choices for a given classifier in order to achieve optimal classification performances; just to name a few examples:
Added by Zhongmin Luo on May 29, 2017 at 12:49am — No Comments
Past literature show that the comparisons of classifier's performance are specific to the types of datasets (e.g., Pharmaceutical industry data) used; i.e., some classifiers may perform better in some context than others. A paper titled CDS Rate Construction Methods by Machine Learning Techniques conducts the performance comparison exclusively in the context of financial market by applying a wide range of classifiers to provide solution to so-called Shortage of…Continue
Added by Zhongmin Luo on May 23, 2017 at 1:30am — No Comments
Machine Learning is a vast area of Computer Science that is concerned with designing algorithms which form good models of the world around us (the data coming from the world around us).
Within Machine Learning many tasks are - or can be reformulated as - classification tasks.
In classification tasks we are trying to produce a model which can give the correlation…Continue
Added by Ahmet Taspinar on December 15, 2016 at 2:00pm — No Comments
A pdf version of this document created using latex can be downloaded by clicking here.
Polymorphic malware detection is challenging due to the continual mutations miscreants introduce to successive instances of a particular virus. Such changes are akin to mutations in biological sequences. Recently, high-throughput methods for gene sequence…
Added by Jake Drew Ph.D. on May 22, 2016 at 6:32pm — No Comments
Reading the academic literature Text Analytics seems difficult. However, applying it in practice has shown us that Text Classification is much easier than it looks. Most of the Classifiers consist of only a few lines of code.In this three-part blog series we will examine the three well-known Classifiers; the Naive Bayes, Maximum Entropy and Support Vector Machines. From the…Continue
Added by Ahmet Taspinar on February 15, 2016 at 10:00pm — No Comments
Many Machine Learning articles and papers describe the wonders of the Support Vector Machine (SVM) algorithm. Nevertheless, when using it on real data trying to obtain a high accuracy classification, I stumbled upon several issues.
I will try to describe the steps I took to make the algorithm work in practice.
This model was implemented…
Added by Renata Ghisloti Duarte Souza Gra on December 18, 2015 at 5:00pm — No Comments