Why do we take logs of variable in Regression analysis?
We should remember that a regression equation has two parts
i) The Dependent variable (Predictand)
ii) The Independent variables (Predictors) ; which can be one or more and can be of different types (Categorical or Continuous).
The nature of the regression that we should run depends on the type of Dependent variable that we are dealing with in our model. For example, if the dependent variable is Continuous…Continue
Added by Sibashis Chakraborty on October 20, 2019 at 8:57am — No Comments
Summary: Python’s open-source and high-level nature, as well as its comprehensive libraries, make it the perfect fit to solve the numerous real-life ML challenges.
The increasing popularity and accessibility of Artificial Intelligence solutions is rapidly reshaping many industries, from healthcare through finance to aviation. Although the application of the latest technologies has always been an essential consideration for companies striving to get…Continue
Added by Łukasz Grzybowski on July 23, 2019 at 1:30am — No Comments
Credit risk or credit default indicates the probability of non-repayment of bank financial services that have been given to the customers. Credit risk has always been an extensively studied area in bank lending decisions. Credit risk plays a crucial role for banks and financial institutions, especially for commercial banks and it is always difficult to interpret and manage. Due to the advancements in technology, banks have managed to reduce the costs, in order to…Continue
Each year, Risk Quant Europe Conference, a conference well-attended by practitioners from banking, asset management, insurers as well as academics from Europe, selects two papers to present in their annual conference.
For 2018, our paper is lucky to be one of the two winning papers selected by the Advisory Board for the conference to be held in London. Please feel free to check out our paper titled CDS Rate Construction Methods by Machine Learning…Continue
Added by Zhongmin Luo on February 24, 2018 at 2:00am — No Comments
In the last post, we talked about how to estimate the coefficients or weights of linear regression. We estimated weights which give the minimum error. Essentially it is an optimization problem where we have to find the minimum error(cost) and the corresponding coefficients. In a way, all supervised learning algorithms have optimization at the crux of it where…Continue
Added by Jobil Louis on January 2, 2018 at 3:30pm — No Comments
Does it sound familiar to you? In order to get an idea of how to choose a parameter for a given classifier, you have to cross reference to a number of papers or books, which often turn out to present competing arguments for or against a certain parameterization choice but with few applications to real-world problems.
For example, you may find a few papers discussing optimal selection of K in…Continue
Cross Validation is often used as a tool for model selection across classifiers. As discussed in detail in the following paper https://ssrn.com/abstract=2967184, Cross Validation is typically performed in the following steps:
Added by Zhongmin Luo on June 2, 2017 at 7:00pm — No Comments
In practice, we often have to make parameterization choices for a given classifier in order to achieve optimal classification performances; just to name a few examples:
Added by Zhongmin Luo on May 29, 2017 at 12:49am — No Comments
Past literature show that the comparisons of classifier's performance are specific to the types of datasets (e.g., Pharmaceutical industry data) used; i.e., some classifiers may perform better in some context than others. A paper titled CDS Rate Construction Methods by Machine Learning Techniques conducts the performance comparison exclusively in the context of financial market by applying a wide range of classifiers to provide solution to so-called Shortage of…Continue
Added by Zhongmin Luo on May 23, 2017 at 1:30am — No Comments
Multicollinearity (Collinearity) is not a new term especially when dealing with multiple regression models. This phenomenon of relationship in between one response variable with the set of predictor variables also include models like classification and regression trees as well as neural networks. Collinearity is infamously famous for inflating the variance of at least one estimated regression coefficient, which can cause the model to predict erroneously and in a business setup it can have an…Continue
Added by Sunil Kappal on March 6, 2017 at 10:00am — No Comments
Linear Model better known as linear regression is one of the most common and flexible analysis framework to identify relationship between two or more variables. The widely used linear model is represented by drawing the best fit line through a series of data points represented on a scatter plot.
For any budding business analyst this should be the starting point to understand how model works at the very core of its design.
Selecting the Variables in Deducer…Continue
Added by Sunil Kappal on February 28, 2017 at 7:00am — No Comments
As we all know CRISP DM stands for Cross Industry Standard Process for Data Mining is a process model that outlines the most common approach to tackle data driven problems. Per the poll conducted by KDNuggets in 2014 this was and “is” one of the most popular and widest used methodology. This method of gleaning insights out of the data is very dear to the industry experts and data miners.
As the title suggest I will align some of the most useful R packages with this most popular and…Continue
As per the largest market research firm MarketsandMarkets the speech analytics industry will grow to USD 1.60 billion by 2020 at a Compound Annual Growth Rate (CAGR) of 22% from 2015 to 2020. Today the omnichannel world consists of voice, email, chat, social channels, and surveys, and each channel has its own importance.
Therefore, it becomes inevitable for any customer centric organization to ignore the information that can be glean…Continue
As the world is getting more tech savvy and advancements made in the information technology especially in the healthcare industry has opened areas in data mining and machine learning. Within the area of data mining one technique which has gained a lot of popularity as well as skepticism among the auditors and fraud detectives is Benford’s Law or “The Law of First digit.
In the past some researchers in Canada used the Benford’s Law distribution to detect anomalies within the claims…Continue
Best Subset Regression method can be used to create a best-fitting regression model. This technique of model building helps to identify which predictor (independent) variables should be included in a multiple regression model(MLR).
This method comprises of scrutinizing all of the models created from all possible permutation combination of predictor variables. This technique uses the R Squared value to check for the best model. Considering the level of complexity involved in creating…Continue
This Tutorial talks about basics of Linear regression by discussing in depth about the concept of Linearity and Which type of linearity is desirable.
In Linear Regression the term linear is understood in 2 ways -
Added by Shantanu Deo on March 16, 2016 at 4:30am — No Comments
We have witnessed the rise of Key & Value pair, since the emergence of Big Data. We certainly can explore the relationship of such two variables in terms of X & Y, to be worked with in terms of using Data Science. The use of Regression also on basic terms gives an a depiction of two variables X & Y to work with. These variables are:
Independent Variables & Dependent Variables
Let us take behavior of users of a…Continue
Added by Atif Farid Mohammad on May 25, 2015 at 6:00am — No Comments