**Summary**: Python’s open-source and high-level nature, as well as its comprehensive libraries, make it the perfect fit to solve the numerous real-life ML challenges.

The increasing popularity and accessibility of Artificial Intelligence solutions is rapidly reshaping many industries, from healthcare through finance to aviation. Although the application of the latest technologies has always been an essential consideration for companies striving to get…

ContinueAdded by Łukasz Grzybowski on July 23, 2019 at 1:30am — No Comments

Credit risk or credit default indicates the probability of non-repayment of bank financial services that have been given to the customers. Credit risk has always been an extensively studied area in bank lending decisions. Credit risk plays a crucial role for banks and financial institutions, especially for commercial banks and it is always difficult to interpret and manage. Due to the advancements in technology, banks have managed to reduce the costs, in order to…

ContinueAdded by Shruti Goyal on March 14, 2018 at 11:30am — 5 Comments

Each year, Risk Quant Europe Conference, a conference well-attended by practitioners from banking, asset management, insurers as well as academics from Europe, selects two papers to present in their annual conference.

For 2018, our paper is lucky to be one of the two winning papers selected by the Advisory Board for the conference to be held in London. Please feel free to check out our paper titled *CDS Rate Construction Methods by Machine Learning…*

Added by Zhongmin Luo on February 24, 2018 at 2:00am — No Comments

In the last post, we talked about how to estimate the coefficients or weights of linear regression. We estimated weights which give the minimum error. Essentially it is an optimization problem where we have to find the minimum error(cost) and the corresponding coefficients. In a way, all supervised learning algorithms have optimization at the crux of it where…

ContinueAdded by Jobil Louis on January 2, 2018 at 3:30pm — No Comments

Does it sound familiar to you? In order to get an idea of how to choose a parameter for a given classifier, you have to cross reference to a number of papers or books, which often turn out to present competing arguments for or against a certain parameterization choice but with few applications to real-world problems.

For example, you may find a few papers discussing optimal selection of K in…

ContinueAdded by Zhongmin Luo on June 5, 2017 at 7:30pm — 6 Comments

**Cross Validation** is often used as a tool for model selection across classifiers. As discussed in detail in the following paper https://ssrn.com/abstract=2967184, Cross Validation is typically performed in the following steps:

- Step 1: Divide the original sample into K sub samples; each subsample typically has equal sample size and is referred to as one fold, altogether,…

Added by Zhongmin Luo on June 2, 2017 at 7:00pm — No Comments

In practice, we often have to make parameterization choices for a given classifier in order to achieve optimal classification performances; just to name a few examples:

- Neural Network: e.g., the optimal choice of Activation Functions, # of hidden units
- Support Vector Machine: e.g., the optimal choice of Kernel Functions
- Ensemble: e.g., the number of Learning Cycles for Bagging.
- Discriminant Analysis: e.g., Linear/Quadratic; regularization…

Added by Zhongmin Luo on May 29, 2017 at 12:49am — No Comments

Past literature show that the comparisons of classifier's performance are specific to the types of datasets (e.g., Pharmaceutical industry data) used; i.e., some classifiers may perform better in some context than others. A paper titled **CDS Rate Construction Methods by Machine Learning Techniques** conducts the performance comparison exclusively in the context of financial market by applying a wide range of classifiers to provide solution to so-called **Shortage of…**

Added by Zhongmin Luo on May 23, 2017 at 1:30am — No Comments

Multicollinearity (Collinearity) is not a new term especially when dealing with multiple regression models. This phenomenon of relationship in between one response variable with the set of predictor variables also include models like classification and regression trees as well as neural networks. Collinearity is infamously famous for inflating the variance of at least one estimated regression coefficient, which can cause the model to predict erroneously and in a business setup it can have an…

ContinueAdded by Sunil Kappal on March 6, 2017 at 10:00am — No Comments

Linear Model better known as linear regression is one of the most common and flexible analysis framework to identify relationship between two or more variables. The widely used linear model is represented by drawing the best fit line through a series of data points represented on a scatter plot.

For any budding business analyst this should be the starting point to understand how model works at the very core of its design.

Selecting the Variables in Deducer…

ContinueAdded by Sunil Kappal on February 28, 2017 at 7:00am — No Comments

As we all know CRISP DM stands for Cross Industry Standard Process for Data Mining is a process model that outlines the most common approach to tackle data driven problems. Per the poll conducted by KDNuggets in 2014 this was and “is” one of the most popular and widest used methodology. This method of gleaning insights out of the data is very dear to the industry experts and data miners.

As the title suggest I will align some of the most useful R packages with this most popular and…

ContinueAdded by Sunil Kappal on February 6, 2017 at 8:00am — 1 Comment

**A**s per the largest market research firm MarketsandMarkets the speech analytics industry will grow to USD 1.60 billion by 2020 at a Compound Annual Growth Rate (CAGR) of 22% from 2015 to 2020. Today the omnichannel world consists of voice, email, chat, social channels, and surveys, and each channel has its own importance.

Therefore, it becomes inevitable for any customer centric organization to ignore the information that can be glean…

ContinueAdded by Sunil Kappal on January 25, 2017 at 8:00am — 3 Comments

As the world is getting more tech savvy and advancements made in the information technology especially in the healthcare industry has opened areas in data mining and machine learning. Within the area of data mining one technique which has gained a lot of popularity as well as skepticism among the auditors and fraud detectives is Benford’s Law or “The Law of First digit.

In the past some researchers in Canada used the Benford’s Law distribution to detect anomalies within the claims…

ContinueAdded by Sunil Kappal on January 16, 2017 at 5:12am — 5 Comments

Best Subset Regression method can be used to create a best-fitting regression model. This technique of model building helps to identify which predictor (independent) variables should be included in a multiple regression model(MLR).

This method comprises of scrutinizing all of the models created from all possible permutation combination of predictor variables. This technique uses the R Squared value to check for the best model. Considering the level of complexity involved in creating…

ContinueAdded by Sunil Kappal on December 29, 2016 at 8:00am — 7 Comments

This Tutorial talks about basics of Linear regression by discussing in depth about the concept of Linearity and Which type of linearity is desirable.

In Linear Regression the term linear is understood in 2 ways -

- Linearity in variables
- Linearity in parameters…

Added by Shantanu Deo on March 16, 2016 at 4:30am — No Comments

We have witnessed the rise of **Key & Value pair**, since the emergence of Big Data. We certainly can explore the relationship of such two variables in terms of X & Y, to be worked with in terms of using Data Science. The use of Regression also on basic terms gives an a depiction of two variables X & Y to work with. These variables are:

**Independent** Variables & **Dependent** Variables

Let us take behavior of users of a…

ContinueAdded by Atif Farid Mohammad on May 25, 2015 at 6:00am — No Comments

- How Can Python Help Solve Machine Learning Challenges?
- Credit Risk Prediction Using Artificial Neural Network Algorithm
- Machine Learning Techniques based paper one of the two Risk Quant Europe 2018 Call for Paper Winners
- Optimization techniques: Finding maxima and minima
- A Guide for Applying Machine Learning Techniques in Finance
- Choice of K in K-fold Cross Validation for Classification in Financial Market
- Parameter Selection in Classification for Financial Market

- Fraud analysis using speech analytics and Monte Carlo
- Credit Risk Prediction Using Artificial Neural Network Algorithm
- Useful R Packages that Aligns with The CRISP DM Methodology
- How Can Python Help Solve Machine Learning Challenges?
- A Guide for Applying Machine Learning Techniques in Finance
- Optimization techniques: Finding maxima and minima
- Apply Machine Learning Techniques to Problems in Financial Market

**2019**

**2018**

- December (109)
- November (108)
- October (114)
- September (116)
- August (119)
- July (109)
- June (133)
- May (135)
- April (118)
- March (137)
- February (135)
- January (130)

**2017**

- December (110)
- November (152)
- October (199)
- September (153)
- August (237)
- July (160)
- June (187)
- May (165)
- April (175)
- March (207)
- February (153)
- January (168)

**2016**

- December (129)
- November (164)
- October (157)
- September (174)
- August (170)
- July (137)
- June (225)
- May (177)
- April (170)
- March (201)
- February (181)
- January (199)

**2015**

- December (228)
- November (295)
- October (245)
- September (240)
- August (178)
- July (157)
- June (154)
- May (143)
- April (168)
- March (127)
- February (134)
- January (128)

**2014**

- December (104)
- November (113)
- October (141)
- September (129)
- August (101)
- July (104)
- June (91)
- May (120)
- April (86)
- March (117)
- February (99)
- January (112)

**2013**

- December (90)
- November (93)
- October (113)
- September (83)
- August (77)
- July (68)
- June (57)
- May (59)
- April (44)
- March (51)
- February (41)
- January (61)

**2012**

- December (39)
- November (65)
- October (73)
- September (44)
- August (23)
- July (20)
- June (22)
- May (51)
- April (40)
- March (26)
- February (37)
- January (18)

**2011**

- December (58)

**1999**

- November (3)

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions