Written by Jun Wu.Every year since I worked on wall street, traditional trading-is-an-art-form traders are leaving and retiring.Their jobs are replaced by a new breed of experts who are savvy with numbers, systems, and the market. These experts don’t sleep, eat or drink. They are the AI Systems that can run on a thousand machines to analyze information from markets, social media, corporate filings, and economic conditions to quickly decipher which trades to make at any given moment in time.Problems Arising from the Use of AI for Trading PurposesThe “Black Box” of algorithms, particularly deep learning algorithms make grasping how decisions are made virtually impossible. These include trading decisions, investment decisions, and risk management decisions. The communication mechanisms inside the AI System is not transparent. When money is lost, it’s difficult for the hedge fund or the regulatory body to reconcile that loss to any foul-play. If the AI System is at the center, then it is the AI System that is responsible. But, who is responsible for the AI System? If a third party AI system is used, is it the financial management firm that’s responsible or the company that created the AI System that’s responsible?The problem is further complicated when multiple intermediaries experience rapid loss in a short span of time. Then, it leads to a new kind of systematic risk. Volatility in the market begets more volatility in the AI world. As volatility inputs are fed into AI systems, it accounts for that volatility by making new trading decisions that can potentially increase volatility.Read the full article here.See More

The explanation of Logistic Regression as a Generalized Linear Model and use as a classifier is often confusing.In this article, I try to explain this idea from first principles. This blog is part of my forthcoming book on the Mathematical foundations of Data Science. If you are interested in knowing more, please follow me on linkedin Ajit JaokarWe take the following approach:We see first briefly how linear regression worksWe then explore the assumptions and limitations of linear regression.Following this, we then show how these limitations can be overcome by the Generalized Linear Model (GLM)Finally, we explore Logistic Regression as a GLMExplanation of Linear RegressionMachine learning involves creating a model of a process. To create a model of a process, we need to identify patterns in data. Broadly, patterns in data can be of two types: The signal (data generating process) and the variation (error generation process). The simplest model to start off with is the Linear Regression model. Linear models have some advantages – for example, they are relatively simple to implement, and many phenomenon can be modelled using linear regressionAssumptions of Linear RegressionLinear regression has the following requirements (assumptions for use)As per the name, Linear regression needs the relationship between the independent and dependent variables to be linear.the linear regression analysis requires all variables to be multivariate normal distribution.No multicollinearity in the data. Multicollinearity occurs when the independent variables are highly correlated with each other.linear regression analysis requires that there is little or no autocorrelation in the data. Autocorrelation is the correlation of a signal with a delayed copy of itself as a function of delay.Homoscedasticity (the residuals are equal across the regression line). Overcoming the requirement that the dependent(response) variable is of normal distributionThe requirement that the response variable is of normal distribution excludes many cases such as:Where the response variable is expected to be always positive and varying over a wide range orWhere constant input changes lead to geometrically varying, rather than continually varying, output changes. We can illustrate these using examples: Suppose we have a model which predicts that a 10 degree temperature decrease would lead to 1,000 fewer people visiting the beach. This model does not work over small and large beaches. (Here, we could consider a small beach as one where expected attendance is 50 people and a large beach as one where the expected attendance was 10,000.). For the small beach (50 people), the model implies that -950 people would attend the beach. This prediction is obviously not correct. This model would also not work if we had a situation where we had an output that was bounded on both sides – for example in the case of a yes/no choice. This is represented by a Bernoulli variable where the probabilities are bounded on both ends (they must be between 0 and 1). If our model predicted that a change in 10 degrees makes a person twice as likely to go to the beach, as temperatures increases by 10 degrees, this model does not work because probabilities cannot be doubled. Generalised linear models (GLM) cater to these situations by allowing for response variables that have arbitrary distributions (other than only normal distributions), and by using an arbitrary function of the response variable (called the link function) to vary linearly with the predicted values (rather than assuming that the response itself must vary linearly with the predictor). Thus, in a generalised linear model (GLM), each outcome Y of the dependent variables is assumed to be generated from the exponential family of distributions (which includes distributions such as the normal, binomial, Poisson and gamma distributions, among others). GLM thus expands the scenarios in which linear regression can apply by expanding the possibilities of the outcome variable. GLM uses the maximum likelihood estimation of the model parameters for the exponential family and least squares for normal linear models. (Note the section is adapted from Wikipedia) Logistic Regression as GLMTo understand how logistic regression can be seen as GLM, we can elaborate this approach as follows: Logistic regression measures the relationship between the dependent variable and one or more independent variables(features) by estimating probabilities using the underlying logit function. In statistics, the logit function or the log-odds is the logarithm of the odds. Given a probability p, the corresponding odds are calculated as p / (1 – p). For The logit function is the logarithm of the odds: logit(x) = log(x / (1 – x)). The Odds describes the ratio of success to ratio of failure. The Odds ratio is the ratio of odds and is calculated as the ratio of odds for each group. The inverse of the logit function is the sigmoid function. The formula for the sigmoid function is σ(x) = 1/(1 + exp(-x)). The sigmoid function maps probabilities to the range [0, 1] – and this makes logistic regression as a classifier. Thus, many models have data generating processes that can be linearized by considering the inverse The logit and the sigmoid functions are useful in analysis because their gradients are simple to calculate. Many optimization and machine learning techniques make use of gradients ( for example in neural networks). The biggest drawback of the sigmoid function for many analytics practitioners is the so-called “vanishing gradient” problem. This blog is part of my forthcoming book on the Mathematical foundations of Data Science. If you are interested in knowing more, please follow me on linkedin Ajit JaokarReferences byrneslab.netlogit of logistic regression understanding the fundamentalsnathanbrixius -logit-and-sigmoidquora.com – Why is logistic regression called regression if it doesn’t model continuous outcomesSee More

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link. AnnouncementsDeep Learning and AI For All - eBookDemocratizing Data Analytics and AI - Sep 25Migrating R Applications to the Cloud using Databricks - Sep 26Data Engineering, Prep & Labeling for AI (Cognilytica 2019 report)Forecasting Using TensorFlow and FB's Prophet - Oct 17Building Accessible Dashboards in Tableau - Oct 15Featured Resources and Technical Contributions The Math of Machine Learning - Berkeley University TextbookExplaining Logistic Regression as Generalized Linear ModelCorrelation Coefficients in One PictureAWK - a Blast from Wrangling Past.Authorship Analysis as a Text Classification/Clustering ProblemBoosting your Machine Learning productivity with SAS ViyaGoogle Released Angular 7: What’s New in Angular 7?Water Dataset Provides Ground-Level Insight into Business RiskQuestion: Blending 4 data sourcesQuestion: Cleaning responses to meet quotas after samplingFeatured ArticlesHow AI/ML Could Return Manufacturing Prowess Back to USApplications of Data AnalyticsMS Data Science vs MS Machine Learning / AI vs MS Analytics +AI trading the marketThe simplest explanation of machine learning you’ll ever readDesigning an Analytics RoadmapHow to set up an intelligent automation CoEBest Paths to Becoming a Great Data ScientistA data-based view of customer analysisHow AI Is Changing Cyber SecurityPicture of the WeekSource: article flagged with a + To make sure you keep getting these emails, please add mail@newsletter.datasciencecentral.com to your address book or whitelist us. To subscribe, click here. Follow us: Twitter | Facebook.See More

This document is an attempt to provide a summary of the mathematical background needed for an introductory class in machine learning, which at UC Berkeley is known as CS 189/289A.Our assumption is that the reader is already familiar with the basic concepts of multivariable calculus and linear algebra (at the level of UCB Math 53/54). We emphasize that this document is not a replacement for the prerequisite classes. Most subjects presented here are covered rather minimally; we intend to give an overview and point the interested reader to more comprehensive treatments for further details.Note that this document concerns math background for machine learning, not machine learning itself. We will not discuss specific machine learning models or algorithms except possibly in passing to highlight the relevance of a mathematical concept.Contents1 About 12 Notation 53 Linear Algebra 63.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.1.1 Euclidean space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63.1.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2.1 The matrix of a linear map . . . . . . . . . . . . . . . . . . . . . . . . . 83.2.2 Nullspace, range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.3 Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.4 Normed spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.5 Inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.5.1 Pythagorean Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113.5.2 Cauchy-Schwarz inequality . . . . . . . . . . . . . . . . . . . . . . . . . 113.5.3 Orthogonal complements and projections . . . . . . . . . . . . . . 123.6 Eigenthings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.7 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.8 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.9 Orthogonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.10 Symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.10.1 Rayleigh quotients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.11 Positive (semi-)definite matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.11.1 The geometry of positive definite quadratic forms . . . . . . . . . .193.12 Singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.13 Fundamental Theorem of Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . 213.14 Operator and matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.15 Low-rank approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.16 Pseudoinverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.17 Some useful matrix identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.17.1 Matrix-vector product as linear combination of matrix columns . . . . . 263.17.2 Sum of outer products as matrix-matrix product . . . . . . . . . . . . . . . . 263.17.3 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 Calculus and Optimization 27 4.1 Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.2 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.3 The Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.4 The Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.5 Matrix calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.5.1 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.6 Taylor’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.7 Conditions for local minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.8 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.8.1 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.8.2 Basics of convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.8.3 Consequences of convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.8.4 Showing that a function is convex . . . . . . . . . . . . . . . . . . . . . . . . 334.8.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Probability 375.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.1.1 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.1.2 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.1.3 Bayes’ rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2.1 The cumulative distribution function . . . . . . . . . . . . . . . . . . . . . . . 395.2.2 Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.2.3 Continuous random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.2.4 Other kinds of random variables . . . . . . . . . . . . . . . . . . . . . . . . . 405.3 Joint distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.3.1 Independence of random variables . . . . . . . . . . . . . . . . . . . . . . . . 415.3.2 Marginal distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.4 Great Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.4.1 Properties of expected value . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.5 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.5.1 Properties of variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.5.2 Standard deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.6 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.6.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.7 Random vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.8 Estimation of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.8.1 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 445.8.2 Maximum a posteriori estimation . . . . . . . . . . . . . . . . . . . . . . . . . 455.9 The Gaussian distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.9.1 The geometry of multivariate Gaussians . . . . . . . . . . . . . . . . . . . . . 45References 47Add Berkeley course the math of ML by UC Berkeley -- book, 47 pages https://gwthomas.github.io/docs/math4ml.pdfThis material (in PDF format) is accessible here. Other similar free textbooks featuring more original content (including one that is 300-pages long) can be found here. See More

]]>

This article was written by Cassie Kozyrkov. You’ve probably heard of machine learning and artificial intelligence, but are you sure you know what they are? If you’re struggling to make sense of them, you’re not alone. There’s a lot of buzz that makes it hard to tell what’s science and what’s science fiction. Starting with the names themselves… Machine learning is a thing-labeler, essentiallyI’m a statistician and neuroscientist by training, and we statisticians have a reputation for picking the driest, most boring names for things. We like it to do exactly what it says on the tin. You know what we would have named machine learning? The Labelling of Stuff using Examples!Contrary to popular belief, machine learning is not a magical box of magic, nor is it the reason for $30bn in VC funding. At its core, machine learning is just a thing-labeler, taking your description of something and telling you what label it should get. Which sounds much less interesting than what you read on Hacker News. But would you have gotten excited enough to read about this topic if we’d called it thing-labeling in the first place? Probably not, which goes to show that a bit of marketing and dazzle can be useful for getting this technology the attention it deserves (though not for the reasons you might think). It’s phenomenally useful, but not as sci-fi as it soundsWhat about artificial intelligence (AI)? While the academics argue about the nuances of what AI is and isn’t, industry is using the term to refer to a particular type of machine learning. In fact, most of the time people just use them interchangeably, and I can live with that. So AI’s also about thing-labeling. Were you expecting robots? Something sci-fi with a mind of its own, something humanoid? Well, today’s AI is not that. But we’re a species that sees human traits in everything. We see faces in toast, bodies in clouds, and if I sew two buttons onto a sock, I might end up talking to it. That sock puppet’s not a person, and neither is AI — it’s important to keep that in mind. Is that a letdown? Chin up! The real thing is far more useful. Machine learning is a new programming paradigm, a new way of communicating your wishes to a computerIn the traditional programming approach, a programmer would think hard about the pixels and the labels, communicate with the universe, channel inspiration, and finally handcraft a model. A model’s just a fancy word for recipe, or a set of instructions your computer has to follow to turn pixels into labels.But think about what those instructions would be. What are you actually doing with these pixels? Can you express it? Your brain had the benefit of eons of evolution and now it just works, you don’t even know how does it. That recipe is pretty hard to come up with. To read the whole article, with illustrations, click here. See More

]]>

]]>

]]>

]]>