I myself do epidemiologic research, which rarely calls for developing machine learning models. Instead, I spend my time developing logistic regression models that I have to be able to interpret for the broader scientific (and sometimes non-scientific) community. I have to be able to explain in some easy, risk-communication way not only the…Continue
Added by Monika Wahi on December 17, 2020 at 7:47am — No Comments
As you start incorporating machine learning models into your end-user applications, the question comes up: “When is the model good enough to deploy?”
There simply is no single right answer.
There is no clear-cut measure of when a machine learning model is ready to…
Added by Henrik Skogström on November 18, 2020 at 5:30am — No Comments
If you were to invest in real estate, what would be the most important factor you’d take into account? Would it be the age of the building, its location, or maybe how many owners it previously had?
While all of the above will certainly be important to you (though to varying extents), there’s one universal factor that will either be a deal-breaker or make you want to go all in.
You’ve guessed it — the price.
We all have a rough overview of…Continue
Added by Mario Inter on July 29, 2020 at 1:42am — No Comments
Summary: Bias in modeling has long been a public concern that is now amplified and focused on the disparate treatment models may cause for African Americans. Defining and correcting the bias presents difficult issues for data scientists that need to be carefully thought through before reaching conclusions.
Added by William Vorhies on June 29, 2020 at 11:31am — No Comments
After being confined for 2 months and hearing the word pandemic almost daily, seeing graphs, opinions, time lines and values that rise and fall, I became curious. Firstly, what is a pandemic? Is it the same as an epidemic? A plague? Was the plague an epidemic? What do epidemiologists study? Yes, they study a discipline called epidemiology.
What is epidemiology?
Epidemiology studies the distribution, frequency, relationships, predictions, and…Continue
Added by Luis Hidalgo Encinas on May 19, 2020 at 9:00pm — No Comments
This is a longer form followup to my post describing the open source pandemic package on PyPI (with Python code also available on Github). You can use the code to simulate millions of people moving about in two dimensions, crossing paths and, unfortunately, getting sick.
This video illustrates the dynamic with a toy sized town of fifty people. Watch how transmission takes place. You might even discern commuting and households.…Continue
Added by Peter Cotton on April 13, 2020 at 12:00pm — No Comments
Machine Learning (ML) development is an iterative process in which the accuracy of predictions made by the models is continuously improved by repeating the training and evaluation phases. In each of these iterations, certain parameters are tweaked continuously by developers. Any parameter manually selected based on learning from previous experiments qualify to be called a model hyper-parameter. These parameters represent intuitive decisions whose value cannot be estimated from data or from…Continue
Summary: Data Scientists from Booking.com share many lessons learned in the process of constantly improving their sophisticated ML models. Not the least of which is that improving your models doesn’t always lead to improving business outcomes.
Summary: Finally there are tools that let us transcend ‘correlation is not causation’ and identify true causal factors and their relative strengths in our models. This is what prescriptive analytics was meant to be.
Just when I thought we’d figured it all out, something comes along to make…Continue
Summary: Whether you’re a data scientist building an implementation case to present to executives or a non-data scientist leader trying to figure this out there’s a need for a much broader framework of strategic thinking around how to capture the value of AI/ML.
Added by William Vorhies on March 25, 2019 at 8:30am — No Comments
Research fields usually follow the practice of categorizing continuous predictor variables, and they are the same who mostly use ANOVA. They often do it through median splits, the high value above the median and the low values below the median. However; this it seems is not that good an idea, and enlisted are some of the reasons to it:
Added by Chirag Shivalker on October 24, 2017 at 10:00pm — No Comments
I have been writing about the Crosswave Differential Algorithm for a number of years. I described in previous blogs how the algorithm emerged almost by accident while I was attempting to write an application intended to support quality control. In this blog I will be discussing the event model that powers the algorithm. Events are the details and circumstances…Continue
Added by Don Philip Faithful on January 14, 2017 at 5:27am — No Comments
There’s a lot of buzzword around the term “Sentiment Analysis” and the various ways of doing it. Great! So you report with reasonable accuracies what the sentiment about a particular brand or product is.
After publishing this report, your client comes back to you and…Continue
Added by Vivek Kalyanarangan on November 4, 2016 at 5:00am — No Comments
Relation, Relationship and Association
While most players in the IT sector adopted Graph or Document databases and Hadoop based solutions, Hadoop is an enabler of HBase column store, it went almost unnoticed that several new DBMS, AtomicDB previous database engine of …Continue
Added by Athanassios Hatzis on September 7, 2016 at 4:00am — No Comments
Currently, many of us are overwhelmed with mighty power of Deep Learning. We start to forget about humble graphical models. CRF is not so trendy as LSTM, but it is robust, reliable and worth noting.
In this post, you will find a short summary about CRF (aka Conditional Random Fields) – what is this thing, what is it for and some interesting facts. Enjoy!…Continue
Added by Nikitinsky Nikita on August 22, 2016 at 5:00am — No Comments
Regression is the first technique you’ll learn in most analytics books. It is a very useful and simple form of supervised learning used to predict a quantitative response.
Originally published on Ideatory…
Added by Sudhanshu Ahuja on March 28, 2016 at 8:00pm — No Comments
"Look, they are one people, and they have all one language; and this is only the beginning of what they will do; nothing that they propose to do will now be impossible for them. Come, let us go down, and confuse their language there, so that they will not understand one another's speech." (Genesis 11-6,7) On the distance between expression, meaning, and action resulting from growth of populations and…Continue
Added by Don Philip Faithful on October 31, 2015 at 6:20am — No Comments
Added by Athanassios Hatzis on March 21, 2015 at 5:30am — No Comments
This article was first posted in 2014 but the message bears repeating. There is a lot being written about tools simple enough for the citizen data scientist to operate. The unstated constraint is that if you don't have significant experience in data science then these will always be "good enough" models. The problem is that 'good enough' models under achieve both revenue and profit. Very small increases in model fitness can translate into much larger increases in campaign ROI. Business…Continue
I was often the lone wolf among my peers in university because I supported a prominent place in society for corporations and an important social role for capital. I questioned whether the directors and executives of companies entered into boardrooms really intending to “oppress” people such as minorities and people with disabilities. Did they deliberately make bathrooms inaccessible to people in wheelchairs perhaps to advance their preconceptions of who gets to go to the bathroom, I pondered…Continue
Added by Don Philip Faithful on May 10, 2014 at 9:44am — No Comments