If you don't like using black-box R functions, or you don't have access to these functions, here are simple options to simulate deviates from various distributions. They can even be implemented in…Continue
We all know that exponential functions grow faster than polynomials. Let us consider the following function: f(n) = n^a ⋅ (log n)^b ⋅ (log log n)^c ⋅ (log log log n)^d⋯ where the leading coefficient…Continue
Data science pioneer, founder, author, CEO, investor, with broad spectrum of domain expertise, technical knowledge, and proven success in bringing measurable added value to companies ranging from startups to fortune 100, across multiple industries (finance, Internet, media, IT, security) and domains (data science, operations research, machine learning, computer science, business intelligence, statistics, applied mathematics, growth hacking, IoT).
Vincent developed and deployed new techniques such as hidden decision trees (for scoring and fraud detection), automated tagging, indexing and clustering of large document repositories, black-box, scalable, simple, noise-resistant regression known as the Jackknife Regression (fit for black-box, real-time or automated data processing), model-free confidence intervals, bucketisation, combinatorial feature selection algorithms, detecting causation not correlations, and generally speaking, the invention of a set of consistent robust statistical / machine learning techniques that can be understood, implemented, interpreted, leveraged and fine-tuned by the non-expert. Vincent also invented many synthetic metrics (for instance, predictive power and L1 goodness-of-fit) that work better than old-fashioned stats, especially on badly-behaved sparse big data. Some of these techniques have been implemented in a Map-Reduce Hadoop-like environment. Some are concerned with identifying true signal in an ocean of noisy data.
Vincent is a former post-doctorate of Cambridge University and the National Institute of Statistical Sciences. He was among the finalists at the Wharton School Business Plan Competition and at the Belgian Mathematical Olympiads. Vincent has published 40 papers in statistical journals and is an invited speaker at international conferences. Vincent also created the first IoT platform to automate growth and content generation for digital publishers, using a system of API's for machine-to-machine communications, involving Hootsuite, Twitter, and Google Analytics.
Vincent's profile is accessible at http://bit.ly/1jWEfMP and includes top publications, presentations, and work experience with Visa, Microsoft, eBay, NBC, Wells Fargo, and other organisations.
Originally posted by Michael Grogan.
One of the big issues when it comes to working with data in any context is the issue of data cleaning and merging of datasets, since it is often the case that you will find yourself having to collate data across multiple files, and will need to rely on R to carry out functions that you would normally carry out using commands like VLOOKUP in Excel.
Originally posted by Michael Grogan.
The below is an example of how sklearn in Python can be used to develop a k-means clustering algorithm.
The purpose of k-means clustering is to be able to partition observations in a dataset into a specific number of clusters in order to aid in analysis of the data. From this perspective, it has particular value from a data visualisation perspective.
This post explains how…Continue
Visualization has become a key application of data science in the telecommunications industry.
Specifically, telecommunication analysis is highly dependent on the use of geospatial data. This is because telecommunication networks in themselves are geographically dispersed, and analysis of such dispersions can yield valuable insights regarding network…Continue