Summary: The largest companies utilizing the most data science resources are moving rapidly toward more integrated advanced analytic platforms. The features they are demanding are evolving to promote speed, simplicity, quality, and manageability. This has some interesting implications for open source R and Python widely taught in schools but significantly less necessary with these more sophisticated platforms.
This post covers the following tasks using R programming:
This is the 2-part blog version of a talk I've given at DOAG Conference this week. I've also uploaded the slides (no ppt; just pretty R presentation ;-) ) to the articles section, but if you'd like a little text I'm encouraging you to read on. That is, if you're in the target group for this…
Added by Sigrid Keydana on November 17, 2016 at 11:30pm — No Comments
Python, R and SAS are the three most popular languages in data science. If you are new to the world of data science and aren’t experienced in either of these languages, it makes sense to be unsure of whether to learn R, SAS or Python.
Don’t fret, by the time you’re done reading this article, you will know without a doubt which language is the right one for you.
Whether you are a veteran programmer with experience dating back to Fortran, or a new college grad with all the latest technologies, if you use R eventually you will have to worry about scoping!
Sure, we all start out ignoring scoping when we first begin using a new language. So what if all your variables and functions are global - you are the only one using them, right?!?! Unless you give up on R, you will eventually grow beyond your own system - either having to share your code with…Continue
Added by Connie Brett, Ph.D. on September 8, 2016 at 12:30pm — No Comments
[Introduction of Association Rules]
Sometimes, the anecdotal story helps you understand the new concept. But, this story is real. About 15 years ago, in Walmart, a sales guy made efforts to boost sales in his store. His idea was simple. He bundled the products together and applied some discounts to the bundled products. (Now, it became common practices in marketing) For example, this guy bundled bread with jam, so that customers easily found them together. Moreover,…
Original post is published at DataScience+
Recently, I become interested to grasp the data from webpages, such as Wikipedia, and to visualize it with R. As I did in my previous post, I use
rvest package to get the data from webpage and…
Added by Klodian on August 5, 2016 at 10:30pm — No Comments
Visual Analytics and Data Discovery allow analysis of big data sets to find insights and valuable information. This is much more than just classical Business Intelligence (BI). See this article for more details and motivation: "Using Visual Analytics to Make Better Decisions: the Death Pill Example". Let's take a look at important characteristics to choose the right tool for…Continue
Added by Kai Waehner on July 27, 2016 at 10:00pm — No Comments
Data Analytics favorite Apache Spark, is progressing as a reference standard for Big Data, and a “fast and general engine for large-scale data processing”. In our previous post, we detailed how to expand ML tools using a PySpark kernel and leverage the …Continue
Added by Marc Borowczak on June 9, 2016 at 10:30am — No Comments
Summary: Picking an analytic platform when first starting out in data science almost always means working with what we’re most comfortable. But as organizations grow larger there is a need for standardization and for selecting one, or a few analytic tools.
The City and County of San Francisco had launched an official open data portal called SF OpenData in 2009 as a product of its official open data program, DataSF. The portal contains hundreds of city datasets for use by developers, analysts, residents and more. Under the category of Public Safety, the portal contains the list of SFPD Incidents since Jan 1, 2003.
In this post I have done an exploratory time-series analysis on the crime incidents dataset to see…
Added by Vimal Natarajan on May 30, 2016 at 7:42am — No Comments
Single regression on Exxon's stock
[Introduction of Multi-regression]
Let's recall our last job. We conducted the single regression on Exxon Mobil's stock along with WTI crude oil spot price. The result was fantastic, which accounts for 25% of the variation of stock movement. Put it in other way, R-square. The problem is "are you happy with the…
As part of Data Science tutorial Series in my previous post I posted on basic data types in R. I have kept the tutorial very simple so that beginners of R programming may takeoff immediately.
Please find the online R editor at the end of the post so that you can execute the code on the page itself.
In this section we learn about control structures loops used…
Added by dataperspective on May 18, 2016 at 8:30pm — No Comments
Contributed by the n…Continue
Added by NYC Data Science Academy on April 12, 2016 at 3:00pm — No Comments
Contributed by Bin Lin. He took NYC Data Science Academy 12 week full-time Data Science Bootcamp programbetween Jan 11th to Apr 1st, 2016. The post was based on his…Continue
Added by NYC Data Science Academy on April 12, 2016 at 1:30pm — No Comments
Machine Learning? Data Mining?
Well, there is a little bit difference between machine learning and data mining although I don't see any difference between them.
See the Stackexchange debate on the difference between machine learning and data mining.
At the end, it is about training the machine to…
Added by Gregory Choi on April 7, 2016 at 4:30pm — No Comments
[The goal of this page]
When I have read all R introductions, the books were filled with just instructions. The goal of R is to solve our real life problem. That's why I want to minimize this page. In the real though, we need to understand some key concepts that might be useful for you to tackle the real life problem. Here's basic data structures and data manipulation method.
Still, I believe the best way to learn R programming language is to tackle the real life…
Have you ever wondered how to segment your customers? Customer segmentation is a really useful technique to group similar customers together and understand what works for that. You can then tailor your offering and marketing messages to the specific segments. If you do it right, you should be able to see a healthy increase in sales. After all, companies like Amazon target their customers on an individual level so you should at least be targeting them on a segment level.…Continue
Regression is the first technique you’ll learn in most analytics books. It is a very useful and simple form of supervised learning used to predict a quantitative response.
Originally published on Ideatory…
Added by Sudhanshu Ahuja on March 28, 2016 at 8:00pm — No Comments
Recently, I came across with an interesting book on the statistics which has a narration of Ugly Duckling story and correlation of this story with today's DATA or rather BIG DATA ANALYTICS world. This story originally from famous storyteller Hans Christian Andersen
Story goes like this...
The duckling was a big ugly grey bird, so ugly that even a dog would not bite him. The poor duckling…
Added by Manish Bhoge on January 31, 2016 at 12:00pm — No Comments