Subscribe to DSC Newsletter

All Blog Posts (7,233)

Two big datasets to challenge your data science expertise

Stack exchange data dump

This is an anonymized dump of all user-contributed content on the Stack Exchange network. Each site is formatted as a separate archive consisting of XML files zipped via 7-zip using bzip2 compression. Each site archive includes Posts, Users, Votes, Comments, PostHistory and PostLinks. For complete schema information, see the included readme.txt.…

Continue

Added by Vincent Granville on March 12, 2014 at 2:30pm — No Comments

17 areas to benefit from big data analytics in next 10 years

Here's my list:

  1. Automated piloting (cars talking to cars) will reduce accidents and optimize your commute. 
  2. Better fraud detection will catch IRS fraudsters and terrorists before they strike. 
  3. Better encryption and monitoring systems will allow the creation of new, private currencies, avoiding the speculation that surrounds Bitcoin. 
  4. Early detection of epidemics thanks to crowdsourcing 
  5. Detection of earthquakes, solar flares - including…
Continue

Added by Vincent Granville on March 12, 2014 at 9:00am — No Comments

With highest penetration of smartphones in the world, big data market to grow in UAE.

According to a report from Ministry of Higher Education and Scientific Research of UAE, polls published for the year 2013 by the American website “Mashable” stated that three out of four people in the United Arab Emirates own a smartphone making the country rank first globally in the use of smartphones. Saudi Arabia has been ranked third while Britain ranked ninth in the world. Surprisingly, with only 56.4% of penetration of the smart…

Continue

Added by IPSITA on March 12, 2014 at 4:21am — No Comments

Oil and Gas Solution built on Data-Tactics’ Big Data Engine

The Big Data Yawn

Over the past couple of months we have met with a number of oil and gas executives to demonstrate our Oil and Gas Solution built on Data-Tactics’ Big Data Engine (BDE). During these conversations it has become obvious that the very mention of "Big Data" produces an involuntary physiological response among business leaders - eye rolls and yawns. It appears that big data has reached the Gartner "trough of disillusionment". These executives have heard from a bewildering…

Continue

Added by Sullexis LLC on March 10, 2014 at 6:00am — No Comments

Three Myths About Today’s In-Memory Databases

In-memory database technology is fashionable in recent years as the price of RAM drops substantially and gigabyte chips become affordable. By taking advantage of the cost-performance value of RAM, leading edge database developers are boosting the performance of next-generation databases with in-memory technology. However, many developers who intend to adopt in-memory technology only think of speed in terms of RAM, and do not exploit the true power of in-memory technology.

The…

Continue

Added by Yuanjen Chen on March 9, 2014 at 10:00pm — 3 Comments

Great example of root cause analysis

This is an area of data science that the public is less familiar with. This example involves small data, simulations, and 18 years old crowdsourcing.

It's an attempt to explain the cause of the TWA Flight 800 that exploded near New York on July 17, 1996. I raised the possibility that a potential cause for the Malaysia Airlines flight that went missing last week, was being hit by a missile (accidental or not). Likewise, many people still believe that TWA 800 was destroyed by a…

Continue

Added by Vincent Granville on March 9, 2014 at 10:30am — No Comments

Data Interaction in Organizational Systems

I have always had a great interest in how businesses organize in order to get things done.  Here I raise some discussion points intended to stimulate debate.

Principle of Systemic Domains 

Not that long ago, I was completing a graduate degree in “critical” disability studies.  The critical part deserves to be in quotations since it is probably subject to interpretation and all sorts of misinterpretation.  I am going to suggest that in critical…

Continue

Added by Don Philip Faithful on March 8, 2014 at 8:15am — No Comments

Big Data Logistics: data transfer using Apache Sqoop from RDBMS

Apache Sqoop is a connectivity tool to perform data transfer between Hadoop and traditional databases (RDBMS) which contains structured data. Using sqoop, one can import data to Hadoop Distributed File System from RDBMS likeOracle, Teradata, MySQL, etc… and also export the data from Hadoop to any RDBMS in form of CSV file or direct export to databases.

There is a possibility of writing mapreduce programs that would use jdbc connectors…

Continue

Added by Pavan Kumar N on March 8, 2014 at 1:12am — No Comments

The multiple facets of data science

Operations research (including Monte Carlo…

Continue

Added by Mirko Krivanek on March 7, 2014 at 10:00am — No Comments

Sometimes outliers are real data

How do you know if an outlier is the result of a data glitch, or a real data point -- indeed maybe not an outlier. Difficult question to answer, but the chart below shows that in some cases, the outlier is not an error.

View image on Twitter

Source: http://www.businessinsider.com/life-expectancy-vs-healthcare-spending-2014-3

In this example, you could argue that we are not…

Continue

Added by Mirko Krivanek on March 6, 2014 at 6:00pm — 18 Comments

The Present and the Future of Big Data and Mobile Technology

As we move to a more technologically advanced workflow, the need for bigger data storage…

Continue

Added by Kyle Albert on March 5, 2014 at 10:06pm — 1 Comment

Weekly Digest - March 10

Featured Articles

Continue

Added by Vincent Granville on March 5, 2014 at 3:30pm — No Comments

Is data science dead?

I just came across this blog and thought it was an interesting point. i disagree. but its worth a discussion :)

http://beta.slashdot.org/story/199001

cheers

Added by ed pok on March 5, 2014 at 8:30am — 3 Comments

How to compete against data scientists charging $30/hour

While companies complain about lack of analytic talent, professionals complain about lack of jobs. Everyone wants to work for Facebook, LinkedIn, Google, Intel, Apple, Twitter or some hot start-up. It creates fierce competition getting a job interview, let alone a job. But companies that do not belong to this circle see very few candidates applying for their data scientist open positions; in addition, they are only hiring what I call technical developers (defined by a narrow set of technical…

Continue

Added by Vincent Granville on March 3, 2014 at 1:00pm — 15 Comments

Introduction to my data science book

Here's the introduction. Click here to view more details about the book.

Introduction

This book is a type of “handbook” on data science and data scientists, and contains information not found in traditional…

Continue

Added by Vincent Granville on March 1, 2014 at 10:00am — 12 Comments

Data Scientist, please meet the Data Artist

Jim Sterne | Anametrix Blog

I am delighted to bring you this guest post from Jim Sterne, an international consultant who focuses on measuring the value of the Web as a medium for creating and strengthening customer relationships. He has written eight books on using the Internet for marketing, is the founding president and current chairman of the Digital Analytics Association, produces the eMetrics Summit and sits on Anametrix’s Board of Advisors.…

Continue

Added by Ryan Montano on February 28, 2014 at 8:30am — 4 Comments

The reason Facebook paid US$ 19 bn for Whatsapp…decoded

On 19th…

Continue

Added by Aatash Shah on February 27, 2014 at 3:30am — No Comments

Is Your Customer Data For Your Customers Benefit?

We are all increasingly active in the digital space. 70%+ of people in the EU, and growing, use the Internet*, all contributing towards more data generation. But public misperceptions and perspectives on data and how it’s used for marketing, threaten to limit data’s potential curtailing marketers’ abilities to provide personalised services. Careful data usage greatly enhances our lives. Unfortunately, fear or irresponsible use (a thankfully rare occurrence), along with some sensationalist…

Continue

Added by Jed Mole on February 27, 2014 at 3:30am — No Comments

Read this tutorial before you use Proc Corr

All of us at some point in the process of examining…

Continue

Added by Aatash Shah on February 27, 2014 at 3:23am — No Comments

The Data Science Toolkit - The Future Web Toolkit

There's a lot of confusing jargon and buzzwords in this new field. It helps to know who some of the major players are and what services they offer. This list is a mild introduction and far from exhaustive.



Amazon Web Services: Infrastructure as a service (IaaS). EC2 virtual servers, S3 storage, Mechanical Turk, analytics, and more.

Yandex: Russian competitor for google. Recently launched Cocaine server based on Docker.

Salesforce: Customer Relationship Management…

Continue

Added by Peter Higdon on February 25, 2014 at 7:51am — 1 Comment

Blog Topics by Tags

Monthly Archives

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

1999

Videos

  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service