Subscribe to DSC Newsletter

Featured Blog Posts (5,007)

How to Solve Complex Computations in the Report

Reporting tool is good at chart & form design, style of landscaping, query interface, entry & report, and export & print. It is one of the tools that are applied most extensively. However, there are quite often complex computations in the report, which raises a very high requirement for technical capabilities of report designers, and is one of the biggest barriers in report design. This article will introduce to ways to solve the complex computations in the report.…


Added by Daisy Ding on July 5, 2012 at 5:23pm — 2 Comments

Interactive Analysis and Related Tools


Interactive analysis is a cycle analysis procedure of assumption, validation, and adjustment to achieve the fuzzy computation goal.

The interactive analysis is the real on-line analysis to solve the complex computation problem in the real world, and it is one of the key points in the business computation.

Example of Case

Let us explain the interactive analysis with a common example in the…


Added by Daisy Ding on July 5, 2012 at 1:06am — 2 Comments

86 Helpful Tools for the Data Professional PLUS 45 Bonus Tools | Syracuse University

Source: Joshua Kitlas.

I have been working on this (mostly) annotated collection of tools and articles that I believe would be of help to both the data dabbler and professional. If you are a data scientist, data analyst or data dummy, chances are there is something in here for you. Included is a list of tools, such as programming languages and web-based utilities,…


Added by Vincent Granville on December 2, 2011 at 8:26am — No Comments

How do your quantify data as large, big, or huge?

How big your data is depends on the quantity of information that it contains (measured using entropy metrics), rather than the number of terabytes. Huge data that is sparse or shallow is indeed not huge - and can be compressed very efficiently. What do you think?

Here's Stfan Carandy's viewpoint (founder of Bayesia Networks):

If I may cross-post the following from our blog at …


Added by Vincent Granville on December 2, 2011 at 8:28am — No Comments

What is data science?

Below (in italic) is Revolution Analytics point of view. Mine is that is not just science, but a mix of craftsmanship, intuition, art, business acumen, magic and science. I would rather call it data witchcraft, and I call myself a data wizard.

Ever since the term "Data Scientist" was coined by DJ Patil and Jeff Hammerbacker in 2009,…


Added by Vincent Granville on December 2, 2011 at 8:30am — No Comments

Ventana Research Unveils Largest Research Ever Conducted on Business Analytics

PLEASANTON, Calif.- February 25th, 2011 - The largest benchmark research ever done on business analytics establishes for the first time that analytics has become the new engine for business competitiveness and profitability. This landmark research from Ventana Research, which involved input from more than 2,850 organizations, makes clear for the first time the little-understood role of business analytics in the success and failure of businesses.

The research found that more than…


Added by Vincent Granville on December 2, 2011 at 8:30am — No Comments

IBM's vice president of Big Data Products explains the role of data scientists | ComputerWorld

What is a data scientist? A data scientist is someone who analyzes an organization's big data to discover actionable trends that lead to business results. Data scientists look at what questions business people need to ask to remain competitive. They work directly with C-level executives, advising them on how to drive maximum value from big data and integrate new information. In many ways, a …


Added by Vincent Granville on December 2, 2011 at 8:33am — No Comments

Lifetime value of an e-mail blast: much longer than you think

See below an example of an Analyticbridge email campaign that was monitored over a period of about 600 days. It clearly shows that 20% of all clicks originate after day #5. Yet most advertisers and publishers ignore clicks occurring after day #3. Not only 20% of all clicks occurred after day #3, but the best clicks (in terms of conversions) occurred several weeks after the email blast. Also, note an organic spike occurring on day #23 in the chart below -…


Added by Vincent Granville on December 2, 2011 at 8:35am — No Comments

Classification accuracy using different bag of words

In this post I show how the accuracy of the classifier is influenced by the bag of words. 

The test has been done on a naive classifier but it returns good information about the data set.…


Added by Vincent Granville on December 2, 2011 at 8:36am — No Comments

Statisticians Have Large Role to Play in Web Analytics | American Statistical Association

Read my full interview for AMSTATat You will also find my list of recommended books. Here is a copy of the interview, in case the original article (posted on AMSTAT News) disappear.

(Dr. Granville's Interview for AMSTAT)

Vincent Granville is chief scientist at a publicly traded company and the founder of AnalyticBridge. He has consulted on…


Added by Vincent Granville on December 2, 2011 at 8:37am — No Comments

How to detect a pattern? Problem and solution.

Check the three charts below: only one shows no pattern and is truly random. Which one?

Chart #1




Added by Vincent Granville on December 2, 2011 at 8:38am — No Comments

Connecting with the Social Analytics Experts

Social Media Tips for Analytics Professionals 

From Text and Data Mining to Market Research and Social Media Consulting, few are more influential than today’s guests. In advance of the West Coast Text Analytics Summit (Nov. 10-11, San Jose), Text Analytics News caught up with four analytics leaders who are helping…


Added by Vincent Granville on December 2, 2011 at 8:39am — No Comments

Why and how you should build a data dictionary for big data sets

One of the most valuable tools that I've used, when performing exploratory analysis, is building a data dictionary. It offers the following advantages:

  • Identify areas of sparsity and areas of concentration in high-dimensional data sets
  • Identify outliers and data glitches
  • Get a good sense of what the data contains, and where to spend time (or not) in further data mining

What is a data dictionary

A data dictionary is a table…


Added by Vincent Granville on December 2, 2011 at 8:41am — No Comments

Online advertising: a solution to optimize ad relevancy

When you see google ads on Google search result pages or elsewhere, the ads that are displayed in front of you eyes (should) have been highly selected in order to maximize the chance that you convert and generate ad revenue for Google. Same on Facebook, Yahoo, Bing, LinkedIn and on all ad networks.

If you think that you see irrelevant ads, either they are priced very cheaply, or Google's ad relevancy algorithm is not working well.

Ad scoring algorithms used to be very simple,…


Added by Vincent Granville on December 3, 2011 at 8:43am — No Comments

Interview with Kaggle CEO Anthony GoldBloom

For those that haven't heard of Kaggle before, Kaggle is a team of people that provide the functionality and support to host Data Mining contests. Here is how it works : Suppose that you are working for a Telco and wish to implement a new Churn prediction model. Rather than running this project in-house, you submit your data to Kaggle. What happens next is that -hopefully- many…


Added by Vincent Granville on December 5, 2011 at 10:41am — No Comments

Big Data....Small Wars

Of all the ills that impede development around the world, persistent conflict may be the most pernicious and the most widespread. As the World Bank noted in its April 2011 report, insecurity “has become a primary development challenge of our time. One-and-a-half billion people live in areas affected by fragility, conflict, or large-scale, organized criminal violence, and no low-income fragile or conflict-affected country has yet achieved a single United Nations Millennium Development…


Added by Patricia Tenanty on December 5, 2011 at 11:00am — No Comments

Hadoop and Productivity Boosting

Zettaset today announced the release of Version 4 of its big data management solution, which offers several new service management features, including the industry's first NameNode Failover, as well as JobTracker Failover, Oozie Failover and a unique visual user interface (UI). Built on Hadoop and other high-volume, open-source technologies, Version 4 offers greater stability within Hadoop while providing a solution to manage big data that is more accessible to IT pros, yielding…


Added by Stan Mason on December 6, 2011 at 10:41am — No Comments

Cloud Integration Issues

What's your cloud integration strategy? If you're like most IBM i shops, much of your data interchange is handled via good old EDI or flat file transfers. But the rapid spread of cloud services is hastening the move to more sophisticated forms of data and application integration and interchange. According to EXTOL which develops integration broker software for IBM i and other platforms, the day is fast approaching when companies will need new techniques for integrating cloud…


Added by Pearse William on December 7, 2011 at 11:01am — No Comments

Healthcare fraud detection still uses cave-man data mining techniques

The Washington Education Association (WEA, in Washington State) is partnering with Aon Hewitts (Illinois), a verification company, to eliminate a specific type of health insurance fraud: teachers reporting non-qualifying people as dependents, such as an unemployed friend with no health insurance. The fraud is used by "nice" people (teachers) to provide health insurance to people who would otherwise have none, by reporting them as spouse or kids.

Interestingly, I saw the letter sent to…


Added by Vincent Granville on December 7, 2011 at 5:16pm — No Comments

Big Data Analytics - Visualization Tools

Jaspersoft announced a second-generation native connector to MongoDB, an open source database. 10gen, the company behind MongoDB, and Jaspersoft have teamed together to deliver an enhanced tool for companies to provide easier reporting, analytics, and visualization of Big Data. Jaspersoft is a sponsor of the upcoming MongoSV, to be held in Santa Clara CA today, December 9, 2011 and will be showcasing the combined solution there.

Building on the popularity of the first generation…


Added by Stan Mason on December 9, 2011 at 7:00am — No Comments

Featured Monthly Archives











  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service