Subscribe to DSC Newsletter

How to Analyze Data Without SQL Experience

“What is the sales value in this month? What is the increased percentage compared with the previous month? Which sales persons are among the top 10?” – The similar data analysis is mostly required in the sectors like commerce, finance, government, and scientific research. SQL usually is the only choice for such kind of analysis.

As a query language initially developed in early 70s, SQL has a relatively higher technical requirement on analysts. Lacking of advanced IT professionals is the root cause that hinders the data analysis from going any further.

Here is a data analysis script of the new generation. It not only provides all features of SQL, but also requires lower technical requirements. Even those developers without SQL expertise can use esProc to analyze the data by themselves.

Case Description:

A certain insurance company requires the following data analysis: How many newly-added polices this year? Of which how many policies are of the ”life insurance” ? How many policies valued above 500 thousand dollars?

The data are mainly stored in the insurance table of database, and the main fields are policyID, amount, time, type, and other fields. Some data are given below:

Of which ”policyID” is the serial number of policy, ”amount” is the policy amount, ”time” is the date of signing the policy, ”type” is the type of policy, and ”L” represents the “life insurance”.

Let’s check the SQL solution below.

SQL Solution

If using SQL to solve this type of problem, three SQL statements are required as given below:

  • SELECT COUNT(*) FROM insurance WHERE to_char (time,’yyyy’) = to_char (sysdate,’yyyy’);
  • SELECT COUNT(*) FROM insurance WHERE to_char (time,’yyyy’) = to_char (sysdate,’yyyy’) and type=’L’;
  • SELECT COUNT(*) FROM insurance WHERE to_char (time,’yyyy’) = to_char (sysdate,’yyyy’) and type=’L’ and amount>500000;

The script Solution

For the script solution, the computation is carried out in the grid. Its user can just put a simple expression in each Grid. Grids can make reference to each other with the Grid name, and the analysis procedure is that the observing along with the thinking. The operation habit is almost the same to that of Excel®.

 

A

B

1

=insurance.select(year(time)==year(now()))

=A1.count()

2

=A1.select(type==”L”)

=A2.count()

3

=A2.select(amount>500000)

=A3.count()

Comparison

This problem is of the progressive style, and the later problem is the further explore on the previous one. The three problems are related mutually. As for esProc, the esProc user only need to process a bit based on the results of previous computation according to the business description.

SQL users are hard to perform the step by step computation and the previous computation cannot be referenced. Therefore, each problem is a new problem to SQL user. With the further explore on the problem, the statement will become more and more complex, and finally the SQL user has to resort to the technical specialist to solve this problem. For example, based on the problem 3, seek the monthly increased percentage of the policies valued high.

SQL style is like this:

select amount, month,
       (case when
              prev_amount !=0 then (amount)/prev_amount
              else 0
        end) compValue
from ( select month, amount,
             lag(amount,1,0) over(order by month) prev_amount
        from (
             select month,sum(amount)amount from(
             SELECT amount,to_char(time,’MM’) month
             FROM insurance
             WHERE to_char (time,'yyyy') = to_char (sysdate,'yyyy') and type=’L’ and amount>500000
             ) group by month order by month
          )
)

Not understand? Never mind. We can use esProc to implement the same computation:

A

B

4

=A3.group(month(time))

5

=A4.(~.sum(amount))

6

=A5.(~/~[-1])

A4: Group the results from the previous step by month.
       A5: Sum the data of every month.
       A6: Seek the increased percentage of this month, compared with the previous month. The “~” represents every month in A5, and “~[-1]” represents the previous month.

It is obvious that the technical requirements of esProc remain the same when exploring the problem. Even the business man or nontechnical person can always grasp the usage of such evident, easy-to-understand, and Excel®-like expression. On the contrary, in this case, the technical requirement of SQL statement will experience the exponential increase and soon reach the bottleneck of technical capability. Therefore, regarding the age-old and complex analysis tool of SQL, the analyst is in a complex that they both love and hate the SQL, and feels hard to grasp it.

Analysts can analyze data without SQL expertise and experience, and easily solve the tough problems that the SQL programmer encountered, thanks to the great features of script solution.

 

Views: 4133

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Arindam Samanta on January 20, 2016 at 9:01am

I can use My SQL more effectively , even exel also ... suing it . need a good search engine .. Database of Quora social networking site using my SQL for database...

Comment by Daisy Ding on July 15, 2012 at 10:52pm

Thank you for your comment, Vincent, Yes, esProc still need to access or extract the data
through SQL / txt / excel, etc. But the data retrieving is not part of the analysis, or to say, just the easiest and least important part. What's more difficult is how to finish the analysis work after retrieving the data. Of course C, perl, VB can do it. But these have to be operated by programmers who master a very high technology, while such requirements are too high or inappropriate for data analysis experts. The data analysis experts need an easy-to-use tool as well.

Comment by Vincent Granville on July 13, 2012 at 8:11am

Once you have the data on your machine, any language, e.g. C, Perl, Java, Python maybe even PHP will do a better job than SQL. You still meed SQL to physically access or extract the data if it is stored in a traditional database such as Oracle, Teradata, SQL Server (although I'm not even sure about that).

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service