As an analytic or data science professional, what are the software bottlenecks / nightmares in your daily job? In my case, my challenges are:
Thanks for your help!
Tags:
Permalink Reply by Vincent Granville on April 24, 2012 at 7:18am With SQL, merging two data sets coming from two different databases is challenging, when the database key (for the join) is encoded in two different, non-compatible formats (e.g. in one database, German characters are coded one way, and in the second database, it is coded a different way or German words are removed).
I would not recommend Excel for serious analytics due to mistakes found in many Excel sheets.
I think highly of Hadoop, but you are right: not every algorithm can efficiently be implemented in a distributed architecture. Nevertheless, Hadoop as a representative of NoSQL technology is better than any SQL based tool when analyzing big data. The main problem with SQL when used with big data is table joins that become very slow. Hadoop, however, doesn't remedy this drawback of SQL; instead, your data need to be almost ready for analysis with Hadoop without doing joins. You can join tables in Hadoop as well, but I don't advise to do so as Hadoop would instantly lose its advantages (there could be few exceptions, but they don't change the entire picture).
Permalink Reply by Daisy Ding on June 27, 2012 at 4:57pm I think Excel has its advantages, as the analyst without experienced IT background could also operate easily, but you are right, it has its shortcomings, and at present, there are many small tools like esProc, esCalc, etc. can help solve such problems very conveniently, they can deal with the complicated data processing easily, but not-high IT technology requirements, so combinition of Excel and some tools may be a good choice for some analysts.
Permalink Reply by Daisy Ding on June 27, 2012 at 5:00pm Java and SQL have powerful computation ability, but there are too complex for general users, as they need specialized IT specialized background. while there are also many other solutions can help make up their shortcomings.
© 2013 Data Science Central
