Has anyone performed tests to compare computation times for different data science algorithms on different platforms? Or for sorting, merging, joins, hash table management, and other database operations? For IO operations (parsing a file)?
For very large data sets or during the initial learning stage of a machine learning system, most of the time is probably spent in data transfers rather than in-memory processing, so maybe it does not matter if one uses R or Python. Sometimes generating / summarizing the entire data set with Python / Perl, and pre-loading it into an R table (as for generating video frames) accelerates the process: it is much faster than generating one video frame at a time, on the fly, with R. So clearly, optimizing speed is not just about using a faster procedure or faster language, but breaking down the tasks in a way that optimizes in-memory usage.
Any thoughts on this?
Rather than emphasizing more on finding which procedure, language, tools or techniques will suit to the problem, it's more important to break down the Big problem into Smaller & Manageable problem so that we can Optimize well for better In-Memory processing usage.