Has anyone performed tests to compare computation times for different data science algorithms on different platforms? Or for sorting, merging, joins, hash table management, and other database operations? For IO operations (parsing a file)?
For very large data sets or during the initial learning stage of a machine learning system, most of the time is probably spent in data transfers rather than in-memory processing, so maybe it does not matter if one uses R or Python. Sometimes generating / summarizing the entire data set with Python / Perl, and pre-loading it into an R table (as for generating video frames) accelerates the process: it is much faster than generating one video frame at a time, on the fly, with R. So clearly, optimizing speed is not just about using a faster procedure or faster language, but breaking down the tasks in a way that optimizes in-memory usage.
Any thoughts on this?
DSC Resources
Additional Reading
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge
Tags:
Exactly !!
Rather than emphasizing more on finding which procedure, language, tools or techniques will suit to the problem, it's more important to break down the Big problem into Smaller & Manageable problem so that we can Optimize well for better In-Memory processing usage.
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles