Home » Uncategorized

Composable Dataflows vs Python Scripts

I started to think more about the data prep and cleansing use cases, and why someone would choose to develop those as Composable Dataflows versus using something like Python scripts.  Yes, either one will get the job done, but there are factors like development time, debug-ability, execution speed, and re-usability which can be deciding factors.  By buying into the dataflow methodology, there are some big value adds.   

For one,  the execution of the code can be systematically parallelized.  This allows the developer to avoid thinking about locks, or threads, or how to use async APIs and callbacks.  Yes, in Python, you can write a bunch of code for functions to run concurrently, but you won’t be able to use multiple CPUs, unless you start sharding things across processes, resulting in quite a bit of  orchestration and marshaling code to maintain (psst … Composable does similar things under the hood).  Making your code performant and scalable doesn’t become an after-thought .. it’s baked into the beginning with Composable.

In Composable, you can very easily see intermediate values and large tables.  Dev tools (like in Eclipse or Visual Studio) aren’t really made for viewing data like this.  You could start to dump data out of the Python runtime into files and databases, but then you’re debugging in multiple environments.  Composable maintains a history of the runs.  So when you have 10000s of executions happening, and errors occurring infrequently, you can debug and fix those issues very easily.  Just bring up a previous run, crack it open in the designer and see all the intermediate values upstream, thus having complete context of why the error occurred.  In Python, you’d probably just be looking at a log file.  Or you’d be trying to replicate it and running ad-hoc Python code on the command line.  And just imagine if it’s data other than tables (i.e. images).

Composable isn’t just about flashy end-user development tools.  We have a whole runtime and clean APIs that developers can use.  Dataflows don’t need to be created through the designer.  You can simply use a fluent API and code them up in your choice of .net language, and host portions of the runtime inside your own process.  We typically demo the designer and our dashboarding capabilities because people want to see flash.  If I just said “We have an amazing backend and we do cools things with data … and look, you can visualize it in Tableau”,  the Tableau sales team would love us.

Finally, we fully believe in the composability principle, hence the name of our product and company, and we feel this leads to re-usability.  Yes, you can get composability with object-oriented or procedural languages, but we feel we take it to the next level.

Click here for original article