I wanted to get your thoughts on data science workstations? We’re seeing an uptick in product releases/messaging around DS workstations from several hardware vendors and I wanted to get your take on things.
Do you have a sense whether many data science teams are using these today? Or are they more inclined to leverage AI-as a service? What would be the benefit of using a workstation vs. leveraging say a DS platform or AI cloud services?
Here is an answer I received from Theodore Omtzigt::
The benefit of a custom data science workstation is increased productivity. For example, video production and computer aided engineering all use custom workstations as the productivity gained by the low latency provided by the interactivity is undeniable. Given the fact that a custom workstation is a $5–10k affair, and the person behind the workstation is a $100k+ resource, creating a 5% productivity improvement will be the break-even point. Typical productivity improvements provided by custom workstations are measured more in the 2–5x range.
A data science workstation would need to be a machine that can deal with large data sets efficiently, so custom configurations would include large memory configs (64–128GBytes) and large NVMe storage arrays (50-100TBytes).
Renting that type of server with an CSP will be expensive. Just checked AWS and depending on the I/O backend selected, cost is in the $8–10/hr, or $70k-87k/year. So, a custom workstation is also significantly lower Total Cost of Ownership as compared to firing up a similar piece of hardware in the cloud. Cloud pricing sort of assumes that your resource is under-utilized. For workstations that tends to be not the case. But if you are doing a lot of contract project work, were you are working one quarter one day and not the other, CSP resources can be attractive.
Whether or not you are going CSP or on-premise depends if your on-premise team is running a cloud-type environment or not. The public cloud offerings also provide private networks so from a security point of view you can create similar profiles. The on-premise trade-off tends to be a capacity planning exercise, particularly for these types of large configurations, where as the public cloud will have plenty of capacity.
With today’s complexity of the cloud, a CSP solution will require a good IT team that can support the remote connectivity properly. Similarly, depending on the skills of your staff, you will need IT support staff to configure and manage the workstations.