My company is looking to setup an analytics platform to support their production transaction processing systems (the data here exists in relational tables). The intent is to ingest structured and unstructured data into a landing zone which will then be transformed into a central data store (CDS), Data from the CDS will be sent to subject-oriented marts. An analytics sandbox will be able to extract data from the landing zone, the CDS and the production systems. Our customer data, which has many different dimensions across several databases, has a lot of governance and is considered sensitive as we are in the field. My thoughts are to establish the landing zone and analytics sandbox in the cloud and keep the CDS and marts on prem, at least initially. Any thoughts on this approach?
great Alteryx usecase... Check out Alteryx.com for some examples...
Personally I would not move data to cloud, takes time and money, there is absolutely no need.
I would instead setup a sandbox DWH using MongoDB or similar, select the data sources and siloed tables, do ETL and aggregation and even Machine Learning on the fly (in-memory using Alteryx or similar tools) and store model tables again onprem (which will be fast).