Home » Uncategorized

Charting a Mandate for a Chief Data Scientist

The profusion of big data alongside helpful nudges from Wall Street has inspired many companies to create Chief Data Officer(CDO) and Chief Data Scientist(CDS) roles. The mandate for these roles remains inchoate much in tune with the incipient nature of application of machine learning and predictive analytics within a large corporate structure.

In a previous post, I had introduced a new paradigm – the ALDI (Aim-Lever-Data-Implementation) framework – for integrating data science with corporate strategy and marketing strategy. A rough mandate for a CDS could just be to ensure an efficient and robust application of the ALDI (Aim-Lever-Data-Implementation) steps across various silos in an organization. Below, I will expand on that theme and chart out a more detailed mandate for a CDS within a modern corporate hierarchy.

Let us start with quickly recalling the main steps in the ALDI framework.

First the Aims and objectives have to be defined. Second, we should know how to bring about an intervention in the game by understanding the Levers that will be pulled as a result of analysis. The logical third step is then to gather the appropriate Data and perform the analysis. Fourth, the Implement step drives action from the results of the analysis phase.

For defining the mandate for a CDS position, we draw inspiration from a classic 1980 paper on the 7S model by Waterman et al. and update it – in the context of ALDI paradigm – to create a 5S model. The five S’s are:

  1. Superordinate goals or Shared values
  2. Structure
  3. Strategy and Style
  4. Staff and Skills
  5. Systems

1. Superordinate goals — Aim(A)

The CDS must believe in an organization’s superordinate goals – or the guiding concepts/values. These superordinate goals could be either in the form of written mission statements or unwritten shared values. Ensuring this must precede any discussions that a CDS has with division/branch heads on the exact objectives of machine learning or data analysis.

2.  Structure — Data(D)

Data collection and analysis depend on the structure of the firm. Examples of structures could be

  1. Customer structure – one division to serve all customers of type A, and another to serve all of type B (some form of market segmentation).
  2. Geographic structure – one division for each geographic unit.
  3. Functional structure – differentiate between sales, marketing, IT, operations or broadly between front, middle and back office.

Both the data collection phase as well as feature selection phase (a data science term to indicate predictors) are contingent on the structure of the firm.

3. Strategy and Style — Lever (L) and Implement (I)

The style of the company refers to the mindset of its leaders and executives. For example, would they rather “move fast and break early” or “move cautiously and deliver a sturdy product”. This can be ascertained by the CDS in collaboration with other C-suite executives. This again reinforces the point that the CDS role is not a purely technical role to be filled by a machine learning expert; it calls for – like other senior management roles – an amalgamation of soft and hard skills in one’s domain of expertise.

4. Staff and Skills — Lever (L), Data (D) and Implement (I)

Skill set of the company personnel influences the ALDI framework in multiple ways. If the teams are quantitative, we can expect robust statistical sampling and better data collection – not to mention a smaller dedicated team under the CDS. A quantitatively skilled senior management (say at a quant hedge fund) would be more receptive to Bayesian analysis and results from black-box techniques (e.g. bagging/boosting over multiple models) whereas a non-quantitative senior management team might remain skeptical about the efficacies of these models based solely on their not being able to decipher them. At the same time, qualitative domain-oriented skill set of the firm’s front office personnel will determine what can and what cannot be implemented; and the CDS will have to factor that into his/her final recommendations.

5. Systems — Data (D)

By systems, I refer to the aggregate of information collection and analysis systems in use within the firm. The CDS must have overarching responsibility over business/market intelligence, data collection/processing (including database systems and Extract-Transform-Load functionality) apart from the putative data science teams. In some organizations, this may be accomplished with a matrix system of control.



R. H. Waterman Jr,  T. J. Peters and J. R. Phillips, “Structure is not organization”, Business Horizons, June 1980.


I wish to thank Dr. Rajesh T Krishnamachari for our discussions on this topic.