Guest blog post by James Kobielus.
Data science is a creative problem-solving exercise. People become data scientists for many reasons, and the creative challenge is probably high up on the list.
Data scientists, like most creative people, are adept at what’s often called “pattern thinking.” This is the ability to discover beautiful regularities in the world around us that others may have overlooked. To search for statistical regularities, data scientists use big-data platforms, high-powered analytical tools, and interactive visualizations to find correlations that might otherwise elude them.
How do you identify pattern-thinking aptitudes in candidates for your company's next data-scientist position? This article highlights an interesting approach that at least one organization has taken to identifying strong pattern thinkers for their data science practices. An executive at Booz Allen Hamilton says the consulting firm, in addition to hiring statisticians, computer scientists, and domain experts, has had success with both physicists and music majors. "Both groups tend to bring curiosity and experimentation into play...with physicists 'exuding the scientific method' -- moving from conjecture to hypothesis to testing -- and music majors offering 'amazing creativity and quantitative skills.'"
Where data scientist teams are concerned, you are quite likely to find one dominant personality type: people who have ample curiosity, intellectual agility, statistical fluency, analytical acuity, research stamina, and scientific discipline. Of course, these aptitudes are not evenly distributed throughout the population. If you’re assembling a data-science practice, you need an aptitude for social pattern thinking to determine what types of individuals would complement each other best. Some data scientists are awesome polymaths who have mastered a wide range of skills, while others are strict specialists. Some are closer to the statistical analyst end of the skills spectrum, whereas others take pride in being the subject-matter expert that all the data scientists run to when the question turns to marketing, finance, and what have you.
The productivity of the entire data-scientist team depends on being able to balance this mix of people, aptitudes, skills, and roles. But more than that: it depends on being able to incorporate new roles into the team as the nature of big data and data science initiatives evolves. For example, the notion of a "customer experience modeler" is of fairly recent vintage, and it's usually not the same expert you hire when you need an expert in, say, log-linear regression modeling. It may be someone with a degree in the humanities, not mathematics and statistics.
This new reality is the focus of an InformationWeek article, "How To Build An Analytics A-Team.” The piece discusses a study by Blue Hill Research in which that firm outlines several important roles within data-science organizations. I've arranged the bulleted list of roles from well-established (in business intelligence and data management generally) to newer and less frequently found in traditional data-science organizations:
If you've already included all or many of these as distinct jobs in your initiative, you're in the forefront of businesses who've committed to building data-science centers of excellence (CoE)s. And you may have even created a position for CoE administrator, whose core job it is to build up the environment where cross-role "chemistry" takes hold. Here are some tips for finding the best blend of data science professionals and for orchestrating their efforts in a collegial environment:
Want to engage with a creative community of top-notch data science professionals? Get your ticket here for the first Datapalooza, which will take place next week, November 10-12, at Galvanize in San Francisco. Sponsored by the Spark Technology Center, Datapalooza will enable you to take your data-science skills to the next level. You’ll gain hands-on experience, enjoy one-on-one coaching, and learn how to build a practical data-science product in just three days. In doing so, you’ll be addressing real-world data-science challenges that require creative pattern thinking, machine learning, cognitive computing, natural language processing, and stream computing.
You should also explore this informational IBM Analytics resource page on Spark.
Kobielus is an industry veteran and serves as IBM Big Data Evangelist; Senior Program Director for Product Marketing in Big Data Analytics; and Team Lead, Technical Marketing, IBM Big Data & Analytics Hub. He spearheads IBM's thought leadership activities in Big Data, Hadoop, enterprise data warehousing, advanced analytics, business intelligence, and data management. He works with IBM's product management and marketing teams in Big Data. Kobielus has spoken at such leading industry events as IBM Insight, Hadoop Summit, and Strata. He has published several business technology books and is a very popular provider of original commentary on blogs and many social media.