Data Science: What Marketing Professionals should Know about it

By Kostas Pardalis

Marketers have a wealth of data that they access to, containing information that waits to be harnessed. It’s this information that data science promises to turn into actionable knowledge. But how can a marketing professional, better understand this brand new world of data science?

In this post, by going through a simple use case of e-mail marketing optimization, we’ll see the overall workflow in applying data science for achieving this optimization and what a marketer should know about it.

Choose a goal

First, start with goals. They have to be very clear on what you want to achieve. For example, you might want to optimize your email campaigns and to do so you need to start by defining how to measure the performance of a campaign. It could be the open rate, the click rate or something much deeper that might need the incorporation of data from other departments within a company. After you’ve chosen one or even a few different ways of measuring your campaign performance it’s time to move to the next question(s).

What factors and in which ways affect this performance? The most typical scenario is to see how demographics might relate to the performance but again, considering the data that nowadays a company has access to, there are plenty of factors waiting for you to discover and fine tune. Now, what is very important here, is how clearly defined these questions are. There’s a good reason for putting the word “science” next to “data”. Data science works best with well formulated and clear questions. This is where the data scientist comes into play, what she’s really good at is in getting these questions, formulate them into scientific terms and use a mathematical toolset together with her intuition to figure out if they can be answered and in which way. But always keep in mind that her work is heavily affected by how clear and well-defined goals and questions are.

Data, data & data

You don’t have to become an engineer, but it is good to have at least a minimum understanding of the data you have access to. Everything that you do as part of your job generates data. The content you create is data but also the interactions that people have with your content is data. Marketers have the “ownership” of these data and how meticulous they are in the processes that generate this data will heavily influence the results of any kind of analytic process applied to it.

To illustrate this let’s consider Mailchimp as the platform for managing e-mail campaigns. Mailchimp has a rich data model that can be used to extract actionable insights, but this data is the result of your actions in the platform. For example, do you add demographic information about your recipients? Additionally, when you create campaigns do you add extra fields or information inside the titles that characterize the type of content that you are going to share? For example, you would like to be able to distinguish between a newsletter campaign with an announcement for a new product feature or a new blog post.

In general, the principle garbage in, garbage out holds when you want to work with data in a more scientific way. Although a good data scientist will be able to let you know that there’s too much noise in the data and that can greatly affect their quality. This data will be the raw material on which you will build everything, from a simple dashboard to complex models that explain the behavior of customers, so take good care of it and try to keep in mind the following:

Consistency: Try to be as consistent as possible with your actions. If you decide that you will have a specific format for your titles, make sure to keep it like this.
Completeness: It helps to keep your data as complete as possible. If you can connect your tools with CRMs or other sources of data inside your company, just do it and consider the connectivity that your tools support as an important factor in selecting which one to use as part of your job.

The tip of the iceberg

There is information in the data that is explicitly there for you to read but that usually is just the tip of the iceberg. There’s a large amount of information that can be derived from these data and that a data scientist may extract. Consider for example the case of delivery time of your campaign.

Mailchimp will report a timestamp of when it was delivered to the recipients. By combining this information together with the location of every recipient, which is also stored in Mailchimp, you can derive if the email was delivered during working time or not. So, apart from the explicit information that your dataset offers, there’s a large amount that is hidden and waiting for you to discover. This is the task where you will much appreciate working with a data scientist. After that, it’s your turn to earn the value out of these “new” insights and find out together with her if they can be used to achieve your goals.

Do not underestimate the complexity of the technical infrastructure needed for this task. Although it is not your responsibility, you affect it and also get affected by it. Consider the case where you decide to work with a service that does not expose an API for pulling data out of it, which would be awful, right? At the same time, the data infrastructure might impose restrictions to what you want to achieve. Let’s consider that you want real-time notifications based on the outcome of online analysis that your data scientists have created, but your infrastructure is working with batches of data every 24h. The best thing you can do is to have good communication channels with both your data science and engineering teams.

The scientific method

Finally, keep in mind that what we want to do here is to apply the scientific method to marketing, which means that we will approach everything as an experiment which requires iterations. We formulate our questions, we consult our data, apply what we learned, measure new results and repeat. Try to adapt to it and get as involved as possible in the process. Part of the things that a data scientist will do can also be done by you, learn from them and try to use this knowledge in your everyday routine. For example, you can use some elementary exploratory data analysis techniques to come up with better-defined hypotheses and questions.

The example

Now let’s go through a simple but practical example. As we said at the beginning, we’d like to optimize our e-mail campaigns. We are using MailChimp for managing our campaigns so we have access to the rich data that the platform exposes. To measure the performance we will use the Open Rate and ideally we’d like to find out the conditions that would lead to a higher probability for someone to open the emails that we send. Based on a subset of the data that Mailchimp exposes we have the following:

We have a stream of events for each email campaign, indicating which user has opened the email in the past.
We have a timestamp of when the campaign was sent
We have access to the email addresses of our recipients.
Finally, in the title of the campaign we encode its type. It can be a blog post or a product announcement. This is not the only way to encode this information, for example, you can create a custom field for this, something that might be preferable.

With the above we can do the following:

Using the timestamp of when the email campaign was sent, together with location related data for each of our recipients, we can derive if the email was received during working time or not.
By using the email address and some simple rules, we can categorize the addresses in business and non-business email addresses.
Finally, we can characterize every email that was sent, as a blog or as a product announcement.

Goals & Questions

So, we’d like to know how the above parameters affect the open rate of our email campaign and if we manage to do that with some certainty, then we can start adjusting them to optimize it. For example, we might find that a product announcement is better to be sent during working hours to people who are receiving the emails on their business address.

The process

Now that we have some goals and some questions to be asked, the following steps have to be performed.

Pull the data out of Mailchimp. Ideally, the data should be pulled automatically in specific intervals and stored into a database where the data scientist will always have easy access to up to date data.
Retrieve a subset of the data from the database.
Clean and reshape the data to codify the implicit information that we mentioned earlier
Perform analysis and report back results

Depending on the size of a company 1 & 2 might be done by:

data engineers
the data scientist using your own infrastructure
or tools (vendors)

The rest of the steps are performed by the data scientist, but when a conclusion is reached on the analysis, a pipeline might be implemented that will automate all steps from 1 to 4 which will require again the involvement of engineers.

What will be reported back is of course heavily related to the questions asked if for example, the data scientist applied logistic regression to figure out how the above parameters affect the possibility of opening an e-mail campaign, she most probably will end up with something like this:

Based on that she will compile a report for you that will contain insights like:

the type of e-mail address is not significant in determining if the mail will be opened or not,
or, that getting the mail during work has a negative effect
or that sending an e-mail about a blog post has much higher odds to be opened.

Pure magic, right? 🙂

Closing

After discussing with your data science team the results, you may need to make some educated actions and follow the same process again to see what you have accomplished and how to improve even further.

Hope you liked the post and would love your comments bellow.

This post was first published at here.