In this article, we discuss various strategies used to generate exponential traffic growth, while preserving traffic quality, and user loyalty. Our growth hacking engine is a combination of
1. Growth Hacking: Part I
Here we describe a strategy that consists of tweeting your top articles over a long period of time, to generate incremental traffic. After testing it for one week, we have experienced a 10% growth in traffic. This strategy works well for getting new users, and we believe that it can triple your traffic when fully optimized, though it might reduce user engagement.To get new and loyal subscribers, another strategy is needed: read section 2. This works in fast-growth environments, though you can fine-tune the parameters if applying it to no-growth web sites.
Our DSC network has more than 50,000 live articles at any time, and growing by more than 2,000 new articles per year. Our intern Livan analyzed our Google Analytics statistics, and found more than 2,000 articles each with more than 150 page views - and some with more than 100,000 page views. As we have a Twitter account with 60,000 followers (growing by 5,000 new followers per month at the current growth rate), and a LinkedIn group with 160,000 members (growing by 6,000 new members per month), we asked ourselves the following question:
The answer, from our first tests, is an immediate 10% traffic boost. We could tweet 100 articles per day from that same list, not just 25. We could tweet from multiple accounts, not just @AnalyticBridge, and we could also post on LinkedIn or Google+. With Hootsuite, this process can be fully automated. What would be the impact? Of course there is an optimum: too much tweeting will create dilution. But given the large number of new followers each day, and the fact that the top 2,000 articles could be replaced by entirely new articles after one year (because we produce new articles every day, and we are in the process of automating some postings, such as new books or new salary surveys), is it a clear indicator that 25 tweets a day is well below the optimum. And indeed, we have 50,000+ live articles, so we could tap in the whole list, not just the top 2,000.
Optimizing this tweeting process is discussed later in this article. Note that the way tweets work, it is OK if a user sees a same tweet 2 or 3 times over a one-year time period, as long as on average, he sees many tweets from us only one or two times. And given the fact that tweets are short-lived, even with 100 tweets per day (out of a list of 2,000 tweets updated monthly), randomly selected (according to some selection mechanism slightly favoring, new, or very old, or popular, time-insensitive tweets), we should be fine, if we proceed carefully, incrementally, with constant adaptation to new web traffic conditions whenever they occur.
The idea that very old, time-insensitive articles with few (say 150) page views are worth tweeting again today, is because our traffic grew up by 500 percent over the last several years, thanks to the techniques described here. So old articles were not seen by most of our new visitors. This concept is best explained in our article about the lifecycle of blog posts, discussing traffic decay and how to increase the lifetime and yield of old blog posts. For instance, by having top articles listed in a footer in each new article, as we have at the bottom of this very article - a footer that can be updated at once across thousands of articles, when needed, using an shtml include or iframe to load the adaptive footer stored in one web location: more on this soon.
The process consists of five steps:
The score can be used to slightly favor (over-tweet) articles that are more recent, or popular. But it is random enough that any article has some chance to eventually be tweeted one day. The score reflects the fact that not all articles are created equal.
The final implementation will consist of a fully automated machine-to-machine communication service (between Google Analytics, Hootsuite, and Twitter), powered by robust black-box analytics, automated machine learning (hash tag creation, detection of time-sensitive articles) and automated, adaptive statistical scoring.
The number of tweets can be slightly adjusted each day (increased, decreased, or change in scoring parameters) as a response to performance. Performance is measured in terms of daily clicks arising from this activity (the stats are readily available from Hootsuite analytics), and the resulting average session duration for traffic coming from Twitter (available from Google Analytics).
Details about the scoring algorithm
This algorithm is used to score articles based on page views (denoted as P), creation date (denoted as T for time), and a random number denoted as R (uniform deviate on [0, 1]). Note that older articles tend to have more page views, so P and T are not independent. The score S is computed as follows:
S = (b + R) * P^a / (T-Offset)^c
The parameter a, b, c are chosen so that the top 25 articles selected each day (for tweeting) have, on average, a median P (historical page views count) about twice as high as the median P computed across all 2,000 articles. This way, we slightly favor popular articles, but not too much. Details are in the spreadsheet described below. Offset is chosen so that T = Offset, for our oldest article. You must use the median for P, not the average, because it has a Zipf distribution. Note that page view decay occurs, especially for not popular article, though decay is masked by growth for popular articles, in our case.
Data Sets, Excel spreadsheet
You can download our Excel spreadsheet with 2,000 articles, featuring the following fields, for each article:
The parameters a, b, c are in cells J2, J3, and K2 respectively. A low value for J3 will produce more random scores. Cross correlations are displayed in cells L1:O4, and the median score for top 25 articles, and for all 2,000 articles, are displayed in cells M8 and M7 respectively.
Note that the cross-correlations are not very useful: even when correlation(P, S) is as low as 0.04, the median P for the top 25 articles (those with highest S) is twice as much as the overall median score S computed on all articles. This is because traditional correlation is a poor indicator in this context, sensitive to the numerous outliers in the P numbers, caused by the fact that P has a Zipf rather than Gaussian distribution.
You can also download a full data set (for members only) that contains the full text (not just the title), for each article. It is used for clustering articles (see section 3).
Python Source Code
Our intern Livan wrote some Python code to process Google Analytics reports, and scrape DSC articles to extract relevant fields (creation date, channel, and title). Download Python code (rename this text file with a .py extesion after downloading).
We can make this system more powerful by
2. Growth Hacking: Part II
This section quickly describes the other fundamental component required to make our system (described in section 1) work. It is the creation and growth of at least one massive Twitter account, with highly relevant, high value followers, and use of automated tweeting systems. There is a feedback loop in the sense that having a lot of valuable content to tweet, helps generate large volume of good traffic to your website, and helps boost your Twitter growth, which in turn further fuels the traffic growth for your website.
Here, a significant part of our growth (150 new Twitter followers per day) is generated via Twitter advertising: we spend a little more on Twitter than on Google AdWords. With Twitter, it is possible to target US-based profiles (and their followers) that are similar to pre-selected profiles, and you can upload a list of pre-selected profiles when starting your advertising campaigns. Our list has hundreds if not thousands of pre-selected data science profiles. Such lists are easy to find, and regularly published on various websites. But ours also includes top profiles - indeed the very largest, most relevant ones - that are missing in the traditional published lists, as well as people who re-tweet or like our tweets.
The growth and volume of our two main Twitter profiles, @analyticbridge and @datasciencectrl, is displayed in the figure below. It is a few months old: now our number of followers have more than doubled, and we are well above @hmason in terms of number of followers.
The strategy described in section 1 delivers more than 1,000 extra clicks per day to our network, at the current low levels (25 tweets per day).
We also use LinkedIn and Google AdWords, but for a different goal: generating new members, US-based in the case of AdWords. But we have encountered a number of issues with AdWords (low conversion), thus we have reduced our budget, optimized our Adwords strategies (adding negative keywords and conversion tracking, more on this coming soon), and shifted money to Twitter and to acquire high quality content. Read our article on 360-degree data science to understand how we blend domain expertise, business hacks, machine learning, engineering, and modern statistical science, to efficiently solve business problems in general. And in particular, to discover how we optimize our bidding strategies for Google keywords (how much to pay for a keyword).
3. Growth Hacking: Part III
Another part of our growth hacking strategy consist of creating new channels, for instance:
One of the challenges is to populate these channels with new content. While we use syndicated feeds for this purpose, we also want to add our own content. One way to do so is to perform a clustering of all our articles, and assign them a category: visualization, data plumbing, big data, Hadoop and so on. Once the articles are categorized, we can publish (re-post) some popular articles from DSC on the appropriate sub-channels. Our intern Livan is actually working on this, adding a category field to the list of 2,000 top DSC articles.
Here we describe a very simple and highly scalable NLP (natural language processing) technique, called indexation, to perform this clustering task. It works as follows.
Algorithm: categorizing / clustering articles
I call this technique indexation because it is very similar to the creation of a search engine; another word that could be used is tagging algorithm. We also have used and described this technique in the context of clustering thousands of data science websites (source code provided).
Instead of using this algorithm, you can just use customized Google Search for your website, and once installed, search for data plumbing to find articles in your website, that are a good fit for the data plumbing category or channel. We've actually implemented it on DSC.
Also add 3-token keywords in your dictionary. For 3 tokens keywords, you have 3! (factorial 3) = 6 n-grams. Usually, only one or two of these 6 n-grams will show up in the articles, for any keyword (data science central will show up, but central science data won't).
This DSC growth engine illustrates that data science is not just about programming. Indeed, here, programming is a small part of the project, compared with designing algorithms that efficiently make API's communicate with each others, based on data automatically gathered, with insights automatically extracted, and automatically leveraged. It also shows the limitation of traditional statistical science, with correlations (see the sub-section about the scoring engine) that are useless, and replaced by something else.
It certainly shows that there are different types of data scientists, and that indeed, data science is greater than the sum of its parts. It also shows how business and domain expertise are critical. For instance, if you don't know about the Twitter advertising capabilities, nor the Hootsuite product, you will never even think of doing this kind of stuff, no matter how much you know about coding and algorithms, thus missing on a big opportunity. If you work in a bigger organisation, of course finding and convincing the right person to start a project like this one, is a challenge, no matter how much business savvy you are. But my experience is that big organisations tend to hire specialists rather than people like me.
Finally, we invite you to test our list of 2,000 articles, and see which tweets (that is, which articles) resonate best with your followers. It would be interesting to see if articles with high page view counts perform better on your Twitter account (just like they do on ours). And it might be a way for you to further attract followers, by posting stuff that they and many people like to read.