You can call it business or data hacking, but the idea is to use analytic intelligence to reverse-engineer algorithms, transform, manipulate and modify data in external databases, without even accessing the databases in questions, for your business advantage.
A few examples:
- Query tag hijacking. You find an interesting link on Facebook, pointing to a Forbes article. You copy it on your website, but you change the query tag from (say) &source=facebook to &source=mywebsite (whatever your website is), to change traffic attribution. You then shorten the URL using bit.ly, so nobody will notice the transformation. Your connections will re-post your link, now Forbes is going to check mywebsite.com to see how it generated all the traffic, and you might get them as an advertising client. In short, the impact of this hack is to alter data (traffic statistics) in Forbes' web traffic reports (possibly generated by their vendor, maybe Accenture or Google Analytics), using an indirect way that does not involve hacking into any accounts. In other words, this is a soft hack as opposed to a (conventional) hard hack, but the impact is the same. The hard hack is illegal, the soft one is not. Note that the counter-attack to this hack consists in using encrypted query tags that match the public (non-encrypted) tags..
- Search relevancy attacks. Google ranks web pages based on CTR (click-through rate), among other criteria. If for a specific keyword (say big data), you can manufacture fake clicks on most links (on Google search result pages) except for your competitors, using a Botnet (e.g. open Botnet where all participants agree to have their computer hijacked by a virus, for the benefit of all Botnet participants - in short consenting Botnet "victims" or colluders), you might eventually get your competitors delisted from Google because of their comparatively low CTR for the keywords in question. This applies both to organic and paid traffic. In short, the impact of this hack is to (again, indirectly) modify page-keyword relevancy scores in Google keyword index, for a small number of keywords that are important to you (to avoid detection).
- Reverse-spam attacks. Using black-hat SEO tactics can get you blacklisted. Don't do it for you, but instead, promote your competitors via black-hat SEO. Eventually, you'll get their Google score plummet, and get them delisted / banned from Google. The impact of this hack is to transform the spam score (email spam or webspam) of your competitors, for the worse.
- Insider information discovery. Apply for jobs, go to job interviews just to learn as many secrets as you can, to use it in your trading strategies. Posting fake tweets (from fake twitter accounts) or fake press releases (about a publicly traded company) could also have an impact that you could leverage if you built a solid base of connections using your fake accounts (click here to see a large-scale example of fake accounts used in a different context). We might test it to see if it works as expected, to improve stock trading strategies.
- Fake reviews. Use 2 ISP's and 3 browsers on your laptop, to simulate the activity from 6 = 2 x 3 different users. This way, you can have 6 accounts (and you can easily switch from one account to another on your laptop) without being detected. Post good reviews for your business (say on Yelp) using these accounts, and bad reviews for your competitors - not too many to avoid detection. Eventually, your might appear as a top business in your sub-category (e.g. restaurant in a specific city), thanks to your manipulations. These manipulations impact scores stored in Yelp's databases.
Should data scientists learn these data attack techniques? I believe so, in order to outsmart data hackers.