Throw them all out I say. Big Data is really just defined by one letter, D, Dimensionality.
(Or, at most, 2 letters: BD. Big Dimensionality.)
To use 3 (V)s, when 1 (D) is sufficient, is truly unforgivable.
Barty Crouch Jr. (as Alastor Moody): "But first, which of you can tell me how many Unforgivable Curses there are? "Hermione: "Three, sir. "Barty Crouch Jr. (as Alastor Moody): "And they are so named? "Hermione: "Because they are unforgivable. The use of any one of them will [...] "Barty Crouch Jr. (as Alastor Moody): "Earn you a one-way ticket to Azkaban. Correct. The Ministry says you are too young to see what these curses do. I say different! You need to know what you're up against. You need to be prepared [...] "
In the world of Predictive Modeling and Big Data, there are two curses that really stand out.
1) The Curse of Dimensionality
Coined by that original professor of the Dark Arts, Robert E. Bellman, in his work on dynamic optimization. It refers to the fact that as dimensionality increases we see a problem of data sparsity. I frequently run into this problem when trying to build models using traditional data sources. The challenge there is that the majority of business processes/practices involve human discretion and judgment against a limited set of actions, leading to decision makers repeatedly "doing what they've always done." In an experiment, you would try to cast a wider net to determine how things behave under different circumstances. Due to the homogeneity intrinsic to traditional conservative decision-making approaches, large parts of these problem domains remain not well understood (under-sampled). This can derail efforts to develop robust statistical models.
2) The Curse of Big Data
As Vincent points out, leads to spurious correlations. In other words, the problem of seeing things that are not real, and missing things that are real. Ghosts. Hallucinations. Other magical things.