Of course one way to win is play by the rules and submit the best answer. But since most of these challenges are about predicting something, what about a candidate who creates 5 accounts with 5 different IP addresses, and submit 5 different predictions to a same contest? Wouldn't he increase his odds of winning from 1 out of 10 to 1 out of 2? This could create professional cheaters, who participate in many contests, and regularly win. Since Kaggle claims to have 100,000 data scientists (and does it include you?) there is a possibility that many accounts are duplicate.
What do you think? Are there any barriers in place to prevent this fraud from happening?
Disclaimer: I have never participated in a Kaggle competition. I am not one of the 100,000 Kaggle data scientists.
One issue to consider is that the score differences between the best submissions are often very small. This was the case in many competitions I looked at. Thus, there are often a lot of good solutions.
Evaluating algorithms on a holdout sample is a sound approach, but the holdout sample is just another sample from the underlying data distribution. If you found a good solution to the problem, you still need a bit of luck to actually win the competition; i.e., your algorithm has to perform very good on the chosen holdout sample. I guess this is were duplicate accounts can increase your chances; given that you have a good solution.
It would be quite difficult to prevent this kind of behavior without compromising the participants' privacy. It's the same as MOOCs, where if you want, you can sign up using multiple accounts and get in at least one of them perfect score. Coursera deals with this issue by having the option of using biometric data to confirm your identity (naturally you need to pay for this service). Perhaps Kaggle can do something similar for all the comps that have a monetary reward.
BTW, Kaggle attracts lots of people, data scientists and otherwise. It would naive to assume that DS constitute the majority of their players.
I was surfing the web for my research on a data problem, happen to found this ...very few of the companies apply data science starting with a true definition of business problem and solving using data science techniques.. this looks promising ...do check
It will only increase the chance of winning the competition, the Kaggle isn't just a race where if one loses the other one wins, it's a community to take up challenges and solve them. All the measurement factors such as accuracy level prediction level are useful!
Plus the competition is not usually a prediction of a result of rolling a dice which has 6 possible outcomes, the test set has thousands of test cases and each one has multiple possible outcomes :)