Password and hijacked email dataset for you to test your data science skills

Here's a password data set (20 MB) with 2 million entries, from dazzlepond.com. I discovered this Malaysian website when investigating new subscriber email addresses on Analyticbridge (to decide whether they were associated with spam or other malicious activity). This Malaysian website also claims to have the full list of 450,000 Yahoo email accounts that were recently hijacked - you can indeed download all these email addresses from their website (and possibly check whether hijacked email addresses share patterns that make them vulnerable).

Anyway, the reason for sharing the password data set with you is for you to test your data science skills: try to answer the following questions:

  1. What are the most common patterns found in passwords?
  2. Based on these patterns, how to build robust yet easy-to-remember passwords?
  3. Does this password data set look OK, or do you think it is somewhat inaccurate or not representative of the password universe? If not, can we still draw valid conclusions from this data set, and how?

Other data set of interest: Official salary of 30,000 University of Washington employees

Views: 729

Reply to This

Replies to This Discussion

Reply to Discussion

RSS

Follow us

© 2013   Data Science Central

Badges  |  Report an Issue  |  Terms of Service