Subscribe to DSC Newsletter

Password and hijacked email dataset for you to test your data science skills

Here's a password data set (20 MB) with 2 million entries, from dazzlepond.com. I discovered this Malaysian website when investigating new subscriber email addresses on Analyticbridge (to decide whether they were associated with spam or other malicious activity). This Malaysian website also claims to have the full list of 450,000 Yahoo email accounts that were recently hijacked - you can indeed download all these email addresses from their website (and possibly check whether hijacked email addresses share patterns that make them vulnerable).

Anyway, the reason for sharing the password data set with you is for you to test your data science skills: try to answer the following questions:

  1. What are the most common patterns found in passwords?
  2. Based on these patterns, how to build robust yet easy-to-remember passwords?
  3. Does this password data set look OK, or do you think it is somewhat inaccurate or not representative of the password universe? If not, can we still draw valid conclusions from this data set, and how?

Views: 7340

Reply to This

Replies to This Discussion

The link to the "Offical salary..." data did not work.

here is what I have done my analysis. Any input is welcome. Thanks

Hi,

I didn't find "450,000 Yahoo email accounts that were recently hijacked" database as this link wasn't open on my browser. If anyone have such database please share.

RSS

Videos

  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service