Subscribe to DSC Newsletter

Password and hijacked email dataset for you to test your data science skills

Here's a password data set (20 MB) with 2 million entries, from dazzlepond.com. I discovered this Malaysian website when investigating new subscriber email addresses on Analyticbridge (to decide whether they were associated with spam or other malicious activity). This Malaysian website also claims to have the full list of 450,000 Yahoo email accounts that were recently hijacked - you can indeed download all these email addresses from their website (and possibly check whether hijacked email addresses share patterns that make them vulnerable).

Anyway, the reason for sharing the password data set with you is for you to test your data science skills: try to answer the following questions:

  1. What are the most common patterns found in passwords?
  2. Based on these patterns, how to build robust yet easy-to-remember passwords?
  3. Does this password data set look OK, or do you think it is somewhat inaccurate or not representative of the password universe? If not, can we still draw valid conclusions from this data set, and how?

Views: 4908

Reply to This

Replies to This Discussion

Folks,

I was surfing the web for my research on a data problem, happen to found this ...very few of the companies apply data science starting with a true definition of business problem and solving using data science techniques.. this looks  promising ...do check

http://bit.ly/datasciencetraining

Cheers

The link to the "Offical salary..." data did not work.

here is what I have done my analysis. Any input is welcome. Thanks

Reply to Discussion

RSS

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service