Subscribe to DSC Newsletter

“If you treat an individual as he is, he will stay as he is, but if you treat him as if he were what he ought to be and could be, he will become what he ought to be and could be." —JOHANN WOLFGANG VON GOETHE

The last few years I have been trying to get an handle on the field which encompasses  analytics , big data, modeling, prediction, machine learning, algorithms , data mining techniques, rules, computational complexity, latency, data products, data engineering, statistical inference, R programming, data wrangling, data hacking, statistical modeling, supervised / unsupervised learning, data visualization, unstructured data and many other subjects that make the world of  Data Science. Someone said that the term itself is an umbrella term. There are many books out there which would deal with great detail on some of the topics mentioned above, written by their respective masters.

The  moment I picked the book..."The Handbook of Data Science" I sensed that this would be different and profoundly impact me.  I must confess that, my understanding of this beautiful art/science of drawing insights from data has gone up to a whole new level.

Data science has been there and practiced  much before the term was coined by DJ Patil and Jeff Hammerbacher and there are some serious practitioners who have been flag bearers of this art for a  long time, but if you want to know about  the transformation of this discipline into what it has evolved  into presently  and where it is headed and the impact it has created for itself on almost every field and who are the people behind this , what is there background, where they come from and how they have impacted and continue to take this field into almost a cult status , to the extent that the Obama government has appointed its own , the first ever chief data scientist, then this book is for you. It has as many as 25 stalwarts-- including some real rock stars, who have rocked the boat having contributed significantly to the art of data science and have single handedly turned their organizations on its head into great success stories--  who have been interviewed by some very intelligent data scientists , who themselves ardently seeking to know the nuances of this art, have done a good job in getting the best from these elite artistes.

Definition

The book has many versions of definitions of data science like the one from Josh Wills and the conversations carry on from there and the transitions they make from their rich academic backgrounds to the real world where they hone their skills practicing the craft of seducing the information, insights, and the signal  out from  complex, unyielding and noisy datasets and creating data products that can capture data from a captive audience and in the process building rich data sources for further analysis.

It is from these conversations that my understanding of this science has deepened with my longing to continue to  pursue the craft still enhanced. Every artiste in this book has got his/her own view on the  definition  of data science but broadly speaking they seem to agree on the convergence of the  fields of Math/Statistics, computer science and domain expertise.

Target Audience

The target  audience for this book ideally can be the following

  1. An aspiring data scientist
  2. A practitioner data scientist
  3. A leader of a team of data scientists
  4. An entrepreneur or business owner
  5. A data curious citizen

As I have already said the  25 artistes themselves come from varying disciplines and there cannot be a better representation of the different backgrounds than the list comprises of. Having said that you are  treated to some deep diving sessions and one can only marvel at the brilliance of these artistes performing seamlessly. There is something for everyone right from the aspiring data scientist to the elite or just the curious or the connoisseur.

Transitioning and upgrading

The book also courses through an important topic of learning and upgrading the skills required for practicing data science. Making a transition from a purely academic background to the real world of business . Organizations like Insight Data science founded by Jake Klamka is specifically designed for helping PhD's transition into industry. At the other end of the spectrum, aspiring data scientists, who  have enough domain expertise and are keen to pursue this art can take umbrage from the example of Clare Corthell who has embarked on a self crafted journey to embrace the art of data science purely on online learning MOOCs. In Fact she has herself come out with a curriculum for data science with the Open Source Data Science Masters--OSDSM- program. These courses can help you to bridge the gap in your learning and practicing the craft.

The OSDSM is a collection of open source resources that will help you to acquire skills  necessary to be a competent entry level data scientist. You can access the curriculum here .

You have to be adept at learning and upgrading on the job and on the fly. Kunal Punera the Co founder / CTO at Bento labs  talks about this aspect when he says.. I spent two years at RelateIQ. I worked on building the data mining system from scratch — and by the time I left I had built most of the data products deployed in RelateIQ. And in the process I learnt a hell of a lot.

Jace Kohlmeier the data scientist at Khan Academy who joined the company after listening to the TED talk of Salman ,  had a  background of finance in the field of high frequency trading adds to the discussion on  learning new skills and crossing the learning curve of knowing more about data science on the job...There is not a steady rate at which you learn new techniques and employ them; it definitely comes in waves. When I made the transition into this new domain of education and internet-generated data, I went through a period of needing to learn new modeling techniques. I wasn't familiar with probabilistic graphical models; that wasn't something that I had used in high frequency trading. Once I got past that initial learning curve, learning came very much in waves. There will be a very concrete and motivating need or goal.

Finally Joe Blitztein the Harvard professor who teaches Statistics hits the nail on its head when he observes that  ....You have to be energetic and work really hard, but not get discouraged just because you don't know everything.

Metrics

The metrics for data science depends upon the problem that your customer wants you to solve. As Diane Wu data scientist at Palantir, says.. some may want a streaming solution while others may want a static model based off the information from their databases. These can be ranging from one to several  dozens. She says that in her role, success is very measureable --it is the accuracy or the precision / recall of your model performance. It also depends on what questions you ask or try to find the right questions and try to make an impact with the answer.

Industrious

The book also touches upon the value of industry and hard work and discipline. You have to be prepared to put in long hours of hard work not only in bridging the gaps in understanding of data science ,  certain missing links in your armory but also with the problem in hand which you are trying to decipher. DJ Patil, who lead a team of data scientists in LinkedIn and since then has gone on to become the first chief data scientist of the US--talks about the same very eloquently, when he says ..One of the first things I tell new data scientists when they get into the organization is that they better be the first ones in the building and the last ones out... You're not putting in your time because of some mythical ten thousand hours thing (I don’t buy that argument at all, I think it’s false because it assumes linear serial learning rather than parallelized learning that accelerates). You put in your time because you can learn a lot more about disparate things that fit into the puzzle together. It's like a stew, it only becomes good if it’s been simmering for long time.

Finding the relevant question and the art of Story Telling

In the book there is also the emphasizing of the narration of a problem and the ability to communicate the solution in the form of a story without losing the feeling of passion and curiosity. This is equally important along with the usual skills. Hilary Mason the  New York based scientist and founder of Fast Forward Labs says..For each data project you’re working on, you need to ask yourself these questions: what are you working on? How will I know when it’s done? What does it impact?  She has very valuable advice for the aspiring data scientists..Try to do a project that plays to your strengths. In general, I divide the work of a data scientist into three buckets: Stats, Code, and Storytelling/Visualization. Whichever one of those you’re best at, do a project that highlights that strength. Then, do a project using whichever one of those you’re worst at. This helps you grow, learn something new, and figure out what you need to learn next. Keep going from there.

John Foreman the data scientist at Mail Chimp makes an important point when he talks about...For me, a core skill that any data scientist should possess is the ability to communicate with the business. It’s dangerous to rely on others at a business to actively identify and throw problems at the data scientist while he or she passively waits to receive work.

Rock stars

The rock stars themselves....As I have already mentioned in this post you will find the list a heady mix of who is who in the field.

DJ Patil, Hilary Mason, Pete Skomoroch, Riley Newman, Jonathan Goldman, Michael Hochster,
George Roumeliotis, Kevin Novak, Jace Kohlmeier, Chris Moody, Erich Owens, Luis Sanchez,
Eithon Cadag, Sean Gourley, Clare Corthell, Diane Wu, Joe Blitzstein, Josh Wills, Bradley Voytek,
Michelangelo D’Agostino, Mike Dewar, Kunal Punera, William Chen, John Foreman, Drew Conway

Authors

The authors themselves have a interesting and varied background, which has made this book that much more special. They have certainly brought the best out of the data scientists for the benefit of the entire race of this field. 

  • Carl Shan - He is a Data Scientist for Social Good Fellow in Chicago, where he works with President Obama’s former Chief Scientist on applying machine learning and data science to pressing policy issues and holds a honors degree in Statistics from UC Berkeley. He’s written extensively on his experiences in applying machine learning to social issues. 
  • Henry Wang - He is an investment analyst with New Zealand’s sovereign wealth fund, where he focuses on private investments in alternative energy technologies and holds a Bachelors in Statistics from UC Berkeley.
  • William Chen - He is a data scientist at Quora, where he helps grow and share the world’s knowledge. He is also an avid writer on Quora, where he answers questions on data science, statistics, machine learning, probability, and more. Check out his recent projects (like The Only Probability Cheatsheet You’ll Ever Need) on his website.  He  holds a Bachelors in Statistics and a Masters in Applied Mathematics from Harvard.
  • Max Song - He  is a data scientist currently working on secret projects in Paris. Previously, he was the youngest data scientist at DARPA-backed startup Ayasdi, where he used topological data analysis and machine learning to build predictive models. He wrote a popular post about his journey to become a data scientist on Medium, and enjoys the craft of writing. 

Conclusion

The diversity of the backgrounds  of all these artistes is what makes it very interesting- academic, career or domain wise, but still something that ties them all is curiosity and the hunger to satisfy that famished state. These artistes make you think and contemplate. 

Why is data science so important in today’s world and economy?

  • How does one master the triple disciplines of programming, statistics and domainexpertise to become an effective data scientist?
  • How do you transition from academia, or other fields, to a position in data science?
  • What separates the work of a data scientists from a statistician, and a software engineer? How can they work together?
  • What should you look for when evaluating data science roles at companies?
  • What does it take to build an effective data science team?
  • What mindsets, techniques and skills distinguishes a great data scientist from the merely good?
  • What lies in the future for data science?

Apart from the above rock stars , you can additionally follow the following Grand Masters who have not been featured in the book but are also equally working hard, untiringly for the growth of this industry. You can  just Google and follow them through twitter or their websites..

Vincent Granville, Gregory Piatetsky , Kirk Borne,  Eric Colson, Marck Vaisman, Milind Bhandarkar, Monica Rogati, Simon Zhang, Dean Abbot, Nate Silver

You will never  miss out on their rays of insight  and the  sprinkle of stardust on you.

If still your hunger has not satiated then you can follow the list  of top 50/100 influencers and brands in the industry that will surely get you going.

Finally I leave you with another gem from the BOOK ...This time it is from Sean Gourley Co founder and CTO at QUID...  I think data science is really going to become more of a product design process; actually an algorithm design process. Algorithms take information and direct us; whether it’s the information we read, the music we listen to, the places we drink coffee, the friends we meet, or the updates in our lives.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 12377

Tags: Courses, Science, Scientists, analytics, data, free, mining, techniques

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Patti Tillotson on May 27, 2015 at 4:27am

I like that this is written by real-world practitioners.

 

Comment by William Vorhies on May 21, 2015 at 7:56am

Sounds like a great book.

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service