I have been following the news on data science and wanted to share some of the titles here : The sexiest job of the 21th century, Ten Trends on Data Science, How to become a data scientist, What is data science, so on..

I wonder how and when data and science became sexy and everybody started to have a stake in data science. Is it because young energetic technology companies of Silicon Valley HAS named it that way? Do we need more data or do we have more data nowadays? Whether the data make science sexy or the otherway around?

All of these questions in my mind, I was sitting in my library looking at my books that never looked so sexy before. The second edition of Schaum’s Statistics book published in 1988 and purchased in 1993. I remember using it over and over to develop correlation algorithms to identify types of bread molds. I liked what I did in late 90s as a researcher (back then people developing algorithms were called researchers); but, I don’t recall anyone envying my job or calling it sexy. My attention moves to another forgotten book by Oppenheim: Signals and Systems. That book always meant long days and nights to understand the nature of Fourier transform developing algorithms and taking it a step further to Ceptrum Domain ( a logarithmic approach to separate frequency and phase components of signal/data).

I remember the days of Bezdek, fuzzy c-means clustering. My humble team developed algorithms to classify landmines in Angola. We spent a lot of time looking at the data, matrices and vectors before selecting a random sample group. Principal component analysis was another popular method to compress the data to decrease the cost of algorithms. It was not too long ago that I wrote my dissertation on it in 2010.

Within all those algorithms and applications my favorite is a very simple method called clipping. When I realized that outliners might have some information to develop forecasting algorithms I was so impressed with the power of clipping. It is basically a fuzzy thresholding. You identify a threshold (there are a lot of ways of identifying thresholds; averaging the data, averaging chunk of data, etc.) and change your values to zero if the value is smaller than the threshold; otherwise, keep it as it is. It was so sexy to me that I had a higher resolution in my data and I could recover more features. It made the algorithms slower and costly; but, who cares in this age of cloud and powerful computers.

These were the days that MATLAB crashed over and over, had problems with averaging and filtering. We all needed to validate what we were doing. I was wondering if we still need to validate what we are doing with data and try to learn from the nature of the data? Or else, are we a step further that all datasets are the same? Can we trust to commercial products and press a button to puke graphs and histograms? Is that why data science became so sexy?

All in all, the message I am trying to give is that data science is becoming a cluster of a lot of things and nothing. We forget about data itself and focus on how many times we can click in a second using powerful package algorithms.

Views: 2598


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Sione Palu on March 3, 2015 at 8:41am

data science is a multi-disciplinary field.

Comment by Meltem Ballan on February 25, 2015 at 7:51am

Thank you very much Vincent. The data science has a lot of layers to that. In your analogy we have young generation representing up scaled burger bars and also the ones like you representing the pubs.. Or myself representing the wine bar. There should be enough room in data science to be open to different disciplines. The way I am seeing that it is becoming a-one-kind of restaurant..

Comment by Vincent Granville on February 25, 2015 at 7:23am

There are some definitions that are very precise, like what is planet Earth. And some that are fuzzy, like data scientist or restaurant. Is a pub or a wine bar a restaurant? In my opinion, yes. Is the local Burger King a restaurant? In my opinion, no. For other people, the exact opposite is true. The same applies to many concepts, and job appellations in particular. A number of people now want to call me statistician (though in the past they claimed I did not know anything about statistics).  So be it, but sorry if I call myself data scientist and can't think about anything remotely close to statistical science, that I perform daily. At the end, it is what you contribute that matters, regardless of how you call it.

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service