Comments - Data science versus statistics, to solve problems: case study - Data Science Central2020-01-18T01:30:50Zhttps://www.datasciencecentral.com/profiles/comment/feed?attachedTo=6448529%3ABlogPost%3A217809&xn_auth=noI have been a statistician fo…tag:www.datasciencecentral.com,2017-09-17:6448529:Comment:6219042017-09-17T16:36:24.046ZAlec Zhixiao Linhttps://www.datasciencecentral.com/profile/AlecZhixiaoLin
<p>I have been a statistician for over 15 years. If asked to approach the problem, i would choose the method of sampling suggested in the article. This is not to claim that data scientists know nothing about sampling. Not sure how survival analysis can apply here, but I agree with the author that statistical modeling is meant for knowledge discovery. A statistical model needs to be explainable in a white-box manner. </p>
<p>I have been a statistician for over 15 years. If asked to approach the problem, i would choose the method of sampling suggested in the article. This is not to claim that data scientists know nothing about sampling. Not sure how survival analysis can apply here, but I agree with the author that statistical modeling is meant for knowledge discovery. A statistical model needs to be explainable in a white-box manner. </p> Am I missing something? Why…tag:www.datasciencecentral.com,2014-10-30:6448529:Comment:2197762014-10-30T20:49:03.430ZHerbert L Roitblathttps://www.datasciencecentral.com/profile/HerbertLRoitblat
<p>Am I missing something? Why would you sort those lists? Why not use a hash table? Whatever solution, you need to have a unique identifier for each unique visitor. After that, you can count the number of days each unique visitor visits (unique visitor days), count the number of unique visitors over a time period, or anything else.</p>
<p>Am I missing something? Why would you sort those lists? Why not use a hash table? Whatever solution, you need to have a unique identifier for each unique visitor. After that, you can count the number of days each unique visitor visits (unique visitor days), count the number of unique visitors over a time period, or anything else.</p> Here's one of the comments I…tag:www.datasciencecentral.com,2014-10-30:6448529:Comment:2194982014-10-30T17:02:08.675ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p>Here's one of the comments I posted on the <a href="https://www.linkedin.com/pulse/article/20141027214106-2426729-data-science-versus-statistics-to-solve-problems-case-study?trk=prof-post" target="_blank">LinkedIn version of this article</a>:</p>
<p><em>I am an ex-statistician. It's not like my PhD in stats (computational statistics, 1993 - image analysis) was revoked, I exited myself when I found that most people calling themselves statisticians worked in narrowly defined sectors:…</em></p>
<p>Here's one of the comments I posted on the <a href="https://www.linkedin.com/pulse/article/20141027214106-2426729-data-science-versus-statistics-to-solve-problems-case-study?trk=prof-post" target="_blank">LinkedIn version of this article</a>:</p>
<p><em>I am an ex-statistician. It's not like my PhD in stats (computational statistics, 1993 - image analysis) was revoked, I exited myself when I found that most people calling themselves statisticians worked in narrowly defined sectors: government, surveys, clinical trials and related, banks/insurance, using very specific methods which I no longer use (ANOVA, p-value, GLM, logistic regression etc.)</em></p>
<p><em>Many of the statistical methods that I still use (confidence intervals, random number generation) are so different from what I learned in grad school, that it's like comparing apples with oranges. For instance, I use model-free confidence intervals not underlined by any statistical models.</em></p>
<p><em>When looking for a job, it (more often than not) creates confusion among recruiters/hiring managers if I call myself statistician. Data scientist, machine learning guy or analyst resonates better, as it fits better with the kind of stuff I'd be working on, if hired. Ironically, now I could call myself statistician again, as I will almost never again look for a job (being a successful, happy, stress-free entrepreneur). But statistics have evolved in one direction (at least the stuff you find in academia or in AMSTAT - both research and training), and me in another one, that it does not make sense for me to call myself statistician.</em></p>
<p><em>Quality control, actuarial sciences, and operations research are more closely aligned with statistical science. Not sure why these professionals don't call themselves statisticians anymore, either.</em></p> Question on Note 1 under Data…tag:www.datasciencecentral.com,2014-10-30:6448529:Comment:2194882014-10-30T16:32:40.311ZRichard Meyerhttps://www.datasciencecentral.com/profile/RichardMeyer
<p>Question on Note 1 under Data Science Approach: Wouldn't it be preferable to sample using a true simple random sampling method rather than the proposed systemic sampling method? Thanks.</p>
<p></p>
<p>Rich M.</p>
<p>Question on Note 1 under Data Science Approach: Wouldn't it be preferable to sample using a true simple random sampling method rather than the proposed systemic sampling method? Thanks.</p>
<p></p>
<p>Rich M.</p> I used to do what is called "…tag:www.datasciencecentral.com,2014-10-29:6448529:Comment:2190942014-10-29T17:28:50.305ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p>I used to do what is called "computational statistics" during my PhD years back in 1990. But the word is no longer used (data science replaced it, at least in US).</p>
<p>I used to do what is called "computational statistics" during my PhD years back in 1990. But the word is no longer used (data science replaced it, at least in US).</p> I would like to suggest that…tag:www.datasciencecentral.com,2014-10-29:6448529:Comment:2191662014-10-29T17:21:50.170Zabbas Shojaeehttps://www.datasciencecentral.com/profile/abbasShojaee
<p>I would like to suggest that "data science" can be considered as a wide umbrella that statistical analysis falls in it as one of many disciplines. I would prefer to discriminate them as statistical approaches versus computational approaches. This naming denotes that the the first group is based on and limited to probability theory and other groups do not.<br></br><br></br>In my opinion the disputes around statistics raise because until recent emergence of computational resources there was a of lack…</p>
<p>I would like to suggest that "data science" can be considered as a wide umbrella that statistical analysis falls in it as one of many disciplines. I would prefer to discriminate them as statistical approaches versus computational approaches. This naming denotes that the the first group is based on and limited to probability theory and other groups do not.<br/><br/>In my opinion the disputes around statistics raise because until recent emergence of computational resources there was a of lack of a good substitute for statistics and as a result it has been overly used and applied in several scenarios that it is not meant or built for. It is a kind of maxtooling of statistics the same way that some people try to use the concept of spreadsheets (i.n. Excel) for every data storage and processing scenario which is not sufficient or efficient.<br/><br/>Also I'd like to add that, in above article sampling could be similar in computation approaches or statistical approach. But I agree that for modeling purposes, using computational approaches rather then statistical ones, can be less complex, less limited, faster and more information rich.</p> Some statisticians claim that…tag:www.datasciencecentral.com,2014-10-28:6448529:Comment:2179952014-10-28T17:25:25.907ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p><span>Some statisticians claim that data scientists know nothing about sampling or experimental design. I wanted to show here that indeed, real data scientists do know these techniques. They belong to statistics as well, although the statistical and data science versions can be quite different (model-free confidence intervals, no p-value in data science, vs. distribution-based confidence intervals and statistical testing with p-values, in statistical science).</span></p>
<p><span>Also,…</span></p>
<p><span>Some statisticians claim that data scientists know nothing about sampling or experimental design. I wanted to show here that indeed, real data scientists do know these techniques. They belong to statistics as well, although the statistical and data science versions can be quite different (model-free confidence intervals, no p-value in data science, vs. distribution-based confidence intervals and statistical testing with p-values, in statistical science).</span></p>
<p><span>Also, someone mentioned using <a href="http://en.wikipedia.org/wiki/HyperLogLog" target="_blank">HyperLogLog</a><span> rather than sort, to solve this problem.</span></span></p>