Featured Discussions - Data Science Central2019-04-22T14:12:27Zhttps://www.datasciencecentral.com/forum/topic/list?feed=yes&xn_auth=no&featured=1What Negative Kurtosis impliestag:www.datasciencecentral.com,2019-04-15:6448529:Topic:8174962019-04-15T12:08:46.664ZNiana Thomashttps://www.datasciencecentral.com/profile/NianaThomas
<p><span>Hi...I am new to data science, I have a dataset and when I am computing kurtosis of the Target Variable I am getting the value as -0.94. What I understand is a high kurtosis implies a Heavy tail of the data and low kurtosis implies light tail. What I can decide on this negative kurtosis?</span></p>
<p><span>Hi...I am new to data science, I have a dataset and when I am computing kurtosis of the Target Variable I am getting the value as -0.94. What I understand is a high kurtosis implies a Heavy tail of the data and low kurtosis implies light tail. What I can decide on this negative kurtosis?</span></p> How did your company choose a BI tool?tag:www.datasciencecentral.com,2019-04-12:6448529:Topic:8171502019-04-12T17:37:41.938ZLee Schlesingerhttps://www.datasciencecentral.com/profile/LeeSchlesinger
<p><span>I’m curious to learn how companies settle on a BI tool. What criteria did you consider? How did you evaluate different tools? Did you choose the BI tool before or after other pieces of your data analytics stack (data warehouse, ETL pipeline)?</span></p>
<p><span>I’m curious to learn how companies settle on a BI tool. What criteria did you consider? How did you evaluate different tools? Did you choose the BI tool before or after other pieces of your data analytics stack (data warehouse, ETL pipeline)?</span></p> Restrictions for split-half testing ?tag:www.datasciencecentral.com,2019-04-06:6448529:Topic:8158552019-04-06T18:03:17.613ZHeiko Hahnhttps://www.datasciencecentral.com/profile/HeikoHahn
<p>Hello everyone,</p>
<p>I am a psychology student about to write my bachelor thesis for which I need to test some questionnaires for split-half reliability (plus Cronbach's Alpha and retest relability). In an article on the <em>Statistics How To</em> website I found an interesting passage describing some limitiations of split half testing.…</p>
<p></p>
<p>Hello everyone,</p>
<p>I am a psychology student about to write my bachelor thesis for which I need to test some questionnaires for split-half reliability (plus Cronbach's Alpha and retest relability). In an article on the <em>Statistics How To</em> website I found an interesting passage describing some limitiations of split half testing.</p>
<p><a href="https://www.statisticshowto.datasciencecentral.com/split-half-reliability/" target="_blank" rel="noopener">https://www.statisticshowto.datasciencecentral.com/split-half-reliability/</a></p>
<p>One criterion is that there should be a "large set of questions". The other criterion is that there should be only one construct and no subscales.</p>
<p>I hadn't heard about these restrictions before. Researching these aspects, I couldn't find any literature I could cite this information from. Is this statistical "common sense" or is it taken from experience ? Has anyone heard of these limitations before ? Does anyone know some literature as a citation source ?</p>
<p></p>
<p>Thanks in advance</p>
<p>Heiko</p> FAMD - Factor Analysis of Mixed Datatag:www.datasciencecentral.com,2019-04-04:6448529:Topic:8151772019-04-04T07:13:57.748ZUli Wallerhttps://www.datasciencecentral.com/profile/UliWaller
<p>Hi there,</p>
<p>I'm new in the Data Science environment and have done some research to find a solution to do some analysis on combined quantitative (cost, transactions, notional, #applications, ...) and qualitative (region, country, month, organizational unit,...) variables in python.</p>
<p>The only library I could find is called Prince but with very limited documentation.</p>
<p></p>
<p>Problem Statement:</p>
<p>How do I process a Factor Analysis of Mixed Data (FAMD) in python to…</p>
<p>Hi there,</p>
<p>I'm new in the Data Science environment and have done some research to find a solution to do some analysis on combined quantitative (cost, transactions, notional, #applications, ...) and qualitative (region, country, month, organizational unit,...) variables in python.</p>
<p>The only library I could find is called Prince but with very limited documentation.</p>
<p></p>
<p>Problem Statement:</p>
<p>How do I process a Factor Analysis of Mixed Data (FAMD) in python to identify patterns, correlations between the quantitative and qualitative variables.</p>
<p></p>
<p>Please let me know if there is an easier way than using Prince, e.g. preprocess the qualitative variables into numerical data using, e.g. the scikit labelencoder or maybe I'm going in the wrong direction.</p>
<p></p>
<p>Thanks in advance</p>
<p></p>
<p>Uli</p>
<p></p>
<p></p> Minimum number of observations per time slice?tag:www.datasciencecentral.com,2019-04-03:6448529:Topic:8146992019-04-03T00:03:03.560Zdavidgibsonhttps://www.datasciencecentral.com/profile/davidgibson
<p>Is there a minimum number of observations per time slice needed? What happens to the method as the number approaches 1?</p>
<p>Is there a minimum number of observations per time slice needed? What happens to the method as the number approaches 1?</p> What was your most difficult job interview question?tag:www.datasciencecentral.com,2019-04-02:6448529:Topic:8144482019-04-02T01:28:36.513ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p>Whether as a job applicant for a data science role, or as a hiring manager. Was it a technical question (mathematics, statistics, or coding problem?) What it a riddle? Or a general question? Were you able to successfully answer it?</p>
<p>We have published various lists of job interview questions. You can find them <a href="https://www.datasciencecentral.com/page/search?q=interview+question" rel="noopener" target="_blank">here</a>. Specific lists include:…</p>
<p>Whether as a job applicant for a data science role, or as a hiring manager. Was it a technical question (mathematics, statistics, or coding problem?) What it a riddle? Or a general question? Were you able to successfully answer it?</p>
<p>We have published various lists of job interview questions. You can find them <a href="https://www.datasciencecentral.com/page/search?q=interview+question" target="_blank" rel="noopener">here</a>. Specific lists include:</p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/46-questions-on-sql-to-test-a-data-science-professional-skilltest" target="_blank" rel="noopener">SQL</a> | <a href="https://www.datasciencecentral.com/profiles/blogs/70-mongodb-interview-questions-and-answers" target="_blank" rel="noopener">MongoDB</a> | <a href="https://www.datasciencecentral.com/profiles/blogs/r-programming-job-interview-questions-and-answers" target="_blank" rel="noopener">R</a> | <a href="https://www.datasciencecentral.com/profiles/blogs/top-25-hadoop-interview-questions-prepared-by-experts" target="_blank" rel="noopener">Hadoop</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/answers-to-dozens-of-data-science-job-interview-questions" target="_blank" rel="noopener">Data Science</a> (with answers)</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/25-questions-to-detect-fake-data-scientists" target="_blank" rel="noopener">Data Science</a> (50 questions to test true data science knowledge)</li>
</ul> How do I import a SAS layout file correctly in DSS?tag:www.datasciencecentral.com,2019-03-14:6448529:Topic:8094902019-03-14T06:10:15.849ZNiana Thomashttps://www.datasciencecentral.com/profile/NianaThomas
<p>I'm trying to import a SAS layout file into DSS in such a way that I can use it as a lookup table for column names/descriptions. Is it possible to do this within the Data Science Studio?</p>
<p>Here is a truncated example of the layout contents:</p>
<p></p>
<p>DATE: 07/02/2014 POS RECORD LAYOUT PAGE: 1<br></br> Hospital, CATEGORY = "01" (SEE POSITIONS 3-4)</p>
<p>SHORT DESCRIPTION LEN START END TYPE</p>
<p>Provider Category Subtype Code 2 1 2 VARCHAR2<br></br> Description: Identifies the subtype of…</p>
<p>I'm trying to import a SAS layout file into DSS in such a way that I can use it as a lookup table for column names/descriptions. Is it possible to do this within the Data Science Studio?</p>
<p>Here is a truncated example of the layout contents:</p>
<p></p>
<p>DATE: 07/02/2014 POS RECORD LAYOUT PAGE: 1<br/> Hospital, CATEGORY = "01" (SEE POSITIONS 3-4)</p>
<p>SHORT DESCRIPTION LEN START END TYPE</p>
<p>Provider Category Subtype Code 2 1 2 VARCHAR2<br/> Description: Identifies the subtype of the provider, within the<br/> primary category. Used in reporting to show the<br/> breakdown of provider categories, mainly for hospitals<br/> and SNFs.<br/> SAS Name: PRVDR_CTGRY_SBTYP_CD<br/> COBOL Name: PRVDR-CTGRY-SBTYP-CD<br/> VALUES: 01=Short Term<br/> 02=Long Term<br/> 03=Religious Non-Medical Health Care Institutions<br/> 04=Psychiatric</p> IBM Watson and Jupyter Notebooks - Worthwhile?tag:www.datasciencecentral.com,2019-03-13:6448529:Topic:8095462019-03-13T21:31:17.437ZLarry Fieldhttps://www.datasciencecentral.com/profile/LarryField
<p>I finished the Coursera Data Science Professional cert, which is an IBM course that uses the Jupyter Notebooks and Python in IBM Watson. I used Watson enough to run out of free time and had to sign up for the single user account to finish up. Well... is this such a good idea?</p>
<p>My account billing estimate is now at about $132 after about 2-3 weeks. That doesn't panic me at the moment as they have this $200 credit for new upgrades. I have had a problem with the relationship between…</p>
<p>I finished the Coursera Data Science Professional cert, which is an IBM course that uses the Jupyter Notebooks and Python in IBM Watson. I used Watson enough to run out of free time and had to sign up for the single user account to finish up. Well... is this such a good idea?</p>
<p>My account billing estimate is now at about $132 after about 2-3 weeks. That doesn't panic me at the moment as they have this $200 credit for new upgrades. I have had a problem with the relationship between my MACOS and the Jupyter Notebook, which the IBM people don't seem very interested in answering. Someone on Stack Overflow directed me to a Jupyter site in SO so I logged the problem there, then noticed that a couple hundred other problem reports had been languishing there for up to 2 years. Hmm. This is a bad sign, I can tell.</p>
<p>IBM Watson/Jupyter notebooks seem to me to only provide value when groups are collaborating and the IBM Cloud has value when an Enterprise needs a host for their application. Otherwise, they don't seem to add value for an individual who is just trying to learn the trade and push out a few studies.</p>
<p>Am I missing something or is that about it?</p>
<p></p>
<p>Also, can anyone provide a human scale interpretation of the IBM Watson billing structure? Billowing clouds of vCPU hours and instantiations charges. I think that, given the charges that I've accumulated so far, and the $200 credit for the upgrade, it all means $200/month.</p> Career Prospects for Data Science with a Graduate Engineering Degree and Apprenticeship?tag:www.datasciencecentral.com,2019-03-12:6448529:Topic:8092602019-03-12T17:02:08.467ZBryan Atkinsonhttps://www.datasciencecentral.com/profile/BryanAtkinson
<p>First, let me say that I understand so much can change in just a few years that I can not hold any plan down to certainty. I am currently accepted to a 6 month mentorship program with a Data Scientist as my mentor, and I have never held a job before. I am also trying to get accepted into a local Junior Software Engineering apprenticeship through Lauchcode alongside my mentorship. After this, my goal is to get accepted to a 2 year Corporate Analytics Professional Development Program or get…</p>
<p>First, let me say that I understand so much can change in just a few years that I can not hold any plan down to certainty. I am currently accepted to a 6 month mentorship program with a Data Scientist as my mentor, and I have never held a job before. I am also trying to get accepted into a local Junior Software Engineering apprenticeship through Lauchcode alongside my mentorship. After this, my goal is to get accepted to a 2 year Corporate Analytics Professional Development Program or get hired as an Associate Data Scientist with a Bachelor's Degree. That is my current plan set in stone.</p>
<p>However, I have been told repeatedly that the type of degree you get can serve as a barrier to entry for employment.</p>
<p>If you do not have a degree in the specific field and related work experience, people are likely to question your usefulness. Thus, I have a specific scenario that might become my future:</p>
<p>I have a Bachelor's Degree in Mathematics and a Minor in Computer Science.</p>
<p>Suppose over the next three years I obtain one to two years full time experience as an entry level Associate Data Scientist. Next, I decide to enroll into an Aerospace Engineering Master's Degree, and I simultaneously work three years part time in Data Analysis and Software Development as I complete the degree.</p>
<p>The Master's Degree I entered had a research thesis option, and I decide to specialize in creating, testing and simulating Embedded AI Control Systems for autonomously flying unmanned aircraft. I complete my research thesis, do an internship with an Aerospace Company, and try to get / find time for a Data Science internship.</p>
<p>I will always take at least six courses in Random Stochastic Processes, Machine Learning Theory, Deep / Reinforcement Learning, Bayesian Statistics, High Dimensional Statistics, Generalized Linear Models, Data Mining, Computer Science, and Data Wrangling no matter the degree.</p>
<p>After graduating, I would try to find employment designing embedded AI systems in aircraft. If I really enjoy the work, there's a good chance this becomes a PhD in something like embedded AI flight controls for evasive maneuvers and I do another two internships while trying to put a small Data Science spin on my research.</p>
<p>How would you evaluate someone as above with just a Master's in Aerospace Engineering once they reached out to make an employment pitch after applying for a Data Science position at your company with a reference?</p>
<p>What if this person made a pitch and applied with a reference to the same position at your company while having an Aerospace PhD and three years experience in Aerospace Industry?</p>
<p>What could excite you about a person studying Aerospace, and what might they lack vs. other candidates?</p>
<p>I really like the idea of Aerospace and want to seriously try it out. However, I'm afraid of the fact that Aerospace Engineering only has an expected 5% to 9% job growth with 3000 new jobs over the next 8 years. I want to be versatile.</p>
<p></p> Who is using regression models with interactions?tag:www.datasciencecentral.com,2019-03-12:6448529:Topic:8089872019-03-12T14:15:20.967ZCapri Granvillehttps://www.datasciencecentral.com/profile/CapriGranville733
<p>I believe biostatisticians use it, especially with small data sets, in the context of clinical trials, especially when using dummy variables (e.g. 0/1 for gender.) I am wondering how decision trees and other models compare with interaction regression. I would think the interaction regression model suffers from the same issues as <a href="https://www.datasciencecentral.com/profiles/blogs/deep-dive-into-polynomial-regression-and-overfitting" rel="noopener" target="_blank">polynomial…</a></p>
<p>I believe biostatisticians use it, especially with small data sets, in the context of clinical trials, especially when using dummy variables (e.g. 0/1 for gender.) I am wondering how decision trees and other models compare with interaction regression. I would think the interaction regression model suffers from the same issues as <a href="https://www.datasciencecentral.com/profiles/blogs/deep-dive-into-polynomial-regression-and-overfitting" target="_blank" rel="noopener">polynomial regression</a>. </p>
<p>For those not familiar with the concept, below is an introduction to interaction regression, from <a href="https://stattrek.com/multiple-regression/interaction.aspx" target="_blank" rel="noopener">Stattrek</a>. </p>
<p>In regression, an interaction effect exists when the effect of an independent variable on a dependent variable changes, depending on the value(s) of one or more other independent variables.</p>
<h2>Interaction Effects in Equations</h2>
<p>In a regression equation, an interaction effect is represented as the product of two or more independent variables. For example, here is a typical regression equation<span> </span><i>without</i><span> </span>an interaction:</p>
<p>ŷ = b<sub>0</sub><span> </span>+ b<sub>1*</sub>X<sub>1</sub><span> </span>+ b<sub>2*</sub>X<sub>2</sub></p>
<p>where ŷ is the predicted value of a dependent variable, X<sub>1</sub><span> </span>and X<sub>2</sub><span> </span>are independent variables, and b<sub>0</sub>, b<sub>1</sub>, and b<sub>2</sub><span> </span>are regression coefficients.</p>
<p>And here is the same regression equation<span> </span><i>with</i><span> </span>an interaction:</p>
<p>ŷ = b<sub>0</sub><span> </span>+ b<sub>1*</sub>X<sub>1</sub><span> </span>+ b<sub>2*</sub>X<sub>2</sub><span> </span>+ b<sub>3*</sub>X<sub>1*</sub>X<sub>2</sub></p>
<p>Here, b<sub>3</sub><span> </span>is a regression coefficient, and X<sub>1</sub>X<sub>2</sub><span> </span>is the interaction. The interaction between X<sub>1</sub><span> </span>and X<sub>2</sub><span> </span>is called a two-way interaction, because it is the interaction between two independent variables. Higher-order interactions are possible, as illustrated by the three-way interaction in the following equation:</p>
<p>ŷ = b<sub>0</sub><span> </span>+ b<sub>1*</sub>X<sub>1</sub><span> </span>+ b<sub>2*</sub>X<sub>2</sub><span> </span>+ b<sub>3*</sub>X<sub>3</sub><span> </span>+ b<sub>4*</sub>X<sub>1*</sub>X<sub>2</sub><span> </span>+ b<sub>5*</sub>X<sub>1*</sub>X<sub>3</sub><span> </span>+ b<sub>6*</sub>X<sub>2*</sub>X<sub>3</sub><span> </span>+ b<sub>7*</sub>X<sub>1*</sub>X<sub>2*</sub>X<sub>3</sub></p>
<p>Analysts usually steer clear of higher-order interactions, like X<sub>1*</sub>X<sub>2*</sub>X<sub>3</sub>, since they can be hard to interpret.</p>
<p><strong>Illustration</strong></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/1387762547?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/1387762547?profile=RESIZE_710x" class="align-center"/></a></p>
<p><span>For males, drug dosage has a minimal effect on anxiety; but for females, the effect is dramatic. The effect of drug dose cannot be understood without accounting for the gender of the person receiving the medication.</span></p>
<p>Typically, when a regression equation includes an interaction term, the first question you ask is: Does the interaction term contribute in a meaningful way to the explanatory power of the equation? You can answer that question by:</p>
<ul>
<li>Assessing the statistical significance of the interaction term.</li>
<li>Comparing the coefficient of determination with and without the interaction term.</li>
</ul>
<p>If the interaction term is statistically significant, the interaction term is probably important. And if the coefficient of determination is also much bigger with the interaction term, it is definitely important. If neither of these outcomes are observed, the interaction term can be removed from the regression equation.</p>
<p><span style="font-size: 14pt;"><b>DSC Resources</b></span></p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/new-books-and-resources-for-dsc-members">Free Book and Resources for DSC Members</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/decomposition-of-statistical-distributions-using-mixture-models-a">New Perspectives on Statistical Distributions and Deep Learning</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/data-science-wizardry">Deep Analytical Thinking and Data Science Wizardry</a></li>
<li><a href="https://www.datasciencecentral.com/page/search?q=statistical+concepts">Statistical Concepts Explained in Simple English</a></li>
<li><a href="https://www.datasciencecentral.com/page/search?q=in+one+pictures">Machine Learning Concepts Explained in One Picture</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources">Comprehensive Repository of Data Science and ML Resources</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel">Advanced Machine Learning with Basic Excel</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning">Difference between ML, Data Science, AI, Deep Learning, and Statistics</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles">Selected Business Analytics, Data Science and ML articles</a></li>
<li><a href="http://careers.analytictalent.com/jobs/products">Hire a Data Scientist</a><span> </span>|<span> </span><a href="http://www.datasciencecentral.com/page/search?q=Python">Search DSC</a><span> </span>|<span> </span><a href="http://www.analytictalent.com">Find a Job</a></li>
<li><a href="http://www.datasciencecentral.com/profiles/blog/new">Post a Blog</a><span> </span>|<span> </span><a href="http://www.datasciencecentral.com/forum/topic/new">Forum Questions</a></li>
</ul>
<p><span>Follow us: <a href="https://twitter.com/DataScienceCtrl">Twitter</a> | <a href="https://www.facebook.com/DataScienceCentralCommunity/">Facebook</a></span></p>