<p><span>Hi...I am new to data science, I have a dataset and when I am computing kurtosis of the Target Variable I am getting the value as -0.94. What I understand is a high kurtosis implies a Heavy tail of the data and low kurtosis implies light tail. What I can decide on this negative kurtosis?</span></p>
<p><span>I’m curious to learn how companies settle on a BI tool. What criteria did you consider? How did you evaluate different tools? Did you choose the BI tool before or after other pieces of your data analytics stack (data warehouse, ETL pipeline)?</span></p>
<p>Hello everyone,</p>
<p>I am a psychology student about to write my bachelor thesis for which I need to test some questionnaires for split-half reliability (plus Cronbach's Alpha and retest relability). In an article on the <em>Statistics How To</em> website I found an interesting passage describing some limitiations of split half testing.…</p>
<p><a href="https://www.statisticshowto.datasciencecentral.com/split-half-reliability/" target="_blank" rel="noopener">https://www.statisticshowto.datasciencecentral.com/split-half-reliability/</a></p>
<p>One criterion is that there should be a "large set of questions". The other criterion is that there should be only one construct and no subscales.</p>
<p>I hadn't heard about these restrictions before. Researching these aspects, I couldn't find any literature I could cite this information from. Is this statistical "common sense" or is it taken from experience ? Has anyone heard of these limitations before ? Does anyone know some literature as a citation source ?</p>
<p>Thanks in advance</p>
<p>Heiko</p> FAMD - Factor Analysis of Mixed Datatag:www.datasciencecentral.com,2019-04-04:6448529:Topic:8151772019-04-04T07:13:57.748ZUli Wallerhttps://www.datasciencecentral.com/profile/UliWaller
<p>Hi there,</p>
<p>I'm new in the Data Science environment and have done some research to find a solution to do some analysis on combined quantitative (cost, transactions, notional, #applications, ...) and qualitative (region, country, month, organizational unit,...) variables in python.</p>
<p>The only library I could find is called Prince but with very limited documentation.</p>
<p>Problem Statement:</p>
<p>How do I process a Factor Analysis of Mixed Data (FAMD) in python to…</p>
<p></p>
<p>Please let me know if there is an easier way than using Prince, e.g. preprocess the qualitative variables into numerical data using, e.g. the scikit labelencoder or maybe I'm going in the wrong direction.</p>
<p></p>
<p>Thanks in advance</p>
<p></p>
<p>Uli</p>
<p>Is there a minimum number of observations per time slice needed? What happens to the method as the number approaches 1?</p>
<p>Whether as a job applicant for a data science role, or as a hiring manager. Was it a technical question (mathematics, statistics, or coding problem?) What it a riddle? Or a general question? Were you able to successfully answer it?</p>
<ul>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/46-questions-on-sql-to-test-a-data-science-professional-skilltest" target="_blank" rel="noopener">SQL</a> | <a href="https://www.datasciencecentral.com/profiles/blogs/70-mongodb-interview-questions-and-answers" target="_blank" rel="noopener">MongoDB</a> | <a href="https://www.datasciencecentral.com/profiles/blogs/r-programming-job-interview-questions-and-answers" target="_blank" rel="noopener">R</a> | <a href="https://www.datasciencecentral.com/profiles/blogs/top-25-hadoop-interview-questions-prepared-by-experts" target="_blank" rel="noopener">Hadoop</a></li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/answers-to-dozens-of-data-science-job-interview-questions" target="_blank" rel="noopener">Data Science</a> (with answers)</li>
<li><a href="https://www.datasciencecentral.com/profiles/blogs/25-questions-to-detect-fake-data-scientists" target="_blank" rel="noopener">Data Science</a> (50 questions to test true data science knowledge)</li>
</ul> How do I import a SAS layout file correctly in DSS?tag:www.datasciencecentral.com,2019-03-14:6448529:Topic:8094902019-03-14T06:10:15.849ZNiana Thomashttps://www.datasciencecentral.com/profile/NianaThomas
<p>I finished the Coursera Data Science Professional cert, which is an IBM course that uses the Jupyter Notebooks and Python in IBM Watson. I used Watson enough to run out of free time and had to sign up for the single user account to finish up. Well... is this such a good idea?</p>
<p>IBM Watson/Jupyter notebooks seem to me to only provide value when groups are collaborating and the IBM Cloud has value when an Enterprise needs a host for their application. Otherwise, they don't seem to add value for an individual who is just trying to learn the trade and push out a few studies.</p>
<p>Am I missing something or is that about it?</p>
<p></p>
<p>Also, can anyone provide a human scale interpretation of the IBM Watson billing structure? Billowing clouds of vCPU hours and instantiations charges. I think that, given the charges that I've accumulated so far, and the $200 credit for the upgrade, it all means $200/month.</p> Career Prospects for Data Science with a Graduate Engineering Degree and Apprenticeship?tag:www.datasciencecentral.com,2019-03-12:6448529:Topic:8092602019-03-12T17:02:08.467ZBryan Atkinsonhttps://www.datasciencecentral.com/profile/BryanAtkinson
<p>I have a Bachelor's Degree in Mathematics and a Minor in Computer Science.</p>
<p>Suppose over the next three years I obtain one to two years full time experience as an entry level Associate Data Scientist. Next, I decide to enroll into an Aerospace Engineering Master's Degree, and I simultaneously work three years part time in Data Analysis and Software Development as I complete the degree.</p>
<p>The Master's Degree I entered had a research thesis option, and I decide to specialize in creating, testing and simulating Embedded AI Control Systems for autonomously flying unmanned aircraft. I complete my research thesis, do an internship with an Aerospace Company, and try to get / find time for a Data Science internship.</p>
<p>I will always take at least six courses in Random Stochastic Processes, Machine Learning Theory, Deep / Reinforcement Learning, Bayesian Statistics, High Dimensional Statistics, Generalized Linear Models, Data Mining, Computer Science, and Data Wrangling no matter the degree.</p>
<p>After graduating, I would try to find employment designing embedded AI systems in aircraft. If I really enjoy the work, there's a good chance this becomes a PhD in something like embedded AI flight controls for evasive maneuvers and I do another two internships while trying to put a small Data Science spin on my research.</p>
<p>How would you evaluate someone as above with just a Master's in Aerospace Engineering once they reached out to make an employment pitch after applying for a Data Science position at your company with a reference?</p>
<p>What if this person made a pitch and applied with a reference to the same position at your company while having an Aerospace PhD and three years experience in Aerospace Industry?</p>
<p>What could excite you about a person studying Aerospace, and what might they lack vs. other candidates?</p>
<p>I really like the idea of Aerospace and want to seriously try it out. However, I'm afraid of the fact that Aerospace Engineering only has an expected 5% to 9% job growth with 3000 new jobs over the next 8 years. I want to be versatile.</p>
<p></p> Who is using regression models with interactions?tag:www.datasciencecentral.com,2019-03-12:6448529:Topic:8089872019-03-12T14:15:20.967ZCapri Granvillehttps://www.datasciencecentral.com/profile/CapriGranville733
<p>I believe biostatisticians use it, especially with small data sets, in the context of clinical trials, especially when using dummy variables (e.g. 0/1 for gender.) I am wondering how decision trees and other models compare with interaction regression. I would think the interaction regression model suffers from the same issues as <a href="https://www.datasciencecentral.com/profiles/blogs/deep-dive-into-polynomial-regression-and-overfitting" rel="noopener" target="_blank">polynomial…</a></p>
<p>I believe biostatisticians use it, especially with small data sets, in the context of clinical trials, especially when using dummy variables (e.g. 0/1 for gender.) I am wondering how decision trees and other models compare with interaction regression. I would think the interaction regression model suffers from the same issues as <a href="https://www.datasciencecentral.com/profiles/blogs/deep-dive-into-polynomial-regression-and-overfitting" target="_blank" rel="noopener">polynomial regression</a>. </p>
<p>For those not familiar with the concept, below is an introduction to interaction regression, from <a href="https://stattrek.com/multiple-regression/interaction.aspx" target="_blank" rel="noopener">Stattrek</a>. </p>
<p>In regression, an interaction effect exists when the effect of an independent variable on a dependent variable changes, depending on the value(s) of one or more other independent variables.</p>
<h2>Interaction Effects in Equations</h2>
<p>In a regression equation, an interaction effect is represented as the product of two or more independent variables. For example, here is a typical regression equation<span> </span><i>without</i><span> </span>an interaction:</p>
<p>ŷ = b<sub>0</sub><span> </span>+ b<sub>1*</sub>X<sub>1</sub><span> </span>+ b<sub>2*</sub>X<sub>2</sub></p>
<p>where ŷ is the predicted value of a dependent variable, X<sub>1</sub><span> </span>and X<sub>2</sub><span> </span>are independent variables, and b<sub>0</sub>, b<sub>1</sub>, and b<sub>2</sub><span> </span>are regression coefficients.</p>
<p>And here is the same regression equation<span> </span><i>with</i><span> </span>an interaction:</p>
<p>ŷ = b<sub>0</sub><span> </span>+ b<sub>1*</sub>X<sub>1</sub><span> </span>+ b<sub>2*</sub>X<sub>2</sub><span> </span>+ b<sub>3*</sub>X<sub>1*</sub>X<sub>2</sub></p>
<p>Here, b<sub>3</sub><span> </span>is a regression coefficient, and X<sub>1</sub>X<sub>2</sub><span> </span>is the interaction. The interaction between X<sub>1</sub><span> </span>and X<sub>2</sub><span> </span>is called a two-way interaction, because it is the interaction between two independent variables. Higher-order interactions are possible, as illustrated by the three-way interaction in the following equation:</p>
<p>ŷ = b<sub>0</sub><span> </span>+ b<sub>1*</sub>X<sub>1</sub><span> </span>+ b<sub>2*</sub>X<sub>2</sub><span> </span>+ b<sub>3*</sub>X<sub>3</sub><span> </span>+ b<sub>4*</sub>X<sub>1*</sub>X<sub>2</sub><span> </span>+ b<sub>5*</sub>X<sub>1*</sub>X<sub>3</sub><span> </span>+ b<sub>6*</sub>X<sub>2*</sub>X<sub>3</sub><span> </span>+ b<sub>7*</sub>X<sub>1*</sub>X<sub>2*</sub>X<sub>3</sub></p>
<p>Analysts usually steer clear of higher-order interactions, like X<sub>1*</sub>X<sub>2*</sub>X<sub>3</sub>, since they can be hard to interpret.</p>
<p><strong>Illustration</strong></p>
<p><a href="https://storage.ning.com/topology/rest/1.0/file/get/1387762547?profile=original" target="_blank" rel="noopener"><img src="https://storage.ning.com/topology/rest/1.0/file/get/1387762547?profile=RESIZE_710x" class="align-center"/></a></p>
<p><span>For males, drug dosage has a minimal effect on anxiety; but for females, the effect is dramatic. The effect of drug dose cannot be understood without accounting for the gender of the person receiving the medication.</span></p>
<p>Typically, when a regression equation includes an interaction term, the first question you ask is: Does the interaction term contribute in a meaningful way to the explanatory power of the equation? You can answer that question by:</p>
<ul>
<li>Assessing the statistical significance of the interaction term.</li>
<li>Comparing the coefficient of determination with and without the interaction term.</li>
</ul>
<p>If the interaction term is statistically significant, the interaction term is probably important. And if the coefficient of determination is also much bigger with the interaction term, it is definitely important. If neither of these outcomes are observed, the interaction term can be removed from the regression equation.</p>
