Data analysis challenge + how can I detect the advanced correlation between variables? - Data Science Central2020-05-31T11:25:02Zhttps://www.datasciencecentral.com/forum/topics/what-tool-or-programming-library-can-i-use-to-auto-detect-the?feed=yes&xn_auth=noHello Vincent,Thanks for your…tag:www.datasciencecentral.com,2019-08-07:6448529:Comment:8662112019-08-07T13:33:45.883ZVincent Hayekhttps://www.datasciencecentral.com/profile/VincentHayek
<p>Hello Vincent,<br></br>Thanks for your answer.</p>
<p>Not in batch mode, I am more planning to load manually a CSV file in a Python/R/SAS program.<br></br><br></br>Excel could run correlation analysis between numerical variables, but des not handle:</p>
<p>- correlation analysis on sub-scopes (e.g two variables X and Y have a low correlation, except when X >50)<br></br>- correlation involving more than two variables (e.g two variables X and Y have a low correlation, except when Z is >0)<br></br>-…</p>
<p>Hello Vincent,<br/>Thanks for your answer.</p>
<p>Not in batch mode, I am more planning to load manually a CSV file in a Python/R/SAS program.<br/><br/>Excel could run correlation analysis between numerical variables, but des not handle:</p>
<p>- correlation analysis on sub-scopes (e.g two variables X and Y have a low correlation, except when X >50)<br/>- correlation involving more than two variables (e.g two variables X and Y have a low correlation, except when Z is >0)<br/>- correlation with non numerical variables like days, cities, etc.<br/><br/>I will check Python Numpy, but I haven't seen that it handled these cases.<br/><br/></p>
<p>Thanks for the reference to your book, very interesting!</p> Are you running your computat…tag:www.datasciencecentral.com,2019-08-03:6448529:Comment:8644942019-08-03T20:08:44.648ZVincent Granvillehttps://www.datasciencecentral.com/profile/VincentGranville
<p>Are you running your computations in batch mode? Some libraries in Python and R will solve your problem, see for instance <a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html" rel="noopener" target="_blank">here</a>. In Excel, it is easy to compute the correlation matrix (all the correlations at once) if you install the Analysis Toolpack, see…</p>
<p>Are you running your computations in batch mode? Some libraries in Python and R will solve your problem, see for instance <a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html" target="_blank" rel="noopener">here</a>. In Excel, it is easy to compute the correlation matrix (all the correlations at once) if you install the Analysis Toolpack, see <a href="https://study.com/academy/lesson/creating-a-correlation-matrix-in-excel.html" target="_blank" rel="noopener">here</a>. </p>
<p>The main concern, if you have many variables, is that some correlations will be significant just by chance, even with totally random data. See the section on <em>p</em>-values in my book (<a href="https://www.datasciencecentral.com/profiles/blogs/free-book-statistics-new-foundations-toolbox-and-machine-learning" target="_blank" rel="noopener">here</a>) page 236.</p>