.

Q1) 85,000 Views

I wonder if there is a direct way to import the contents of a csv file into a record array, much in the way that R's read.table(), read.delim(), and read.csv() family imports data to R's data frame? Or is the best way to use csv.reader() and then apply something like numpy.core.records.fromrecords()?

Answer:

You can use Numpy's genfromtxt() method to do so, by setting the delimiter kwarg to a comma.

from numpy import genfromtxt my_data = genfromtxt('my_file.csv', delimiter=',')

More information on the function can be found at its respective documentation.

See full answer on Stack Overflow

Q2: 67,000 Views

Calculating Pearson correlation and significance in Python

I am looking for a function that takes as input two lists, and returns the Pearson correlation, and the significance of the correlation.

Answer:

You can have a look at scipy: http://docs.scipy.org/doc/scipy/reference/stats.html

from pydoc import help

from scipy.stats.stats import pearsonr

help(pearsonr)

>>>

Help on function pearsonr in module scipy.stats.stats:

pearsonr(x, y)

Calculates a Pearson correlation coefficient and the p-value for testing

non-correlation.

===

Parameters

----------

x : 1D array

y : 1D array the same length as x

Returns

-------

(Pearson's correlation coefficient,

2-tailed p-value)

Detailed Answer on Stack Overflow

Q3. 50,000 Views

How can I sort an array in numpy by the nth column? e.g.

a = array([[1,2,3],[4,5,6],[0,0,1]])

I'd like to sort by the second column, such that I get back:

array([[0,0,1],[1,2,3],[4,5,6]])

**Answer:**

a[a[:,1].argsort()]

OR:

In [1]: import numpy as np In [2]: a = np.array([[1,2,3],[4,5,6],[0,0,1]]) In [3]: np.sort(a.view('i8,i8,i8'), order=['f1'], axis=0).view(np.int) Out[3]: array([[0, 0, 1], [1, 2, 3], [4, 5, 6]])

To sort it in-place:

In [6]: a.view('i8,i8,i8').sort(order=['f1'], axis=0) #<-- returns None In [7]: a Out[7]: array([[0, 0, 1], [1, 2, 3],

See full answer on Stack Overflow

Q4) 18,000 Views :

What is the best way to calculate inverse distance weighted (IDW) interpolation in Python, for point locations?

Answer:

This class Invdisttree combines inverse-distance weighting andscipy.spatial.KDTree.

See a more detailed answer on Stack Overflow

Q5) 10,000 Views

I have a list of more than 30 000 values ranging from 0 to 47 e.g.[0,0,0,0,..,1,1,1,1,...,2,2,2,2,..., 47 etc.] which is the continuous distribution.

PROBLEM: Based on my distribution I would like to calculate p-value (the probability of seeing greater values) for any given value. For example, as you can see p-value for 0 would be approaching 1 and p-value for higher numbers would be tending to 0.

I don't know if I am right, but to determine probabilities I think I need to fit my data to a theoretical distribution that is the most suitable to describe my data. I assume that some kind of goodness of fit test is needed to determine the best model.

Is there a way to implement such an analysis in Python (Scipy or Numpy)? Could you present any examples?

Answer:

There are 82 implemented distribution functions in SciPy 0.12.0. You can test how some of them fit to your data using their fit() method. Check the code below for more details:

- Toucan Toco unveils native integration for Snowflake
- Top trends in big data for 2021 and the future
- Common application layer protocols in IoT explained
- HYCU Protégé integrates Kubernetes data protection
- 10 Jenkins alternatives for developers
- Flow efficiency is one of the trickiest DevOps metrics
- Continuous delivery vs. continuous deployment: Which to choose?
- Camel case vs. snake case: What's the difference?
- Advice on intent-based networking and Python automation
- Risk & Repeat: Will the Ransomware Task Force make an impact?

Posted 3 May 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central