5 Of the Most Viewed Scipy and NumPy Questions with Problems on Stack Overflow

Q1) 85,000 Views

I wonder if there is a direct way to import the contents of a csv file into a record array, much in the way that R's read.table(), read.delim(), and read.csv() family imports data to R's data frame? Or is the best way to use csv.reader() and then apply something like numpy.core.records.fromrecords()?


You can use Numpy's genfromtxt() method to do so, by setting the delimiter kwarg to a comma.

from numpy import genfromtxt my_data = genfromtxt('my_file.csv', delimiter=',')

More information on the function can be found at its respective documentation.

See full answer on Stack Overflow

Q2: 67,000 Views

Calculating Pearson correlation and significance in Python

I am looking for a function that takes as input two lists, and returns the Pearson correlation, and the significance of the correlation.


You can have a look at scipy: http://docs.scipy.org/doc/scipy/reference/stats.html

from pydoc import help

from scipy.stats.stats import pearsonr



Help on function pearsonr in module scipy.stats.stats:

pearsonr(x, y)

Calculates a Pearson correlation coefficient and the p-value for testing





x : 1D array

y : 1D array the same length as x



(Pearson's correlation coefficient,

 2-tailed p-value)

Detailed Answer on Stack Overflow

Q3. 50,000 Views

How can I sort an array in numpy by the nth column? e.g.

a = array([[1,2,3],[4,5,6],[0,0,1]])

I'd like to sort by the second column, such that I get back:





In [1]: import numpy as np In [2]: a = np.array([[1,2,3],[4,5,6],[0,0,1]]) In [3]: np.sort(a.view('i8,i8,i8'), order=['f1'], axis=0).view(np.int) Out[3]: array([[0, 0, 1], [1, 2, 3], [4, 5, 6]])

To sort it in-place:

In [6]: a.view('i8,i8,i8').sort(order=['f1'], axis=0) #<-- returns None In [7]: a Out[7]: array([[0, 0, 1], [1, 2, 3],

See full answer on Stack Overflow

Q4) 18,000 Views :

What is the best way to calculate inverse distance weighted (IDW) interpolation in Python, for point locations?


This class Invdisttree combines inverse-distance weighting andscipy.spatial.KDTree.

See a more detailed answer on Stack Overflow

Q5) 10,000 Views

I have a list of more than 30 000 values ranging from 0 to 47 e.g.[0,0,0,0,..,1,1,1,1,...,2,2,2,2,..., 47 etc.] which is the continuous distribution.

PROBLEM: Based on my distribution I would like to calculate p-value (the probability of seeing greater values) for any given value. For example, as you can see p-value for 0 would be approaching 1 and p-value for higher numbers would be tending to 0.

I don't know if I am right, but to determine probabilities I think I need to fit my data to a theoretical distribution that is the most suitable to describe my data. I assume that some kind of goodness of fit test is needed to determine the best model.

Is there a way to implement such an analysis in Python (Scipy or Numpy)? Could you present any examples?


There are 82 implemented distribution functions in SciPy 0.12.0. You can test how some of them fit to your data using their fit() method. Check the code below for more details:

See a more detailed answer on Stack Overflow

Views: 5729


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service