Q1) 85,000 Views
I wonder if there is a direct way to import the contents of a csv file into a record array, much in the way that R's read.table(), read.delim(), and read.csv() family imports data to R's data frame? Or is the best way to use csv.reader() and then apply something like numpy.core.records.fromrecords()?
You can use Numpy's genfromtxt() method to do so, by setting the delimiter kwarg to a comma.
from numpy import genfromtxt my_data = genfromtxt('my_file.csv', delimiter=',')
More information on the function can be found at its respective documentation.
Q2: 67,000 Views
Calculating Pearson correlation and significance in Python
I am looking for a function that takes as input two lists, and returns the Pearson correlation, and the significance of the correlation.
You can have a look at scipy: http://docs.scipy.org/doc/scipy/reference/stats.html
from pydoc import help
from scipy.stats.stats import pearsonr
Help on function pearsonr in module scipy.stats.stats:
Calculates a Pearson correlation coefficient and the p-value for testing
x : 1D array
y : 1D array the same length as x
(Pearson's correlation coefficient,
Q3. 50,000 Views
How can I sort an array in numpy by the nth column? e.g.
a = array([[1,2,3],[4,5,6],[0,0,1]])
I'd like to sort by the second column, such that I get back:
In : import numpy as np In : a = np.array([[1,2,3],[4,5,6],[0,0,1]]) In : np.sort(a.view('i8,i8,i8'), order=['f1'], axis=0).view(np.int) Out: array([[0, 0, 1], [1, 2, 3], [4, 5, 6]])
To sort it in-place:
In : a.view('i8,i8,i8').sort(order=['f1'], axis=0) #<-- returns None In : a Out: array([[0, 0, 1], [1, 2, 3],
Q4) 18,000 Views :
What is the best way to calculate inverse distance weighted (IDW) interpolation in Python, for point locations?
This class Invdisttree combines inverse-distance weighting andscipy.spatial.KDTree.
Q5) 10,000 Views
I have a list of more than 30 000 values ranging from 0 to 47 e.g.[0,0,0,0,..,1,1,1,1,...,2,2,2,2,..., 47 etc.] which is the continuous distribution.
PROBLEM: Based on my distribution I would like to calculate p-value (the probability of seeing greater values) for any given value. For example, as you can see p-value for 0 would be approaching 1 and p-value for higher numbers would be tending to 0.
I don't know if I am right, but to determine probabilities I think I need to fit my data to a theoretical distribution that is the most suitable to describe my data. I assume that some kind of goodness of fit test is needed to determine the best model.
Is there a way to implement such an analysis in Python (Scipy or Numpy)? Could you present any examples?
There are 82 implemented distribution functions in SciPy 0.12.0. You can test how some of them fit to your data using their fit() method. Check the code below for more details: