Subscribe to DSC Newsletter

I have four columns in my data set like below:

A       B      C       D

0      NA    NA    NA

0      NA    NA    NA

NA    1      NA    NA

NA    0      NA    NA

NA   NA     0      NA

NA   NA     1      NA

NA   NA    NA     0

NA   NA    NA     1

I want to merge all columns in one column and want the output as below:

A

0

0

1

0

0

1

0

1

How we can do this in R and python?

Views: 295

Reply to This

Replies to This Discussion

I think the easiest and quickest way would be to use the Numpy package for Python.  It performs operations on arrays very quickly.

You can enter the data into a Numpy array, or use the numpy.asarray() to put your Python list into an array:

x = numpy.asarray(a)   <---- where a is your Python list

next, use numpy.flatten() to take your 2-D array and put it into a 1-D array:

x = numpy.flatten(x)

The problem is that you still have a lot of "NA" values, which you want to get rid of.  The most readable way to remove them is to iterate over the array:

y = numpy.array([])   #an empty array that we'll put our cleaned data into

for value in x:

   if value isnan:

      pass

   else:

      y = numpy.append(y, value)

Numpy has isnan which checks to see if a value is "not a number".  If the value is a number, it gets appended to the array.  If you have a lot of data, you can sacrifice readability for speed and use some of Numpy's innate features.  We can use isnan on the entire flattened array to get an index of values that aren't numbers.

x = x[~numpy.isnan(x)]

~ is the "not" operator.  So what this does is create an array of x values that are numbers.

That should do it!

I'm a big proponent of using R packages data.table and dplyr (tidyverse) in collaboration:

library(data.table)
library(dplyr)

dt <- data.table(a=c(rep(0,2),rep(NA,6)),b=c(rep(NA,2),c(1,0),rep(NA,4)),c=c(rep(NA,4),c(0,1),rep(NA,2)),d=c(rep(NA,6),c(0,1)))

setnames(melt(dt)[!is.na(value)][,variable:=NULL],"value",names(dt)[1])[]

actually, for this exercise, dplyr isn't necessary.

steve miller said:

I'm a big proponent of using R packages data.table and dplyr (tidyverse) in collaboration:

library(data.table)
library(dplyr)

dt <- data.table(a=c(rep(0,2),rep(NA,6)),b=c(rep(NA,2),c(1,0),rep(NA,4)),c=c(rep(NA,4),c(0,1),rep(NA,2)),d=c(rep(NA,6),c(0,1)))

setnames(melt(dt)[!is.na(value)][,variable:=NULL],"value",names(dt)[1])[]

Hi,

I believe a simple solution would be to use foreach loop construct. Let us declare the dataset:

t1 = data_frame(
                          A=c( rep(0,2),rep(NA,6) )
                         ,B=c( rep(NA,2),1,0,rep(NA,4) )
                         ,C=c( rep(NA,4),0,1,rep(NA,2) )
                         ,D=c( rep(NA,6),0,1)
)

Now what I want to do is iterate over all rows of the table, select not NA value in each row, and combine the results from each row into a column (or vector). Here's the code:

t1$column = foreach(i=seq( 1 ,t1 %>% nrow() )
                                 ,.combine='rbind') %do% {

                                     temp = t1[i,

                                                      -(t1[i,] %>% is.na() %>% which())

                                                     ]
                                     names(temp) <- c("value")
                                     return(temp)


                     }

The drawback is the need to rename the selected column, because rbind will not accept different column names when combining rows into table. So the idea is to go row after row, and each time filter the non null value with which function.

nice answer. you wanna use array operations. numpy's an elegant and efficient library.

John Swanson said:

I think the easiest and quickest way would be to use the Numpy package for Python.  It performs operations on arrays very quickly.

You can enter the data into a Numpy array, or use the numpy.asarray() to put your Python list into an array:

x = numpy.asarray(a)   <---- where a is your Python list

next, use numpy.flatten() to take your 2-D array and put it into a 1-D array:

x = numpy.flatten(x)

The problem is that you still have a lot of "NA" values, which you want to get rid of.  The most readable way to remove them is to iterate over the array:

y = numpy.array([])   #an empty array that we'll put our cleaned data into

for value in x:

   if value isnan:

      pass

   else:

      y = numpy.append(y, value)

Numpy has isnan which checks to see if a value is "not a number".  If the value is a number, it gets appended to the array.  If you have a lot of data, you can sacrifice readability for speed and use some of Numpy's innate features.  We can use isnan on the entire flattened array to get an index of values that aren't numbers.

x = x[~numpy.isnan(x)]

~ is the "not" operator.  So what this does is create an array of x values that are numbers.

That should do it!

Here is how you can do it in R using just a one-line code.

apply(yourData, 1, function(data) { data[which(!is.na(data))] })

You can also type 2 if you wanna do the operation columnwise.

I would be happy to hear if it worked for you.

Thank you every one for your help. I am able to do achieve this row merge in R as well as in Python. 

Reply to Discussion

RSS

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2018   Data Science Central™   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service