Subscribe to DSC Newsletter

Combining data from different sources into a data set

I am currently working on a project, where for some analysis I need to combine data from different advertising sources such as Google Analytics, Facebook Ads, Google Ad words, Bing Ads and CRM into a single data set. 

I have integrated all these sources into a single ware house but struggling to extract important information from all these sources and combine all of it in a single data set.

Is there any standard procedure or tool that can help me to do this ?

Thanks.

Views: 2017

Reply to This

Replies to This Discussion

I guess that you need the functions cbind() or merge(). Use cbind() if you have same row lengths in the data sets you have to bind. Use merge() if you have data sets that share at least one column (these need not have same length). See here and here

As you are collecting the same kind of data from different resources , they should have some common attribute which can be figured out by looking at data.

It will be good if you can share some sample data.

I believe there is no standard way to merge data from different resources. Do some pre processing before merging data in single data set. For example :  Same data with different names on different resources can be mapped to single attribute in your new data set.

Definitely as Pablo suggested cbind(), rbind() or merge() function will help for merging data in single data set.

Totally agree with Neeraj. There's no standard in the industry. It always depends on what's KPIs you value the most and what this project is achieving.

I guess there are unique userIDs across all platform. If this is the case, merge is absolutely necessary.

Otherwise, impressions, clicks, spend and any engagement events are always good to include, as well as CPM, CPC, CVR, etc. You also want to make sure the columns are the ones you are looking for before rbind(). There are lots of columns with similar naming in each platform.

Why not use Python? It's generally the better choice for manipulating/merging data from multiple sources. Then use R to do your analysis.

R, SQL, or Python or you can use a visualization tool like Tableau



Dorothy Hewitt-Sanchez said:

The trick is you need to know or have some idea of what valuable information you are looking for.  If you do not know that, then you need to know how to spot trends.  You need to know what the data is telling you.  You need the art of discovery.   Ask yourself, “What am I looking for?” or “Let’s shape the data and see what it shows me”

RSS

Videos

  • Add Videos
  • View All

Follow Us

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service