Subscribe to DSC Newsletter

Combining data from different sources into a data set

I am currently working on a project, where for some analysis I need to combine data from different advertising sources such as Google Analytics, Facebook Ads, Google Ad words, Bing Ads and CRM into a single data set. 

I have integrated all these sources into a single ware house but struggling to extract important information from all these sources and combine all of it in a single data set.

Is there any standard procedure or tool that can help me to do this ?

Thanks.

Views: 1602

Reply to This

Replies to This Discussion

I guess that you need the functions cbind() or merge(). Use cbind() if you have same row lengths in the data sets you have to bind. Use merge() if you have data sets that share at least one column (these need not have same length). See here and here

As you are collecting the same kind of data from different resources , they should have some common attribute which can be figured out by looking at data.

It will be good if you can share some sample data.

I believe there is no standard way to merge data from different resources. Do some pre processing before merging data in single data set. For example :  Same data with different names on different resources can be mapped to single attribute in your new data set.

Definitely as Pablo suggested cbind(), rbind() or merge() function will help for merging data in single data set.

Totally agree with Neeraj. There's no standard in the industry. It always depends on what's KPIs you value the most and what this project is achieving.

I guess there are unique userIDs across all platform. If this is the case, merge is absolutely necessary.

Otherwise, impressions, clicks, spend and any engagement events are always good to include, as well as CPM, CPC, CVR, etc. You also want to make sure the columns are the ones you are looking for before rbind(). There are lots of columns with similar naming in each platform.

Why not use Python? It's generally the better choice for manipulating/merging data from multiple sources. Then use R to do your analysis.

The approach used in 

   www.executable-english.com/demo_agents/EnergyIndependence1.agent

may be useful.  The basic idea is to document the meaning of data from each diverse source as sentences in executable English.  Then write further rules that simultaneously define merged meanings and act as a self-explaining app.  The platform for this is live online at executable-english.com , and shared use is free.  Here is a summary slide

www.executable-english.com/internet_business_logic_in_a_nutshell.pdf

and a short paper

www.executable-english.com/A_Wiki_for_Business_Rules_in_Open_Vocabu...

R, SQL, or Python or you can use a visualization tool like Tableau



Dorothy Hewitt-Sanchez said:

The trick is you need to know or have some idea of what valuable information you are looking for.  If you do not know that, then you need to know how to spot trends.  You need to know what the data is telling you.  You need the art of discovery.   Ask yourself, “What am I looking for?” or “Let’s shape the data and see what it shows me”

Reply to Discussion

RSS

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service