Subscribe to DSC Newsletter

10 questions about big data and data science

Open ended-questions. What are your thoughts, how would you answer the following questions? Feel free to select a few of these questions (or create your own) and provide an answer in the "Comments" section. I'll add my own answers in the next few days.

  1. Should companies embrace big data? Which ones (start-ups, big-companies, tech companies, retail, health care)? And how? Using vendors, outsourcing or by hiring employees? And how do you measure ROI on big data? Should they use redundant data to consolidate KPI's?
  2. What do you consider to be big data? I tend to think of big data as anything 10 times larger (in terms of megabytes per day) than the maximum you are used to. Also, sparse data might not be as big as they look, can be costly to process. Is there a price per megabyte, for big data storage, big data transfers, and big data analytics?
  3. How did you become interested in data science?
  4. What is the difference between data science, statistics, machine learning, and data engineering? Do you think an hybrid role (cross-disciplines) would be helpful (helpful to small companies, or helpful to the analytic practitioner as it opens up more job opportunities?
  5. What kind of training do you recommend for future data scientists? Any specific program in mind?
  6. How to get university professors more involved in teaching students how to process real live, big data sets? Should curricula be adapted, outdated material removed, new material introduced?
  7. During my first year in my PhD program, I worked part-time for a high-tech small company, in partnership with my stats lab. This was a great experience - being exposed to the real world, and decently paid to do my PhD (in Belgium in 1988). How to encourage such initiatives in US?
  8. Besides Hadoop-like and graph database environments, do you see other technology that would made data plumbing easier for big data?
  9. Does it make sense to try to structure un-structured data (using tags, NLP, taxonomies, etc.)
  10. Can you tell me 5 business activities that would benefit most from big data, and 5 that would benefit least?

Thought leaders are invited to answer these questions and submit their interview with Dr. Granvlle, to [email protected] Interviews will be published on DSC, and featured in our weekly digest.

Views: 7571

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Bosco Albuquerque on February 20, 2014 at 3:40pm
  1. (a) Should companies embrace big data? (b) Which ones (start-ups, big-companies, tech companies, retail, health care)? (c) And how? (d)Using vendors, outsourcing or by hiring employees? (e) And how do you measure ROI on big data? (f) Should they use redundant data to consolidate KPI's?

I think these are very interesting questions. I'll answer them one at a time.

(a) Companies should embrace all data. Where skills/technology lack, ROI should be examined to determine whether it is worth their while to explore new data frontiers.

(b) All companies should should all data with the same rationale mentioned in (a)

(c) The approach should be one from Business Strategy. Do we have the data to support business decisions? What data do we need to help make better decisions? What are the opportunities for optimizing business strategy in the areas of vendor negotiations, cost savings, profits, employee retention, hiring the best talent, customer acquisition etc.

(d) A mix of internal domain experts, data scientists, and technology infrastructure experts. Where the role goes missing internally, look to hire permanent staff, but hire contractors when needed. Depending on the situation it may be necessary to hire a specialist consulting company.

(e) I've seen ROI being measured quantitatively in terms of how data has driven decisions which has shown a specific dollar figure in terms of cost savings due to vendor negotiations

(f) Additional parameters/indicators may need to be defined. I'd lean towards keeps things separate i.e. more detail the better.

Comment by Khurram on February 14, 2014 at 3:17am

We cannot measure or weigh in terms of unit for big data , an excerpt from a book "Doing Data science" written by Rachel Schutt & Cathy O'Neil

"Constructing a threshold for Big Data such as 1 petabyte is meaningless because it makes it sound absolute. Only when the size becomes a challenge is it worth referring to it as “Big.”So it’s a relative term referring to when the size of the data outstrips the state-of-the-art current computational solutions (in terms of memory, storage, complexity, and processing speed) available to handle it."

As far as concern about sparse data may be costly or not , depends on which problems/trend you are after e.g. Marketing for a single a product A,B or C or combination of all,  finding a Telco bundles charm for all Customers or some specific brackets customer , Customer whether from retailer or should be from wholesale? Depends  what you want to solve. Yes there should be price for big data storage (Cloud computing) , transformation from Cloud to yours designed model and vice versa would require data washing and traversing right data for yours right problem all requires costing.

For the second part I tried my best to contribute , correct me please if I am wrong. You have got very good stuff in yours question and contribution to these question will gear ones skill to become a Data Scientist.

Comment by Big Data Queen on February 10, 2014 at 1:35pm

Vincent great questions when considering Big Data. When considering a big data strategy, I think it's worth mentioning HPCC Systems from LexisNexis. Designed by data scientists, HPCC Systems is an open source data-intensive supercomputing platform to process and solve Big Data analytical problems and can help companies derive actionable insights from their data.

HPCC Systems provides proven solutions to handle what are now called Big Data problems, and have been doing so for more than a decade. The main advantages over other alternatives are the real-time delivery of data queries and the extremely powerful ECL language programming model. More info at http://hpccsystems.com

Comment by Alex J. Caffarini on February 10, 2014 at 7:48am

To your first question about whether companies should embrace big data, it really depends. What is the company trying to accomplish? All the data in the world is useless if companies don't know what they want to do with it. A company's adoption of Big Data should evolve. It should start with a small project and then expanded as learnings go on. For example, a restaurant chain might start with a summary of top selling menu items in its stores to gauge an idea of ingredients that are popular with patrons. Then they can look at their locations to see if those tastes are regional. Learnings from that can then dictate the use of Big Data for deeper things like selection of vendors, quality control, etc. 

How to use Big Data is another major concern. If a company has no Big Data expertise in-house, it may be difficult to know what it wants to accomplish with it, how to use it, how to hire for it, or to whom they could outsource. In such a case, a company may want to talk to executive recruiters with experience placing Big Data and Data Science professionals. They can alert these companies as to the types of skill sets that are available. Then the company may want to talk to a consultant for the sole purpose of problem definition. Another consideration for outsourcing or in-house - security of the data. If the data is sensitive, keep it in-house. Also, if the local talent pool is replete with experienced data science professionals (as it is in most major metropolitan areas), keeping it in-house might be best. If the local talent pool is lacking of these professionals, consider outsourcing or remote workers. Vendors would be best for short-term Big Data projects or for supplementary data (e.g., demographic data, etc.).

ROI on big data should be measured based on what the company wanted to accomplish. If a restaurant chain used Big Data to determine what ingredients were most in demand with customers' needs, then did the actions they took from the learnings result in fewer ingredient out-of-stocks? Increased sales? Fewer sales of less popular items? KPIs should be specific to the Big Data project at hand. 

  

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service