Hi, I am a B. E fresher. I have joined in a Data Science team and doing LDA related works and I am very much interested in data science. Am i in to data science too soon ? Someone in my team said, In a long run it will affect my career because I wont get any business knowledge. Is that true ?. Thanks in advance for your reply.
No such thing as "into data science too soon". Everyone should have a basic fluency in statistical thinking. That said, the learning curve for data science is very steep and long and hard. The business or functional end can always come later.
I will give you an example: I came in with zero exposure to reliability or maintenance or wind engineering, and in under a month, I was able to have a technical debate around operations and maintenance, with very sr. wind engineers (stakeholders) who have been doing their thing for 20+ years, and succeeded in changing their operational plan and perceptions around the economics of their business. My work effectively revised their entire annual operating plan/budget based on a spare parts forecast that was tested in real time, and turned out to be quite accurate several months in advance. I was able to tease out (blind) insights such as the fact that they used two different vendors for a particular (major) electronic component in a wind turbine, based purely on the shape of a bi-modal life-cycle distribution, with no knowledge of the actual vendors they sourced. I was able to take their data and work out that the majority of their part failures were due to a specific brand of electronic components, and my statistical model of the gearbox (most expensive component) was blind tested against a physical/engineering model of the gearbox (forecasted curves matching almost perfectly), showing that I could see the wear out occurring months in advance, with only the data, and no intimate engineering knowledge of the gearbox itself. Needless to say, the actual engineers in the room were stunned that I knew what was going on with their operations, without having ever seen a physical wind turbine. I should point out that it wasn't easy to forecast those failures either -- you aren't just taking a bunch of points and drawing a line through them. It required that I reverse-engineer the life-cycle of the fleet itself, based on back of the envelope calculations of the theoretical performance characteristics of these turbines based on OEM marketing materials. That's how we showed that not only could I forecast their failures but that I could work out their existing operating budget based on OEM marketing materials as a benchmark for how under-budgeted they were going to be. That required a lot of hard thinking to get to -- because they don't teach you how to do that in any class. When you go into a room full of skeptics who want you to fail, don't expect them to give you their most prized data up front. Sometimes, you have to reverse engineer things because of the politics of a situation, and not because "data is not available".
I disagree with all these posts who believe generalists can't be great data scientists; I would argue a data scientist, by virtue of their statistical thinking and applied computational skills, must be a generalist able to parachute into a random domain, learn the domain, and apply their knowledge to the domain, to effect a difference that is measurable/tangible to the bottom line of a business.
This has nothing to do with coding skills or whatever the popular hype around data science would have you believe. It is the ability to think about and structure problems which are inherently unstructured when they are presented to you. That ability to work out principles, laws, or rules from an otherwise unstructured situation, is the essential skill of a data scientist. The second most important thing is to be able to quantify the business value of any such insights generated. No one acts on your recommendations unless it is worth something to for them to change the status quo.
This is what it takes to compete in the top 1% of data scientists. Notice how I never refer to stochastic gradient descent or deep learning or python even once. Oops, I just did. Oh well.
*footnote: The Sr. stakeholder in the group point blank asked me at one point "How did you get our operating curve? I didn't provide that data... [and proceeded to confer with the other engineers in the room, who all nodded in approval as to its accuracy]" to which I replied "I reverse engineered it from [...]". That is how you change the hearts and minds of skeptical customers/engineers who secretly want you to fail, because they believe you, a n00b to their business, don't know as much as they do about their assets, and they don't want to spend any money changing anything.
A shorter answer to your question: A great data scientist will often develop more intimate knowledge of a business than the stakeholders in the business.