I listened once to an old professor talking about working out factorial ANOVA and multiple regression on paper, back in the day. He described a whole room full of papers, all in a particular order, every sum contributing to the next sum and if you got one number wrong the whole thing would be wrong and you'd have to start over. Adding machines made it go quicker, and when punch-card mainframes came they were, well, frustrating at times -- lots of graphic and emotional stories there! -- but you could get a big job done overnight. By the end of the nineties everybody had a computer on their desktop, mostly with a connection to the brand-new Internet. SAS, SPSS, and other statistical and data-management packages became crucial office workhorses, and don't forget Excel. "Data-crunching" became a word that everyone recognized. And now, Lord have mercy, this stuff is exploding. The race at the moment is between R and Python, and who in your office knows how to use either one of those? (And let's not argue about who the race is between!)
Through all of this technological change, people were doing their work. Whether it was the military or medicine, insurance or administration, people were taking in lots of data and trying to wring the meaning out of it, so they could make decisions. When some new technology came along you didn't re-invent your work assignment, you adapted the new methods to facilitate what you were doing already. Economists, statisticians, physical scientists, business, the military, each picked up a set of tools that worked for them, and focused on the aspects of data analysis that were most useful for their field.
The different disciplines separated into camps that could hardly communicate, and that is a great topic for another time. Today I want to focus on the immediate fact that, within a discipline, within an office, the same people kept their jobs and kept doing what they were doing, with new tools. You don't fire people because their skills are suddenly obsolete. Everybody has to change.
As a swarm-intelligence guy, I think of this in terms of a fitness landscape. We have a multimodal landscape here where you can accomplish a task in Excel, writing macros and pivot tables and hoping it doesn't interpret your numbers as dates; the product is robust, it's easy to hire people with Excel skills, and you can get professional support. Or you can whip out the same job using a readily-available and well-tested -- and free -- Python package, where when you hit a brick wall you turn to the forums. Each of these problem solutions exploits a local minimum in a search space -- the right Excel macros and office skills can accomplish the task, and the right Python skills and code can accomplish it, as well.
These are two valid ways to analyze data, in other words, and of course not the only ones -- I love SAS and have done everything in SAS for many years. SAS is a whole other approach, maybe the most solid of all but still, if you don't have SAS skills in the office already you're SOL there.
Between these two local minima -- Excel and Python -- I'd have to say that if you stepped back and evaluated the situation you would conclude that the Python package has the advantage over Excel. It is more versatile, more powerful, and it has a universe of open-source, shared tools around it, including powerful statistical and data-analytic methods. Also it has a bright future. Let me know when there is a "natural language tool-kit" for Excel! (BTW, if you like, you can substitute "R" for "Python" here, the case is the same.)
I think of change as optimization or algorithmic problem-solving. Offices typically use a sort of real-world gradient descent, an incremental approach to fine-tuning the way they do what they do, they improve and refine their methods over years. They pick up a trick here and there, the office tool adds a feature, and your office may be very near the best point in that local optimum.
But you, as a manager, realize that there is a deeper optimum in the problem space. And now your problem is to reorient your employees so that they can hop over the obstacles to the new optimal region. You are proud of how your group has improved their processes, but now you need to leap to a new way of doing things. It is not just a little different, it is very different. You need a "big change."
As usual, your two basic approaches are hiring and training. But let me note that your somewhat out-of-it, productive-but-borderline-obsolete office might not hold a lot of appeal for a person just out of college, jazzed up with fully up-to-date data-science skills. You might be able to bring in a contractor, but I think the best approach is to hire -- or turn to -- a motivated person with the background and potential to pick up new techniques, and give them the opportunity to prove themselves, to advance and learn in an exciting new field. And that means you have to invest in training, there is no way around that -- scan the headlines this month, compared to six months ago, and you will see that change itself is part of the essence of data science. Your people don't have to know every newest thing, but they need to be empowered to identify when there are new tools out there that can help in your workplace, and they need to have the perspective to bring new approaches and techniques on-board.
Now the big problem, the organizational problem, is changing the habits of people who have been bringing home a paycheck all these years and really don't care about your data-science voodoo.
Years ago I led a multi-year project to migrate a huge monthly survey from paper-and-pencil to tablet computers (handheld Windows machines). It was a big job, but we got 'er done, and when the new system was deployed to the field, the data collectors refused to use it. They printed out their paper forms and collected data the old way, then typed the whole thing into their tablet at the kitchen table.
This is going to happen to you if you try to get people to change. Exactly that. People are going to try to keep you off their backs, while they do what they already know how to do.
And I think that is great, it is perfectly normal. Let them rebel. Let them analyze their data in the spreadsheets they know, while the person in the next cube uses Python (or R) -- teach them all how to use Python for the task, but don't worry if most of them keep doing what they were already doing. Maybe you can insist that they give you some chart or table that requires Python, so they have to fire it up and run it occasionally. But you don't have to push.
Resistance is an inevitable part of change. There is nothing wrong with it, though your job is to keep a long-term view of your office and not let resistance win in the end. I say, let your people move at their own pace, with gentle encouragement from you; you guide them toward the goal you desire and accept that it might take some time. Trust me, they will notice that the person in the next cube gets their work done and has spare time on their hands. And at some point they will realize that keeping their job will depend on getting good with this new stuff. Change will happen. You can force people to adopt your shiny new method and it might work, but a little respect and tolerance will work, too. After some time you will discover that your office has painlessly made a transition. They will feel proud, will feel that it was something they did themselves, and your office will ratchet up to a new, higher level.
In the particle swarm algorithm, once one particle has discovered a new local optimum it will attract other particles to that region. The algorithm is based on the model of human social psychology, and it can work in human populations as well as in a computer. When people see that the "new way" is working for so-and-so over there, they will want to use it, too. It just won't make sense to keep doing things the hard way.
Somebody in your office is eager to learn something new. Identify that person, train them, give them the tools, work with them to meet your office needs in a new environment. Let them work in that environment, alongside your old-timers. You'll see, they'll change.