Subscribe to DSC Newsletter

Latest update: November 16, 2018

Microsoft Excel has been around for over 30 years now, and chances are it's not going to change in the foreseeable future. In fact, Excel is facing immense competition from challengers such as Google Spreadsheets and well-funded start-ups like Airtable, which are both going after Excel's massive user base of approximately 500 million worldwide. Tech-savvy small and mid-sized businesses embrace innovative alternatives to Excel. However, making a dent in the large enterprise space is a whole different ballgame. It's nearly impossible to rip and replace Excel along with Office 365 and the massive underlying infrastructure, which is serving ever-growing workloads. Not that it can't happen, but it might take decades.

The world runs on Excel, whether we like it or not

I recently spoke to a large consultancy which serves an enterprise with revenues north of 100 billion USD annually: "They run entirely on Excel." as one of the consultancy's top executive jovially confessed. This is not an exception. Let's take as an example a bank holding trillions of USD assets: How do they manage their credit risks? The chances are that their thousands of risk managers are holding one of the most valuable assets buried in hundreds of thousands, if not even millions of spreadsheets, scattered across the entire organization.

This is indeed no exception. Insurance companies? Pharmaceuticals? Retail? I would make an educated guess here, that very likely trillions of USD in annual revenues globally literally run through billions of spreadsheets.

How could things even get this far? Maybe Excel is being misused as a temporary quick-fix which has never matured into a permanent, enterprise-grade solution.

Enterprise-grade, Mezzogiorno style: Living with temporary solutions for ever

The Temple of Hera at Selinunte (Temple E) | Source: Wikipedia (image released into the public domain by its creator)

Southern Italy, commonly referred to Mezzogiorno, is amongst the most beautiful, yet poorest and worst-governed parts of Europe. A good friend of mine, who lives in Sicily, enjoys an enviably vast number of sunny days, beautiful sceneries and mouth-watering cuisine. To wine aficionados' delight, some of the world's best wineries are located in this region. On the flip side though, daily life can be harsh. "This whole region is depending on temporary, quick-fix type of infrastructure, which is never going to be replaced with permanent solutions. It's a miracle if things don't break." so said my dear friend over a bottle of red wine from east-Sicily. But what if they do?

It might sound like a gross exaggeration at first glance. But are IT departments acting that differently from officials in Sicily operating at the verge of possible a collapse? What could go wrong? A lot. Oracle, historically a life-or-death Microsoft competitor, put together a list of biggest business failures due to Excel misuse. The so-called "London Whale" incident, for example, caused JPMorgan a whopping 6.5 billion USD in losses and fines. The bank's risk analysts miscalculated the downside of its synthetic credit portfolio as a result of excessive formula-shuffling in Excel.

Excel was introduced in 1987. People were using fax machines back then

I bought my first laptop in 1992. It came with a back then great feature: It allowed me to send word documents via my telephone line to the recipient's fax machine. That was a great invention, which allowed me as a former copywriter to submit my work to my boss, who would manually review the printouts and send them back via fax. Excel was five years old back then, and I guess most people have not even heard of the internet yet.

I started using Excel approximately five years ago, and I quickly turned into a power user. To Excel's rescue, I have to admit that I am still joyfully using it on a daily basis. It has a wealth of features, many of which even seasoned Excel users are not aware of. However, we should take a hard look at Excel: Is this still the all-purpose solution for corporate data? It's certainly not.

Excel's role in the enterprise needs to be redefined

First and foremost: Is Excel the right place to hold mission-critical data? Large enterprises tend to have highly formalized, very tight IT security policies. However, on the other hand, the misuse of spreadsheets as the primary engine for multi-billion USD lines of businesses is entirely out of control. This is very much like owning a house within a gated community, protected by various alarm systems, and then handing over the keys to your teenage kids over the weekend – leaving a richly equipped bar unlocked and your jewelry and life's savings in cash stuffed into a cardboard box on the kitchen table.

However, what other options do business users have?

It's hard to imagine a corporate world without spreadsheets. Very much like it's hard to imagine a teenage house party without their parents' wine & whiskey collection up for grabs to everyone who shows up at the doorstep.

Your bank though, in case they manage their trillion USD crown jewels via spreadsheets, does so quietly. Consultancies helping large enterprises overcome their spreadsheet-addiction act just as secretively as the Betty Ford Center trying to protect their rich and famous clientele from reputational damage. The stakes are high.

Python is for data scientists. Is it?

Most companies I came across still bucket their knowledge-working staff into two categories: those with coding skills and those without them. Due to Evans Data Corporation EDC, there are roughly 23 million software developers worldwide. How many data scientists are out there? One can only estimate, for example, based on the total Jupyter project user base of presently 3 million people. Even if we round up the numbers generously to 30 million people with programming skills, this still represents less than 1% of the world's total adult population. This should ring everyone's alarm bells. Literally, everyone's.

Lack of programming skills is nothing else but illiteracy

Around the time when book production started to grow, roughly 12% of the adult world population was able to read and write. Today, around 86% of all adults worldwide are literates. At the same time, less than 1% of the total adult population is capable of reading and writing code.

"While the earliest forms of written communication date back to about 3,500-3,000 BCE, literacy remained for centuries a very restricted technology closely associated with the exercise of power." Ourworldindata.org

Literacy is associated with the exercise of power

The accumulation of wealth in the hands of few individuals and corporations can be well observed in rankings provided by Forbes and Fortune. An ever-growing amount of fortunes is being made of technology. However, is there a connection between technology literacy and the accumulation of wealth and power? There is not much hard evidence to answer that question, as this subject is not well researched yet.

In spite of a lack of research: What assumptions can we make regarding coding skills in the business world?

Want to empower business users? Teach them to code

What had happened, if Amazon Alexa was invented in the 16th century, when only less than 12% of the population was able to read and write? There would probably be little incentive for people to learn how to communicate in written form. Why invest time and effort into mastering the skill of writing tiny, little signs on a piece of paper when you can yell: "Alexa! What's the weather gonna be in LA tomorrow?" Of course, it's convenient. However, convenience comes at a high price. Also, as users get accustomed to the comfort of yelling and mumbling their commands into a smart device, they grow increasingly scared of the foundational skill of reading and writing: "It must be very tough to master …"

Is coding in Python hard? Not harder than writing in natural language

Python is a high-level language. It's meant to be readable by humans and not surprisingly most Python tutorials start with writing print("hello world"). Guess what print("hello world") does? It prints the two worlds hello world. The process of displaying those two words on your computer screen is abstracted away. This is why Python is also called a declarative language. You, the author of your code, declare what you want the program to do: print hello world, perform a calculation or visualize data. The rest is being taken care of. An imperative programming language lets you describe the control flow, meaning: You have to write how exactly the program is going to execute printing hello world, perform a calculation or visualize data. By comparison, this is like driving a car with an automatic gear shift (declarative) vs. one with a manual gearbox (imperative).

High level, declarative languages are making programming incredibly easy. Yet, less than 1% of the world's adult population is capable of writing and reading code. Not surprisingly,  businesses start discovering Python as an alternative to Excel for their mission-critical workloads. JPMorgan just recently made headlines with their effort to teach their analysts to code, very likely thousands of employees across the entire organization. Yes, the same JPMorgan which lost 6.5 billion USD in the course of the "London Whale" disaster. Maybe they learned their lessons that not everything that can be done in Excel should also be done in Excel?

If you have ever written an Excel function, you can do the same in Python and much more

Getting started with Python is just as easy as picking up Excel. Every Excel user knows how easy it is to sum up numbers in a column. Just write =SUM and then select the cells you want to sum up. In case you want to sum up cells which meet specific criteria, you can write an =IF or =IFS statement. All this can be done in Python as well. As an advanced Excel user, you might ask yourself: "Why should I learn Python if I already know how to do the same in Excel?" That's an excellent question, and I kept asking it myself back a few years ago. Here's my answer:

#1 Mastering Excel is harder than wrapping your head around Python

Excel is not just one product, but three products in one:

  • Excel as you know it with its worksheets and function language
  • Power Query for mashing up and cleansing data
  • Power Pivot as the calculation engine for Pivot Tables

If you want to work with spreadsheets, clean up your data and perform advanced calculations on multiple tables connected into one data model - sure you can do that in Excel (surprisingly, most advanced Excel users are not aware of that). However, you have to juggle around with three different UIs and three separate languages: Excel's regular function language, M-Language in Power Query and DAX (Data Analysis Expressions) in Power Pivot. Oh, did I mention VBA?

I find it more convenient to do all of the above in one language with one consistent syntax and semantics. By the way, why do I prefer Python over R? Because Python is easier to learn. Ease of use matters a lot to me.

#2 People around you can understand what you write in Python

Data scientists are often left clueless when business users throw their Excel spreadsheets over the fence. Even as a business user, it's hard to read and interpret someone else's spreadsheet. Now imagine a data scientist, who has never worked with Excel spreadsheets.

Many data analysts and data scientists use Jupyter Notebooks. I call it "Word for Geeks" because Jupyter Notebooks allows you to tell your story in natural language using headlines and body text like in a Word document, and in addition to that you can write and execute code in the same place.

Once I discovered Jupyter Notebooks, I instantly fell in love with it, because it allows me to communicate my ideas in natural language based on data I prepare, analyze and visualize in Python – all in one document. If I hit the wall, I can share my Jupyter Notebook with a data scientist, and he can just read through my document, understand my data analysis and pick up where I left. Once he's done with his work, I can read through it and see what he has done.

Business users and data scientists finally united in love!

#3 Python is open source, and the community is very active and creative

I can't think of a data related problem that I would not be able to solve in Python in conjunction with one of its libraries. Whether it's extracting data from Excel, cleansing data, performing calculations, visualizing data or utilizing various APIs - there is certainly a package out there.

In a future not so distant from now, I envision analysts within banks, for example, utilizing Python packages built internally specifically for their proprietary risk assessment. No more murky Excel functions nobody other than its creator comprehends. Just one repository with one centrally maintained set of Python packages for the bank's internal risk analysis and assessment.

"But didn't you say open source?" some concerned voices might ask. Open source should not be misinterpreted as "open door". You can apply any security level you want and thus ensure that your packages with your proprietary risk models are only available to a selected group of people within your organization and nobody else.

#4 Utility grade is where Python shines and not Excel

There is a growing number of data analytics and data science PaaS solutions which are built with petabyte-scale, highly collaborative use cases in mind such as Cloudera Altus, Microsoft Kusto and T-Systems Data Science Workstation.

If your business requires you to crunch vast amounts of data in near real time, and if you are increasingly faced with semi-structured and unstructured data, you are better served with Python, Jupyter Notebooks and a big data cluster within one of the aforementioned PaaS solutions.

#5 You want to keep your job

Due to recent McKinsey research, around 300 million workers need to get retrained to meet the requirements of job markets by 2030. I wrote two blog posts on transitioning from a line of business career into data analytics and data science in my previous posts.

Programming is all about automation. The more companies automate their processes, the less they need to rely on people performing mundane tasks. "Do you want to keep your job?" it all comes down to this question. If your answer is "Yes," you should consider learning to code. Python is a good starting point.

I work as a data literacy expert, and I cater to large companies in Europe and the US. Maybe you have questions I didn't answer in my write-up? Please leave a comment or reach out to me via email [email protected] or LinkedIn.

Views: 6040

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Ramesh Gopal on October 27, 2018 at 7:52pm

Hi Rafael,

I agree with what you're saying, and also what the other commenter has said.

Many consultants, IT experts and even risk managers have pointed out for decades about how fragile spreadsheets are, and the risks posed to businesses who rely on Excel (or spreadsheets in general), without adopting proper safeguards.

However, Excel is the ultimate business software tool. Its very weaknesses, which you clearly point out in your article, are also its enduring strengths. It will be near impossible to dislodge Excel from its current position as it fills an important niche or gap in enterprises as THE information tool which runs business workgroups, departments and divisions, especially when corporate IT takes ages to deliver functionality end-users want.

And trying to teach programming to existing workers isn't likely to succeed in a big way for several reasons.

One, not everyone has a brain wired to "get" code. Studies have shown that you need a certain type of abstract thinking capabilities to understand programming.

Two, those in operations who are taught to code will still find coding time-consuming and frustrating if the skill is not exercised day-in, day-out. Developers do this as part of their jobs, but operations people have real work to do, in addition to using Excel as a data and model repository.

Three, daily work pressures and deadlines also mean that enterprises aren't likely to send their personnel for programming training en masse.

However, the future is bright for the coming generation. They're likely learning programming in school and such fundamental skills and concepts, learnt at a young age, will typically persist when they enter the workforce.

So what is called for today is pragmatism, and figuring out how to co-exist with spreadsheets, while trying to minimize the damage done by spreadsheets gone bad.

Comment by Ulf Morys on October 14, 2018 at 12:15am

Hi Raffael, great post - and I absolutely second the observations.

One thing, though, merits to be mentioned explixitly: the core of the Excel problem is simply the fact that, at no point, there is a clear seperation of data from logic in Excel sheets. You see results - which is usually good. But if you don't see what you expected, it's very tedious to identify if (a) there really is an unexpected result  or (b) there is a problem with the data or (c) a problem with the logic.

The reason is obvious and simple: any Excel cell may hold either data...or a function and thus logic.

I have stopped counting the sheets and models I have seen where just a single, accidental copy&paste or data entry has produced huge confusion and loss of time.

The beauty of Pyhton (...and Pandas, I suppose)  in combination with Jupyter Notebook (...we seem to have fallen for the same solution): You start out with a clear, pure data set, you apply a clear logic (aka run the data thru the code) and end up with data again.

The approach seperates clearly data from logic - and the logic becomes better understandable and more clearly documented and reproducable with the Notebook.

Really, if I had to express the advantage of Python over Excel in a single statement it would be just this: Seperation of logic from data.

Videos

  • Add Videos
  • View All

Follow Us

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service