How to Make Black-box Systems more Transparent

This article is intended to users relying on machine learning solutions offered by third party vendors. It applies to platforms, dashboards, traditional software, or even external pieces of code that are too time-consuming to modify. One of the goals is to turn such systems into explainable AI.

The Problem I Faced

I frequently write code to solve complicated problems, such as in my recent shape-fitting article. It involves integrating libraries or external code written by various authors. Many times, the third-party functions deal with optimization problems. I call it my “back end”. Typically, it involves well-known algorithms working more or less decently depending on your input data. I don’t have the time to write such algorithms from scratch. First, existing solutions are good enough and have survived the test of time. Then I would otherwise be likely to reinvent the wheel and come up with a solution inferior to what exists already. Finally, the code in question is usually not simple. Sometimes, I don’t even really have access to it: this happens when it is buried in some libraries.

In short, my “back end” code is essentially an external black box. Here, I describe how to make the best out of such systems. In particular, I discuss how to better understand how they work and how to test them on synthetic data to identify their limits and their strengths. In some ways, this is about reverse-engineering your black box. Of course, the first step is to get the black box that best fits your needs. You want a black box that can handle your future projects as well. This requires testing different products offered by different vendors. Work with selected vendors (your short list of finalists) to perform blind tests and a proof of concept before purchasing a solution. Watch out for the quality of customer support and the length of the contract.

Case Studies

In one case, I used a black box despite its known limitations. It is good at what it does, but I wish it was not limited to one special kind of data. The reason to stick with it is a lack of time. In another situation, I was able to put a wrapper around the black box, but I left the core of the algorithm unchanged. Again, it is good at what it does. In a third case, because of the peculiarities of my data, the classic method implemented in the black box resulted in poor performance or total failure or on occasion, excellent results. I had to explore an alternative. I first discuss the last example.

When the Black Box Does Not Work on Your Data

I selected classic least squares to estimate the parameters of a non-periodic time series consisting of a sum of periodic terms. I discuss the problem in section 3.3 in my article “Machine Learning Cloud Regression: The Swiss Army Knife of Optimization”, available here. My problem is known to be ill-conditioned: solutions are numerically unstable, and it has many local minima. So I can’t blame the black box for failing. Yet, I need a solution because, indeed, there is one.

As a temporary fix, I decided to perform Monte Carlo simulations, abandoning the black box. It is very slow, but it always leads to the solution in all cases. The recommended method in the black box (according to the documentation) performed worse than the outdated, non-optimized black box version that you should avoid. The best fix is to use another black box available from the same vendor: one that does swarm optimization.

Takeaways

The previous case leads to a number of key takeaways:

Check different methods or hyperparameters in your black box.
Use challenging data to see how it reacts.
Compare with home-made or external alternatives.
Test on rich synthetic data to find where the black box shines and where it fails.
Check if/when you are stuck in undesirable local minima, and look for convergence issues.
Monte Carlo simulations can help you pinpoint the problems or “debug” the black box.

Finally, without thorough testing, you may not know your black box does not work on your specific, unusual data. In my case, it worked very well on some examples. And in many cases, it worked poorly, but it kind of worked. But with terrible performance compared to basic techniques such as Monte Carlo. Also, read the documentation to find which algorithm the black box uses. Then research the algorithm in question to find its limitations. Ask questions or read answers on message boards relevant to your vendor and the algorithm. As a last resort, contact your vendor!

When the Black Box Won’t Handle Some Cases

Again, it may not be obvious when it happens unless you do some testing. You may still get a solution, but not a great one, or no warning from your black box. In this case study, I used a good black-box solution, well designed, for curve fitting. The issue is it only handles ellipses, but not parabola or hyperbola. It is sad because the code has everything it needs to do the work for curves other than ellipses. It tests whether the data fits with an ellipse and will not run if the test fails. I could have updated the code to take care of various shapes in this case. But chances are the code was optimized for ellipses. I describe this example in the same article in section 3.2, here.

Surprisingly, it can handle circles. However, the circle is a degenerate case and can make the algorithm unstable. For instance, estimating the rotation angle of an ellipse makes sense, but for a circle, it does not: any angle is part of the optimum solution (if you rotate a circle, it stays unchanged). The takeaway here is to test degenerate cases and if necessary, use another method — possibly available in the black box — to handle such cases.

Finally, estimates for highly eccentric ellipses were biased. It is mentioned nowhere in the documentation, and again you need to do a lot of testing to figure this out. Or perhaps read technical articles on the topic. In some cases, bias may be a concern. In my case, it was noticeable but small enough not to be a concern, especially since the goal was curve fitting rather than predictions.

Some Features of Your Black Box are Great; Others aren’t

I had a black box to sample points on an ellipse in one case. It worked well, but it would perform sampling only on the full ellipse, not a partial arc. The front-end function was well written, so it was not difficult to adapt it to take care of partial arcs. I did not have to modify the back end (the core and most complicated part of the black box, computing elliptic integrals). In short, I integrated the black box by surrounding it with a wrapper.

In another example, I realized the K-means procedure in Python does not handle one-dimensional data. I had to come up with a solution, essentially making the one-dimensional data bivariate, with two identical components. Then I had to compute mean squared errors compatible with the ones used in my homemade clustering algorithm. Again, I created a wrapper, leaving the K-means computations to the Python library and taking care of everything else (including mean squared errors) in the wrapper.

As the goal was to compare my method to K-means, it was also important to test various data sets: two clusters with asymmetric sizes, asymmetric variances, and asymmetric probability distributions (with mean different from the median, thus, non-Gaussian). I generated synthetic data — a mixture — for that purpose. In some of my tests, the probability distribution had finite support. In other tests, it had a long tail and outliers.

Takeaway: When using a black box, you must test it against the typical scenarios. Not just one or two, or even 10,000 almost identical data sets. In my case, about ten well-selected synthetic data sets covered the most frequent situations.

When a Wrong Solution is OK, and When it is Not

In the context of dynamical systems, people routinely use recursions that lead to totally wrong computations in less than 50 steps. The reason is errors propagate exponentially fast. The simplest example is as follows:

X_{n+1}=b X_n - \lfloor b X_n\rfloor,

where X₀ is the initial value called “seed”, b > 1 is an integer, and the brackets represent the integer part function. If you use such sequences to average some statistics, this is not an issue: it is like starting with a new seed every 50 iterations because the process is ergodic. You may not notice the issue, though; it is not a big deal. But it is a major issue if you analyze long-range auto-correlations within a specific sequence. Then the case b = 2 leads to complete failure on most platforms due to how computers perform binary arithmetic. Try in Excel or Python with any irrational number for X₀. After 50 iterations, all X_n‘s are zero! There are workarounds, like using b = 2 – 1/2³¹ instead of b = 2.

The above problem may go undetected if you automatically run the algorithm thousands of times in production mode (with no human being interaction), some with an integer b that works, and some with one that does not. It will result in a mix of good and poor outcomes.

Takeaway: The problem is not just with b = 2, but whenever b is even. You should watch out for cases like that, especially when using a black box. Sometimes, problematic cases are described in the documentation or discussed in community forums.

About the Author

Vincent Granville is a machine learning scientist, author, and publisher. He was the co-founder of Data Science Central (now part of TechTarget) and, most recently, the founder of MLtechniques.com.