An Intriguing Job Interview Question for AI/ML Professionals

In my last project, I had to come up with some code and algorithm to solve an interesting problem. I realized that it could lead to some off-the-beaten-path job interview question. The problem is a fundamental one. The level ranges from elementary school to one of the most difficult unsolved problems of all times, depending on how deep you dig into it. It is a question that ChatGPT could not invent, at least until today. And definitely not a question that it could answer, if you were allowed to use this app in a job interview.

Now I discuss my interview question in details, with four possible difficulty levels.

Level 1: an elementary school problem

The question could be: what is the value of 327 multiplied by 1149? You need to explain why your computation works, and offer an alternative method, hopefully faster. With the extra requirements, it quickly becomes a question for high school students, or a warm-up question in a real job interview. In 50 years, it could become a very challenging question, if nobody learn multiplication tables anymore.

abacus — This tool can be used to solve the level 1 problem

Level 2: a job interview question

At this level, it is a tech question of average difficulty in a typical job interview. The question is about writing code to solve this problem: given a number X and two positive integers p, q, compute the binary digits of the products pX and qX. It must work even if X has billions of digits. In addition, compute the correlation between the first n digits of pX and qX, for large n, assuming the digits of X are random. Do you notice a pattern? I provide the solution later in this article, as well as the reason why I am interested in this problem.

Level 3: the topic of a PhD thesis

I break down the problem into a number of steps. By digits, I mean the digits starting after the decimal point.

Prove that the answer to the level 2 question, regarding the correlation, is 1/(pq). This is true assuming that p, q are co-prime odd integers.
Assuming that the binary digits of SQRT(2) and SQRT(3) behave exactly like random bits, using the above result, prove that the corresponding two digit sequences are not correlated. For SQRT(12) by SQRT(75), the correlation is 1/10. Generalize. Illustrate with numerical computations involving one trillion digits, and convergence of the correlation to zero in the first example, and to 1/10 in the second.
Find a very fast algorithm to compute the binary digits for square roots of non-square integers. First, look at the integer square root concept, and Python libraries such as gmpy2. Then, see if you can do better. I actually have a solution to this problem.

Level 4: to win the Nobel prize in mathematics

According to the legend, there is no Nobel prize for math because a mathematician was carrying on an affair with Alfred Nobel’s wife. You may double-check on Wikipedia or ChatGPT. That said, there is the equally prestigious Fields Medal. You will win it if you prove that indeed, the digits of SQRT(2) and SQRT(3) are uncorrelated, or more specifically, that these digits can not be distinguished from pure random noise. This is level 4 for my question. Countless people including top geniuses have worked on this problem for decades, to non-avail. It is safe to say that there will be human beings walking on planet Mars before this problem gets solved – if ever.

I have been working on it for well over a decade, on occasion sharing progress with top experts in the field such as David Bailey. I am still nowhere close to a solution.

Solution to the level 2 problem

The computation of the digits in question are a main component of my new money game. You can check it out in my article “Synthetic Stock Exchange Played with Real money”, available here. The code below computes the binary digits of the products pX, qX, and the cross-correlation, for a random number X. In short, the answer to the level 2 question. The algorithm in question is known as grade-school multiplication.

# Compute binary digits of X, p*X, q*X backwards (assuming X is random)
# Only digits after the decimal point (on the right) are computed
# Compute correlations between digits of p*X and q*X
# Include carry-over when performing grammar school multiplication

import numpy as np

# main parameters
seed = 105
np.random.seed(seed)
kmax = 1000000
p  = 5
q  = 3

# local variables
X, pX, qX = 0, 0, 0
d1, d2, e1, e2 = 0, 0, 0, 0
prod, count = 0, 0 

# loop over digits in reverse order
for k in range(kmax): 

    b = np.random.randint(0, 2)  # digit of X
    X = b + X/2  

    c1 = p*b
    old_d1 = d1
    old_e1 = e1 
    d1 = (c1 + old_e1//2) %2  # digit of pX
    e1 = (old_e1//2) + c1 - d1
    pX = d1 + pX/2

    c2 = q*b
    old_d2 = d2
    old_e2 = e2 
    d2 = (c2 + old_e2//2) %2  #digit of qX
    e2 = (old_e2//2) + c2 - d2
    qX = d2 + qX/2

    prod  += d1*d2
    count += 1 
    correl = 4*prod/count - 1

    if k% 10000 == 0:  
        print("k = %7d, correl = %7.4f" % (k, correl))  

print("\np = %3d, q = %3d" %(p, q))
print("X = %12.9f, pX  = %12.9f, qX  = %12.9f" % (X, pX, qX))
print("X = %12.9f, p*X = %12.9f, q*X = %12.9f" % (X, p*X, q*X))    
print("Correl = %7.4f, 1/(p*q) = %7.4f" % (correl, 1/(p*q)))

About the Author

Vincent Granville is a pioneering data scientist and machine learning expert, founder of MLTechniques.com and co-founder of Data Science Central (acquired by TechTarget in 2020), former VC-funded executive, author and patent owner. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, CNET, InfoSpace. Vincent is also a former post-doc at Cambridge University, and the National Institute of Statistical Sciences (NISS).

Vincent published in Journal of Number Theory, Journal of the Royal Statistical Society (Series B), and IEEE Transactions on Pattern Analysis and Machine Intelligence. He is also the author of “Intuitive Machine Learning and Explainable AI”, available here. He lives in Washington state, and enjoys doing research on stochastic processes, dynamical systems, experimental math and probabilistic number theory.