I describe here the ultimate number guessing game, played with real money. It is a new trading and gaming system, based on state-of-the-art mathematical engineering, robust architecture, and patent-pending technology. It offers an alternative to the stock market and traditional gaming. This system is also far more transparent than the stock market, and can not be manipulated, as formulas to win the biggest returns (with real money) are made public. Also, it simulates a neutral, efficient stock market. In short, there is nothing random, everything is deterministic and fixed in advance, and known to all users. Yet it behaves in a way that looks perfectly random, and public algorithms offered to win the biggest gains require so much computing power, that for all purposes, they are useless -- except to comply with gaming laws and to establish trustworthiness.
We use private algorithms to determine the winning numbers, and while they produce the exact same results as the public algorithms (we tested this extensively), they are incredibly more efficient, by many orders of magnitude. Also, it can be mathematically proved that the public and private algorithms are equivalent, and we actually proved it. We go through this verification process for any new algorithm introduced in our system.
In section 4.1, we offer a competition: can you use the public algorithm to identify the winning numbers computed with the private (secret) algorithm? If yes, the system is breakable, and a more sophisticated approach is needed, to make it work. I don't think anyone can find the winning numbers (you are welcome to prove me wrong), so the award will be offered to the contestant providing the best insights on how to improve the robustness of this system. And if by chance you manage to identify those winning numbers, great, you'll get a bonus! But it is not a requirement to win the award.
1. Description, Main Features and Advantages
2. How it Works: the Secret Sauce
3. Business Model and Applications
4. Challenge and Statistical Results
5. Designing 16-bit and 32-bit Systems
1. Description, Main Features and Advantages
Rather than trading stocks or other financial instruments, participants (the users) purchase numbers. Sequences of winning numbers are generated all the time, and if you can predict the next winning number in a given sequence, your return is maximum. If your prediction is not too far from a winning number, you still make money, but not as much. Our system has the following features:
Comprehensive tables of previous winning numbers are published, even well before a new sequence (based on these past numbers) is offered to players. It helps participants to design or improve their strategies to find winning numbers. Actually, past winning numbers are part of the public data that is needed to compute the next winning numbers, both for participants and the platform operators.
Various ROI tables are available to participants, and you can even design your own ones. If you are conservative, you can choose one offering a maximum return of 10% (for finding the exact value of a winning number), a 54% chance of winning on any transaction, and a maximum potential loss of 4%. This table is safe enough that we will allow you to "trade" on margin. Another interesting ROI table offers a maximum return of 330%, and the same 54% chance of winning on any transaction, with a maximum potential loss of 4%. Keep in mind that this return is what you can make (or lose) in one day, on one sequence. New winning numbers are issued every day for each life sequence, so your return (negative or positive) gets compounded if you play frequently.
If you are a risk taker, you may like a table offering a maximum return of 500%, a 68% chance of winning on any transaction, and a maximum potential loss of 60%. Or another table with a maximum return of 600%, a 80% chance of winning, but a maximum potential loss of 100%. To download all the sample ROI tables discussed in this presentation, click here.
All the sequences currently offered on the market consist of 8-bit numbers: each winning number (a new one per day per sequence) is an integer between 0 and 255. We will soon offer 16-bit numbers. By design, all ROI tables (even if you use a customized one) offer an average return of 0%. This is true regardless of the sequence you are playing with: sequences and ROI tables are independent.
The participant can test various strategies: for instance:
Below, we explain how this works, using a real-life example.
Source for picture: here
2. How it Works: the Secret Sauce
Here is an example of a sequence being tested in our lab. It shows how the winning numbers are computed, for the sequence in question. The purpose is to illustrate the mechanics, applied to one of our 8-bit systems. The 32-bit version offers more flexibility, as well as potential returns that can beat those of a state lottery jackpot. Our sample 8-bit sequence is defined by the public algorithm below.
2.1. Public Algorithm
Start with initial values x(0) and y(0) that are positive integers, called seeds.
Then for t = 0, 1, 2, and so on, compute x(t+1) and y(t+1) iteratively as follows:
If 4x(t) + 1 < 2y(t) Then
y(t+1) = 4y(t) - 8x(t) - 2
x(t+1) = 2x(t) + 1
Else
x(t+1) = 2x(t)
y(t+1) = 4y(t).
2.2. The Winning Numbers
The future winning numbers for a particular sequence, start at a specific machine-generated iteration T that no one knows, not even the platform operator nor its software engineers. Typically, T > 30,000,000 and can be chosen randomly. The iterations represent the time. The future winning numbers are always integers between 0 and 255, and they occur only at iterations t = T, T + 8, T + 16, T + 24, and so on. Their value at iteration t is x(t) - 256 x(t - 8). The reason for skipping 7 out of 8 numbers is to make sure that winning numbers are not auto-correlated.
Past winning numbers are those occurring at iterations t = T - 8, T - 16, T - 24, and so on. The last 2,000 of them are published before the sequence is available (life) on the platform, allowing participants to predict future winning numbers, using the public algorithm or by other means, and make (or lose) money. For our above test sequence, the 2,000 past winning numbers in question, ordered chronologically, are available in this text file.
For each sequence, one new winning number is published each day. So, the time unit used here is 3 hours since one day is 8 x 3 hours. To win the maximum amount, one must correctly predict the winning number attached to a future day. Good and fair approximations also result in a gain, albeit lower. These gains and losses are explicitly specified beforehand, in very precise ROI tables, see below. Finally, by design, the winning numbers are not auto-correlated; they appear independently and uniformly distributed (more so than many software-generated pseudo-random numbers), and do not exhibit any known or visible pattern. In short, they look totally arbitrary, yet generated using a rudimentary formula.
2.3. Using Seeds to Find the Winning Numbers
Most participants are likely to do random trials to find or approximate winning numbers. The few who want to use the public algorithm need extra information to compute winning numbers, and even then, their chance of finding such numbers is virtually zero, due to the tremendous amount of computations required. In short, you need to know the seeds, and when to stop your computations. The stopping rule is simple: you stop when you have found numbers that match the past winning numbers publicly available. Then you known for sure that your next number will be a winning one.
We offer information about the seeds in two different ways:
We guarantee the following:
So we offer you a way to find the next winning numbers, and you know in advance how much you will win when finding them, using the ROI table. The question is: how many years would the most powerful computers in the world need, to make all these computations? By contrast, as of January 2019, only 31.4 trillion digits of Pi are known, and computing them require several months using a lot of computing power, together with very clever mathematical engineering bearing some resemblance to our private algorithms. And checking that all these digits (not just the first few trillion) are correct, is another big problem. Here, if you make any tiny mistake in your computations, you will miss the past sequence of winning numbers.
Of course, you could be a mathematical genius, and somehow figure out what the private algorithm is, to make your computations far more efficiently. This is highly unlikely to happen. There is a considerable amount of very advanced, unpublished mathematical research that has been done to make our systems robust. Also, we regularly change the type of sequences that we use in our system, every few months or so. And we work with white hat hackers (paid to hack our system) in order to identify potential vulnerabilities.
Finally, seeds that lead to unpredictable winning numbers (simulating an efficient market) are known as good seeds. Of course, all the sequences that we offer are based on seeds highly believed to be good ones, and that have been run through a battery of statistical tests. Using sequences based on bad seeds would not hurt the players, quite the contrary, but it would make our system easier to crack and cause problems with the ROI tables, thus hurting us.
Proving that specific seeds are good or bad, is one of the most challenging, unsolved mathematical problems of all times. If solved, we would know for sure whether the digits of a number such as Pi, are evenly distributed or not. These mathematical concepts have been studied for some time, see recent material on this topic, here and here.
2.4. ROI Tables
The ROI tables tell you how much money you will make or lose when submitting a number. Your ROI is a function of the distance between your submitted number z and the actual winning number x. The distance, also called error, is computed as follows: d(x, z) = min(|x - z|, 256 - |x - z|). It is always an integer value between 0 and 128. A pre-determined ROI is attached to each of the 129 potential error values. These ROI's characterize the type of risk that you are willing to take, and can be customized by each user, as long as the theoretical expected return (automatically computed in the ROI spreadsheet) is zero.
You will find these values in the ROI tables, available in spreadsheet format, here. Look at the second row in the spreadsheet, between column K and EI. The spreadsheet also contains 1,000 user-submitted numbers (simulations) with the ROI computed for each submitted number. Other summary statistics of interest are available in the spreadsheet: highest and lowest potential payout, chances of winning, and more.
3. Business Model and Applications
Accredited investors, hedge funds, stock trading brokers, stock exchange companies, cryptocurrency operators, government organizations (for instance,state lotteries and agencies interested in creating a lottery at the federal level) as well as game developers and companies in the gaming industry, are welcome to contact us. Investors potentially interested in participating in a first round of funding to create and scale this platform, and who can bring clients and/or a CEO of their choosing, are also invited. We traditionally work smart and fast, with very small efficient teams in a lean environment, with people located all over the world.
This short presentation only features the tip of the iceberg. The possibilities are endless, including the implementation of:
Some of these features allow players to sometimes slightly beat the official and neutral odds of winning, offering a true positive return on average for some short periods of time, at the expense of the operators. For the organization implementing these features, this can be seen as marketing costs to attract new customers. Other potential applications includes Blockchain and cryptocurrency technology, strong encryption, patent and security laws, and state-of-the-art, innovative research in statistical science, computer science, and number theory. Finally, the system can also be used for simulated trading, to test various strategies with various ROI tables.
Let's now look at how the money flows.
3.1. Managing the Money Flow
Managing the money involves subtracting or adding dollars to user accounts after each completed transaction. On a given day, how do we know whether on average, gains and losses will balance out, since we don't control the numbers entered by the participants?
Actually, we don't know. Sometimes the balance is slightly negative, sometimes slightly positive. However, by using fair ROI tables and good seeds, we are guaranteed to be flat on average. You can even compute the daily volatility resulting from the daily winning and losing transactions. Example: with 1,000 transactions in a single day, each one consisting of a $20 bet, the most conservative ROI table introduced in this presentation produces a theoretical standard deviation of $24, over a volume of $20,000. The most aggressive one produces a standard deviation of $314, still entirely manageable. These theoretical numbers have been confirmed by simulations, and are included in each ROI table, for internal use. When offering customized ROI tables, you might want to put a cap on the standard deviation being allowed. See section 4.2. for more details.
3.2. Virtual Currency
Rather than actual dollars, the operator could use a virtual currency. The currency is issued by the operator (it could even be tokens), while the real money is held by an escrow company. Since on average no real (nor virtual) money is made by the operator on the gains and losses of the participants (assuming the system is fair,) no tax should be paid by the operator on the deposits made by the participants, and no tax deduction allowed for the money distributed to participants. This is made easier using of a virtual currency. Of course, if users pay a fee to participate, the operator will have to pay taxes on this source of revenue.
Participants are expected to pay taxes on their gains, and in an ideal word, deduct losses. The "tax event", for the participant, occurs when the escrow company disburses the money, if it comes with a gain or a loss. This could take place after any bet or at any time that is convenient for the participant. The operator also deposits extra money on the escrow company, to cover the maximum cash float and keep a positive balance at all times. The cash float is the difference between aggregated gains and losses across all participants. The cash float typically represents less than 3% of the value of the portfolio of bets managed by the operator.
4. Challenge and Statistical Results
We discuss here two important statistical results that make this system works. But first, let's talk about the competition announced at the very beginning.
4.1. Data Science / Math Competition
We plan to organize a competition focusing on the public algorithm. The goal is to compute the next 200 winning numbers, in the right chronological order, using
You can use the methodology described in this article, or any other means. The award will be offered to participants providing the best insights on how to improve the robustness of our system. So it is not required to find the 200 next winning numbers to earn the award. But if you do find them, we offer a bonus. We will announce the competition by April 30, 2019. To not miss the announcement, you can sign-up to receive our newsletter.
4.2. Controlling the Variance of the Portfolio Value
Any guess regarding a winning number results in a gain or a loss depending on how close your guess z is to the winning number x. The metric used to measure the proximity between x and z is
d(x, z) = min(|x - z|, 256 - |x - z|)
All winning numbers are integers between 0 and 255. If the participants made random guesses, then the distance d(x, z) would be a random variable, say D, with the following distribution:
The money that you put on a number (your guess) is called principal, similarly to the money invested in a stock, in the stock market. Once the winning number is announced, your principal increases or decreases depending on how good your guess is. Your principal is actually multiplied by a factor G(d(x, z)) which is a function of the distance between the number you picked up, and the winning number.
The multipliers G(0), G(1), G(2) and so on, up to G(128), are known in advance and specified in the ROI table that you use. The ROI tables are fair, in the sense that the average gain for the player, is zero. In order to achieve this goal, ROI tables are designed so that
If the top multipliers offered are very high -- the highest being G(0) for a correct guess -- then, even though the system is fair (unbiased), the variance for the gain for a single guess, is also high. This variance, assuming E(gain) = 0 and the participant puts $1 per guess, is equal to
The total value of the portfolio that we manage, defined as the aggregated principal across all guesses, is flat over time but experiences daily fluctuations. To compute its variance, use the previous formula, and multiply it by both the number of guesses and the dollar amount attached to them. The standard deviation values mentioned in section 3.1. (about money management) is the square root of this variance, assuming we have 1,000 guesses, each with a $20 price tag.
With 1,000 daily guesses each with the same price tag, the most extreme standard deviations for the portfolio, expressed as a percentage of the portfolio value, are:
These percentages decrease as the number of guesses increases. In practice, we stick to ROI tables with a theoretical standard deviation that is less than 3% of the portfolio value. This guarantees our survival in case of extreme events, such as a very big client winning big time on a single guess and claiming her gain right away, or a "bank run".
4.3. Probability of Cracking the System
The sequences used in our system generate numbers that look random. The successive past winning numbers published to help you find the next one -- even though it is a small list of K = 2,000 integers between 0 and 255 -- look just as random. Without using working seeds x(0) and y(0) that are known to lead to the solution (albeit in a very large number of iterations, manipulating numbers with many million of digits most of the time), the chance of finding in any given sequence, the K successive numbers matching the K past winning numbers, in less than M iterations (say M = 30 million), is about
See here for details. Even if you try a million set of seeds, knowing that one and only one of them leads to the solution in less than M iterations, it will take you a staggering amount of time to find it.
If a participant uses the wrong sequence, starting with one of the allowed sets of seeds other than the one that is guaranteed to work, and by some incredible chance the sequence also contains the K past winning numbers in the first M iterations, even if the participant submit a number that is not a winning number, we would still have to pay her as if she had found the winning number. The chance of this happening is virtually zero.
5. Designing 16-bit and 32-bit Systems
So far we discussed 8-bit systems only. As the name indicates, a b-bit system is where the winning numbers are b-bit integers. In a b-bit system, the public iterative algorithm in section 2.1. is still the same, with the following adaptations (here b is the number of bits):
A b-bit ROI table has 1 + 2^(b -1) multipliers G(0), G(1), G(2), and so on, up to G(2^(b - 1)). Also, G(k + 1) is chosen to be either equal to or less than G(k), so that participants know that the more accurate their guess, the higher the return.
5.1. Layered ROI Tables
Below is an example of a fair, layered 32-bit ROI table. If your guess is within 19 points of the winning number (it will happen to about 39 people out of 4.3 billion) then your principal is multiplied by a factor one million. About 48.7% of the guesses are not winners, and they erode your principal by 30%. The lowest ROI you get if you are a winner is 15%, and 50.7% of all guesses fall in that category. About one in two hundred (0.54%) results in a 900% ROI. One in 100,000 would boost your principal by a factor 1,000.
An even more skewed table could be designed, guaranteeing an average return strictly above zero to the player. If the positive return is driven entirely by the multiplier offered for correctly guessing the winning number (and otherwise, excluding a perfect guess, the average return is just a very tiny bit below zero) you might be able to entice more players to participate, especially sophisticated, big ones. If the odds of winning the maximum is less than one in 4 billion in a single bet, it will take so much time and money to win the big prize, that the operator has time to accumulate gains and grow them slightly faster than inflation, so that when the big winning event takes place, enough funds are available to pay the big winner.
5.2. Smooth ROI Tables
It is possible to create smooth ROI tables, with a continuous, slow decline in the multiplier rather than sharp drops as in the above table. One of the most natural functions that comes to mind is the geometric function G(k) = A / B^k, with the parameters A and B chosen so that the table is fair both to the player and to the operator. It is illustrated below, using the 8-bit system. We are working on producing similar tables for the 32-bit system.
The smooth tables offer one advantage: no participant will be disappointed for missing a massive payout by only a few points. In the above table (8-bit), G(k) = 1.7685 / 1.0100^k. The table is fair. Note that only 44% of the guesses are winners. The highest multiplier is only 1.77. Also, you can lose as much as 50%. Yet you could argue that this table is far more equitable than those previously discussed.
5.3. Systems with Winning Numbers in [0, 1]
The theory can be extended to winning numbers that are real numbers in [0, 1]. For instance, one can use the seed x(0) = log 3 - log 2 with the sequence x(t + 1) = { 2x(t) } where the brackets represent the fractional part function.
Then, x(t) = A(t) log 3 + B(t) log 2 + C(t) where A(t), B(t), and C(t) are integers. The geometric ROI function (above picture) becomes G(k) = p / q^k, with p, q > 1, and k a real number in [0, 1]. It has the following features:
All the numbers x(t) are winning numbers, either past or future. The public information could consist of
We can replace the third item in the above list, by the value x(t) computed with one million correct digits, for some secret t between 0 and T. In that case, there is no need to mention log 3 and log 2.
You can replace log 3 and log 2 with two (or more) irrational numbers that are conjectured to be linearly independent over the set of rational numbers. For instance, Pi and SQRT(5), or the values of
Very few people know how to efficiently obtain millions of digits for these values -- which is the first step required to find the winning numbers. Finally, as long as x(0) is a good seed, the numbers generated by the sequence x(t) will look random, after proper decorrelating [see how to decorrelate in section 3.2.(a) in this article]. The concept of good and bad seed is illustrated in this article and in my book on stochastic processes, available here.
A shorter, less technical version of this article, was originally posted on DataShaping.com.
To not miss this type of content in the future, subscribe to our newsletter. For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on on LinkedIn, or visit my old web page here.
Comment
Yes Victor, the answer is positive in both cases. I actually started the process with a pair of seeds (a, b) each being a positive integer smaller than 1000, run a fairly large number of iterations and arrived at the published (x(0), y(0)) with 250,000 digits, and then if you continue for many more millions of iterations (but less than 30,000,000 of them) you end up with the 2,000 past winning numbers that I published. Any subsequent number after these 2,000 numbers is a future winning number. I'll be presenting this system at the INFORMS annual meeting in Seattle, in October 2019.
I also formally proved (mathematical proof) today that the auto-correlations among the x(t) [and also among the winning numbers] are all equal to zero (lag-1, lag-2, lag-3 and so on, all of them.) The proof applies to a much wider class of sequences.
May I clarify a couple of points?
- is it true that the sequence of 2,000 winning numbers provided above is generated with a seed (a,b) consisting of a pair of positive integers less than 1,000 (after running the dynamics for over 30 million steps)?
- is it true that the "valid seeds" or "public seeds" x(0), y(0) provided in section 2.3 can be reached starting from (a,b)?
Victor, you need to have somewhere in your sequence x(t), starting at some iteration (smaller than 3 trillion), all the 2,000 successive values, in the right order, listed at www.datashaping.com/winningNumbers2000.txt (these are the past winning numbers).
Only one set of seeds (among the one million or so suggested) will meet this requirement. Yes, plenty of seeds produce a sequence x(t) being a constant number equal to 255, and they are easy to eliminate. In section 2.3, I published seeds x(0) and y(0) that work (they will get you to the winning number) but these seeds are integers with 250,000 digits.
I'm glad to see the interest in this problem!
I have a question about the statement "Then you known for sure that your next number will be a winning one." If y(0) > 4 x(0) + 2 ( which is larger than 2 x(0) + 1/2), then y(t) > 4 x(t) + 2 for all t, and so x(t) - 256 x(t-8) = 255 for all t >=8, i.e.: any seed that satisfies y(0) > 4 x(0) + 2 produces exactly the same sequence of winning numbers. Clearly, seed with y(0) > 4 x(0) + 2 are bad seeds (for the purpose of generating pseudo random numbers), but they illustrate the fact that the winning numbers are not uniquely determined by the seed (assuming all starting pair of natural numbers are seeds).
From the public algorithm, the transition from x(t) to x(t+1) has 2 cases (one when 4x + 1 < 2y, and the other when not), so either x(t+1) = 2x(t) + 0, or x(t+1) = 2x(1) + 1.
Hi Victor, yes you are correct. I came up to that conclusion using the exact same line of reasoning. The part that is a bit more tricky (in my opinion) is to prove that r(t) is either 0 or 1. I think you might be on the right track to win our next competition when we announce it. At least, if I was a contestant with the goal of finding the future winning numbers, that's how I would start. Did you hear anything about the paper written regarding the past competition?
Hi Vincent. As often, your abstraction from numerical experiments is correct. We have x(t+1) = 2 x(t) + r(t), with r(t)=0 or 1 depending on y(t), therefore: x(t+8) = 2^8 x(t) + [2^7 r(t) + 2^6 r(t+1) + ...+ r(t+7)]. From here, x(t+8) - 256 x(t) is an integer between 0 and 2^7 + 2^6 + ..+ 1 = 2^8 - 1 = 255
Hi Victor,
While there is no explicit "mod 256" in the formula implemented, it does some kind of implicit "mod 256". I'd be curious to see someone proving this fact. I don't think it is hard to prove, but I stumbled upon it by chance (as an accidental, by-product result of some theoretical investigations, and later confirmed by actual computations.) Indeed, this is why I eventually stuck to x(t) - 256 x(t-8) rather than a more complex formula that achieves the same business goal. To put it differently, I was going to use a more intricate but more "obvious" formula that guarantees numbers between 0 and 255 (there is also some reasons for choosing 256 rather than 128 or 512), but it eventually (unexpectedly) resulted in cascading cancellations, finally simplifying to x(t) - 256 x(t-8).
Best,
Vincent
If the value of the winning numbers are of the form x(t) - 256 x(t-8) for some suitable iteration t, how is it that the result is a number between 0 and 255? Are these integers or integers mod 256?
© 2019 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Technical
Non Technical
Articles from top bloggers
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central