And creates the first legit, private lottery company. The winning numbers, to be published each week, are just as random and unpredictable (if not more) than winning numbers from state lotteries. It can be designed so that the expected gains for participants are far more favorable and more closely aligned with what casinos offer: in short, the odds of winning and multiplying your (say) $20 bet by (say) 10 or 100, can easily be made much higher than in traditional lotteries. It even beats the stock market, if you look at the return over the last 15 years. So how is this possible, when large-scale lotteries are illegal everywhere and punished by many years of prison and gigantic fines?

The explanation is very simple. It also shows why math wizards are great candidates to run such a business. While the winning numbers generated look extremely random, just as random as traditional lottery winning numbers, they are actually produced by extremely rudimentary, short mathematical formulas. Think of the decimals of number Pi: they could be used as winning lottery numbers since they appear incredibly random, yet they can be computed very efficiently. Billions of them have been produced so far, so you could run a lottery for millions of years, just based on Pi's decimals. Nobody ever managed to prove that Pi's decimals are random, and it is possibly the biggest mathematical challenge of all times. We even offered a $500,000 award to solve an equally difficult question regarding another famous mathematical number.

So in reality, this lottery business is a disguised math competition, asking you a (say) $20 administrative fee to participate. A math wizard is far more likely to win than other people. You would have to change the word *lottery* to (say) *lotteri* in your business name, to avoid litigation, just like some banks called themselves *banc* rather than *bank*. Still, you are better off calling it a *lotteri* rather than a *competition*, to attract a large number of participants. And disclose the fact that numbers are indeed easily predictable, despite illusion of the contrary. It actually makes this *lotteri* trustworthy, as the generating formula for the winning numbers (typically a mathematical recurrence relationship) can be shared with authorities -- more on this complex issue below, especially on how to do it right. Even better, while state lotteries are a tax on innumeracy, this *lotteri* promotes mathematical education and interest, especially if run on a large scale, as people know that a trivial mathematical algorithm generates the winning numbers. You disclose the formula and change it every year, to further boost credibility.

Various levels of randomness can be offered to participants, using a trivial algorithm in all cases to generate winning numbers, so as to not be accused of running an actual lottery. You can produce numbers that are easier to guess, thus encouraging interest in mathematics in the general population, but offering lower pay-outs. Or at the other extreme, incredibly random-looking numbers, hard to guess, that will attract math and data science geniuses armed with massive data sets consisting of billions of decimals for tons of carefully selected numbers (and some Hadoop / Map-Reduce technology).

This latter model, offering bigger pay-outs to participants, might appeal to investors willing to fund this *lotteri* business, in exchange for equity in the company, to truly be able to offer bigger pay-outs to participants, like $1 million, in case a math genius guessed the right numbers earlier in the company's lifecycle, when the company hasn't gathered enough profits yet. Without VC or angel investor funding, the pay-out would be a re-distribution of the profit, and might be smaller initially. It would still be a viable business in my opinion, especially if you have access to a large mailing list to promote it, and the product and pricing is done right, based on probability theory, simulations/testing, and price/reward elasticity models (what any data scientist worth her salt should be able to determine).

**Possible implementation**

Let's dive a bit deeper into this *lotteri*, to show how it works. Here is an example of implementation, based on the number SQRT(2)/2 represented in binary form - a sequence of 0 and 1 that looks extremely random, unpredictable, even though it is generated by this rudimentary, one-line recurrence formula, starting with p(0) = 0, p(1)= 1, e(1) = 2:

**If** 4p(n) + 1 < 2e(n) **Then **p(n+1) = 2p(n) + 1; e(n+1) = 4e(n) - 8p(n) - 2; d(n+1) = 1

**Else **p(n+1) = 2p(n); e(n+1) = 4e(n); d(n+1) = 0

The surprising result is that the d(n)'s represent the bits of SQRT(2)/2 when represented in base 2. Click here to check out the proof and for more general formulas about more complex numbers.

The *lotteri* would work as follows:

- Each week, people who successfully guessed 20 bits of this sequence (one chance in a million; higher for math wizards), get paid the maximum pay-out. But even if you correctly guess the first 10 bits (one chance in a thousand), you get a good pay-out. And so on.
- The subsequent week, the next 20 bits of the sequence are used as winning numbers, and the previous winning numbers are published, so math wizards have a chance to identify winning patterns. Participants have access to all winning numbers from the past (maybe a $20 fee is required to access past winning numbers, to prevent some math wizards from entering this
*lotteri*only after their figured out the algorithm used to produce winning numbers) - A code offering a 50% discount on all transactions, is offered to participants who subscribe to our newsletter.

The very nice features about the above formula are

- You can even publish the recurrence formula producing these numbers. It grows exponentially fast, and very quickly become unusable due to limitations of modern computing systems, even for those relying on distributed architecture. Yet, because you know (but the
*lotteri*participant does not know) that it produces all the digits (in base 2) of SQRT(2)/2, you don't use this recurrence formula to produce the digits; instead you use tables of billions of pre-computed digits of SQRT(2)/2. Now the math wizard will do the same, testing tons of numbers and see which ones match the winning numbers week after week. That's why a number like SRT(2)/2 is not a good candidate, as these hackers will quickly discover the trick. But there are trillions of numbers that are much less popular, and more difficult to identify - If some government agency investigates your
*lotter*i business and asks you to deliver the magic formula, you can provide the above recurrence formula. While it can theoretically compute all the digits of SQRT(2)/2 and thus find all winning numbers -- past and future -- in practice it is unusable due to its exponential growth. Never tell government agencies investigating your business that these digits corresponds to SQRT(2)/2, otherwise you can expect leaks and mysteriously having a lot of people who suddenly are able to guess the winning numbers each week. Of course, it is a good practice to use a different number each year. This issue is exactly identical to what Apple is facing when the FBI requests the magic key to decrypt cell phone data from some people. - Finally, unlike traditional random number generators, the sequences used for the winning numbers are non-periodic: they never repeat themselves, as long as they represent irrational numbers such as Pi or SQRT(2)/2.. So you can run your
*lotteri*until the end of times, with just one single sequence.

If you are interested in this project, please contact me at [email protected]

**About the author:**

*Vincent Granville worked for Visa, eBay, Microsoft, Wells Fargo, NBC, a few startups and various organizations, to optimize business problems, boost ROI or to develop ROI attribution models, developing new techniques and systems to leverage modern big data and deliver added value. Vincent owns several patents, published in top scientific journals, raised VC funding, and founded a few startups. Vincent also manages his own self-funded research lab, focusing on simplifying, unifying, modernizing, automating, scaling, and dramatically optimizing statistical techniques. Vincent's focus is on producing robust, automatable tools, API's and algorithms that can be used and understood by the layman, and at the same time adapted to modern big, fast-flowing, unstructured data. Vincent is a post-graduate from Cambridge University.*

**DSC Resources**

- Career: Training | Books | Cheat Sheet | Apprenticeship | Certification | Salary Surveys | Jobs
- Knowledge: Research | Competitions | Webinars | Our Book | Members Only | Search DSC
- Buzz: Business News | Announcements | Events | RSS Feeds
- Misc: Top Links | Code Snippets | External Resources | Best Blogs | Subscribe | For Bloggers

**Additional Reading**

- What statisticians think about data scientists
- Data Science Compared to 16 Analytic Disciplines
- 10 types of data scientists
- 91 job interview questions for data scientists
- 50 Questions to Test True Data Science Knowledge
- 24 Uses of Statistical Modeling
- 21 data science systems used by Amazon to operate its business
- Top 20 Big Data Experts to Follow (Includes Scoring Algorithm)
- 5 Data Science Leaders Share their Predictions for 2016 and Beyond
- 50 Articles about Hadoop and Related Topics
- 10 Modern Statistical Concepts Discovered by Data Scientists
- Top data science keywords on DSC
- 4 easy steps to becoming a data scientist
- 22 tips for better data science
- How to detect spurious correlations, and how to find the real ones
- 17 short tutorials all data scientists should read (and practice)
- High versus low-level data science

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central