Subscribe to DSC Newsletter

Free Book: Applied Stochastic Processes

Full title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems. Published June 2, 2018. Author: Vincent Granville, PhD. (104 pages, 16 chapters.)

This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject. It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.

New ideas, advanced topics, and state-of-the-art research are discussed in simple English, without using jargon or arcane theory. It unifies topics that are usually part of different fields (data science, operations research, dynamical systems, computer science, number theory, probability) broadening the knowledge and interest of the reader in ways that are not found in any other book. This short book contains a large amount of condensed material that would typically be covered in 500 pages in traditional publications. Thanks to cross-references and redundancy, the chapters can be read independently, in random order.

This book is available for Data Science Central members exclusively. The text in blue consists of clickable links to provide the reader with additional references.  Source code and Excel spreadsheets summarizing computations, are also accessible as hyperlinks for easy copy-and-paste or replication purposes. The most recent version of this book is available from this link, accessible to DSC members only. 

About the author

Vincent Granville is a start-up entrepreneur, patent owner, author, investor, pioneering data scientist with 30 years of corporate experience in companies small and large (eBay, Microsoft, NBC, Wells Fargo, Visa, CNET) and a former VC-funded executive, with a strong academic and research background including Cambridge University.

Download the book (members only) 

Click here to get the book. For Data Science Central members only. If you have any issues accessing the book please contact us at [email protected] To become a member, click here

Content

The book covers the following topics: 

1. Introduction to Stochastic Processes

We introduce these processes, used routinely by Wall Street quants, with a simple approach consisting of re-scaling  random  walks to make them time-continuous, with a finite variance, based on the central limit theorem.

  • Construction of Time-Continuous Stochastic Processes
  • From Random Walks to Brownian Motion
  • Stationarity, Ergodicity, Fractal Behavior
  • Memory-less or Markov Property
  • Non-Brownian Process

2. Integration, Differentiation, Moving Averages

We introduce more advanced concepts about stochastic processes. Yet we make these concepts easy to understand even to the non-expert. This is a follow-up to Chapter 1.

  • Integrated, Moving Average and Differential Process
  • Proper Re-scaling and Variance Computation
  • Application to Number Theory Problem

3. Self-Correcting Random Walks

We investigate here a breed of stochastic processes that are different from the Brownian motion, yet are better models in many contexts, including Fintech. 

  • Controlled or Constrained Random Walks
  • Link to Mixture Distributions and Clustering
  • First Glimpse of Stochastic Integral Equations
  • Link to Wiener Processes, Application to Fintech
  • Potential Areas for Research
  • Non-stochastic Case

4. Stochastic Processes and Tests of Randomness

In this transition chapter, we introduce a different type of stochastic process, with number theory and cryptography applications, analyzing statistical properties of numeration systems along the way -- a recurrent theme in the next chapters, offering many research opportunities and applications. While we are dealing with deterministic sequences here, they behave very much like stochastic processes, and are treated as such. Statistical testing is central to this chapter, introducing tests that will be also used in the last chapters.

  • Gap Distribution in Pseudo-Random Digits
  • Statistical Testing and Geometric Distribution
  • Algorithm to Compute Gaps
  • Another Application to Number Theory Problem
  • Counter-Example: Failing the Gap Test

5. Hierarchical Processes

We start discussing random number generation, and numerical and computational issues in simulations, applied to an original type of stochastic process. This will become a recurring theme in the next chapters, as it applies to many other processes.

  • Graph Theory and Network Processes
  • The Six Degrees of Separation Problem
  • Programming Languages Failing to Produce Randomness in Simulations
  • How to Identify and Fix  the Previous Issue
  • Application to Web Crawling

6. Introduction to Chaotic Systems

While typically studied in the context of dynamical systems, the logistic map can be viewed  as a stochastic process, with an equilibrium distribution and probabilistic properties, just like numeration systems (next chapters) and processes introduced in the first four chapters.

  • Logistic Map and Fractals
  • Simulation: Flaws in Popular Random  Number  Generators
  • Quantum Algorithms

7. Chaos, Logistic Map and Related Processes

We study processes related to the logistic map, including a special logistic map discussed here for the first time, with a simple equilibrium distribution. This chapter offers a transition between chapter 6, and the next chapters on numeration system (the logistic map being one of them.)

  • General Framework
  • Equilibrium Distribution and Stochastic Integral Equation
  • Examples of Chaotic Sequences
  • Discrete, Continuous Sequences and Generalizations
  • Special Logistic Map
  • Auto-regressive Time Series
  • Literature
  • Source Code with Big Number Library
  • Solving the Stochastic Integral Equation: Example

8. Numerical and Computational Issues

These issues have been mentioned in chapter 7, and also appear in chapters 9, 10 and 11. Here we take a deeper dive and offer solutions, using high precision computing with BigNumber libraries. 

  • Precision Issues when Simulating, Modeling, and Analyzing Chaotic Processes
  • When Precision Matters, and when it does not
  • High Precision Computing (HPC)
  • Benchmarking HPC Solutions
  • How to Assess the Accuracy of your Simulation Tool

9. Digits of Pi, Randomness, and Stochastic Processes

Deep mathematical and data science research (including a result about the randomness of  Pi, which is just a particular case) are presented here, without using arcane terminology or complicated equations.  Numeration systems discussed here are a particular case of deterministic sequences behaving just like the stochastic process investigated earlier, in particular the logistic map, which is a particular case.

  • Application: Random Number Generation
  • Chaotic Sequences Representing Numbers
  • Data Science and Mathematical Engineering
  • Numbers in Base 2, 10, 3/2 or Pi
  • Nested Square Roots and Logistic Map
  • About the Randomness of the Digits of Pi
  • The Digits of Pi are Randomly Distributed in the Logistic Map System
  • Paths to Proving Randomness in the Decimal System
  • Connection with Brownian Motions
  • Randomness and the Bad Seeds Paradox
  • Application to Cryptography, Financial Markets, Blockchain, and HPC
  • Digits of Pi in Base Pi

10. Numeration Systems in One Picture

Here you will find a summary of much of the material previously covered on chaotic systems, in the context of numeration systems (in particular, chapters 7 and  9.)

  • Summary Table: Equilibrium Distribution, Properties
  • Reverse-engineering Number Representation Systems
  • Application to Cryptography

11. Numeration Systems: More Statistical Tests and Applications

In addition to featuring new research results and building on the previous chapters, the topics discussed here offer a great sandbox for data scientists and mathematicians. 

  • Components of Number Representation Systems
  • General Properties of these Systems
  • Examples of Number Representation Systems
  • Examples of Patterns in Digits Distribution
  • Defects found in the Logistic Map System
  • Test of Uniformity
  • New Numeration System with no Bad Seed
  • Holes, Autocorrelations, and Entropy (Information Theory)
  • Towards a more General, Better, Hybrid System
  • Faulty Digits, Ergodicity, and High Precision Computing
  • Finding the Equilibrium Distribution with the Percentile Test
  • Central Limit Theorem, Random Walks, Brownian Motions, Stock Market Modeling
  • Data Set and Excel Computations

12. The Central Limit Theorem Revisited

The central limit theorem explains the convergence of discrete stochastic processes to Brownian motions, and has been cited a few times in this book. Here we also explore a version that applies to deterministic sequences. Such sequences and treated as stochastic processes in this book.

  • A Special Case of the Central Limit Theorem
  • Simulations, Testing, and Conclusions
  • Generalizations
  • Source Code

13. How to Detect if Numbers are Random or Not

We explore here some deterministic sequences of numbers, behaving like stochastic processes or chaotic systems, together with another interesting application of the central limit theorem.

  • Central Limit Theorem for Non-Random Variables
  • Testing Randomness: Max Gap, Auto-Correlations and More
  • Potential Research Areas
  • Generalization to Higher Dimensions

14. Arrival Time of Extreme Events in Time Series

Time series, as discussed in the first chapters, are also stochastic processes. Here we discuss a topic rarely investigated in the literature: the arrival times, as opposed to the extreme values (a classic topic), associated with extreme events in time series.

  • Simulations
  • Theoretical Distribution of Records over Time

15. Miscellaneous Topics

We investigate topics related to time series as well as other popular stochastic processes such as spatial processes.

  • How and Why: Decorrelate Time Series
  • A Weird Stochastic-Like, Chaotic Sequence
  • Stochastic Geometry, Spatial Processes, Random Circles: Coverage Problem
  • Additional Reading (Including Twin Points in Point Processes)

16. Exercises

To not miss this type of content in the future, subscribe to our newsletter. For related articles from the same author, click here or visit www.VincentGranville.com. Follow me on on LinkedIn, or visit my old web page here.

Additional resources:

Views: 62325

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by shishir goel on June 13, 2018 at 11:05pm

Thank you Vincent for the informative book...

Comment by Jacob on June 11, 2018 at 3:56am

Perfect timing. Thanks!

Comment by Alessandro Trinca Arnould on June 7, 2018 at 9:34pm

Many thanks Vincent, very appreciated.

I'm going to deeply investigate about the matters your book tooks about.

Comment by philippe therond on June 7, 2018 at 7:25am

The book sounds appealing as it touches on some very deep mathematical concepts and relates them to common data science problems, which I guess will bring a critical perspective on the,. I look forward to reading it but I have tried to clear my cache, cookies, use private mode, and three different browsers but I just can't download it...

Can anyone please help?

regards

Philippe

Comment by Nitin Thakur on June 7, 2018 at 3:54am

Thanks very much Prof. Vincent. This sounds interesting, I have already started going through it.

Regards

Comment by Ilya Selitser on June 7, 2018 at 3:37am

Thank you for the very interesting book, Vincent!

Comment by Vincent Granville on June 6, 2018 at 4:19am

Hi Nitin,

I tried my best to make the subject, typically considered as difficult, accessible to a large audience, almost free of theorems and proofs, requiring no more than the central limit theorem (and even this stuff is explained rather simply in the book). Try reading the first chapter, it will give you an idea about the overall level. It does not get more difficult later in the book, the level is pretty even throughout the book. Even the more technical aspects have been simplified or summarized, indeed that is why it is only 100-pages long.

Most other books on the subject, are very heavy on measure theory even on the very first pages; mine barely mentions this topic. It turned me off when I tried reading about martingales, and it is one of the reasons that I wrote this book, as all this complexity is unnecessary even to get a good grasp of what stochastic processes are, and what you can do with them. Finally, thinking of these processes as time series, is a way to study them at a beginner level. 

Finally, the approach is bottom-up rather than top-down: I explain how results are discovered, rather than using an academic style by starting each concept with tons of definitions and theorems. This is helpful for beginners, and probably, makes for a more pleasant read, even for experts and academic researchers.

Vincent

Comment by Nitin Thakur on June 6, 2018 at 12:22am

Hello Vincent,

I am developing my skills in Data Science. However I do not have strong statistical background. Just wanted to check would this book help me get the concepts and make me ready for the role. OR do I have to refer to some more material?

Thanks you

Comment by Prabhakar Krishnamurthy on June 5, 2018 at 3:01pm

Dear Professor Vincent Granville, Thanks for your mail and publishing book on Applied Stochastic Processes. I am unable to download the book. Is it possible to guide me. Thank you very much for publishing a book that is most needed by teachers. 

With warm regards

Dr.K.Prabhakar

Comment by Vincent Granville on June 5, 2018 at 12:10pm

Hi Mohammed,

I believe so. If you look at chapter 5 (six degrees of separation) it applies to Youtube videos as well, in the sense that there is a path involving no more than six links from any Youtube video to any other one. Using a recursive algorithm for (automated) crawling is not a good idea though, as explained in chapter 5. Also, some videos are somewhat disconnected from the vast majority of Youtube videos. For instance, can you start with a video of the Beatles, and end up after any amount of browsing, discovering a machine learning video? Maybe not, and it means that the Youtube graph is not fully connected, and you need a number of seed videos from each connected component when doing your browsing, in order to retrieve all of them.

Vincent

Videos

  • Add Videos
  • View All

Follow Us

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service