Statistical conclusions that do not make sense are regrettably a majority of the outcomes on large unconstrained data problems. Yet these large divergent problems sometimes settle over time into something that may make sense eventually. This can leave the data tool box full of scrubbing tools, regressions, culminations, and learning algorithms obsolete. Replaced with common normal growth estimates and a demand for clear hindsight into why the data exists and what really happened. This high level view has value beyond the math and can lead to real social value and learning potential.

Looking at data imprints from the Holocaust of WWII is crazy yet appropriate subject. We will look at two high level concepts in this article and leave the details as an ongoing exercise. This 10-12 year incident contains some of the most significant and controversial social and psychological impacts of all time. A staggering number of wild conclusions and opinions have not stopped for over 70 years. A key point is that the data and conclusions are still changing. Wobbling like some asymptotic FFT toward a frightful still silence of truth.

Over time the gross human brutality and general ethnic horror has settled the data opinions into some well rounded rational conclusions and solid numbers. Making this an approachable subject now. Because this subject is so vast and complex we will start at the top with the total number of estimated deaths and drill down in kind of a systematic logical categorization similar to the popular algorithms of the day. This is for demonstration as completeness would demand scope and purpose outside of this short article. All data uses valid recognized sources.

To start the modern generally accepted total death estimate for WWII weighs in at 66 million. With a pretty good estimate of the number of soldiers lost fighting the war at 16 million. This leaves close to 50 million non military deaths that were directly or indirectly caused by the war. Sadly this number includes the nuclear horrors toward the end of the war. All figures use whole numbers so there is an implicit margin of error that is human recognizable and appropriate for settled data like this.

So what do we know about the holocaust. It was a fanatical ethnic cleansing event based on an ideology. There were at least 2 million documented Jewish deaths with close to 6 million deaths for all concentration camps as of the current estimates. Also there was significant starvation and effects from weather and general population displacement.

Looking closer at the ethnic cleansing documents revealed the Jewish people were only a small percentage of the whole movement. There were gypsies, mentally and physically handicapped, and mixed breeds of all kinds declared as targets. It is well documented that selected people were shot on site if in a rural situation or shipped to concentration camps if in a populated occupied area and of significant wealth.

Toward the end of the war ethnic cleansing methods improved. The roving elite death squads (Toperkopf) following the front line soldiers carried out much of the ethnic cleansing duties remotely. The numbers for these death squads was conservatively held at 700,000 non military people.

Looking at the 50 million non military deaths again. Subtracting an estimated 10 million non military deaths for the Pacific rim leaves a figure of 40 million for Europe. This puts kills per (Nazi) soldier at 11.4 for non military people. This number seems small given the large amount of munitions produced and total bullets fired. Again not all soldiers killed people making the averages more or less variable. Still a unique perspective given the vast amount of material at hand. Also the loss of soldiers on the front lines along with the amount of soldiers killed by local resistance make these calculations more wobbly yet reasonable given the training, social indoctrination, and group mentality.

The general nature of non military deaths is complex with many different social and environmental factors. Some areas were occupied by three different regimes in a short time. The mass exodus certainly was a factor. The indiscriminate Allied mass bombings and social back lash by Allied troops were all factors in non military deaths. A complex subject.

Coming in from another direction the finite amount of rail capacity during war time comes into question with respect to concentration camps. Moving soldiers forward and ethnic acquires back to concentration camps seemed to be a selective process based on rail capacity, scheduling and social status. A 6 million conservative estimate for death totals for all concentration camps is currently on the books with some important variables left out.

A classic transportation problem of physical size verses value existed with the people shipping concept. The two types of “people carrier” train cars could carry up to 20 tonnes of cargo and shared the same basic size. Single level cars could carry 14 cattle with an attendant, 49 soldiers or close to 70 packed adults. The double level cars could carry as many as 50 pigs, close to 100+ turkeys or geese or about 100 or more small children. There is mention of the children being separated from adults and groomed and trained as low level servants as well as being shipped off to camps.

Records show roughly 2500 single level “cattle” cars existed. With over a thousand double level or “goose carrier” cars according to rail road reports. As an example with each rail trip containing between 40 and 70 cars per train, to move 6 million people, 2800 people per train trip (40 cars x 70 people per car) would take 2,142 trips. This spread over 5 years is 428 trips per year. Add children and the number increases. It is well known that the children were separated. The children had less of a survival over long distances so they may have been handled differently or escaped before occupation. Let hope.

Other factors that could influence the death counts are:

Social displacement and weather (Winter)

Allied Bombings

Civilian Death Squads

Marked Children Deaths/Survival lack of records after separation

Economic Recovery Death and disease.

Revenge or Ravage from having three occupations in some places.

In conclusion to this short article, the mass of written opinion and data on this subject is huge. The some what fanatical nature of the era and the times lends this topic a good one for describing and demonstrating the many types of data imprints and data bias. The data used here in is freely available on the internet and is intended to demonstrate data bias and data imprinting concepts. Any other interpretation is left as an exercise to the reader.

Data imprints mentioned here are just that; data with a time and context left by an event.

Several key points on data imprints and data bias are:

Problems may be simplified with time.

Appropriate error margins are settled and recognized

The intense emotion of a topic can jade the data representation and interpretation.

The motivation for a data imprint can lead to insights that may have been overlooked.

Time is a key factor and may change the meaning and the bias of the data.

To demand that the data says one thing may be proved differently later on.

Context is important to a broader understanding.

In contrast a well done complete unbiased data model with complete data (imprint) can bring a lasting quality to data that can be valuable over time. Sometimes leaving the tools out of a exercise can lead to and enjoyable insights and dynamic perspectives.

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

**Technical**

- Free Books and Resources for DSC Members
- Learn Machine Learning Coding Basics in a weekend
- New Machine Learning Cheat Sheet | Old one
- Advanced Machine Learning with Basic Excel
- 12 Algorithms Every Data Scientist Should Know
- Hitchhiker's Guide to Data Science, Machine Learning, R, Python
- Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Pyth...
- How to Automatically Determine the Number of Clusters in your Data
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- Fast Combinatorial Feature Selection with New Definition of Predict...
- 10 types of regressions. Which one to use?
- 40 Techniques Used by Data Scientists
- 15 Deep Learning Tutorials
- R: a survival guide to data science with R

**Non Technical**

- Advanced Analytic Platforms - Incumbents Fall - Challengers Rise
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- How to Become a Data Scientist - On your own
- 16 analytic disciplines compared to data science
- Six categories of Data Scientists
- 21 data science systems used by Amazon to operate its business
- 24 Uses of Statistical Modeling
- 33 unusual problems that can be solved with data science
- 22 Differences Between Junior and Senior Data Scientists
- Why You Should be a Data Science Generalist - and How to Become One
- Becoming a Billionaire Data Scientist vs Struggling to Get a $100k Job
- Why do people with no experience want to become data scientists?

**Articles from top bloggers**

- Kirk Borne | Stephanie Glen | Vincent Granville
- Ajit Jaokar | Ronald van Loon | Bernard Marr
- Steve Miller | Bill Schmarzo | Bill Vorhies

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives**: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central