# The most expensive data science textbook

This rudimentary statistics textbook, entitled Statistics: The Art and Science of Learning from Data (3rd Edition), sells on Amazon for \$157.79. Not sure if everyone sees the same price as me (maybe prices are user-customized), if price changes over time, but it seems stable. Below is a screenshot.

Surprisingly, this book is meant for first-year college students, so there's nothing original in the book. Just basic standard stuff that anyone can find for free on the Internet. Books covering far deeper and broader material, at the research level, filled with new intellectual property that you can directly apply to new-world data, sell at a fraction of the cost.

What justifies such a high price? And why would someone - necessarily a student just starting college - spend so much money on this classic, three century old material? I mean, even if you are forced by your stats professor to buy this book, you will never purchase it at full price, you'll get a used copy, borrow it from the library, or get a copy from a friend. The fact that there are so few reviews (over a two-year time period) might indicate that they are very few buyers, but who knows?

To make things worse, it's 800-pages thick, but contains so little material that it could be compressed (summarized) in just 10 pages. For instance, the author spends 350 pages introducing material such as sampling, random variables, moments, binomial distribution, probability distributions (though there is no mention of generating functions or any proof of the central limit theorem) before introducing the concept of confidence intervals. It takes another 50 pages to review the concept of confidence intervals, yet another 50 pages to discuss the basics of hypothesis testing, and then yet another 50 pages to discuss group comparisons.

By contrast, if you look at my tutorial on confidence intervals and hypothesis testing, it has the following features:

• No introduction needed - you can skip the first 350 pages of Agresti's book (my tutorial relies only on ranks and sorting, no need to know even the most basic probability concepts; indeed middle school students can read my tutorial, understand it, and apply it)
• It's just 2-3 pages long to cover the three chapters spreading over 150 pages in Agresti's book
• It's entirely free
• It provides a very robust, model-free, automatable approach, yet it is 100% compatible with standard confidence intervals, when applied to traditional data (which is what Agresti's book is all about)

So what make people

• Continue teaching these topics that way,
• Write the exact same stuff that has been written by thousands of authors over hundreds of years (why the need for a new book in the first place?),
• Sell it at a more and more expensive price when it has become free knowledge - essentially unsellable?

I am absolutely stunned by this level of absurdity. Maybe you have some explanations. I can't find any... Is this book an exception, or is this the general trend with traditional publishers? Or is a publisher bubble on its way, regarding college textbooks?

Related article

Views: 4167

Comment

Join Data Science Central

Comment by Swati Jain, Ph.D. on August 26, 2014 at 7:15am

The text book prices are ridiculous. Professors and students receive them for free as a marketing stunt and students end up buying these. The worst part is that some professors give assignments such as attempt " Q4 on page59" . In that case, hand me down editions don't work. A student has to buy a new one.

Comment by George Vander Meulen on August 25, 2014 at 8:00am

Vincent, I'm surprised it's only \$157. Their is no logic to textbook pricing. Professors require them so students buy them so publishers charge whatever they can get away with. They could probably sell just as many for \$257. This is a scandal!

I suspect that education is suffering as a result. A core curriculum should be decided on (for everything) and put up on the web for free. Then every accredited institution should have to use it.

Comment by Vincent Granville on August 23, 2014 at 8:41pm

Data science books are much cheaper than statistics books because the target audience is much larger.

Comment by Vincent Granville on August 22, 2014 at 3:08pm

Rebecca, an 800-page book is depressing when the true content (after filtering out noise) is about 10 pages or less and can be assimilated in one hour rather than 10 months. Not sure what kind of students you have in your classrooms, but I can tell that my former college students (in Belgium, where college education is free) or my 12-year old daughter for that matter, understand the concept of confidence interval (as explained in my 3-page tutorial) very quickly. Maybe you are trying to make sure that even the dumbest students get it - which penalizes average and smart students - and maybe this is a side effect of asking \$50,000 in tuition fees for something most can learn for free by themselves in a much shorter time period.

Here's an interesting job interview question, for a future statistician: how much did you spent on basic statistical textbooks during your college years? I could use it to select candidates, hiring the one answering "less than \$500", especially if she tells me that she understood confidence intervals in 30 minutes after reading my (very popular) tutorial.

Comment by Rebecca Barber, PhD on August 22, 2014 at 12:43pm

One other comment - as you note, the book you call out is an introductory stats text, not a data science textbook.  I don't consider those the same thing at all.  Most data science books are under \$50.

Comment by Harvey Summers on August 22, 2014 at 12:03pm

Why? Cash. Textbooks are big, fat cash cows.  My Intro to PM course has a \$280 textbook. I could teach it with \$85 of books (\$50 for the PMBOK and \$35 for Fast Forward MBA in PM). But as an adjunct facutly, I don't have the time to develop all the test and supporting materials. The \$280 gets me all the quizes, tests, and presentations that have been calibrated so I can tell if my class isn't "getting it." The sad part is that if I wasn't teaching for a state university, I'd chuck the quizes and tests for practical real-world experience. But projects aren't laying around like data, and I can't have 50 people do different things and then grade them without some TA support. So that's why people create crappy books that cost so much and why adjuncts have expensive textbooks.

Comment by Rebecca Barber, PhD on August 22, 2014 at 12:01pm

It seems you've never taught statistics to undergraduates.  I have, and here are a few reasons for what you see (although the book I used was somewhat less expensive).

1) Students really don't automatically get things like confidence intervals and hypothesis testing.  There are a LOT of things they need to understand first in order to see both the use and the point of those items.  They can't contextualize those concepts when they first walk in the room.

2) Examples need to be contemporary or the students can't relate to them and get hung up on the disconnect rather than working the problem.  If an instructor used an example of, for example, the probability of a typewriter ribbon breaking, the entire class would be off googling typewriter ribbons because they've never even seen one. Better examples actually pull from relevant CURRENT research literature, and that changes all the time.

2a) In addition, undergraduates are notorious for putting the answers to all the questions up on the internet.  If I actually want my students to do the work, I either have to come up with new problems for every single class or use a current book that is recent enough for the answers not to be out there already. While you can algorithmically come up with alternative numbers for problems (and that is what most online systems do), that doesn't change the scenario.  And often part of what they need to learn is how to apply the calculations to the scenarios.  That information will be already out there if I try to just change the numbers.

3) The more you teach this stuff, the more new ways you come up with to explain it.  Some of those new ways are better than the old.  (Admit it: no statistics book is a jolly good read)

4) Undergrads in a stats class are from a wide variety of different backgrounds and going into a wide variety of different skills.  While you don't NEED to know probability distributions to understand confidence intervals, there are many students who will never use confidence intervals but for whom the understanding of probability distributions is quite relevant.

My point is that until you actually stand in front of a class of community college students (mine were mostly psych and nursing) who really don't want to be there and are dreading the course more than a root canal, you don't have the qualifications necessary to judge what is and is not necessary in a college textbook.

Having said that, there are a great many less expensive and even a couple of open source stats textbooks that are also better for teaching. My experience is that the open source ones are not helpful to a lot of students and leave things out/jump around more, but you can definitely teach that class with a textbook that costs under \$50.