In the NBA, a top player makes around a thousand shots during the entire regular season. A question worth asking is: What information can we get by looking at these shots? As a basketball fan for more than 10 years, I am particularly interested in discovering facts that can not be directly seen on live TV. When I was surfing on web last week, I found a data set called NBA shot-log from Kaggle. This data summarizes every shot made by each player during the games in the 14/15 regular season along with a variety of features. I decided to perform an exploratory visualization with this data. Now Let’s dive into the shot-log, and see what interesting information we can discover in terms of game style and shooting performance among NBA players. I focused this analysis on Stephen Curry, James Harden, Lebron James and Russell Westbrook, who are ranked 1-4 in the MVP ballot in 2014-to-2015 season and undoubtedly superstars in the league.
Data Obtaining and Processing
Data cleaning, feature creating and graph processing were performed using R. The package used for generating graphs is ggplot2. The R code for data cleaning and feature creation can be found here.
Figure 1. Shot density plot with respect to shot distance. The graph above demonstrates the distribution of the shot attempts by each player versus shot distance. All four players have a local maximum centered at around 5 feet and 25 feet, corresponding to lay-up region and three-point region. Curry has the shot density leaning towards three-point zone while James shot more shots at the paint zone, indicating different play style between two players. It can also be seen that Westbrook uses two-point jumper frequently, as suggested by the peak at around 17 feet.
Figure 2. Violin plot that summarizes shot accuracy for each player.
The above violin plot summarizes the the shot accuracy for each player throughout the season. Based on the visual inspection of this plot, Curry and James have relatively stable shot accuracy compared to Harden and Westbrook (as suggested by a wider shape).
Figure 3. Boxplot that describes the shot accuracy with respect to match result.
After seeing the summary of shot attempt and shot accuracy, let’s explore how these values behave when other factors are taken into account. Let’s divide the shot accuracy according to the match result. From the plot, Curry, James and Westbrook display a large gap between the won games and the lost games. In contrast, Harden shows a relatively small accuracy gap.
Figure 4. The shot number and shot accuracy with respect to date.
Then let’s look at how the shot number and accuracy change over the season timeline. Westbrook tends to make more shots at the end of the season, during which time Oklahoma City Thunder is fighting for the last playoff position. From the graph on the right, Curry and James have relatively stable shot accuracy throughout the timeline, while the accuracy of Harden and Westbrook seems to have greater variance.
Figure 5. Number of shots with respect to touch time.
Now let’s see the number of shots plotted against touch time. Curry performed more shot at a very short touch time, indicating his catch-and-release shooting style. In contrast, Westbrook tends to have the ball in hand for a few seconds before taking the shot.
Figure 6. Shot accuracy with respect to shot distance.
An interesting phenomenon was observed when plotting shot accuracy against the shot distance. As shown above, the shot accuracy decreases from the lay-up region to around 10 feet. For Curry, James and Westbrook, although value of accuracy differ with each other, they all have a local maximum at around 14 feet. Let’s call this region the comfortable zone. On the other hand, the accuracy peak of Harden extends out of the three-point line, which is different with the others. When the comfortable zone is passed, the accuracy for all players decreases monotonically.
Figure 7. Density plot with respect to shot distance and closest defender distance.
When combining defender distance into figure 1, we get a contour plot that can give us a general feeling about the play style of each player. From the plot on the left, it can be seen that at lay-up region, the contour plot for Westbrook lies below the one for Curry, meaning that Westbrook tends to make more tough lay-ups than Curry. To my surprise, Westbrook is even more aggressive at the rim than Lebron James. Figure 8. Shot number and shot accuracy with respect to opponent and players. From the heat map above, we can view the number of shots and shot accuracy with respect to each opponent. For example, Westbrook made more shots when playing against New Orleans Pelican and Portland Trail Blazers, and Harden had poor accuracy when playing against Boston Celtics. Figure 9. The shot accuracy after made shots. The top graph combines all shots, while the bottom graph takes only three point shots into account. Some people believes that making one shot will affect the accuracy of the next shot. Based on the shot-log, we can actually explore this effect. A set of plots has been generated. For each player, the left most red bar represents the shot accuracy of all shots right after missing one shot. The green, blue, and purple bars represent the shot accuracy after making 1, 2 and 3 consecutive shots. It is interesting to note that, almost for all players under study, having one shot made seems to have a negative effect on the following shot. The more consecutive shots are made, the lower the accuracy of the next shot. When only three-point shots are taken into account, this trend still holds true for Curry and Lebron James.
Takeaways and Future Direction
From these graphs, we can see that four stars have dramatically different play styles. For example, Stephen Curry tends to perform catch and quick release, while Russell Westbrook prefers to attack the rim with ball in hand. In terms of shot accuracy, Stephen Curry and Lebron James have a more stable performance than Harden and Westbrook. Interestingly, in most cases, hitting one shot tends to have a negative effect on the next shot. A deeper exploration is needed for more detail about this phenomenon. For the future direction, focusing on the defender side of the data is a potentially interesting extension. Further more, we could also apply machine learning techniques to predict the probability of hitting a shot.