Subscribe to DSC Newsletter

Student Loans: a Subprime Time-bomb for the US Government?

Contributed by Spencer James Stebbins. He takes the NYC Data Science Academy 12 week full time Data Science Bootcamp program from July 5th to September 22nd, 2016. This post is based on their first class project – the Exploratory Data Analysis Visualization Project, due on the 2nd week of the program. You can find the original article here.

Are Student Loans a Subprime Time-bomb for the US Government?

There is overwhelming concern among politicians, professionals, and students that the current student loan market may be the next soaring hot air ballon primed to run out of gas and collapse. For students facing rising tuition costs, increased competition among peers, and a still uncertain job market, they are concerned about whether they will have the ability to pay off the increasing amount of debt they have taken on in order to pursue an education believed necessary to stay competitive. On the other hand, the US government in the wake of the 2008 recession needed to get people back to work or at least back to school and increasing student loan offerings afforded a way to accomplish this goal. Although the current student loan market is only 1/10 the size the mortgage loan market, many articles have illuminated similar trends between the two claiming that student loans may be the next mortgage crisis, but are these headlines valid? In order to attempt to answer this question, lets first investigate some existing student loan trends...

Increasing total outstanding student loan debt

Many readers may be aware that there is an outstanding student loan debt problem in the United States. As of 2015, there was 1.3 trillion dollars in outstanding student loan debt and this number has increased steadily over the past 10 years.

Source: QUANDL:FRED-MDOAH - St.Louis Federal Reserve Bank

#load data
student_loans_outstanding_df <- read.csv("")
#change column name
student_loans_outstanding_df <- transmute(student_loans_outstanding_df, Date=DATE, student_loans_outstanding=VALUE)
#convert to dollars
student_loans_outstanding_df$student_loans_outstanding = student_loans_outstanding_df$student_loans_outstanding * 100000
#convert to Date
student_loans_outstanding_df$Date = as.Date(student_loans_outstanding_df$Date)
ggplot(student_loans_outstanding_df, aes(x=Date,y=student_loans_outstanding)) +geom_line( color='red') + ggtitle("Outstanding Student Loan Debt") + scale_x_date( labels = date_format("%Y")) + scale_y_continuous( labels = comma) + theme_fivethirtyeight() + theme(legend.title=element_blank()) + theme(axis.title = element_text(), axis.title.x = element_blank()) + ylab('Dollars')

More students take on more debt

Over the same period, the average debt of graduates has increased to almost $27,000 per graduate and the percentage of graduates with student loan debt has increased to nearly 60%.

Source: TICAS - The Institute for College Access & Success

Source: QUANDL:FRED-MDOAH - New York Federal Reserve Bank

state_college_data <- read.csv("", stringsAsFactors = FALSE)
#parse state names
state_college_data$Name = gsub(' - 4-year or above', '', state_college_data$Name)
#rename Year column to Date and Name to State
state_college_data <- rename(state_college_data, Date=Year, State=Name)
#convert date to yearformat
state_college_data$Date = substr(state_college_data$Date,0, nchar(state_college_data$Date) -3)

#convert all NA in order to summarize on year
NAs <- state_data == "N/A"
state_data[NAs] <- NA
#convert appropriate to numerics
state_data[,3:11] <- as.numeric(unlist(state_data[,3:11]))

state_data <- select(state_data, -State)

#group by on year and summarize means
data <- state_data %>% group_by(Date) %>% summarise_each(funs(mean(., na.rm = TRUE)))

##PLOT 2 - Average Student Loan Debt
#select columns from main dataframe
average_debt <- select(data,Date, Average.debt.of.graduates)
#melt to single column
average_debt<- melt(average_debt,id = 'Date','')
ggplot(average_debt,aes(x=Date,,group=variable,color=variable)) + geom_line() + ggtitle('Average Student Loan Debt') + theme_fivethirtyeight()+ theme(legend.title=element_blank()) + theme(axis.title = element_text(), axis.title.x = element_blank()) + ylab('Dollars') +theme(legend.position="none")

##PLOT 3 - Percent of Student with Debt
#select columns from main dataframe
percent_student_with_debt <- select(data,Date, Percent.of.graduates.with.debt)
melt to single column
percent_student_with_debt <- melt(percent_student_with_debt ,id = 'Date','Percent')
#convert percent to readable integer
percent_student_with_debt$Percent <- as.numeric(percent_student_with_debt$Percent) * 100
ggplot(percent_student_with_debt , aes(x=Date,y=Percent,group=variable,color=variable)) + geom_line() + ggtitle('Percent of Students with Debt')+ theme_fivethirtyeight()+ theme(legend.title=element_blank()) + theme(axis.title = element_text(), axis.title.x = element_blank()) + ylab('Percent') + theme(legend.position="none")

Most student loans are federal loans 90% of student loan debt in 2015 was comprised of federal loan borrowing.

Source: TICAS - The Institute for College Access & Success

#select data from main dataframe
percent_federal_debt <- select(data,Date,
#melt to single column
percent_federal_debt <- melt(percent_federal_debt ,id = 'Date','Percent')
ggplot(percent_federal_debt , aes(x=Date,y=Percent,group=variable,color=variable)) + geom_line() + ggtitle('Federal Borrowing as Percent of Total Debt') + theme_fivethirtyeight() + theme(legend.title=element_blank()) + theme(axis.title = element_text(), axis.title.x = element_blank()) + ylab('Percent')+ theme(legend.position="none")

Student loans make up a large, increasing portion of the US Govt.'s assets

Student loans comprise nearly 27% of the US federal government's total assets as reported in the 2015 US Treasury's financial statements. ($845.1B of the $3,229.8B). This is a 3.5% increase since 2014 ($731.2B of $3,065.3B). Although this pales in comparison to the over 21 trillion dollar US deficit, student loans are becoming an ever growing part of the US governments' assets since 2009/10.

Source: TICAS - The Institute for College Access & Success

#manually input data, student loan asset, and total federal asset data
student_loan_asset <- c(84.5,92.1,97.7,101.0,108.0,124.4,157.8,231.3,356.1,495.5,613.9,731.2,845.1)
total_federal_asset <- c(1405.4,1397.3,1447.9,1496.5,1581.1,1974.7,2667.9,2883.8,2707.3,2748.3,2968.3,3065.3,3229.8)
date <- c('2003','2004','2005','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015')
#create dataframe
student_loan_federal_asset_data <- data.frame(Date=date,student_loan_asset=student_loan_asset, total_federal_asset=total_federal_asset)
#create percent column
student_loan_federal_asset_data$Percent= student_loan_asset/total_federal_asset
#remove other columns
student_loan_as_percentage_federal_assets <- select(student_loan_federal_asset_data, Percent, Date)
#convert percent to integers
student_loan_as_percentage_federal_assets$Percent <- as.numeric(student_loan_as_percentage_federal_assets$Percent) * 100
#convert data
student_loan_as_percentage_federal_assets$Date =as.Date(student_loan_as_percentage_federal_assets$Date)
ggplot(student_loan_as_percentage_federal_assets , aes(x=Date,y=Percent,group='',color='red')) + geom_line() + ggtitle('Student Loans as Percent of Federal Assets') + theme_fivethirtyeight() + theme(legend.title=element_blank()) + theme(axis.title = element_text(), axis.title.x = element_blank()) + ylab('Percent')+ theme(legend.position="none")

What changed in 2009/10 to prompt increasing student loans?

The Health Care and Education Reconciliation Act of 2010 was signed into law by President Barack Obama on March 30, 2010. Many people have heard of "Obamacare", which is the nickname this program took on, but what they may not know is that in addition to the major changes implemented in the Healthcare sector, this act also implemented some major student loan reforms as well.

Here are some of the changes that were made by President Obama:

  1. The federal government will no longer give subsidies to private lending institutions for federally backed loans.
  2. Borrowers of new loans starting in 2014 will qualify to make payments based on 10% of their discretionary income.
  3. New borrowers would also be eligible for student loan forgiveness after 20 years instead of 25 on qualifying payments.
  4. Money will be used to fund poor and minority students and increase college funding.

It is highly probable that the increase in student loan borrowing was promoted by the 2008 recession and the ensuing high unemployment rate which prompted many to return to school, but the changes implemented by the Health Care and Education Reconciliation Act of 2010 allowed many people the means to return to school and consequently the government now owns 90% of all student loans.

More unemployed & underemployed graduates

Although unemployment rates have been stabilizing since the recession in 2008, an increasing number of graduates are still unemployed or underemployed. Underemployment includes three groups of people: unemployed workers who are actively looking for work; involuntarily part-time workers who want full-time work but have had to settle for part-time hours; and so-called marginally-attached workers who want and are available to work, but have given up actively looking. Additionally, the percentage of employed college graduates under age of 27 who were working in a job that did not require a college degree increased from 38 to 46 percent from 2007 to 2014 (Abel and Deitz 2014).

Data is for college graduates age 21–24 who are not enrolled in further schooling.
Source: 2015 EPI analysis of basic monthly Current Population Survey microdata
#load data from csv
grad_unemployment_data <- read.delim("")
#convert date
grad_unemployment_data$Date = as.Date(grad_unemployment_data$Date)
#filter for after 2003
grad_unemployment_data <- filter(grad_unemployment_data, Date >= "2003-01-01")
#melt to single column
grad_unemployment_data <- melt(grad_unemployment_data, id = 'Date',measure.vars = names(select(grad_unemployment_data,-Date)))
ggplot(grad_unemployment_data, aes(x=Date,y=value, group=variable, color=variable)) + geom_line() + ggtitle('Graduate Unemployment & Underemployment') + theme_fivethirtyeight() + theme(legend.title=element_blank()) + theme(axis.title = element_text(), axis.title.x = element_blank()) + ylab('Percent')

Graduate earnings are flat In addition to the large number of graduates who are unemployed or underemployed, initial young graduate earnings are not increasing along with increasing debt they are taking on. On average, young college graduates have an hourly wage of $17.94, which translates to an annual salary of roughly $37,300 for a full-time, full-year worker. This is a decline of 2.5 percent from what a typical college graduate would have made in 2000 ($38,300).

* Data reflects 12-month moving averages; data for 2015 represent 12-month average from April 2014 to March 2015. Note: Data are for college graduates age 21–24 who do not have an advanced degree and are not enrolled in further schooling, and high school graduates age 17–20 who are not enrolled in further schooling. Shaded areas denote recessions. Source: EPI analysis of Current Population Survey Outgoing Rotation Group microdata

#load data
earnings_data <- read.delim("")
#convert date
earnings_data$Date = as.Date(earnings_data$Date)
#filter data for after 2003
earnings_data <- filter(earnings_data, Date >= "2003-01-01")
#rename columns
earnings_data <- rename(earnings_data, college_grads=All, high_school_grads=All.1)
#melt data to single column
earnings_data <- melt(earnings_data, id = 'Date',measure.vars = names(select(earnings_data, -Men,-Women,-Men.1,-Women.1,-Date)))
ggplot(earnings_data, aes(x=Date,y=value, group=variable, color=variable)) + geom_line() + ggtitle("Graduate Earnings") + theme_fivethirtyeight() + theme(legend.title=element_blank()) + theme(axis.title = element_text(), axis.title.x = element_blank()) + ylab('Hourly Wage')

Rising cost of attending college Between 2003 and 2013, the average cost of attending college increased over 30%. This rising cost of college combined with the failure of wages to grow for young college graduates signals that a college education is becoming a more uncertain investment. Universities have taken advantage of the influx of students and their accessibility to loans and have increased tuition at a rate that far outweighs unemployment recovery and graduate earnings, signaling that college is becoming an increasingly difficult investment.

Source: TICAS - The Institute for College Access & Success

#select data from main data frame
increasing_college_costs <- select(data,Date,Total.cost.of.attendance..on.campus.)
#melt to single column
increasing_college_costs <- melt(increasing_college_costs,id = 'Date','')
ggplot(increasing_college_costs, aes(x=Date,,group=variable,color=variable)) + geom_line() + ggtitle('Average College Costs') + theme_fivethirtyeight() + theme(legend.title=element_blank()) + theme(axis.title = element_text(), axis.title.x = element_blank()) + ylab('Dollars')+theme(legend.position="none")
High student loan default rates Student loan default rates have decreased over the past few years, but have remained fairly high as the number of loans granted and outstanding total debt have increased dramatically. Although decreasing student loan defaults should be a good indicator that graduates are able to pay their debts, is this really enough to mitigate all the other trends that support the claim that a student loan bubble may exist?
Source: CollegeBoard Trends - U.S. Department of Treasury calculations based on sample data from the National Student Loan Data System - See more at:
#load data
default_rate_data <- read.csv("", stringsAsFactors = FALSE)
#melt to single column
default_rate_data <- melt(default_rate_data, id = 'Date',measure.vars = names(select(default_rate_data,-Date)))
ggplot(default_rate_data, aes(x=Date,y=value, group=variable, color=variable)) + geom_line() + ggtitle('Student Loan Default Rates') + theme_fivethirtyeight() + theme(legend.title=element_blank()) + theme(axis.title = element_text(), axis.title.x = element_blank()) + ylab('Percent')


Although student default rates seem to be decreasing in recent years, there are an overwhelming number of factors that support the claim that student loans may be a subprime bubble. Lets restate these trends:
  1. Outstanding student loan debt has linearly increased over the past 10 years and is in excess of 1.3 trillion dollars.
  2. The average debt of students is linearly increased over the past 10 years and exceeds $27,000 per student.
  3. Over 60% of students have student loan debt and 90% of that debt is in federal loans.
  4. Since 2009, student loans as a percentage of the US government's assets has increased from 3% to almost 30%
  5. 7% of young graduates are unemployed, 15% of remaining graduates are underemployed, and 46% are employed in jobs that do not require a college degree.
  6. The average cost of college has increased over 30% between 2003 and 2013.
  7. Student loan default rates are still high at ~12.5% on average.

Furthermore, the U.S. Govt treats these federal loans as if they are triple AAA rated, increasing their portion of total US assets to nearly 30% in 2015, yet they require no credit check on borrowers. Yes, the statistics show that college graduates earn more compared to those who do not graduate college, but the only way to access this college premium is by completing a college degree, but of the 66 percent of young adults who began college, 37.5 haven’t completed their degree by age 27 (BLS 2014) Additionally, the percent who graduate has not increased alongside increasing tuition costs and consequently the average amount of federal debt each student takes on and this trend only seems to be continuing. Finally, the Health Care and Education Reconciliation Act of 2010's income based payment plans have made it easy for students to pay back the bare minimum of this loans and these loans are entitled to be forgiven after 20 years.

So are student loans a subprime bubble? Besides a sudden, minimal decrease in default rates, the rest of these trends do not bode very well for the ever increasing asset line of the U.S. Govt, but I'll let you be the judge...

Next Steps

  • Correlate subprime mortgage crisis and current student loan trends
  • Find and further analyze loan repayment data
  • Compare default rates between states and if they match that of the mortgage crisis
  • Compare mortgage backed securities and student loan asset backed securities data
  • Show how both Hillary Clinton or Donald Trump's student loan stances may effect these trends
  • Predict likelihood of default rates based on loan composition


View the full project on Github: HERE

Written in R, using R studio.

Packages used:

  • dplyr
  • ggplot2
  • ggthemes
  • reshape2

Views: 1778


You need to be a member of Data Science Central to add comments!

Join Data Science Central


  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service