I am building out a data set to use in R and a xg-boost program I have developed. I want to ask about several data strategies and this issues involved. The data involved includes such fields as: term_gpa, total_gpa, total-failures, program of study, dev_ed_course_required_yes_no, assessments test scores like ACT, SAT and PERT, race and many others as well as and demographic data (income, parents income, county, zip code, etc.).
The strategy is to take fall term data from previous terms (2013 to 2016) to build and test a model, then use it to predict on fall 2017 (or the latest term completed) to determine whether the student will return or go missing for the next year.
Any help with strategies, issue to consider would be enlightening.
william, Data and Policy Analyst, SF College