.

Boston housing Dataset without the racial profiling field

Like many data scientists, I use the UCI datasets extensively

Specifically, the Boston Housing Dataset is useful especially to teach

For example, I use it in the Data Science for IoT course because its a dataset which people can relate to easily

The attributes are

  1. CRIM per capita crime rate by town
  2. ZN proportion of residential land zoned for lots over 25,000 sq.ft.
  3. INDUS proportion of non-retail business acres per town
  4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  5. NOX nitric oxides concentration (parts per 10 million)
  6. RM average number of rooms per dwelling
  7. AGE proportion of owner-occupied units built prior to 1940
  8. DIS weighted distances to five Boston employment centres
  9. RAD index of accessibility to radial highways
  10. TAX full-value property-tax rate per $10,000
  11. PTRATIO pupil-teacher ratio by town
  12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
  13. LSTAT % lower status of the population
  14. MEDV Median value of owner-occupied homes in $1000's

However, there is a problem with this dataset especially with this attribute

  • B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

Hence, I use a modified version of the dataset which you can find as a CSV HERE It removes the above attribute and it does not make any difference to the dataset

You can then upload into a dataframe using the following code and changing to your directory path

# Read the data from the csv file
Boston = read.csv(“c:\\futuretext\\Boston.csv”)

Views: 2791

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service