Crime Datasets for the Heights, Houston, Texas

This is where you may access the basic data that has gone into my analysis.

Datasets

Much effort was spent cleaning up the data, which suffered from multiple issues. After cleanup, the data, where possible, was geocoded so that GIS and mapping operations may be performed.

Image

Beat 2A30 of the Houston Police Department

The Beat 2A30 data is found in tables by month starting in January 2010. The format of the tables and the form of the content has undergone several changes. I have tried to take these into account.


Image

Population Growth

library("ggplot2")
Year = c(2011, 2012, 2013, 2014, 2015)
Pop  = c(29776, 29424, 30807, 31418, 31863)
census = data.frame(Year, Pop)
ggplot(census, aes(x=Year, y=Pop)) +
	geom_point() +
	geom_smooth(method="lm", aes(output=fit<<-..y..)) +
	labs(title="Population for 77008")
lm(formula=Pop~Year, data=census)

Coefficients:
(Intercept)         Year  
 -1210960.8        616.8  

As can be seen, the population in the area grew by about 2% per year, data from the US census.

Image

Unemployment data for Harris County

The Unemployment data from the Federal Reserve is also a factor that may be used to adjust the data


Image

Cleaned datasets and code

  • Zipped CSV file of cleaned up Beat data
  • R dataset of cleaned up Beat data
  • Premise translation (R dataset) table
  • Geocoding table of street addresses and lat-longs
  • Code for reading Premise table
  • Code for reading the monthly crime tables


About the author

Alan Jackson is a retired geophysicist, with time to learn some new skills and do stuff that may benefit the neighborhood.