Untangling 311 Request Data

Andrew Nicklin
3 min readMay 23, 2016

One of the datasets most commonly shared by governments is non-emergency service request data. However, despite there being some clear common threads between jurisdictions, this data is largely a tangled mess. With some simple analysis and a partnership with the Open311 community, we hope we can change that.

Photo courtesy of Brandon Griggs via Unsplash

One of our newest What Works Cities asked us to provide recommendations on publishing their non-emergency service requests (311 requests) as open data. There has been plenty of work in this space, including strong support from Code for America, a research paper from the University of Toronto, and Open311. Thanks to the support of technology platforms like SeeClickFix and FixMyStreet, Open311 has seen adoption internationally. However, Open311 doesn’t recommend a format suitable for bulk download, such as a DataPackage or Comma Separated Values (CSV), which open data platforms commonly support.

Our first step in developing a recommendation was to identify US-based jurisdictions that publish 311 requests. Using tools such as the US City Open Data Census, Socrata’s Open Data Network, and web searches, we identified candidate data sources to be included in our analysis, aiming for a diversity of cities and counties, as well as population. For each of the twenty jurisdictions, we reviewed the data structure of their 311 request data, specifically the column names, data types, and sample data. We used this information to construct a matrix, where each column represents a city and each row represents an element of the data structure.

We populated the cells in the matrix with the column names from each city’s data and arranged cells so they aligned horizontally with columns from other city data which had approximately the same meaning. For example, Boston’s “OPEN_DT” column and Little Rock’s “ticket_created_date_time” column both contain date/time values representing when the 311 request was created, so they were placed in the same row. In some some cities we saw multiple columns which were relevant. For example Kansas City’s “CREATION DATE,” “CREATION MONTH,” and “CREATION YEAR,” were the same datapoint, so we placed them together in the same cell and placed that cell in the same row as well.

311 request data is wildly inconsistent between jurisdictions

Our final step was to take the most common columns and propose standardized names for them. For this, we turned to the Open311 GeoReport v2 specification, specifically the GetServiceRequest method, which is used to retrieve a single record. Where possible, we used the field name defined in the Open311 specification; where that didn’t exist, we followed Open311’s naming conventions. These conventions are consistent with the Python and Ruby programming languages; all lowercase with underscores separating words (e.g., “agency_responsible” and “closed_date”). For each of the recommended columns, we also proposed a data type (“datetime,”text,” or ”number”) and in appropriate cases described the format of the value when publishing in a plain text file format such as CSV.

So what did we learn? First, as you can see from the above image, plenty of cities publish their 311 request data. However, every single one of them publishes this data in their own unique way — from which columns of data they include, to the names of the columns, to the way they present basic information such as dates and times. Second, despite these differences, there are some common threads and these form a foundation on which we can standardize. Third, even if we create some standardization in the short term, it will most likely be schematic and not semantic. This means defining common vocabularies for types of requests is still a long way off, if possible at all.

You can view the full analysis and synthesis and feedback is welcome. We plan to integrate this work into the Open311 ecosystem, helping to provide an even richer framework for accessing and sharing service request data.

This post was originally published on the @gov_ex blog.

--

--

Andrew Nicklin

Technology, Policy, and Data in Government. @johnshopkins