Enabling Civic Data Standards

Governments share lots of data with the public, and unleash more every day. Open data is improving the lives of hundreds of millions of people, many incrementally and some dramatically. If we want open data to have even greater impact, we need to think strategically about how to organize it for more effective production and consumption at scale.

Recently, we started gathering a list of civic data standards. This is a launching point for using interoperability to broaden the reach of data for public good. The aim is to build a library for suppliers and consumers of government data to learn about existing standards, as well as the skills and resources to create new ones.

Before discussing the this library, let’s define Civic Data Standard. Getting this definition right is important; aside from servicing as an explicit guide for which standards to include in the hub, it will help frame the discussion needed to guide a significant amount of work.

A Civic Data Standard is an open, collaboratively developed set of schematics or semantics that facilitates interoperability between multiple providers and consumers for the public good.

Open might seem obvious for people working closely with public data, but it’s important to call out for three reasons. First, just like the data and interfaces they describe, these standards should be readily available to both publishers and consumers equally, without barriers such as fees or intellectual property restrictions. Second, providers can simplify their own metadata by pointing to a publicly available copy of a standard. Third, open means making ongoing evolution an inclusive process, extremely important to ensuring these standards really serve the public good.

In the most general sense, [open] conveys independence from the threats of arbitrary power and centralized control.” — Andrew L. Russell

This idea is reinforced by collaboratively developed, but eventually this should require at least four organizations — two provider and two consumer — to support the ongoing development of any given standard. It may also be wise to have a fifth, neutral party which helps facilitate discussions, resolve disagreements, answer community questions, and support governance activities.

Schematics establish the technical underpinnings of a civic data standard. They identify the means for communication, for example simple bulk record transfers or complex application programmer interfaces (APIs). They identify the formats through which data is delivered, such as comma-separated values (CSV), extensible markup language (XML), or JavaScript object notation (JSON). Schematics provide structures for the elements of data to be shared, along with clear descriptions of their intended contents.

Although schematics are critical to enable data sharing, semantics define the values that appear within the data and ensure common meanings. They describe words, phrases, or codes within the data, so that providers know how to translate them to or from their own internal vocabulary, and consumers know how to interpret them. Semantics describe what numbers should appear in the data, and how they should be calculated or derived. An example of semantics is the Federal Bureau of Investigation’s Uniform Crime Reporting (UCR) standard, which clearly defines types of crimes. UCR enables state and local law enforcement agencies to provide statistics, regardless of local jargon or variation in legal definitions.

Most civic data standards available today have solid schematic foundations and minimally address semantics. For example, Open311's GeoReport requires a standard way to indicate whether a service request is open or closed, and the Building and Land Development Specification (BLDS) recommends common ‘mapped’ values. Agreeing on common semantics is a big challenge, but offering both schematics and semantics as part of a civic data standard will produce inestimably greater value.

Interoperability is a nod to a recent commentary by Tim Davies, saying that data sharing standards aren’t just about the formats, but about the “tools, identifiers, platforms, policies and collaborations” for solutions. That vision needs to be incorporated into the very hearts of the standards. A library of civic data standards is one step toward that goal.

Finally, it’s important to acknowledge the public good that should be represented in civic data standards. In the short term, governments will be the main providers, but open data isn’t their exclusive domain. In the future, particularly with the growing Internet of Things, civic data will be shared by a variety of organizations, public and private. Waze, a popular navigation app, already has a Connected Citizens Program which provides real-time traffic flow and incident data to cities in exchange for information about road closures, events, and other traffic-related diversions. Both of these resources are worthy of new civic data standards and broad public release.

With so few civic data standards around right now, there is significant opportunity to do extremely valuable work. The White House Police Data Initiative, for example, invites law enforcement agencies around the country to share their data on police/public interactions, but is intentionally waiting on creating standards for that data. As the program evolves, similar data models will emerge organically. Eventually, however, providing guidance on what data to release and how to release it will go a long way to increasing adoption and achieving the program’s goals.

Beyond defining a Civic Data Standard, we are exploring other work aligned to a sustainable reference library — a Civic Data Standards Hub. This work includes identifying and measuring important attributes of each standard, such as how it was developed, how broad its community is, the impact it has on data quality, what kinds of validation tools exist, what governance model(s) are in place, and so on. Knowing these details empowers providers and consumers alike to find them, evaluate them for implementation, and collaborate on adapting and improving them.

Finally, we need to enrich the open data movement with tools and best practices so new civic data standards can be quickly developed and grown to scale. It often takes years for the use of a data standard to reach widespread adoption, and very few, such as the General Transit Feed Specification, are nearly ubiquitous in the United States. There are hundreds of years of history in open standards development, and civic data standards should benefit from that collective wisdom.

We have a lot of work to do. We are not the first to focus in this area, and hopefully won’t be the last. This work has interesting implications beyond the open data community, but more on that later. It’s important to build a sustainable platform to advance this work, and hope to take on that effort. Help us get it started; refine the definition of a Civic Data Standard, let us know about any missing standards, suggest ways to measure the value and effectiveness of standards, or offer funding to accelerate this work.

Technology, Policy, and Data in Government. @johnshopkins

Technology, Policy, and Data in Government. @johnshopkins