Open data is already a part of the everyday lives of most people in the United States.
- Public health: The COVID pandemic dominates world news, and because of open public health data, researchers and reporters alike can analyze and disseminate the latest statistics and trends.
- Weather: In the 1970s, the National Oceanic and Atmospheric Administration (NOAA) began releasing weather data, which informed local weather reports on the nightly news. NOAA’s open data now powers numerous weather apps.
- Global positioning systems (GPS): Long before GPS data provided directions to countless drivers, it was a military project, closely guarded by the United States government.
- US Census: Combing through handwritten census records when you’re researching one of your ancestors for a family history is not terribly efficient, but it’s often manageable. However, if you’re trying to understand recent demographic trends, reviewing individual responses to the 10-year census would be almost impenetrable, at least if you wanted to work quickly. Thanks to the openness of US Census data, analyses of all kinds can be undertaken.
For end users, it’s fairly straightforward to deal with open data when 100 percent of that data comes from a single source, as with NOAA and the US Census Bureau. Good data management practices internal to each agency ensure that the same type of information is available, and in the same format, on a consistent basis.
But what happens when users want to aggregate and analyze data that has been collected and stored by different agencies? The possibilities for variation abound, and if variations across data sets are too numerous, collective analysis may not be possible.
A solution to potential variation is the use of data standards, which are a set of shared expectations for communication between systems, much like the rules of a common language between people. Most of the time, computers exchange data in an established structure. A data standard defines the overall structure a data producer must adhere to, along with the types and formats of the data elements that fill that structure, with the goal that the data can be reliably interpreted by a data consumer.
When a data standard gets established and multiple data producers use that standard to structure their open data, new possibilities open up for anyone who wants to use that data. Public transportation in the US began its first significant foray into producing standardized open data in 2005 with the creation of what is now called the General Transit Feed Specification. The result has been a significant transformation in how riders engage transit, as well as how transit services and planned and operated. More about that in the next installment of our series, so stay tuned.