The Luddite's Guide to Open Data and Socrata

This is the first in a series of posts explaining the technology, core concepts, and techniques employed by Code for Greensboro and civic hackers the world over. Written by a slightly-technical person for a very non-technical audience.

Header image provided by Descrier.

A Brief Explanation

The world of civic hacking, open data, and the kinds of projects done by Code for Greensboro are exciting and relatively uncharted territory for most people. To someone who's not only new to civic hacking, but programming and working with data in general, a lot of the terminology that gets bandied about at a brigade meeting can seem daunting and even a little arcane.

I, being new to this stuff myself, am writing this series to provide a resource for people like me. People who want to contribute but don't necessarily know where to start. These posts aren't going to be comprehensive or in-depth tutorials, but hopefully they can give you a starting point for learning. With each entry I will write on a piece of technology or a core concept essential to the work done at Code for Greensboro.

What the heck is open data?

Open Data is the lifeblood of almost any kind of project the brigade does. It's also a philosophy about the distribution of information and, more recently, a diffuse movement across science and government at all levels. It is the idea, simply put, that certain kinds of data should be publicly available and "free to use, reuse, and be redistributed by anyone-subject only, at most, to the requirement to attribute and sharealike." (to quote the Open Data Handbook).

For our purposes, this means a dataset of community or civic interest online and available to the public, free of cost and copyright, readable to both machines and humans, and typically stored in a common file format such as .xls or .csv.

The kinds of data Code for Greensboro and Code for America brigades work with typically are provided by local city governments and on occasion federal agencies. Examples of open datasets you might find include park locations, address information for fire hydrants, police records of various calls for service in a community, traffic accidents, boundaries of local school or voting districts, city or state budgets over the years, and just about anything else local governments or community groups might keep records of.

That's not to say all that stuff is going to be available the minute you type it into google. Open data is still a new and perhaps even unheard of idea to many people and government agencies. Data becomes available when those in power and local communities understand what can be done with open data, and begin to advocate for it. Part of what makes Code for America and similar organizations so important isn't just what their individual apps or projects accomplish, but also the demonstration of open data's possibilities and benefits.

Why is open data important?

Simply put, we live in a digital world. Every day you can walk down a street bathed in wi-fi signals transmitting entire libraries worth of information. People have become accustomed to accessing knowledge and information via things like Wikipedia, Yelp, Reddit, and various other mobile apps or websites. As our lives and communities become more enmeshed with online data and communication, everyday citizens expect their governments (local and national) to be responsive, accessible, and transparent. Data that is available online and usable helps communities have a better understanding about their local quality of life, and make accurate and fact-based assessments about local civic issues. Additionally, it gives a chance for community agencies and activists to access, visualize, and spotlight their particular causes more effectively.

Open data doesn't just benefit citizens though; civil servants and government agencies need to communicate and share information with each other too. By posting data publicly, the process of sharing information with other agencies or departments becomes streamlined and eases the constraints of bureaucracy. It also saves time and energy for government employees. Every time a public information request is filed, multiple employees, agencies, and layers of bureaucracy are involved with fulfilling it. With data already available to the community, more time is freed up for civil servants to pursue other projects and provide services to the public.

Finally, there are civic hackers, who use the open datasets as the backbone of programs and apps for civic good. This is what I mean by open data being Code for Greensboro's lifeblood. Without data, we have no foundation to build our projects on. But how is this data used? And where would we go to find it? That brings us to the second part of our guide...

Socrata

Socrata is the company that provides Greensboro's Open Data Portal or the place where you can find all datasets currently available to the public. As of this writing, the portal is still in early beta release. That said, at our recent Civicon event we had multiple teams compete to develop apps based on the ten datasets already in the portal with great results.

Among some of the things created were Violations Near You, an app that allows you to find out nearby code violations based on a geographical location to get a "sense" of the neighborhood. False Alarms GSO is a site that provides information on false fire alarm occurrences in Greensboro and educates the user on the cost of these incidents and steps that can be taken to prevent them. Finally Do It With Greensboro is an app using inspection permit data to research local contractors.

Each of these apps were made possible by browsing the Socrata portal and using the datasets within them. Most datasets will look similar to spreadsheets you would find in Excel or a similar program:

Shown above is some of Greensboro's fire incident data. Each of the rows represents an incident reported to the city fire department, and each column represents an aspect of the incidents such as the date and address.

You may be thinking right now Big deal, I can scroll through a huge table and download it to my computer...How does that help me make sense of the information itself?

There are several ways to answer this question, and depending on what you would like to do with the data you're accessing, Socrata has in-browser tools that can make your job much easier.

First, you may not want all the fire incidents in Greensboro for the past X amount of years. Maybe you just want fires as recent as six months ago and located on a particular street. Provided the information attributes you're looking for are contained within the set, the data can be filtered or sorted to meet your specific criteria.

"Whats so special about that?" you may ask "I can do that by just clicking on a column in Excel or iTunes!"

True, but this is an important component when building apps that use civic datasets. If a website or mobile app needs to continually request and update certain kinds of information, it can't just download the dataset every time. That could run very slowly, and potentially leave a lot of angry users with expensive phone bills! For an app to run efficiently, it needs to be able to request just the information the user needs. Socrata can process and respond to such requests easily and efficiently. These requests are usually handled through an API, a topic for a future post.

One doesn't need to be a developer to make use of Socrata's features though. You may have noticed towards the upper right a button that says "visualize". This provides a set of tools to create nifty charts, calendars, or maps depending on the kind of data you have. This allows you to quickly generate human-friendly visuals for the information you want. Socrata also provides an embed option to easily put visualizations on a website or social media as well.

Best of all, if you have a Socrata login, you can save your session or "view" with a particular dataset. That way you can come back and work with it again later.

As mentioned before, Code for Greensboro's Open Data portal is still in beta phase, but there is still plenty to go in and start working with:

Socrata isn't the only kind of Open Data portal being used. For example, our neighbors at Code for Cary have a portal provided by OpenDataSoft. Open Data and Open Government are emerging fields right now and many companies are providing solutions for hosting civic data.

Want to find out more?

If this has got you excited about the possibilities of open data, portals, and Socrata here are some great resources for in-depth learning:

Getting Started With Your Socrata Site

Why Does My Organization Need Open Data?

Take a look around Greensboro's Open Data Portal!

Take a look at Open Data Portals of Code for America Brigades from around the country!

"Open Data Portals: 9 Solutions and How They Compare"

Questions? Concerns? Ready to dive in? Join us on Slack and come to a hacknight!

Until next time, happy learning!