Illuminating the jargon of criminal justice data with Elasticsearch
Last year, a team of researchers here at the Sunlight Foundation started putting together a wide-ranging but centralized inventory of criminal justice data. As we’ve been doing this important work, other efforts to address issues with criminal justice data have popped up nationwide. There’s been a call for better data by organizations as high up as the White House, where a Police Data Initiative has been organized. (We are honored to be involved with that effort.)
We’ve collected the location of thousands of datasets and information about those datasets by hand; this includes the category, format, the frequency which data is updated and about 20 other data points to help people find and navigate the data they’re looking for. The next steps are to build a user-facing product, which we are currently working on.
Sunlight has worked with with government data at the federal and state levels for nine years now, so we regularly work to clean up data that is incomplete or fragmented. Our work gathering information about criminal justice datasets across the country has yielded some familiar challenges — as well as some interesting new ones.
As the Web developer working with the criminal justice team, I’ve been trying to learn about the domain of criminal justice information and terminology in particular. One interesting issue that came up as soon as I started looking at the data they are collecting is the variation in the terminology used by researchers, practitioners and journalists. For example, the terms “close management,” “solitary housing unit” and the “shu” all refer to the concept I knew as “solitary confinement,” but various jurisdictions were using different terms in their datasets. How do you help people uncover information across the country when different people use different language to refer to the same things?
Thanks to the thorough work of the criminal justice research team, I have access to a rich (and growing) list of terms and synonyms related to criminal justice data. To make use of this information, I’ve been working with Elasticsearch, an open source search engine with a high degree of customization. In the past, I’ve used Elasticsearch along with Haystack (a tool to connect Elasticsearch with a Django website) to quickly and easily add search functionality to websites. However, for this project I needed to dive into text analysis features that allow you to transform human-readable text into search optimized tokens.
With Elasticsearch, you can specify tasks that analyze and transform text when it is put into the search index (a database, essentially) and also when search queries are performed. There are a number of natural language processing tasks built-in to Elasticsearch, such as various “stemmer” filters for matching different forms of words to a root (such as “questioning” to “question”) or filters for removing extremely common “stop” words like “and,” “the” and “is.”
I experimented with a synonym filter to create sets of synonyms so that one word expands to a list of words (a search for “arrests” becomes a search for “arrests,bookings”). I’ve also used the synonym filter to map multiple phrases to a single phrase, or vice versa. By creating a custom synonym filter, I made it possible to search for “close management” or “solitary confinement” and also get results for “shu,” simply by mapping these terms so that “shu” gets stored in the search index along with the terms I’ve determined are related.
So far, this approach has helped me demonstrate to my colleagues one way our criminal justice inventory could be made into a searchable website, as well as given me a path to better understanding the complex nature of criminal justice data. There are many other exciting challenges working with a catalog of criminal justice datasets housed in various state, local, national and university websites: Jargon is only the tip of the iceberg. Nevertheless, it’s exciting to be able to leverage technology to create better ways of understanding the landscape of criminal justice data.
The Sunlight Foundation is a non-profit, nonpartisan organization that uses the power of the Internet to catalyze greater government openness and transparency, and provides new tools and resources for media and citizens, alike.
Source: http://sunlightfoundation.com/blog/2015/06/02/illuminating-the-jargon-of-criminal-justice-data-with-elasticsearch/
Anyone can join.
Anyone can contribute.
Anyone can become informed about their world.
"United We Stand" Click Here To Create Your Personal Citizen Journalist Account Today, Be Sure To Invite Your Friends.
Before It’s News® is a community of individuals who report on what’s going on around them, from all around the world. Anyone can join. Anyone can contribute. Anyone can become informed about their world. "United We Stand" Click Here To Create Your Personal Citizen Journalist Account Today, Be Sure To Invite Your Friends.
LION'S MANE PRODUCT
Try Our Lion’s Mane WHOLE MIND Nootropic Blend 60 Capsules
Mushrooms are having a moment. One fabulous fungus in particular, lion’s mane, may help improve memory, depression and anxiety symptoms. They are also an excellent source of nutrients that show promise as a therapy for dementia, and other neurodegenerative diseases. If you’re living with anxiety or depression, you may be curious about all the therapy options out there — including the natural ones.Our Lion’s Mane WHOLE MIND Nootropic Blend has been formulated to utilize the potency of Lion’s mane but also include the benefits of four other Highly Beneficial Mushrooms. Synergistically, they work together to Build your health through improving cognitive function and immunity regardless of your age. Our Nootropic not only improves your Cognitive Function and Activates your Immune System, but it benefits growth of Essential Gut Flora, further enhancing your Vitality.
Our Formula includes: Lion’s Mane Mushrooms which Increase Brain Power through nerve growth, lessen anxiety, reduce depression, and improve concentration. Its an excellent adaptogen, promotes sleep and improves immunity. Shiitake Mushrooms which Fight cancer cells and infectious disease, boost the immune system, promotes brain function, and serves as a source of B vitamins. Maitake Mushrooms which regulate blood sugar levels of diabetics, reduce hypertension and boosts the immune system. Reishi Mushrooms which Fight inflammation, liver disease, fatigue, tumor growth and cancer. They Improve skin disorders and soothes digestive problems, stomach ulcers and leaky gut syndrome. Chaga Mushrooms which have anti-aging effects, boost immune function, improve stamina and athletic performance, even act as a natural aphrodisiac, fighting diabetes and improving liver function. Try Our Lion’s Mane WHOLE MIND Nootropic Blend 60 Capsules Today. Be 100% Satisfied or Receive a Full Money Back Guarantee. Order Yours Today by Following This Link.
