Read the Beforeitsnews.com story here. Advertise at Before It's News here.
Profile image
By Sunlight Foundation Blog (Reporter)
Contributor profile | More stories
Story Views
Now:
Last hour:
Last 24 hours:
Total:

Illuminating the jargon of criminal justice data with Elasticsearch

% of readers think this story is Fact. Add your two cents.


By The Sunlight Foundation

(Photo credit: Joe in DC/Flickr)

Last year, a team of researchers here at the Sunlight Foundation started putting together a wide-ranging but centralized inventory of criminal justice data. As we’ve been doing this important work, other efforts to address issues with criminal justice data have popped up nationwide. There’s been a call for better data by organizations as high up as the White House, where a Police Data Initiative has been organized. (We are honored to be involved with that effort.)

We’ve collected the location of thousands of datasets and information about those datasets by hand; this includes the category, format, the frequency which data is updated and about 20 other data points to help people find and navigate the data they’re looking for. The next steps are to build a user-facing product, which we are currently working on.

Sunlight has worked with with government data at the federal and state levels for nine years now, so we regularly work to clean up data that is incomplete or fragmented. Our work gathering information about criminal justice datasets across the country has yielded some familiar challenges — as well as some interesting new ones.

As the Web developer working with the criminal justice team, I’ve been trying to learn about the domain of criminal justice information and terminology in particular. One interesting issue that came up as soon as I started looking at the data they are collecting is the variation in the terminology used by researchers, practitioners and journalists. For example, the terms “close management,” “solitary housing unit” and the “shu” all refer to the concept I knew as “solitary confinement,” but various jurisdictions were using different terms in their datasets. How do you help people uncover information across the country when different people use different language to refer to the same things?

Thanks to the thorough work of the criminal justice research team, I have access to a rich (and growing) list of terms and synonyms related to criminal justice data. To make use of this information, I’ve been working with Elasticsearch, an open source search engine with a high degree of customization. In the past, I’ve used Elasticsearch along with Haystack (a tool to connect Elasticsearch with a Django website) to quickly and easily add search functionality to websites. However, for this project I needed to dive into text analysis features that allow you to transform human-readable text into search optimized tokens.

Code defining an Elasticsearch synonym map of criminal justice terms.

With Elasticsearch, you can specify tasks that analyze and transform text when it is put into the search index (a database, essentially) and also when search queries are performed. There are a number of natural language processing tasks built-in to Elasticsearch, such as various “stemmer” filters for matching different forms of words to a root (such as “questioning” to “question”) or filters for removing extremely common “stop” words like “and,” “the” and “is.”

I experimented with a synonym filter to create sets of synonyms so that one word expands to a list of words (a search for “arrests” becomes a search for “arrests,bookings”). I’ve also used the synonym filter to map multiple phrases to a single phrase, or vice versa. By creating a custom synonym filter, I made it possible to search for “close management” or “solitary confinement” and also get results for “shu,” simply by mapping these terms so that “shu” gets stored in the search index along with the terms I’ve determined are related.

So far, this approach has helped me demonstrate to my colleagues one way our criminal justice inventory could be made into a searchable website, as well as given me a path to better understanding the complex nature of criminal justice data. There are many other exciting challenges working with a catalog of criminal justice datasets housed in various state, local, national and university websites: Jargon is only the tip of the iceberg. Nevertheless, it’s exciting to be able to leverage technology to create better ways of understanding the landscape of criminal justice data.

The Sunlight Foundation is a non-profit, nonpartisan organization that uses the power of the Internet to catalyze greater government openness and transparency, and provides new tools and resources for media and citizens, alike.


Source: http://sunlightfoundation.com/blog/2015/06/02/illuminating-the-jargon-of-criminal-justice-data-with-elasticsearch/


Before It’s News® is a community of individuals who report on what’s going on around them, from all around the world.

Anyone can join.
Anyone can contribute.
Anyone can become informed about their world.

"United We Stand" Click Here To Create Your Personal Citizen Journalist Account Today, Be Sure To Invite Your Friends.

Humic & Fulvic Liquid Trace Mineral Complex

HerbAnomic’s Humic and Fulvic Liquid Trace Mineral Complex is a revolutionary New Humic and Fulvic Acid Complex designed to support your body at the cellular level. Our product has been thoroughly tested by an ISO/IEC Certified Lab for toxins and Heavy metals as well as for trace mineral content. We KNOW we have NO lead, arsenic, mercury, aluminum etc. in our Formula. This Humic & Fulvic Liquid Trace Mineral complex has high trace levels of naturally occurring Humic and Fulvic Acids as well as high trace levels of Zinc, Iron, Magnesium, Molybdenum, Potassium and more. There is a wide range of up to 70 trace minerals which occur naturally in our Complex at varying levels. We Choose to list the 8 substances which occur in higher trace levels on our supplement panel. We don’t claim a high number of minerals as other Humic and Fulvic Supplements do and leave you to guess which elements you’ll be getting. Order Your Humic Fulvic for Your Family by Clicking on this Link , or the Banner Below.



Our Formula is an exceptional value compared to other Humic Fulvic Minerals because...


It’s OXYGENATED

It Always Tests at 9.5+ pH

Preservative and Chemical Free

Allergen Free

Comes From a Pure, Unpolluted, Organic Source

Is an Excellent Source for Trace Minerals

Is From Whole, Prehisoric Plant Based Origin Material With Ionic Minerals and Constituents

Highly Conductive/Full of Extra Electrons

Is a Full Spectrum Complex


Our Humic and Fulvic Liquid Trace Mineral Complex has Minerals, Amino Acids, Poly Electrolytes, Phytochemicals, Polyphenols, Bioflavonoids and Trace Vitamins included with the Humic and Fulvic Acid. Our Source material is high in these constituents, where other manufacturers use inferior materials.


Try Our Humic and Fulvic Liquid Trace Mineral Complex today. Order Yours Today by Following This Link.

Report abuse

    Comments

    Your Comments
    Question   Razz  Sad   Evil  Exclaim  Smile  Redface  Biggrin  Surprised  Eek   Confused   Cool  LOL   Mad   Twisted  Rolleyes   Wink  Idea  Arrow  Neutral  Cry   Mr. Green

    MOST RECENT
    Load more ...

    SignUp

    Login

    Newsletter

    Email this story
    Email this story

    If you really want to ban this commenter, please write down the reason:

    If you really want to disable all recommended stories, click on OK button. After that, you will be redirect to your options page.