LE-CAT is a Lexicon-based Categorization and Analysis Tool developed by the Centre for Interdisciplinary Methodologies in collaboration with the Media of Cooperation Group at the University of Siegen.

The tool allows you to apply a set of word queries associated with a category (a lexicon) to a data set of textual sources (the corpus). LE-CAT determines the frequency of occurrence for each query and category in the corpus, as well as the relations between categories (co-occurrence) by source.

The purpose of this technique is to automate and scale up user-led data analysis as it allows the application of a custom-built Lexicon to large data sets. The quick iteration of analysis allows the user to refine a corpus and deeply analyse a given phenomenon.

LE-CAT was coded by James Tripp. It has been used to support the workshops Youtube as Test Society (University of Siegen), Parking on Twitter (University of Warwick) and the Digital Test of the News (University of Warwick) and is part of the CIM module Digital Object, Digital Methods.

Academic correspondence should be sent to Noortje Marres.


You can install the released version of lecat from Github by running, in R, the following line of code:


Web based interface

LE-CAT has a web interface which can be started by running


which starts a new shiny app.

Bugs or feature requests

Please enter any bugs or feature requests via github.

Dr James Tripp, Senior Academic Technologist, CIM