Tag Archives: Data

Gephi for the 1916 Letters

Gephi is a suit for interactive visualisation of network data. It is very often used for topic modelling in the Digital Humanities. As an introduction I suggest just play around with it, a how-do reading would be Gephi for the historically inclined. The best is however to get a few data sets and just try to use Gephi. For examples see the following blogs:

Essentially a challenge is to transform the output you get from Mallet or Gensim into a useful input for Gephi (edges and nodes files). On his blog Elijah goes into detail explaining how he visualized the Mallet output.

I wrote a function in my export/outputter module that converts Mallet output to Gephi edges data and saves it to a file. To view the module feel free to have a look at my project on GitHub.

Storing the Letter Objects: Python’s Shelve

Recently I looked into ways to store my Python Letter objects to a file after they are created. This has two advantages:

  1. Increased Performance, because the 850 objects do not have to be kept in memory
  2. The importer function takes quite a bit of time – 20 sec. If I want to run the whole program several times for testing, it is very annoying to wait each run 20 seconds for the importer (The reason why it takes so long is discussed in another post). My data won’t change (at least not during testing) and therefor it is handy to load it directly from a file instead of running the importer module

I found that Pythons pickle and shelve libraries where useful tools to work with. A good tutorial to shelve can be found in O’Reilly’s book: Programming Python, or on the Module of the Week blog. The shelve module is great because it allows to store objects in a dictionary-like way, where the objects can be fetched by keys.