When the Letters of 1916 corpus is clustered to the 16 topics generated with Gensim and Mallet it seems that 16 topics might be too much. In one of my last posts I have shown visualisations created with Gephi, and I colored the letter nodes based on the categories that was assigned by the person that uploaded the letter. Only letters assigned to four or five of these categories actually clustered together. So after I talked with my internship supervisor Dermot it was decided that I try to reduce the number of topics to see what happens, and I would create visualisations for 4, 8, 12 generated topics. I could observer that that with 4, 8, and 12 topics the clustering was still the same as with 16 topics. However, lesser topics shows that many letters from generic categories such as 1916 Rising, or The Irish Question cluster with one of the four distinct topics.
4 topics Mallet:
4 topics Gensim: