Mining Dutch History: Researching Public Debate in the Nineteenth Century

by José de Kruif

What did a nineteenth-century media hype actually look like? Since the founding of the printing press in the fifteenth century, the pamphlet served as the primary mass medium for political debate and local news in Western Europe. In the Netherlands pamphlets functioned this way until well into the 19th century. Up to 1869, newspapers were heavily taxed, which made them unaffordable to most citizens and not very voluminous. Near the end of the nineteenth century, when the tax on newspapers was abolished, newspapers took over the role of main news service, but until then pamphlets and newspapers functioned both as media for the distribution of news and opinions on the news. They were therefore the main vehicle for public debate. When something unusual occurred in the public sphere, the production of pamphlets rose sharply. This production pattern is typical of the early modern period.1

Figure 1: Pamphet production.

The year 1853 will be used here as the case at hand. During the Dutch Revolt that started in the 16th century, Dutch Roman Catholics had been victims of discrimination. The peace treaty of 1648 that ended the Dutch Revolt did not end the plight of Dutch Roman Catholics who continued to be treated as second-class citizens. Convents were closed and their possessions were nationalised. Priests were expelled and many Catholics emigrated. In the ages that followed they were, except for a short period, a repressed minority. Consequently, since 1648 Rome had considered the Netherlands to be a Catholic missionary area and thus the priests working in the Netherlands worked under direct papal supervision. For centuries, there had not been a regular Catholic Church administration. However, in 1848, the government of the liberal Thorbecke introduced a new constitution, a constitution in which the freedom of the churches to arrange their own administration had been definitively laid down.

In 1853, Pope Pius IX decided to take the opportunity to re-introduce the episcopal hierarchy in the Netherlands. In an official letter, he announced his intention which was translated and published in both English and Dutch newspapers.2 When the pope’s intentions thus became known, a storm of indignation broke loose. It started with a petition of the Utrecht Protestant congregation to the king, to prevent this re-instalment of dioceses. Soon, Protestant churches in other towns followed suit with similar petitions, often signed by thousands of the faithful. It was the start of a nineteenth-century equivalent of a media hype, which would last only some seven weeks but which resulted in hundreds of pamphlets and newspaper and magazine articles about the controversy. This press production constitutes the research material of the case we will discuss. Most historians emphasise that the 1853 pamphlets were aimed at instilling unjustified fears, for instance, through repeated references to past Catholic prosecution of heretics, and that all the commotion was unfounded. A lot of attention is paid to the danger to public order that the pamphlet texts and radical magazines supposedly constituted.

Figure 2: Example pamphlet.

Pamphlets and newspapers would contain fragments such as the following:

“Tolerance will be our Waterloo. We will lose our freedom whilst devoting ourselves to the freedom of others. We will only recognise the fruits of our ignorance when the inquisition judges on our free soil and the scaffolds will be the fate of ourselves and our children.”

“Every means, however nasty, malicious or blasphemous can be used: inciting civil war, revolution, inquisition, burning at the stake, poison, murdering the king …are all weapons in the hands of the Jesuits.”

Actually, due to the existence of emotional phrases like these, both contemporaries and later historians consider the complete stream of publications on this matter with value judgments (dangerous, extravagant, senseless expressions of anti-papism) or they chiefly look at the effects like the fall of the liberal Thorbecke cabinet, which, by the way, was followed by the eventual institution of the bishops without any further consequences.3

The question of what the precise contents of the complete stream of publications were was never posed. Only a very limited part of the available corpus is used or quoted, therefore, it is unknown which medium made which contribution to the total number of publications about the matter. It also remains unclear how radical the tone of the pamphlets and newspaper articles that were not consulted is. It is hard to determine to what extent these choices are founded, because we do not know enough of the contents of the rest of the corpus.

This lack of overview comes as no surprise, since we have here a huge heap of documents. Reading all that was published would be extremely time-consuming. Systematic work and a coding of the texts for classical content analysis as employed by present day researchers on press content would mean even more work. Of course, the developments in the field of digitalising and opening up of material facilitate tackling this problem. The subsequent text mining possibilities offer the opportunity to partly automate the systematic investigation of their contents. Can text mining software be employed for initial stock-taking of the contents of a flood of publications as the one of 1853? And can this tool can be helpful in determining the precise nature of sources that have not been quoted since 1853?

To answer these questions, all pamphlets and a large number of newspaper articles about the matter were scanned and converted into Word documents via OCR. Meta data, such as the name of the author, publisher, the medium of publication (newspaper or pamphlet), type of text (poem, song, sermon, request, letter to the editor) are stored in a simple Access database. Standard historical studies about the hype were screened on the mention of pamphlets or newspaper articles in the annotation of the text indicating that they were used as sources. This resulted in a score of 24%, which means that nearly a quarter of the documents researched here were referred to by one or more historical studies. A flag field “cited or not” was created to enable further exploration of the question of to what extent this 24% actually represents the corpus as a total. We will look at this variable again later on.

Subsequently, the contents of the texts were analysed by means of SPSS’s PASW modeller. There is not enough time right now to explain in detail which techniques were applied, but the tool works roughly as follows: all texts are screened on vocabulary, after which the machine draws up a list of words occurring in all texts. The software uses compiled resources such as a general dictionary containing a list of base forms. The resources also contain part of speech codes and built-in types. The extractor for instance tries to recognise if a word represents a location, a person, a product, a currency etc.4 Extraction initially results in a straight forward list as shown in Figure 3.

Figure 3: Extracted results.

The list contains concepts and you can see the software already identifies dates and locations. Categories like “Historical incident”, “1853 actor”, “constitutional arguments” and the like were user-made by me through interactive workbench sessions. Extraction is an interactive process which means these results can be fine-tuned. I can create a synonym definition for concepts I consider to be synonymous but which appear as individual concepts in the extracted results. I can thus enhance the dictionary with domain knowledge about the subject of the hype. For instance many historians think anti-papism was already on the rise since the eighteenth century and as a consequence there had been unfavourable publications about the Jesuits since. This pattern, they suppose, continued in 1853. I can check if the Jesuits appear frequently in my sources by telling the machine which terms are used to indicate the Jesuit order, as demonstrated in Figure 4.

Figure 4: Synonyms for Jesuits

Another useful concept to be explained is ‘Vaderland’ meaning Fatherland. Usage of the term indicates thoughts about the nature of the nation as a whole considering how relations between church and state should be organised.

Figure 5: Refining extraction results.

A necessary feature would be to provide a stop list of insignificant terms to be excluded from extraction. For instance Part of Speech words – words that occur frequently but add no further knowledge, also called “noise”.

When the libraries to be used for extraction are thus sufficiently updated, a model for further text extraction can be built. In this case categories that might be important for judging the supposedly radical nature of the public debate on the Bishop Controversy are scored. A search can be undertaken for the mention of specific matters, persons and/or events.

I have investigated which of the documents refer to events in the past, such as the earlier struggle of the Dutch against the Catholic Spanish king Philip II, the constitution, to the fallen government, etcetera. The scoring of categories like this enables a further preliminary general survey of the stream. This tells us for instance that many publications name actors they supposed to be involved in the conflict.

The hype appears to be quite personalised. Not counting King William III, who figures in 36% of the texts, the liberal prime minister and designer of the new constitution of 1848 (whose cabinet would fall over this crisis), Thorbecke is the person mentioned most often. On the other hand: historians have made much of the position of the archbishop-to-be Zwijsen. It was considered an extra insult that he would become archbishop in Utrecht, the very city where the Republic of the Netherlands originated. However, Zwijsen appears to be mentioned in only 4% of the documents. The context in which his name is mentioned can be viewed by displaying a list of documents in which he figures.

Besides extracting just the names, text link analysis enables the survey of opinions on these persons. The technique is based on pattern rules which describe how favourable or unfavourable opinions are probably expressed. It will come as no surprise that judgements about the pope are not altogether favourable, and the same goes for the liberal ministers (Figure 6).

Figure 6: Opinions on the Pope

Text link analysis can reveal opinions on people or concepts. It is also possible of course to visualise how certain concepts are associated in general. One application which is extremely useful in cases like this is the possibility to research the documents further by means of cluster analysis. Most researchers will be aware of the basics of this technique. For those who are not: Cluster analysis is a tool with which the classification of individual research units within a certain population can be prepared. The technique is widely applied (in a broad array of different fields of study, on collections such as foodstuffs, biotopes, social communities, disease symptoms and even DNA patterns to name just a few) in order to determine groups on the basis of corresponding characteristics. The advantage is that no classification needs to be made beforehand. The analysis is performed solely on the basis of the characteristics of the research object itself. The researcher does not have to have an a priori classification in mind.5

In the case of the pamphlets and newspapers, the texts can be classified solely on the basis of the combinations of terms occurring in them. Subsequently, the cluster analysis of pamphlets and newspapers supplies an arrangement of the documents on the basis of their contents. I clustered and allowed for anomalies. The analysis makes a heap of documents a lot more manageable. An additional advantage is that the researcher’s possible preconceptions can be eliminated, while the themes in the collection can still be found and delineated. An initial exploration was performed in this way using references to the following as categories: historical conflicts with Roman Catholics, the danger of civil disorders, and the new constitution.

Figure 7: Cluster analysis using anomaly detection

It produces four peer groups and three outliers based on the type of arguing in the documents.

In the first group, every text refers to historical incidents and many of them to possible civil disorder. This group contains many poems and songs. In the second group many authors refer to possible civil disorder and unrest as well, but also to the new constitution. As in the first group, many references to historical incidents occur here. In the third group historical incidents do not figure in any text but most of the texts do refer to civil disorders. The most striking feature of the fourth group is the absence of any radical argumentation. Most peer groups contain a mix of genres, but the fourth group contains most of the addresses to the king which are of course polite and formal in style.

The three outliers appear to be rightly chosen by the model. One of them is a translation of a newspaper article from the English Morning Herald describing the Dutch Bishop Controversy from the outside and is as such rare indeed. The other one is a letter from a newly appointed bishop to the faithful, rare as well, and the third one contains a very radical street song.

A priori classification can be systematically compared to the classification on the basis of the texts alone. To me, it was surprising that, apart from the petitions, with their set opening lines and largely identical contents, poems could be divided into three groups: a series of poems which turn out to consist only of jubilant triumphal songs, another group which sings the praises of a king incorporating the hopes of the Protestants, and a third group of indignant Beggar songs. Only the last category is mentioned in the historiography on this subject.

One of the advantages of text mining is the possibility to couple the results of the cluster analysis to the meta data of the documents, such as the identity of the publisher, the format of the document, what is known about the author, whether the text was published anonymously, etcetera. For instance, at first it seemed a little odd that texts with a more radical tone were not necessarily published anonymously any more often than more moderate texts. A further exploration of this matter revealed that many ministers apparently could not resist the temptation to use a lot “us against them” rhetoric in their sermons (which by the way, is the reason why the concept of “Catholic” makes frequent appearances in Protestant sermons). Many publishers apparently thought it might be profitable to publish these sermons subsequently. Anonymous publication or publication under pseudonym was common practice and quite decent in the nineteenth century but the ministers naturally had their name attached to their sermon. In this respect the technique has the ability to draw attention to many facts that are obvious in hindsight but still much-overlooked. Apart from street songs, sermons were among the most radical texts but all bear the name of the author, which reveals of course that a lot of radical stirring-up was done by ministers with a respectable reputation.

A certain coincidence of vocabulary and text genre seems to indicate genre conventions about which hardly anything has been published up to now. On the other hand, a form of plagiarism existed. As mentioned, the Bishop Controversy caused a flood of publications in a short period. Publishers accused each other of exploiting the commotion in objectionable ways. Text analysis reveals that these accusations are exaggerated but not entirely unfounded. The corpus seemed larger than it actually was, due to the transfer of different texts to other media. At the same time, the series of publications must have had a diverse and therefore nearly documentary character in the perception of contemporaries. But the number of publications with a truly original message is smaller than presumed. Many of the pamphlets appear to be reprints of newspaper articles, remarketed in pamphlet format. Also, complete pamphlet texts are included in magazines, and we already saw that sermons were first delivered orally and printed afterwards.

This might have been profitable for the publishers involved, but must have contributed to the feeling of urgency among contemporaries – yet another publication about the bishops, apparently an important issue. For instance, the petition of the Leiden Protestants did not only go to the king, but travelled the whole country via the newspaper in which it was reprinted and in the form of a subsequently printed pamphlet containing the text. The same goes for the song sung in the street and the sermon by a radical preacher in Utrecht. The public could follow all these utterances from day to day and in some cases in different media.

Like any hype, this one partly fuelled itself. Many authors refer to other publications about the issue. One radical text could thus trigger various answers, many of which stressed the need for composure and respect for fellow Roman Catholic citizens. Roman Catholics themselves published conciliatory answers to radical sermons. Paradoxically, it might have been especially this more-of-the-same character of such a stream that acted as a sales argument, as it would do in a modern-day media hype. The story, always about the same question, is to be continued, and this engenders suspense. Who is next to say what to whom? And in a publishing world characterised by hundreds of small companies, it must have been tempting to make some fast cash by adding a song or a sermon. To the public, this meant a supply of ever fresh-looking information.These publications should not of course be ignored; they all are part of the same public debate. It is not the majority of fairly moderate publications, though, but the most flagrant insults which attracted attention from most historians.

I can illustrate this tendency by means of one of the nicer features of PASW modeller – the possibility to compare text-mining results with the meta-data of the document and combine text mining with the use of statistical techniques to further explore the set of documents. Modelling via a classification and regression tree using the flag field “used by historian or not” as a target variable revealed that historians have cited mainly texts in affirmation of their ideas of the nature of the conflict.

Figure 8: Classification and regression tree.

In many publications, much has been made of the importance of the relation between state and church in this conflict. Historians have stressed how many more or less radical Protestants were convinced the Netherlands should be a nation with a protestant character and thus claimed priority for their church in laws and state financing.

In citing sources they use a lot of addresses to the king, apparently since these signalled the onset of the conflict. But apart from that they tend to favour either texts stressing the need to maintain state support for the protestant denomination or antipapist radical texts that talk of conspiring Jesuits and the re-erection of scaffolds and stakes. More moderate and civil comments of contemporaries disapproving the hype itself and stressing the need to live peacefully together under the wings of a neutral state which treats all religions equally are by and large ignored. Analysis suggests sources used until now are not representative for the corpus of publications that contemporaries actually produced on the subject.

As stated, text mining offers the opportunity to tackle this stream systematically, even more so because all utterances can be compared to each other. The lists of words and the possibility to determine with which these are associated in the texts finally offer a solid basis for a decoding procedure, since the machine is much more consequent than the reader.

Obviously, this technique also has disadvantages when compared to reading and interpreting the texts yourself, for instance, to mention only a few:

  • One will have to obtain some prior knowledge on the corpus of documents.
  • One will need to know about the historical context, for instance, the circumstances that accompanied the production of the documents.
  • The approach is especially apt for broad research of large quantities of text. It does not offer a replacement for thorough qualitative analysis of less bulky collections.
  • Although the current (expensive) systems incorporate quite a bit of linguistic knowledge, it is often necessary to supplement the lexical universe of the software supplier with specific domain knowledge. – in my case types addressing the special characteristics of the dispute and nineteenth century language peculiarities. This is very time-consuming.

The researcher will have to be familiar, or will need to familiarise him or herself, with a number of statistic techniques. For many historians this poses a problem.

But the advantage of the attention focus supplied by the generated word lists and clusters seems evident to me. Moreover, the instrument offers the opportunity to screen the whole corpus within a reasonable amount of time and to make a reasoned selection of those documents which qualify for further, more thorough research. One can make a more substantiated selection of texts that are fit to illustrate the nature of the hype. Links are made visible between concepts and between the form and the contents which otherwise might have gone undetected. In this way, it will become possible to systematically investigate in what period what form belonged to what contents. For this experiment, I have had to supply my documents for digitalisation, but many historical sources are available in electronic form or even online nowadays. This means that the technique will become ever more relevant and moreover, it will be possible to answer more and more complicated questions than before.

