A minister of the French government summoned a few of the most eminent merchants and asked them for suggestions on how to stimulate trade-as if he would know how to choose the best of these. After one had suggested this and another that, an old merchant who had kept quiet so far said: “Build good roads, mint sound money, give us laws for exchanging money readily but as for the rest, leave us alone! [Lasst uns machen] ” If the government were to consult the Philosophy Faculty about what teachings to prescribe for scholars in general, it would get a similar reply: just don’t interfere with the progress of understanding and science. (I. Kant, The Conflict of the Faculties, AK VII, 19-20 n2)
When Francesca Di Donato wrote the article we are proposing for open peer review, the COARA principles and its internal governance could perhaps still be developed in a Kantian way. Now, however, with the benefit of hindsight, we are in a better position to see whether this alleged potential has been developed or not.
In the concluding remarks of her article, Francesca Di Donato argues, following Kant, that the evaluation of research belongs only to the scientific community: “philosophical activity is fundamental research, the exercise of a method which consists in subjecting any doctrine to criticism, and as such it is the fundamental precondition of all knowledge. It consists of free communities of peers who learn from their mistakes and constantly self-correct.” Therefore, she concludes, “changing the way we evaluate is not enough if we do not also discuss the evaluators themselves. The last point is at the core of a responsible research assessment reform. In fact, the ARRA requires the direct involvement of individual academics and of scientific communities in the definition of new criteria and processes (ARRA, 2022, pp. 3, 5, 6, 9), but academic communities should assume collective ownership and control over the infrastructures necessary for successful reform. This last point is not as prominent in the ARRA as it should have been – and should be a central governing principle in the future CoARA.”
The following presentation will address two questions:
- Did COARA take Francesca Di Donato’s suggestions seriously?
- If not, why not? Simple reluctance or deeper structural reasons?
1. Promises unkept?
The first sections of Francesca Di Donato’s article report on the origins and principles of the Agreement on Reforming Research Assessment (ARRA), on the basis of which the COARA coalition was formed.
It is worthy of note that the whole process was initiated by the EU Commission and supported by the EU Council, and gained momentum when the Covid-19 pandemic showed that the current system of research evaluation ensures neither accessibility nor quality of science – precisely because it is mainly based on the quantity of publications and citations.
Certainly, the EU Commission and the European Council relied on a mass of scholarly studies, both independent and commissioned, on the basis of which they promoted the reform of research assessment. It is difficult to deny, however, that their political intervention was not merely infrastructural, as Kant demanded in the above quote, but was intended to affect the very core of scientific activity, namely the way in which scientists evaluate their work.
When research evaluation is in the hands of bureaucratic and more or less centralized agencies, the greatest flaw of bibliometrics – the idea that scientific literature can be evaluated without reading and understanding it, using quantitative criteria that are easy to game – becomes a virtue. Peer review, based on the living craftsmanship of scientists debating among themselves, cannot be used as a weapon of mass evaluation, because it does not scale. Therefore, the abolition of bibliometrics as an evaluation tool would also imply the abolition or strong reduction of huge and powerful centralized evaluation agencies such as the Italian ANVUR or the Spanish ANECA.
Since even centralized evaluation agencies such as ANVUR and ANECA could join COARA and sit on its steering committee, there should be some ambiguity and compromise in the ARRA principles.
As reported by Francesca Di Donato, the second ARRA commitment, qualitative assessment, requires that research be evaluated by reading and discussing scientists’ work rather than by counting it. This commitment, in other words, emphasizes the centrality of peer review as part of a public scientific debate that should itself be an object of research rather than a ritual.1 In addition, the third commitment advocates “responsible metrics” by abandoning “the inappropriate use of indicators such as JIF and h-index”.
And yet, when COARA was accused of the hideous sin of “bibliometric denialism“, some prominent COARA members felt it necessary to defend it against such an accusation. They answered by trying to balance qualitative and quantitative evaluation.
Using scientometrics alone for assessments at lower levels of granularity, i.e., for the assessment of individuals, including consequential purposes such as allocating rewards (funding, jobs), is highly problematic. In such cases, peer review should be preferred.
However,
the use of scientometrics at higher levels of aggregation, such as country or university level, and for less consequential forms of assessment such as for scholarly understanding, is far less problematic (if still imperfect).
They also showed awareness of the trickle-down effect of bibliometrics in general, which is well known to Italian researchers. If the institutions in which they work are evaluated and funded on the basis of quantitative criteria, researchers will be pressured to follow bibliometrics, despite any commitment to its responsible use.
The fact remains that an over-reliance on even responsible scientometrics can still have a negative impact on the research evaluation ecosystem due to trickle-down effects. The legitimate use of bibliometrics to understand country-level activity can soon end up illegitimately in promotion criteria if too much reward is associated with bibliometric assessments at higher levels of aggregation.
Although seemingly reassuring, this balanced response reveals that COARA does not want to eliminate bibliometrics as a weapon of mass evaluation, nor the centralized agencies that depend on it.
Regarding the trickle-down effect, the response cites Principle 9 of the Leiden Manifesto for the responsible use of bibliometrics, which states that such an effect can be avoided by adopting “a set of indicators” rather than “a single one” that invites “gaming and goal displacement (in which the measurement becomes the goal)”. In other words, the Leiden solution against the gaming of quantitative indicators is the technocratic idea of multiplying them in order to make gaming more difficult.
But why can quantitative indicators be gamed? Just because researchers subjected to them are inherently evil and need to be harnessed with solutionist solutions? Or because bibliometric indicators, at any level of “granularity”, are only orthogonally related to research quality, even though they are indispensable to centralized bureaucracies that are incapable of reading and understanding science as it is not only written, but also done? Indeed, if administrators subjugate2 them under evaluation criteria that cannot grasp the substance of science, it is easier to explain how researchers can be tempted to game the system for the sake of either their careers or their sheer academic survival, even without assuming that they are particularly evil.
COARA did not include commercial publishers in its coalition because of their inherent conflict of interest in favor of journal-based evaluation systems. However, it does not seem to have perceived the conflict of interest in favor of bibliometrics as a weapon of mass evaluation inherent in centralized evaluation agencies such as ANVUR or ANECA, which were accepted not only as members but even as possible candidates of its steering board. COARA’s response to the accusation of “bibliometric denialism” suggests that this may have been done on purpose: there would indeed be no conflict of interest if the reform of research evaluation were not intended to jeopardize state evaluation and its bibliometric weapons.
On the other hand, COARA’s emphasis on qualitative evaluation might suggest that its goal is (also) to downgrade mass (and quantitative) evaluation in favor of peer review. If, however, centralized evaluation agencies that want to maintain their power are represented in COARA’s steering board, this could create an unnoticed conflict of interest and make their downscaling very difficult.3
2. Quality and freedom
In the language of COARA quality is linked to peer review and is an alternative to bibliometrics. While bibliometrics is the weapon of mass evaluation of choice for bureaucracies unable to understand science, qualitative evaluation is associated with free and open discussion among (expert) peers and thus with open science.
Many international and national organizations have taken the trouble to define and recommend it: in a research system deeply shaped and distorted by weapons of mass evaluation, administrators still seem to feel entitled to tell scientists how to do their work.4 Even without the looming presence of COARA’s “responsible” bibliometrics, the trap of bureaucracy with its normalizing power seems difficult to avoid, so that the only quality we can hope for is the one standardized in the concept of quality control.
The modern scientific revolution was not a decision made by monarchs or a high-ranking administrators. According to Paul David, the idea of science as a common good, based on collaboration and funded by aristocratic patrons, is rooted in a pre-capitalist and less bureaucratized world. If we want to loosen the grip of bureaucracy that leads to research without quality, we cannot conceive openness as an administrative task. In fact, the goal is not to make a lot or resources open for business,5 but to maintain or recreate conditions that allow scientific communities to improve the quality of their work through free collaboration and criticism.
3. Quality: an elusive definition
According to Wilhelm von Humboldt, whose university reform the European Union has almost dismantled through the Bologna process, “it is a peculiarity of the higher scientific institutions that they always treat science as a problem that has still not been fully resolved and therefore remain constantly engaged in research“. This is why the definition of quality in science is so elusive for finite rational beings, who, as such, have no general formula of truth.
The question of the definition of quality is at the heart of The Zen and the Art of Motorcycle Maintenance by Robert M. Pirsig.
The first attempt of solution is refusing to deal with the definition as a theoretical problem and trying to attain it practically. This is the experiment undertaken by Phaedrus, the alter ego of Pirsig at Bozeman: abolish grades and ask students to judge papers day by day. At the end of the experiment, he discovered that students tended to imitate each other and the teacher. It is not surprising: if you rely on practice without any effort of theoretical reflection, what do you get? Only fashion, whose whims cannot be reduced to a concept, but can only be imitated.
Paradoxical as it may seem, this is also the hubris of the bibliometric research evaluation: why bother to look for the elusive and not-scalable quality of research when we can easily calculate its impact, namely its fashionability? Pirsig, however, avoided this hubris by conceiving the scientific method as the way in which rational but finite beings can approach quality. While it cannot attain Truth, it can help to select a single (and perhaps provisional) truth from among many hypothetical truths. But to understand its process, one must be part of it: one cannot be a bureaucrat who, however “responsibly”, annotates the “impact” of something he cannot and will not understand.
Therefore, also when we need to use static patterns of quality for punctual evaluations – for example, when recruiting researchers or selecting projects for funding – we should be aware that, however transparent and verifiable they may be, they do not do justice to the whole process, which is not static but dynamic.
For this reason, as Kant shows in the Conflict of Faculties, universities and research institutions cannot be bureaucratic institutions at the service of “truths” administratively established without jeopardizing the credibility of science. Therefore, within the university, freedom of public criticism is not just a privilege: it is the very condition of the possibility of an institutional scientific research. This freedom does not mean the power to give orders, as in hierarchical organizations, but rather the possibility of challenging the government and the scholars working at its service.
Rational but finite beings cannot allow truth to be established by political powers or by scholars acting as ministers in their service without delegitimizing government and suppressing the pursuit of truth itself. Hence
4. EU, or the Elusive Union
According to Kant, politicians should be concerned with the infrastructure of research and not with the way in which researchers carry out their research. Caesar non est supra grammaticos.
Many EU administrators like to present themselves as Kantian, at least in their statements of values. But, in planning ARRA and COARA, they hardly followed his advice.
- They discovered, albeit belatedly, that a quantitative evaluation undermines the quality of research.
- To solve this problem, they assembled a loose coalition of universities, research institutions, learned societies and evaluation agencies with the task of promoting a reform of research evaluation, as if the dominance of bibliometrics and the consequent damage to research quality were the result of decisions made exclusively by researchers.
Kantian politicians would have done the opposite. First, they would have avoided interfering with the evaluation of research, because “Caesar non est supra grammaticos”. Second, they would have examined whether there were infrastructural conditions that a political action could have improved. They would have discovered that the “irresponsible” use of bibliometrics as a weapon of mass evaluation is linked to centralized evaluation agencies like ANVUR and ANECA. Finally, they would have exercised their legitimate authority to enact a single law: one that would eliminate or minimize any form of centralized administrative evaluation of research under the jurisdiction of state or corporate bureaucracies.
In the conundrum of short-sightedness and conflicts of interest affecting COARA, the original sin may be the overlap of administrative power and research – a sin that the elusive Union seems to have neither the strength nor the awareness, or even the will, to redeem. And COARA alone certainly cannot do what the EU legislator had neither the spirit nor the courage to do.