Evaluation and impact assessment of STI policies

Rationale and objectives

In the field of STI as in other policy areas, the role of evaluations and impact assessments is to provide an informed assessment of public policy interventions in terms of their efficiency, their effectiveness and, often to a lesser extent, their relevance. The results of these exercises can contribute to the policy-making process in different ways, from supporting the accountability of public spending to enhancing policy learning.

Although less visible, evaluations and impact assessments can also have important “process uses” , generating ex-ante effects on the efforts and behaviours of managers and decision makers who anticipate that their programme or policy will be subject to an evaluation or impact assessment. Another potential “by product” is to stimulate debates and interactions among these actors during the process of designing the exercise, contributing to its implementation or following up on its recommendations. Finally, these exercises can be used strategically to establish or strengthen the legitimacy and credibility of specific STI interventions, for instance, in the context of negotiations between line ministries and central administrations such as treasuries or heads of government (OECD, 2010).

Evaluations and impact assessments overlap to a great extent in terms of objectives and process. As the name suggests, impact assessments focus specifically on the effect of a policy, including its longer-term impact. This is increasingly understood in the narrow sense of a robust quantification of the amount and types of outputs, outcomes and impact, using counterfactual analysis (Stern, 2012). Although, while many of these exercises still focus on effects, they also go beyond this and come closer to full evaluation exercises. An evaluation is a more comprehensive exercise that also includes a judgement on the intervention objectives and the process through which effects are produced. At the heart of evaluation exercises is tracing back the causal relationships that link impacts on output and outcome measures (e.g. economic growth, improvements in health or the environment, or broader societal changes) to inputs (e.g. investments in R&D). Before engaging in fieldwork, evaluations more or less explicitly use a theory of change to map the various possible pathways for the effects.

Evaluations and impact assessments may result in significant improvements in policy, including greater transparency of achievements and limitations or a strengthened network around the interventions. Based on their recommendations, they may also prompt a re-positioning of policies and programmes, shape the allocation or re-allocation of public funding (e.g. more generous block grants to top-performing universities) and inform the development of national STI strategy. However, despite the growth of the institutionalisation and reach of evaluations, the limited use of their results is still one of the main weaknesses in the policy cycle (Stern, 2015).

Major aspects and instruments

Evaluations and impact assessments can take many forms according to their purposes, scope, timing and the broader institutional setting in which they are embedded. They can, for instance, take place at different stages of the policy cycle (ex ante, mid-term, ex post) or be implemented as part of a contract (e.g. R&D programme funding) or be imposed by law (e.g. the US Government Performance and Results Act). Individuals, projects, organisations (e.g. universities, funding agencies), programmes, policies and even the overall STI policy mix or system can be evaluated. They use a wide range of qualitative and quantitative methods.

Many of the challenges that affect evaluations and impact assessments in general are particularly salient when it comes to evaluating STI policies, due to some specific features of knowledge and, more generally, of research and innovation processes (see Table 1). The so-called “project fallacy”, for instance, whereby outcomes that are in reality cumulative and dependent upon the interaction of several factors are wholly or mostly attributed to the intervention assessed, can be particularly strong, as can the tendency to underestimate the effects of an intervention because of the narrow focus of the evaluation or because of the timing of the assessment, as the full effects might not yet be felt.

Although the methods and practices of evaluation and impact assessment evolve slowly, some positive tendencies can be observed. With regard to measurement, continuous progress in STI indicators and the promising use of micro data (Galindo-Rueda and Millot, 2015) and “Big Data” (Jensen and Lane, 2013) can lead to improvements. The growing availability of data stemming from the digitalisation of basically all human activities and the enhanced capacity to automatise its treatment make evaluation easier to perform in principle, although the use of such data in evaluation is still in its infancy.

Quantitative approaches (in particular, quasi-experimental methods such as random control trials used for impact assessment), although still rare, are starting to be used in the area of STI (Warwick and Nolan, 2014). This is the case, for instance, in the Netherlands, where control groups and experimental design methods are being tried in evaluations of business-oriented instruments. There has also been a renewed interest in understanding the long-term impacts of STI policies (Arnold, 2013), as well as a growing number of attempts to broaden the scope of these exercises to include a larger portfolio of policy instruments (policy mix evaluation, system evaluation, evaluation of national strategies, etc.) (OECD, 2015).

Recent policy trends

Changes in STI policy evaluations and impact assessments are generally at the crossroads of two dynamics that operate within different timeframes.

In the long run, these changes follow the overall evolution of the practice of evaluation in their respective country of origin. These movements are very slow, as any progress on this front calls for structural and cultural changes in the way public policy is conducted. Although such progress is barely visible on a biennial basis, evaluation and impact assessment agendas tend to continue to move forward in most countries. The slow pace of these developments explains the persisting strong heterogeneity in the level of development of evaluation and impact assessment among countries (see Table 2). The ability to carry out evaluation and impact assessment is poorly developed in some countries, and evaluation practices are not widely embedded (e.g. Colombia, Malaysia [OECD, 2016b], Russian Federation, South Africa). In other countries, evaluation and impact assessment is part of the culture and is institutionalised to a greater or lesser extent and in different ways (through a dedicated committee, as in Korea and Mexico, or by law, as in Spain and Peru, etc.).

Since the 1980s, one key driver of this long-term trend has been the diffusion of New Public Management concepts (Figure 1). Research policy has been among the latest areas affected by this overall trend. Along with the increase in evaluations and impact assessments to feed into evidence-based policy making, this trend has also resulted in a multiplication of competitive schemes to allocate project funding as well as performance-based mechanisms to distribute institutional “block” funding (for instance in Croatia, France, Lithuania, Sweden [OECD, 2016a and 2016c], etc.).

In the short- to mid-term, the practice and use of evaluation and impact assessment are heavily influenced by changes in STI policies themselves. Since governments devoted significant resources to R&D and innovation during the economic and financial crisis as a form of countercyclical policy, STI policy evaluation and impact assessment have logically gained more policy attention in the few years afterwards. This growing demand has been all the more pronounced since tightening fiscal constraints have heightened the need to demonstrate value for public money. Although in some countries evaluations remain geared toward policy learning (formative evaluations), a shift toward more summative evaluations – where the focus is put on measuring the outcomes of an intervention against its objectives – has been seen in recent years. It has, for instance, become more important to motivate ongoing measures and increase their effectiveness in Sweden. The financial constraints have also limited the resources available for evaluation and impact assessment exercises and, reciprocally, increasing evaluation costs have weighed on the actual budget allocated to public support for innovation. Evaluations in New Zealand not only have become more focused on the outcomes and impacts of STI policy in order to justify spending within the STI area, but they have also shifted toward smaller and quicker exercises. Furthermore, they have made more intensive use of public administrative data and online technology, including for collecting qualitative data. Even in countries like Brazil and Chile, where evaluation is not yet well institutionalised and by tradition is more formative than summative, expectations of use for public accountability have been growing. For the same reasons, there has been a change toward a more strategic use of evaluation.

One challenge facing STI policy evaluation and impact assessment is the increasing complexity and scope of the policies being assessed. STI policies deal with multiple objectives, arrangements, targets and instruments; they involve a growing number of actors, interlinked through various forward and feedback loops; and they are aimed at covering a broadening range of needs, including critical social challenges. Evaluation is therefore also affected by the growing interest in innovation systems (Figure 1) and the resulting call for a better understanding of the effectiveness of a larger portfolio of policy interventions (OECD, 2015). The “policy mix” concept has become central to policy discourse and has pervaded the STI policy evaluation sphere (Kergroach et al., forthcoming).

This trend toward more systemic evaluations has developed markedly around the world albeit differently according to countries (Figure 1 and Table 3). In some countries, this shift has been limited to developing a common framework for evaluations. The United States and Japan have been particularly active in taking initiatives in the field of Science of science and innovation policy (SciSIP) which are aimed at developing, improving and expanding models, analytical tools, data and metrics that can be applied in STI policy decision-making processes. Norway has also had a SciSIP research programme since 2010, currently called “FORINNPOL”.The United Kingdom is home to a movement to try and improve the comparability of impact assessment on economic growth across a range of measures. In other countries, grouped evaluations have been carried out on related schemes, sometimes in the context of spending review exercises, as in Greece and Colombia. In Ireland, a grouped evaluation attempted to capture the interactions between different combinations of enterprise supports and reach conclusions about their effectiveness. Less commonly, evaluations have covered the whole STI system or a component of it (for instance, all technology transfer policies). Given their broad scope, these exercises have been mostly performed by international organisations and in all cases have remained one-off initiatives.