The Methodology of the Index

How is ‘European Integration’ measured?

The Eastern Partnership Index combines indicators from existing sources with first-hand empirical information gathered by local country experts within the networks underpinning the EaP Civil Society Forum (CSF). This general design makes it possible to use the best existing knowledge and to improve this body of knowledge by focused, systematic data-collection that benefits from the CSF’s unique access to local knowledge in the EaP countries. 

However, expert surveys are prone to subjectivity. Many existing expert surveys are characterised by a mismatch between “soft”, potentially biased, expert opinions and “hard” coding and aggregation practices that suggest a degree of precision not matched by the more complex underlying reality. The expert survey underlying the Eastern Partnership Index therefore avoids broad opinion questions, and instead tries to verify precise and detailed facts, following a methodological strategy pioneered by the World Bank’s Doing Business Survey. 

Most survey questions ask for a “Yes” or “No” response to induce experts to take a clear position and to minimise misclassification errors. All questions invite experts to explain and thus to contextualise their response. In addition, experts are requested to substantiate their assessment by listing sources. 

The survey is implemented by six country and six sectoral co-ordinators who supervise and assist the data collection and evaluation in the following sectors: deep and sustainable democracy (democracy and human rights); EU integration and convergence; sustainable development; international security, political dialogue and co-operation; sectoral co-operation and trade flows; citizens in Europe. 

Firstly, the country co-ordinators ask local sectoral experts to evaluate the situation in their country on the basis of the questionnaire. These experts and the sectoral co-ordinators co-operate to ensure cross-country consistent assessments.

Secondly, the sectoral and country co-ordinators review the ratings and underlying rationales provided by the local experts. These reviews serve to clarify assessments where necessary, to compare the ratings across countries, and to revise ratings in consultation with local experts. This process facilitates a mutual understanding between experts and co-ordinators in order to improve the reliability and validity of the assessments.

Thirdly, sectoral and country co-ordinators draft narrative reports comparing the assessments for each country and (across all countries) sector. These drafts and the data scores are reviewed by a set of peer reviewers for each country and sector. Finally, the data scores and narrative reports are reviewed and edited by the Index core team. 

As a rule, all questions to be answered with yes or no by the country experts are coded 1 = yes or positive and 0 = negative with regard to the aggregate concepts of the Index: deep and sustainable democracy, European integration, and sustainable development (labelled “1-0”). If the expert comments and the review process suggest intermediate scores, such assessments are coded as 0.5. For items requiring numerical data (quantitative indicators), the figures are coded through a linear transformation, using the information they contain about distances between country scores. The transformation uses the following formula:

           x – x min
y =      ____________

          x max – x min

where x refers to the value of the raw data; y is the corresponding score on the 0-1 scale; and xmax and xmin are the endpoints of the original scale, also called “benchmarks”. We preferred this linear transformation over other possible standardisation techniques (e.g. z-transformation) since it is the simplest procedure. 
For items scored with 0-1 or the intermediate 0.5, benchmarks are derived from the questions, assigning 1 and 0 to the best and worst possible performance. Since benchmarks for quantitative indicators often lack intuitive evidence, they have been defined by assigning the upper benchmark to a new EU member state. Lithuania was chosen as the benchmark country because it shares a post-Soviet legacy with EaP countries and, as the largest Baltic state, resembles EaP countries most with regard to population size. In addition, the selection of Lithuania reflects the idea that the target level for EaP countries should neither be a top performer nor a laggard, but rather an average new EU member state with both strengths and weaknesses. Being the sixth among 13 new EU member states in terms of economic wealth (per capita GDP in purchasing power standards in 2015 according to Eurostat), Lithuania epitomises this idea relatively well. Moreover, considerations of data availability favoured the choice of a single country rather than determining median values for all new EU member states. 
The lower benchmark is defined by the value of the worst-performing EaP country in 2014. To enable a tracking of developments over time, we chose 2014 as the base year for defining benchmark values. This year represents a critical juncture for the EaP countries because three countries signed Association Agreements with the EU, and Ukraine was fundamentally transformed by the Revolution of Dignity, the annexation of Crimea, and the war in its eastern parts. In those rare cases when the values of an EaP country exceeded the upper benchmark or fell below the lower benchmark, the upper and lower scores were set to 1 and 0 respectively. All benchmark values and standardisation procedures are documented in an excel file that is available on the EaP Index website.
The Eastern Partnership Index 2015-2016 measures the situation of EaP countries as of December 2016, or the latest data available up until that point. Thus, the measurement is status-oriented, making it possible to identify the positions of individual countries compared with other countries for the different sectors and questions. 

How is the Index calculated?

Aggregating scores is necessary to arrive at an Index or composite indicator. However, aggregation implies decisions about the relative weight of components that need to be explained. The Eastern Partnership Index consists of two dimensions, which are further disaggregated in sections, subsections, categories, subcategories and items. The different levels of disaggregation are designated by numbers such as 1.1, 1.1.1 etc. This hierarchical structure reflects theoretical assumptions about the components and boundaries of concepts. One could, for example, argue that free and fair elections constitute the core of democracy and should therefore be given a higher weight than the category of Freedom of Speech and Assembly. Conversely, one could also argue that democracy in most EaP countries is mainly impaired by unaccountable governments and the lack of independent media, while elections are more or less well organised. 
Since it would be difficult to establish a clear priority of one or several components over others, we decided to assign equal weights to all components. Equal weighting of components is also intuitively plausible since this method corresponds to the conceptual decision of conceiving, for example, the concept of democracy as composed of a variety of attributes placed on the same level. Equal weighting assumes that all components of a concept possess equal conceptual status and that components are partially substitutable by other components. 
An arithmetical aggregation of components is, strictly speaking, possible only if components are measured on an interval level, that is, we know that the scores of items, subcategories, categories, sections and dimensions contain information on distances. Most numerical data are measured at interval level: in these cases, we know, for example, that a share of EU exports amounting to 40% of GDP is twice a share of 20% and that this ratio is equal to the ratio between 60% and 30%. For the yes-no questions and items measured with other ordinal scales, we have information only about the ordering of scores, not about the distances between scores. 
For example, we do not know the distance between a yes and a no for the question regarding parties’ equitable access to state-owned media. Neither do we know whether the difference between yes and no for this question is equivalent with the difference between yes and no for the subsequent question asking whether political parties are provided with public funds to finance campaigns.
In principle, this uncertainty would limit us to determining aggregate scores by selecting the median rank out of the ranks a country has achieved for all components (assuming equal weighting). This would, however, imply omitting the more detailed information contained by the numerical items. To use this information and to put more emphasis on big differences between countries, we have opted to construct quasi-interval level scores by adding the scores of items measured at ordinal level. This has been a standard practice in many indices and can also be justified by the rationale behind equal weighting. 
Given the frequent uncertainty about the importance of components for aggregate concepts, the safest strategy seems to be assigning equal status to all components. Equal status suggests assuming that a score of 1 used to code a positive response for one question equals a score of 1 for another positive response. Moreover, equal status means that all components constituting a concept are partially substitutable. The most appropriate aggregation technique for partially substitutable components is addition.
Since the number of items differs from subcategory to subcategory and, since we want to apply equal weighting, we have standardised the subcategory scores by dividing them through the number of items. Thus, the subcategory score ranges between 1 and 0 and expresses the share of yes-no questions answered positively in terms of the aggregate concept (and/or the extent to which numerical items or ordinal-level items are evaluated positively).
Quasi-interval level scores allow a range of aggregation techniques at higher levels of aggregation (categories, sub-sections, sections and dimensions). The most important methods are multiplication and addition. Multiplication assigns more weight to individual components, emphasizing the necessity of components for a concept; in contrast, addition facilitates the compensation of weaker scores on some components by stronger scores on other components, emphasizing the substitutability of components for a concept.
We apply an additive aggregation of subcategories, categories and sections because this approach fits to the method used on the item level, reflects the substitutability of components, and is less sensitive with regard to deviating values on individual components. To standardise the aggregate sums and ensure equal weighting, arithmetical means are calculated. An aggregate score is thereby calculated for each of the two dimensions of Linkage and Approximation. This method reflects the conceptual idea that the two dimensions are interdependent and jointly necessary for progress in European integration.
Aggregation levels, aggregate scores, individual scores and the underlying raw data are documented in an excel file that can be downloaded from the Index website.