Methodology

Inside the Index: What we look at and how we measure it

The 2014 Index is based on the same elaborate questionnaire as the previous year in order to take better into account the actual implementation of reforms and not just the existence of legislation. In order to ensure cross-annual comparison, the 2012 Index was also last year updated to match the new questionnaire. As a result, the current Index shows not only the state of things in 2013 and early 2014, but also how the situation changed over the year. This allows us to trace progress or the lack thereof and make conclusions about reform efforts and political will in each of the EaP countries.

The research relies on two types of data: expert assessments commissioned by the core project team and numerical data from publicly available sources. It is intended that this general design uses the best existing knowledge and improves this body of knowledge by focused, systematic data collection that benefits from the Open Society Foundations’ unique embeddedness and access to local knowledge in EaP countries. However, expert surveys are prone to subjectivity. Many existing expert surveys are characterised by a mismatch between ‘soft’, potentially biased expert opinions and ‘hard’ coding and aggregation practices that suggest a degree of precision not matched by the more complex underlying reality and their verbal representation in country reports.

The expert survey underlying the Index therefore avoids broad opinion questions and instead tries to verify precise and detailed facts. Complex issues are disaggregated into detailed questions that enable experts to provide more specific responses. Guided by a detailed questionnaire, experts are less often forced to assign subjective weights to different aspects of reality in their evaluation. Most of our survey questions asked for a ‘Yes’ or ‘No’ response to induce experts to take a clear position and to minimize misclassification errors. Experts were requested to explain and document their responses.

As a rule, all questions to be answered with ‘Yes’ or ‘No’ by the country experts were coded 1 = yes or positive with regard to EU integration and 0 = no or negative with regard to EU integration (labeled ‘1-0’). If the expert comments and the correspondence with experts suggested intermediate scores, such assessments were coded as 0.5 or even 0.25 or 0.75 when a more nuanced valuation was needed (labelled ‘calibration’).

For items requiring numerical data (quantitative indicators) the figures were coded through a linear transformation using information about distances between country scores. The transformation used the following formula:

y =  (x – x min)/(x max – x min)

where x refers to the value of the raw data; y is the corresponding score on the 0-1 scale; and xmax and xmin are the endpoints of the original scale, also called ‘benchmarks’. We preferred this linear transformation over other possible standardisation techniques (e.g., z-transformation) since it is the simplest procedure.

The benchmarks may be based on the empirical distribution, on theoretical considerations, on the country cases examined or on external standards. In the case of the Eastern Partnership Index, this problem is intertwined with the question of the finalité of the Eastern Partnership. Whereas the EU refuses to consider accession an option, at the same time it tends to expect standards similar to those of the accession process and some EaP countries aspire to EU membership. In addition to this uncertain finalité, many items entail the problem of determining unambiguous best or worst practice benchmarks, both in terms of theory and empirical identification. Given these difficulties, we have opted for a mix of empirical and theoretical benchmarks. For items scored with 0-1 or the intermediate 0.5, benchmarks are defined theoretically by assigning 1 and 0 to the best and worst possible performance. In contrast, benchmarks for quantitative indicators were defined empirically: in the Linkage dimension we assigned 1 and 0 to the best and worst performing EaP country to emphasise the relative positioning of a country vis-à-vis its peers. This holds with a few exceptions mostly in the questions on people-to-people linkage and assistance, where 0 was used as a baseline in order to make tracking of the progress possible from one year to the next. In the Approximation and Management dimensions we defined benchmarks either on the basis of theoretical considerations or based on the performance of other East European countries (including new EU member states) in order to focus on gaps or catching-up relative to this group.

To construct an Index that is a composite indicator it is necessary to aggregate the individual scores resulting from numerical data and expert assessments. However, aggregation implies decisions about the relative weighting of components that need to be explained. The hierarchical structure of the Eastern Partnership Index reflects theoretical assumptions about the components and boundaries between concepts. For example, we define the section deep and sustainable democracy as consisting of seven categories: elections; media freedom, association and assembly rights; human rights; independent judiciary; quality of public administration; fighting corruption; and accountability. The individual weighting of each category should depend on the importance each category has for deep and sustainable democracy. One could, for example, argue that free and fair elections constitute the core of democracy and should therefore be given a higher weighting than the category of association and assembly rights. Conversely, one could also argue that democracy in most EaP countries is mainly impaired by unaccountable governments and lacking media pluralism, while elections are more or less well organised.

Since it is difficult to establish a clear priority of one or several categories over others, we have decided to assign equal weighting to all categories. The equal weighting of all components is also intuitively plausible since this method corresponds to the conceptual decision of conceiving democracy as composed of seven categories placed on the same level. Equal weighting assumes that all components of a concept possess equal conceptual status and that components are partially substitutable by other components.

An arithmetical aggregation of components is, strictly speaking, only possible if the components in the data set are interval variables, that is, that the distances between the scores of items, subcategories, categories, sections and dimensions have meaning. Most numerical data is measured at interval level: in these cases we know, for example, that a share of EU exports amounting to 40% of GDP is twice the share of 20% and that this ratio is equal to the ratio between 60% and 30%. For the yes-no questions and items measured with other ordinal scales we only have information about the ordering of scores, not about the distances between scores.

For example, we do not know the distance between a yes and a no answer for the question regarding parties’ equitable access to state-owned media. Neither do we know whether the difference between yes and no for this question is equivalent to the difference between yes and no for the subsequent question on whether political parties are provided with public funds to finance campaigns.

In principle, this uncertainty would limit us to determine aggregate scores by selecting the median out of the scores a country has achieved for all components (assuming equal weighting). This would, however, mean omitting the more detailed information contained by the numerical items. To use this information and to put more emphasis on big differences between countries, we have opted to construct quasi-interval level scores by adding the scores of items measured at ordinal level. This has been standard practice in many indices and can also be justified by the rationale behind equal weighting. Given the frequent uncertainty about the importance of components for aggregate concepts, the safest strategy seems to be assigning equal status to all components. Equal status suggests assuming that a score of 1 used to code a positive response for one question equals a score of 1 for another positive response. Moreover, equal status means that all components constituting a concept are partially substitutable. The most appropriate aggregation technique for partially substitutable components is addition.

Since the number of items differs from subcategory to subcategory and since we wish to apply equal weighting, we have standardized the subcategory scores by dividing them by the number of items. Thus, the subcategory score ranges from 1 to 0 and expresses the share of yes-no-questions answered positively in terms of the aggregate concept (and/or the extent to which numerical items or ordinal-level items are evaluated positively).

Quasi-interval level scores allow a range of aggregation techniques at higher levels of aggregation (subcategories, categories, sections and dimensions). The most important methods are multiplication and addition. Multiplication assigns more weight to individual components, emphasising the necessity of components for a concept. In contrast, addition facilitates the compensation of weaker scores on some components by stronger scores on other components, emphasising the substitutability of components for a concept.

We apply an additive aggregation of subcategories, categories and sections because this method fits the method used on the item level, reflects the substitutability of components and is less sensitive with regard to deviating values in individual components. To standardise the aggregate sums and ensure equal weighting, arithmetical means are calculated.

The survey was implemented in five steps. First, the country team leaders selected and commissioned local experts, asking them to evaluate the situation in their country on the basis of the questionnaire. Different parts of the questionnaire were assigned to sectoral experts. Next, the country team leaders returned the responses to the core project team, which reviewed and coded the responses to ensure cross-national comparability. The experts’ comments allowed us to make a preliminary coding (scoring) that was sensitive to the specific context that guided individual experts in their assessments. In a third step, the core project team returned the coded assessments for all six EaP countries to the local country team leaders and experts, requesting them (1) to clarify their own assessments where necessary and (2) to review the codings by comparing them with codings and assessments made for the other countries. Experts who disagreed with the evaluation of their country were requested to explain their disagreement to the core team. In a fourth step, the answers and the scores were peer-reviewed. This stage is crucial to ensure the accuracy of data and therefore involves several parallel processes. (1) An external review was commissioned for some parts of the Index. An expert on a particular topic from a particular country, who was not involved in filling in the questionnaire, was asked to review the answers submitted by the Index expert from the same country on the same topic. (2) Guided by one of the experts, experts from the six countries working on the same topic had to review the scores in the respective parts of the Index once again and provide feedback to the core team. (3) The Open Society Foundations’ experts also offered their expertise and made observations. Finally, the core team reviewed and adapted the scores in light of this multi-level expert feedback. This interactive evaluation was intended to facilitate mutual understanding among the experts, as well as between the experts and the coders in order to improve the reliability and validity of the assessments.

 

*The Index has been developed by a group of over 50 civil society experts from EaP countries and the EU. Many more have contributed comments at various stages of the project. This Index is produced by he Eastern Partnership Civil Society Forum (CSF), Pasos, the International Renaissance Foundation (IRF) and the Open Society European Policy Institute (OSEPI). The project is funded by the Swedish International Development Cooperation Agendy (SIDA) and Ukraine's part is covered by IRF’s European Programme and the EastEast: Part­nership Beyond Borders Programme of the Open Society Foundations (OSF).