Linguistic Discovery
Dartmouth College

Volume 3 Issue 1 (2005)        DOI:10.1349/PS1.1537-0852.A.281

Note: Linguistic Discovery uses Unicode characters to represent phonetic symbols. Please see Optimizing Display for requirements to accurately reproduce this page.

A Synchronic Lexical Study of Gbe Language Varieties: The Effects of Different Similarity Judgment Criteria[1]

Angela Kluge

SIL International

In the context of a synchronic lexical study of the Gbe varieties of West Africa, this paper explores the question whether the use of different criteria sets to judge the similarity of lexical features in different language varieties yields the same or different conclusions regarding the relative relationships and clustering of the investigated varieties and the prioritization of further sociolinguistic research. Word lists elicited in 49 Gbe varieties were analyzed by means of the inspection method. To assess the effects of different similarity judgment criteria, two different similarity judgment criteria sets were applied to the elicited data to identify similar lexical items. The quantification of these similarity decisions resulted in the computation of two similarity matrices which were subsequently analyzed by means of correlation analysis and multidimensional scaling. The findings of the correlation analysis indicate a significant linear and positive relationship between both word-list computations, thus supporting the conclusion that application of either set of similarity judgment criteria would lead to similar clustering results for the Gbe data set. These findings are corroborated by the findings of multidimensional scaling which suggest that different sets of similarity judgment criteria lead to similar clustering results and similar conclusions as to the scope and priorities for further research.

1. Introduction

In the context of a synchronic lexical study of the Gbe varieties of West Africa, this paper explores the question of whether the application of different sets of similarity judgment criteria in analyzing lexical features with complex polymorphemic word-structures, giving special emphasis to additional morphemes and reduplication, leads to the same or different conclusions as to the relative relationships and clustering of the investigated varieties and the prioritization of further sociolinguistic research.

The Gbe language continuum (Kwa language family) is situated in the southeastern part of West Africa, occupying large areas in southern Benin, Togo, and southeastern Ghana. Among the Gbe varieties, five have thus far undergone language-based development on a larger scale. To assess whether the remaining Gbe communities could benefit from these literacy efforts or whether additional development programs in some of the remaining communities would be beneficial, a sociolinguistic study of the language continuum was launched at the end of the 1980s by the Togo-Benin branch of SIL International.

During the first phase of this study, word and phrase lists were elicited in 49 Gbe varieties, to obtain a rough estimate of the computed degrees of linguistic similarity between these varieties, to identify how these varieties might be treated as clusters, and to establish priorities for further sociolinguistic research.

Given the study’s overall objective of assessing the extensibility of the literacy efforts already existing, a synchronic approach was chosen for this analysis of the elicited lexical Gbe data (Kluge 2000, forthcoming). The primary tool for this approach is the inspection method, first described by Gudschinsky (1955). Focusing on phonetic similarity, this method measures the ‘relative degrees of lexical relationship’ of closely related languages as an indicator of potential intelligibility, with Gudschinsky (1956:206) concluding that knowledge of the degree of lexical similarity between language varieties “is invaluable in practical decisions regarding homogeneity of speech areas for vernacular schools, production of literature, etc.” (See also Sanders 1977, Saussure 1959, and Simons 1977).

However, as Grimes (1988) points out in his study on ‘Correlations between vocabulary similarity and intelligibility,’ high degrees of lexical similarity do not correlate with a high degree of intelligibility whereas the opposite relationship does hold, in that a low degree of lexical similarity always correlates with a low degree of intelligibility. Thus, Grimes (1988) concludes that intelligibility is unlikely when lexical similarity is below 60%. Further, Grimes (1988) concludes that, although high lexical similarity degrees are not reliable, valid indicators for high intelligibility degrees, similarity degrees of above 60% are helpful in identifying areas where more in-depth studies are necessary.

Following these conclusions, SIL’s ‘Language assessment criteria’ (International Language Assessment Conference 1990) gives the following recommendation for further interpretation of word-list results:

[When the word-list analysis results] indicate a lexical similarity between two speech forms of less than about 70% (at the upper confidence limit of the calculation), this generally indicates that these are different languages. ... If the similarity is more than 70%, dialect intelligibility testing is needed to determine how well people can understand the other speech form.

Concerning the procedures involved in making lexical similarity decisions, Gudschinsky (1956) provides a set of guidelines to identify lexical items that are both phonetically and semantically similar and to group these into sets of probable cognates. These guidelines have since been adapted for use in linguistic and sociolinguistic language surveys, for example by Blair (1990:30ff), in his manual for small-scale language surveys which provides a set of criteria for the comparison of pairs of phones in two words. These guidelines take into account word length, but do not deal with the morphemic structure of the elicited lexical items, i.e. how to handle complex polymorphemic word-structures such as additional morphemes and reduplication.

Ideally though, lexical similarity decisions should be based on a thorough morphemic analysis of the elicited lexical items, isolating the equivalent basic stem morphemes in each pair of words. In the context of sociolinguistic language surveys that are limited in scope and time, however, and focus on the extensibility of potential or already existing literacy efforts, such a thorough analysis proves rather unfeasible, especially for languages with complex polymorphemic word-structures (Probst 1992, and Sanders 1977).

In the context of sociolinguistic studies focusing on literacy extensibility, therefore, SIL language survey teams have been employing the inspection method in their lexical analyses to a large extent, applying the principles outlined by Blair (1990) to make similarity decisions without thorough morphemic analysis. Subsequently, SIL’s language assessment criteria are applied to the results of this analysis to establish priorities for further sociolinguistic research: if lexical similarity is higher than 70%, intelligibility testing is recommended to assess whether speakers can understand the other speech forms well enough to be able to benefit from the same literacy efforts, whereas it is assumed that speech communities sharing less than about 70% lexical similarity would not understand the other speech forms well enough to be able to benefit from the same efforts (see for example Brye and Brye 2004, Dettweiler and Dettweiler 2003, Harrison et al. 1999, and Hochstetler et al. 2004).

For the analysis of the elicited lexical Gbe data, the fact that the established procedures for lexical similarity decisions take into account word length but not the morphemic structure of the elicited items raised the question how to deal with the polymorphemic word-structure, characterized by additional morphemes and reduplication, to be found in a fair number of the elicited items. Moreover, the threshold of 70% lexical similarity raised the question whether and to what extent the application of different similarity judgment criteria in analyzing these lexical items would result in differing conclusions as to the prioritization of further sociolinguistic research.

Thus, in analyzing the Gbe data set, two different sets of similarity judgment criteria were applied to explore their effects on the computed degrees of lexical similarity, initial clustering of these varieties, and recommended priorities for further sociolinguistic research. (Regarding the question of whether – for the elicited Gbe word lists – the application of a synchronic approach by means of the inspection method yields comparable results to the findings of a diachronic study, conducted by Capo (1986), see Kluge (2000)).

After presenting, in Section 2, pertinent background information on the Gbe language continuum, Section 3 describes the methodology employed in analyzing the elicited lexical features which resulted in the computation of two lexical similarity matrices based on the two different sets of similarity judgment criteria. Section 4 presents the findings of this analysis, focusing on the statistical relationships between both computations and the clustering of the Gbe varieties according to both criteria sets. In Section 5, the findings of this analysis are discussed with special emphasis given to the question whether the application of different similarity judgment criteria sets leads to the same or different conclusions.

 

2. Background information on Gbe[2]

The Gbe speech varieties are located in the southeastern part of West Africa and spoken by a total of approximately eight million speakers (estimate by Gordon 2005). Stewart (1989), revising Bennett and Sterk’s subclassification (Bennett and Sterk 1977), proposes the following classification for the Gbe language varieties:

Niger-Congo, Atlantic-Congo, Volta-Congo, Kwa, Left Bank, Gbe.

Based on the phonological and morphophonological characteristics of the Gbe varieties, Capo (1986; see also Capo 1991) suggests the internal classification of Gbe into five cluster: Aja, Ewe, Fon, Gen, and Phla-Phera (see also Figure 1). Listed according to their geographical location from west to east, the following varieties are assigned to each of these clusters.

Ewe cluster

Gen cluster

Aja cluster

Phla-Phera cluster

Fon cluster

Adan

Kpelen

Agoi

Dogbo

Alada

Agbome

Agu

Peci

Anexo

Hwe

Ayizo

Arohun

Aveno

Towun

Gen

Sikpi

Kotafon

Kpase

Awlan

Ve

Gliji

Tado

Saxwe

Gun

Dayin

Vhlin

   

Tofin

Maxi

Fodome

Vo

   

Toli

Weme

Gbin

Waci

   

Xwela

 

Ho

Wance

   

Xwla

 

Kpando

         
Table 1: Classification of Gbe varieties (Capo 1986:101, map 1b)

Thus far, four of the five Gbe clusters have been targeted for language-based development on a relatively large scale, Aja, Ewe, Fon, and Gen, with written materials being available in two varieties of the Fon cluster, i.e. in the Fon[3] and in the Gun varieties (Capo 1986; Direction de l’Alphabétisation 1992; Direction de l’Alphabétisation n.d., a,b,c; Direction Nationale de l’Alphabétisation n.d.; Duthie 1988).

Figure 1

Figure 1: Map of the Gbe language area (based on Capo 1986: map 1a)

3. Methodology

On the basis of Capo’s (1986) comparative study of the Gbe language continuum, 100-word lists were elicited in 49 Gbe varieties. The word list used was based on Swadesh’s 100-word list (Swadesh 1955) and the word list published in the Atlas linguistique du Cameroun (Dieu and Renaud 1983). The analysis of the elicited 49 Gbe word lists was conducted in four steps. Step one consisted of a qualitative analysis and focused on the identification of similar lexical items. Steps two to four consisted of a quantitative analysis focusing on the computation of two similarity matrices, a paired t-test and correlation analysis, and multidimensional scaling of both matrices. In addition, a descriptive analysis was conducted to compare the findings of multidimensional scaling to the computed lexical similarity percentage matrices.

 

3.1 Identification of similar lexical items

During this first, qualitative, part of the analysis, the elicited word lists were analyzed from a synchronic perspective applying the inspection method to determine the relative degrees of lexical similarity among the investigated Gbe varieties.

The lexical similarity decisions were based on the principles outlined by Blair (1990:31ff), allowing for a few modifications. According to these guidelines, two lexical items are judged to be phonetically similar if at least half of the segments compared are the same or very similar[4] and of the remaining segments at least half are rather similar.[5] Since these guidelines do not take into account the morphemic structure of lexical items, the question presented itself how to deal with the polymorphemic word-structure, i.e. additional morphemes and reduplication, to be found in a fair number of the elicited items.

In a paper evaluating different methods in dealing with multimorphemic words in lexicostatistics, Probst (1992) draws attention to two major approaches. The first one requires a thorough morphophonemic analysis, and thus proves rather unfeasible in the context of sociolinguistic language surveys limited in scope and time. The second approach does not require a morphological analysis and is suggested by Schooling (1981). In his survey of French Polynesia, Schooling (1981) compared lexical items as a whole, ignoring reduplication and disregarding additional morphemes that occur in the same position.

Given the occurrence of reduplication and additional morphemes in a fair number of the elicited Gbe data and further given the limited scope of the lexicostatistical part of the larger Gbe study, Schooling’s (1981) approach was chosen as the basis for the similarity judgments. Thus, following the guidelines outlined by Schooling (1981), a set of similarity judgment criteria (Criteria Set 1) was established that disregarded differences in the morphological structure of the elicited items:

  1. Additional morphemes were disregarded if they occur in the same position.
  2. Reduplication, regardless of the nasalization of the vowel, was disregarded.
  3. Class prefixes on nouns were disregarded.

To explore the effects of different similarity judgment criteria on the lexicostatistical comparison of polymorphemic words, Probst (1992) suggests that a second, more rigorous set of criteria be applied to the same data set. Following these suggestions, a second set of criteria was established that does not ignore differences in the morphological structure:

1. Pairs of complete words were compared.

2. Additional morphemes were included in the analysis.

3. Reduplication was included in the analysis.

4. Class prefixes on nouns were disregarded.

Table 2 gives three examples for both similarity judgment criteria sets. The first example considers affixed morphemes, displaying four lexical items for Gloss #20 ‘cow’. For Criteria Set 1, apparently affixed morphemes occurring in the same position were disregarded. Thus, focusing on the morpheme [ɲĩ], the elicited words were considered lexically similar for the Arohun, Ayizo, and Be varieties as well as for the Arohun and Dogbo varieties. However, the items for Dogbo and Be as well as for Dogbo and Ayizo were not considered similar since the additional morphemes do not occur in the same position. Applying the second, more rigorous, criteria set to the same data, only the items for Ayizo and Be were considered lexically similar, whereas the items for the Arohun and Ayizo varieties, the Arohun and Be varieties, the Arohun and Dogbo varieties, as well as for the Ayizo and Dogbo varieties were considered lexically non-similar.

Table 2

Table 2: Similarity judgment criteria sets

The second example considers reduplication. For Criteria Set 1, reduplication was disregarded, and thus for gloss #21 ‘goat’ the lexical items were considered similar for the Alada, Awlan, and Kpelen varieties. In contrast, for Criteria Set 2, only the items for Alada and Kpelen were considered lexically similar, whereas for the Alada and Awlan varieties as well as for the Alada and Kpelen varieties they were considered lexically non-similar. Finally, in the third example class prefixes were disregarded for both criteria sets, and thus the elicited items for gloss #6 ‘head’ were considered similar for the Kpelen, Maxi, and Saxwe varieties.

Following these examples, the complete Gbe data set was analyzed with both similarity judgment criteria sets being applied to the elicited data. For 84 glosses (84%) of the 100-item word list, the different treatment of reduplication and additional morphemes resulted in different similarity judgment decisions in at least one dialect pair.

Employing WORDSURV (Wimbish 1989), a computer program designed for analyzing language survey word lists, the elicited data were organized in two different databases referring to the two different criteria sets.

 

3.2 Computation of WORDSURV similarity matrices

During this second step of the analysis, the similarity judgments arrived at during the first step of the analysis were quantified, with WORDSURV computing two lexical similarity percentage matrices based on a count of shared similar lexical items between each pair of Gbe varieties: word list computation 1 (‘WLC1’) is based on Criteria Set 1 which disregards differences in the morphological structure, and word list computation 2 (‘WLC2’) is based on Criteria Set 2 which does not disregard such differences.

In addition to a lexical similarity percentage matrix, WORDSURV computes a range of error for each count, based on the reliability of the word list data which takes into account the researcher’s familiarity with the speech varieties under study, availability of good bilingual informants, and opportunities to double-check elicited items (Wimbish 1989:31).

In light of SIL’s language assessment criteria for further interpretation of word-list results, as explained above, the lexical similarity percentages reported in this study refer not to the actual measured degree of similarity; rather, they are the sum of the actual measured degree of similarity plus the upper range of error.[6]

 

3.3 Paired t-test and correlation analysis

To explore whether – based on the different treatment of reduplication and additional morphemes – the different similarity judgment decisions in the 84 glosses of the 100-item word list resulted in significantly different lexical similarity percentages, a paired t-test was conducted. One requirement of a t-test is that the observations be independent. Any pair of lexical similarity percentages, however, that have any variety in common are not independent; even pairs that have no variety in common, but which are based on very closely related varieties are not independent. Therefore, to obtain a set of approximately independent observations, 24 location pairs were chosen such that no location is in more than one pair, with each pair consisting of varieties that are about four “steps” away from each other in the matrix of similarity percentages.

In addition, a correlation analysis was conducted to explore whether or not the relationship between the two word-list computations is statistically significant, or in other words whether the two word-list computations indicate the same or different relative relationships between the Gbe speech varieties compared. These analyses were conducted employing the statistical computer package SPSS for Windows, release 9.0 (SPSS Inc. 1998; see also Norušis 1993, Savage 1999, and Simons 1979) and the package R, release 1.9.1 (R-Project, n.d.).

To test whether the computed correlation is statistically significantly greater than 0.70, an approximate 95% confidence interval was constructed using the bootstrap method (Efron and Tibshirani 1993). If the lower bound of this interval is greater than 0.70, then the correlation between the two methods is statistically significantly greater than 0.70. Given that, as explained above, the full set of observations is not statistically independent, location pairs used for this significance test were 24 selected pairs, the same as chosen for the paired t-test.

 

3.4 Multidimensional scaling

In the final step of the analysis, the computed word-list similarity matrices were analyzed with multidimensional scaling (MDS), a statistical procedure designed to analyze the structure of dissimilarity (or similarity) data. The results of this analysis are presented in a perceptual map, as shown in Figure 2, that displays the data as a configuration of points, as on a map, along two, three or more dimensions with the proximity of these points to each other indicating how similar they are.

The computed perceptual maps do not directly indicate what the dimensions and configurations refer to but it is left to the researcher to interpret what they represent. Most commonly, ‘dimensional interpretation’ and ‘neighborhood interpretation’ are applied to explain as much of the displayed configuration as possible. ‘Dimensional interpretation’ focuses on large distances along the dimensions of the computed plots and requires the researcher to interpret what the dimensions represent. In contrast, ‘neighborhood interpretation’ focuses on data clustering due to large similarities and requires the researcher to identify groups or neighborhoods of stimuli in the multidimensional space (in Figure 2 the identified neighborhoods are indicated by the ovals).

Figure 2

Figure 2: Perceptual MDS map (sample)

In addition, MDS provides for each configuration two measures to test the results for reliability and validity: (1) the ‘squared simple correlation’ (RSQ) to determine what proportion of variance of the scaled data can be accounted for by the MDS procedure, and (2) a measure of stress (‘Kruskal’s stress’) to measure how well the derived configuration matches the input data. RSQ values of ≥0.60 are generally considered acceptable, whereas with Kruskal’s measure of stress, small values approaching 0.0 indicate an acceptable goodness of fit.

According to Woods et al. (1986) and Scholfield (1991), to name but two, MDS can also be applied to the analysis of linguistic data such as speech sounds or words, respectively, in that it provides a measure of distance or dissimilarity for these data by considering all pairs of observations (see Garrett et al. 1999; Hair et al. 1998; Kruskal and Wish 1978; Manly 1986). Thus, for the current study, MDS was employed as an exploratory data analysis tool to investigate the clustering of the Gbe varieties based on the computed degrees of lexical similarity as indicated by the computed word-list similarity matrices.

This analysis was conducted with the statistical computer package SPSS for Windows, release 9.0 (SPSS Inc. 1998; see also Norušis 1993). A two-dimensional analysis was employed with the scaling model based on the default ‘Euclidean distance measure’. With regard to the interpretation of the computed MDS configurations, a combination of neighborhood interpretation and dimensional interpretation was employed, as suggested by Kruskal and Wish (1978). Thus, the computed perceptual maps were investigated as to what the displayed dimensions represent while at the same time taking into account the clustering of the data.

Due to the large number of Gbe varieties, the analysis of the complete set of 49 Gbe varieties resulted in a very dense configuration rendering a more detailed interpretation of the computed plots difficult. Thus, for each word-list matrix a total of three MDS plots was computed, the first one including all 49 Gbe varieties, the second one referring to the western and the third one to the eastern Gbe varieties.

Given that MDS was employed as an exploratory data analysis tool, no rigorous validation techniques have been applied. However, an initial descriptive analysis of the computed similarity matrices was conducted to compare the MDS findings to the computed lexical similarity percentage matrices.

4. Results

An initial descriptive analysis of the two computed similarity matrices indicates a chaining pattern for the entire Gbe cluster. The Gbe speech groups are situated geographically in a contiguous arrangement from the southwestern corner of Nigeria across southern Benin and Togo into Ghana’s Volta region. Thus, the individual Gbe speech groups have contact relationships with the other Gbe groups surrounding them which results in the linguistic convergence of adjoining groups: overall, the investigated Gbe speech forms are marked by relatively small lexical differences between adjoining dialects whereas differences are greater between Gbe varieties at opposing ends of the chain. This chaining pattern is evidenced for both computed word-list similarity matrices in which neighboring groups in the chain are more or less placed in adjoining columns and rows with the highest degrees of lexical similarity occurring on the diagonal and the lowest in the corner with degrees of lexical similarity getting sequentially larger from the corner to the diagonal.

 

4.1 Statistical Relationships between WLC1 and WLC2

Assessing lexical similarity of polymorphemic items, characterized by reduplication and/or additional morphemes, according to the two different similarity judgment criteria sets lead to different similarity judgment decisions for 84 glosses (84%) of the 100-item word list. For the more rigorous similarity judgment criteria set, WLC2, these different decisions resulted in an overall average degree of lexical similarity of 64% with similarity percentage ranges of 45-100%, whereas for WLC1 the overall average degree of lexical similarity was higher with 73% and the range smaller with 54-100% (n=1176), thus yielding an observed difference of 9% between the mean WLC1 percentage and the mean WLC2 percentage.

Results of the paired t-test indicate that the observed mean difference between the WLC1 and WLC2 values for the 24 observations (selected as described in Section 3.3) is 9.375 which is statistically significantly different from 0 (p-value < 0.0001). Thus, it can be concluded that the two different similarity judgment criteria sets yield statistically significantly different lexical similarity percentages.

Correlation analysis of WLC1 and WLC2 results in a narrow scatter, indicating a linear and positive relationship (see Figure 3). This narrow scatter leads to a high correlation coefficient, r = 0.9395.

Figure 3

r = 0.9395

Figure 3: Scatterplot for word-list computations WLC1 and WLC2 – Complete data set

 

The correlation between WLC1 and WLC2 based on only the 24 observations selected, as described in Section 3.3, is r = 0.9202 (see Figure 4). An approximate 95% confidence interval for the correlation between WLC1 and WLC2 is (0.8751, 0.9583). Since the lower bound of this confidence interval is larger than 0.70, the correlation between the two similarity measures is statistically significantly greater than 0.70.

Figure 4

r = 0.9202

Figure 4: Scatterplot for word-list computations WLC1 and WLC2 – 24 observations

 

These findings indicate a significant linear relationship between WLC1 and WLC2 with either revealing the same relative distances between the pairs of observed values. Thus, it can be concluded that for the analysis of the Gbe data the two different sets of similarity judgment criteria imply the same relative relationships between the investigated speech varieties. These findings support the conclusion that use of either set of similarity judgment criteria would lead to similar clustering results. To verify this conclusion both word-list computations were further analyzed with multidimensional scaling.

 

4.2 Clustering of the Gbe language varieties

Multidimensional scaling of the computed word-list matrices results in the MDS plots presented in Figure 5 and Figure 6, with the MDS stress values of 0.187 and 0.246, and the RSQ values of 0.905 and 0.857 for WLC1 and WLC2, respectively, indicating that the derived configurations have an acceptable correlation with the input data. The ovals denote my interpretation of the clusters as indicated by dimensions 1 and 2 of the MDS plots. According to this interpretation, dimension 1 refers to the degree of inter-group similarity of the Gbe varieties, whereas dimension 2 refers to the degree of intra-group similarity for each cluster.

Figure 5

Figure 5: Major Gbe clusters – MDS plot of computation WLC1

 

For both computations, displayed in Figure 5 and Figure 6, the MDS findings indicate three distinct clusters of Gbe varieties in terms of their degrees of inter-group similarity. Given their distinct geographical locations, these clusters are – for the purposes of this paper – tentatively referred to as ‘Eastern, Central, and Western Gbe’ (see Figure 1, Section 2).

Figure 6

Figure 6: Major Gbe clusters - MDS plot of computation WLC2

 

In this context, mention needs to be made of the following disparities between the MDS findings and the results indicated by a sociolinguistic survey conducted among two of the 49 investigated Gbe varieties, Sikpi and Tohoun-Hwe. The MDS plots for both word-list computations identify Tohoun-Hwe as a component of the western Gbe cluster and Sikpi as a component of the eastern Gbe cluster (see Figure 5 and figure 6). By contrast, the findings of a more recent and in-depth sociolinguistic survey (including the elicitation of word lists) of the Central Gbe, i.e., Aja communities, conducted in late 1996, clearly identified Sikpi and Tohoun-Hwe as varieties of Aja and, therefore, as components of a central Gbe cluster that consists entirely of Aja varieties (Tompkins and Kluge 2002). Therefore, the results concerning Sikpi and Tohoun-Hwe are discussed together with the results for the remaining Aja varieties in Central Gbe varieties (Section 4.2.2).

The placement of the Gbe varieties into three distinct clusters appears to concur with the respective average degrees of intra- and inter-group lexical similarity. For both computations, average degrees of intra-group similarity are higher than average degrees of inter-group similarity, as displayed in Table 3.

Gbe cluster

Intra-group similarity ranges

Average intra-group similarity

Gbe cluster

Average inter-group similarity

 

WLC1

WLC2

WLC1

WLC2

 

WLC1

WLC2

Western Gbe

71-100

66-100

82

76

Western – Central

70

62

Central Gbe

86-100

82-100

91

87

Western – Eastern

64

56

Eastern Gbe

72-100

55-100

85

74

Eastern – Central

68*

56

* Excluding Sikpi yields an average similarity degree of 67%.

Table 3: Average degrees of intra-group and inter-group similarity for the major Gbe clusters

 

In terms of intra-group similarity, the MDS findings show a rather compact placement of the central Gbe varieties, suggesting little intra-group variation, whereas the placement of the western and eastern Gbe varieties is fairly scattered, thus pointing to higher degrees of intra-group variation. These findings are evidenced by the respective ranges of intra-group lexical similarity, as displayed in Table 3, with the range of similarity degrees for the Central Gbe varieties being smaller than for the Eastern and Western Gbe varieties. Excluding Sikpi and Tohoun-Hwe from the analysis results in the same similarity ranges and average similarity degrees, except for an average inter-group similarity degree of 67% rather than 68% between Eastern and Central Gbe (WLC1).

Table 4 gives an overview of the three identified major Gbe clusters and their components; Sikpi and Tohoun-Hwe are grouped together with the central Gbe varieties.

Western Gbe varieties

Central Gbe varieties

Eastern Gbe varieties

Located in Benin, Togo and Ghana between the Mono river to the east and the Volta and Dayi rivers to the west.

Capo’s (1986) ‘Ewe’ and ‘Gen’ clusters.

Located in Togo and Benin along both sides of the Mono river to the west and the Kouffo river to the east.

Capo’s (1986) ‘Aja’ cluster.

Located in Benin and Nigeria between the Mono river in southwestern Benin and the Yewa river in southwestern Nigeria.

Capo’s (1986) ‘Fon’ and ‘Phla-Phera’ clusters.

Adan

Ho

Aja-Dogbo

Agbome

Gbesi

Seto

Agoi / Gliji

Kpelen

Aja-Hwe (Aplahoué)

Ajra

Gbokpa

Tofin

Agu

Kpesi

Aja-Hwe (Azovè)

Alada

Gun

Toli

Anexo

Togo

Aja-Hwe (Gboto)

Arohun

Kotafon

Weme

Aveno

Vhlin

Aja-Hwe (Tohoun)

Ayizo

Kpase

Xwela

Awlan

Vo

Aja-Sikpi

Ci

Maxi

Xwla (eastern)

Be

Waci

 

Daxe

Movolo

Xwla (western)

Gbin

Wance

 

Fon

Saxwe

 

Gen

Wundi

 

Gbekon

Se

 

Table 4: Major Gbe clusters with elicited Gbe varieties according to their larger geographical setting

 

Closer examination of both similarity matrices indicates the same three major Gbe clusters with the same components as the MDS findings. Although lexical similarity percentages differ significantly due to the different similarity judgment criteria sets, a similar pattern emerges for both word-list similarity matrices. For WLC1 and WLC2, average degrees of lexical similarity for the entire Gbe cluster are 73% and 64%, with ranges of 54-100% and 45-100%, respectively (excluding Aja-Sikpi and Aja-Tohoun from the analysis), whereas average degrees of inter-group lexical similarity are lower (WLC1: ≤70%, WLC2: ≤62%), and average degrees of intra-group lexical similarity for the identified three Gbe clusters are higher (WLC1: ≥82%, WLC2: ≥74%). Due to the chaining pattern of the Gbe varieties, though, there is some overlap between the largest degrees of inter-group similarity and the smallest degrees of intra-group similarity.

In the following sections, detailed MDS findings for both word-list computations are presented according to their larger geographical clustering, i.e., the western, central, and eastern Gbe varieties.

 

4.2.1 Western Gbe varieties

The western Gbe varieties are located in Ghana, Togo and Benin between the Volta and Dayi rivers to the west and the Mono river to the east (Capo 1986: map 1a). For the current study, the following western Gbe varieties were included in the analysis:

Adan, Agoi/Gliji, Agu, Anexo, Aveno, Awlan, Be, Gbin, Gen, Ho, Kpelen, Kpesi, Togo, Vhlin, Vo, Waci, Wance, and Wundi

The MDS results for the Western Gbe varieties are displayed in Figure 7 and Figure 9, with the MDS stress values of 0.208 and 0.256, and RSQ values of 0.787 and 0.665 for WLC1 and WLC2, respectively, indicating that the derived configurations have an acceptable correlation with the input data. The ovals denote my interpretation of the clusters as indicated by dimensions 1 and 2 of the MDS plots. According to this interpretation, dimension 1 refers to the degree of inter-group similarity of the western Gbe varieties, whereas dimension 2 refers to the degree of intra-group similarity for each cluster.

For WLC2 (Figure 7), the MDS findings indicate three distinct clusters of the western Gbe varieties in terms of their degrees of inter-group similarity: the Gen and Ewe clusters and a distinct third cluster –preliminarily referred to as ‘Cluster-3’– placed between the Gen and Ewe clusters. The Gen cluster comprises the following varieties: Agoi/Gliji, Anexo, Gen, Kpesi, Vo, Waci, Wance, and Wundi, with the Gen variety already standardized and being used as the language of nonformal education in the southwestern part of Benin. For the Ewe cluster the following varieties are identified by the MDS plot: Adan, Agu, Aveno, Awlan, Be, and Togo, with Awlan the most prestigious variety of Ewe and the basis for Standard Ewe. Finally, Cluster-3 comprises Gbin, Ho, Kpelen, and Vhlin.

Figure 7

Figure 7: Western Gbe varieties – MDS plot for computation WLC2

 

The placement of the western Gbe varieties into these three distinct clusters appears to correspond to the more or less distinct geographical locations of these varieties, as indicated in Figure 8. It also appears to concur with the respective average degrees of intra- and inter-group lexical similarity: for the identified clusters average degrees of intra-group similarity are >80% (Ewe: 82%, Gen: 81%, Cluster-3: 90%), whereas average degrees of inter-group lexical similarity are lower, i.e., <75% (Ewe – Gen: 72%, Ewe – Cluster-3: 74%, Gen – Cluster-3: 73%).

With regard to their degrees of intra-group similarity, the MDS findings show a rather compact placement of the Cluster-3 varieties, suggesting little intra-group variation, whereas the placement of the Ewe varieties is somewhat less dense, thus pointing to a higher degree of intra-group variation. These findings are evidenced by the respective ranges of intra-group lexical similarity of 81-97% for Cluster-3, and 73-100% for Ewe.

Finally, the MDS plot indicates a fairly scattered placement of the Gen varieties with the Wance and Wundi varieties and the Kpesi variety situated the furthest apart, thus suggesting relatively high degrees of intra-group variation, especially between these three varieties. Again, these findings are evidenced by the respective ranges of intra-group lexical similarity of 71-100%. The fairly scattered placement of the Gen varieties with the rather high degree of intra-group variation is due to the somewhat low degrees of lexical similarity between the Wance and Wundi varieties and Kpesi (72% and 71%, respectively), whereas otherwise average degrees of intra-group similarity are higher: 100% between Wance and Wundi, and 84% between the remaining Gen varieties (Agoi/Gliji, Anexo, Gen, Kpesi, Vo, and Waci).

Figure 8

Figure 8: Geographical location of the Western Gbe varieties

 

In terms of their degrees of inter-group similarity, the MDS findings for WLC1 indicate the same three western Gbe clusters as indicated for WLC2, i.e., the Gen and Ewe clusters and a distinct third cluster (see Figure 9). Again, this grouping appears to be supported by the computed degrees of lexical similarity: average degrees of inter-group similarity are ≤80% (Ewe – Gen: 80%, Ewe – Cluster-3: 76%, Gen – Cluster-3: 80%) whereas average degrees of intra-group similarity are higher, i.e., >85% (Ewe: 86%, Gen: 87%, Cluster-3: 92%).

Figure 9

Figure 9: Western Gbe varieties – MDS plot for computation WLC1

 

Cluster-3 comprises the same Gbin, Ho, Kpelen, and Vhlin varieties, with the MDS findings again showing a rather compact placement of these varieties, thus pointing to a low degree of intra-group variation. These findings are evidenced by the respective ranges of intra-group lexical similarity of 85-98%

The Gen cluster, situated between the Cluster-3 and Ewe cluster, comprises the same western Gbe varieties as indicated for computation WLC2. Again, the MDS plot indicates a fairly scattered placement of the Gen varieties, the spread being even wider than for computation WLC2: the Agoi/Gliji, Anexo, Gen, and Kpesi varieties and the Vo, Waci, Wance, and Wundi varieties are situated at opposite ends of the cluster, thus indicating rather high degrees of intra-group variation. Again, these placements appear to concur with the respective computed degrees of lexical similarity. For the subgroupings of the Agoi/Gliji, Anexo, Gen, and Kpesi varieties and the Vo, Waci, Wance, and Wundi varieties average degrees of intra-group similarity are 94% and 91% with ranges of 88-100% and 87-99%, respectively. In contrast, the degrees of lexical similarity between both subgroupings are lower with 83% average inter-group similarity and a range of 80-86%, resulting in the fairly extensive spread of the Gen cluster varieties along dimension 2 as displayed in Figure 9.

The Ewe cluster comprises the same western Gbe varieties as attested to by the MDS findings for WLC2, i.e., Adan, Agu, Aveno, Awlan, Be, and Togo. However, whereas for WLC2 the MDS findings show a somewhat dense placement of the Ewe varieties, suggesting only moderate intra-group variation, the MDS plot for WLC1 shows a fairly large scatter of these varieties with the Adan and Awlan varieties and the Agu, Aveno, Be, and Togo placed at opposite poles within the cluster, thus suggesting rather high degrees of intra-group variation. Again, these placements appear to concur with the respective computed degrees of lexical similarity. For the subgroupings of the Adan and Awlan varieties and the Agu, Aveno, Be, and Togo varieties average degrees of intra-group similarity are 100% and 92% (with a range of 86-99%), respectively. In contrast, the degrees of lexical similarity between both subgroupings are lower with 80% average inter-group similarity and a range of 79-81%, which results in the fairly extensive spread of the Ewe cluster varieties as displayed in Figure 9.These rather high degrees of intra-group variation within the Gen and Ewe cluster varieties suggest possible alternative groupings for some of the western Gbe varieties, namely for the Agu, Aveno, Be, and Togo subgrouping (Ewe cluster) and the Vo, Wance, Waci, and Wundi subgrouping (Gen cluster), both of which the MDS plot situates in rather close proximity to each other, as displayed in Figure 10.

Figure 10

Figure 10: Western Gbe varieties – MDS plot for WLC1 with alternative groupings

 

For the Agu, Aveno, Be, and Togo varieties the findings indicate possible groupings with (1) the Adan and Awlan varieties; (2) the Vo, Wance, Waci, and Wundi varieties in a distinct fourth cluster; or (3) with the entire Gen cluster comprising the Agoi/Gliji, Anexo, Gen, Kpesi, Vo, Wance, Waci, and Wundi varieties. These three alternative groupings result in comparable average degrees of intra-group similarity, i.e., 86%, 87%, 85%, respectively. Likewise, the findings indicate possible groupings of the Vo, Wance, Waci, and Wundi varieties with (1) the Agu, Aveno, Be, and Togo varieties; or (2) the Agoi/Gliji, Anexo, Gen, and Kpesi varieties, resulting in the same average degree of 87% intra-group similarity for both indicated alternative groupings.

Thus, these findings indicate the following two alternative clusterings of the western Gbe varieties:

Cluster-3: Gbin, Ho, Kpelen, and Vhlin
Ewe: Adan and Awlan
Gen: Agoi/Gliji, Agu, Anexo, Aveno, Be, Gen, Kpesi, Togo, Vo, Wance, Waci, and Wundi

 

Cluster-3: Gbin, Ho, Kpelen, and Vhlin
Ewe: Adan and Awlan
Gen: Agoi/Gliji, Anexo, Gen, Kpesi
Distinct fourth cluster: Agu, Aveno, Be, Togo, Vo, Wance, Waci, and Wundi

 

4.2.2 Central Gbe varieties

The MDS findings for both word-list computations indicate a distinct grouping of central Gbe varieties (see Figure 5 and Figure 6). Closer examination of this cluster shows that all components of the central Gbe cluster are in fact varieties of the Aja language. These are located in both Benin and Togo over a large area on both sides of the Mono river. The Aja varieties of Benin are primarily situated in an area bordered by the Kouffo river to the east and the Mono river to the west in Benin’s Mono region, whereas the Togolese Aja varieties are situated west of the Mono river in the southeastern corner of the Plateaux region and the eastern part of the Maritime region (see Figure 11), (Tompkins and Kluge 2002).

Figure 11

Figure 11: Geographical location of the Central (Aja) Gbe varieties

 

Within the context of the SIL study, word lists were elicited in 1988 and 1991 in the Dogbo variety, in the Hwe varieties spoken at Aplahoué, Azovè, Gboto, and Tohoun, and in the Sikpi variety. The MDS plots for both word-list computations indicate Dogbo, Aplahoué-Hwe, Azovè-Hwe, and Gboto-Hwe as components of the central Gbe, i.e., Aja cluster. These findings appear to concur with the respective degrees of lexical similarity with Dogbo, Aplahoué-Hwe, Azovè-Hwe, and Gboto-Hwe sharing fairly high degrees of lexical similarity (WLC1: 86-100%; WLC2: 82-100%).

Neither Tohoun-Hwe nor Sikpi, though, are identified as components of the Aja language. Instead, both MDS plots indicate Sikpi as an eastern Gbe variety and Tohoun-Hwe as a western Gbe variety (see Figure 5 and Figure 6).

However, in 1996 a more recent and in-depth sociolinguistic study of the Aja communities was conducted that included the administration of sociolinguistic questionnaires as well as the elicitation of word lists in Aja-Dogbo, Aja-Hwe, and Aja-Sikpi (only one Hwe word list was elicited). In addition, word lists were elicited in Aja-Tala and Aja-Tado. The results of the word-list analysis show rather high degrees of lexical similarity of ≥93% among the six Aja varieties, including Sikpi and Tohoun-Hwe (see Table 5).


Aplahoué-Hwe

     

97

Sikpi

       

98

97

Tohoun-Hwe

   

94

94

96

Dogbo

   

94

96

93

93

Tado

 

94

93

95

95

94

Tala

Table 5: Lexical similarity percentages for the Aja varieties (Tompkins and Kluge 2002)

 

These results indicate that all six varieties are fairly homogeneous with respect to their lexical inventory. In addition, Aja informants stated that Tohoun-Hwe is identical to the Hwe spoken in Aplahoué and Azovè in Benin. Thus, although the results do not include the western Gbe varieties, the findings clearly identify Sikpi and Tohoun-Hwe as varieties of Aja and therefore as components of the central Gbe cluster (see Tompkins and Kluge 2002).

The disparity between the findings of the Aja survey and the findings of the current MDS analysis could be due to the fact that for the study reported here, word lists were elicited from individual L1 speakers of the variety in question, some of whom were not residing in the language area. Thus, the procedure did not allow for the discussion of variants, with group decisions made by L1 speakers resident in the area, as to which form to include in the lists, thus potentially decreasing reliability.

 

4.2.3 Eastern Gbe varieties

The eastern Gbe varieties are located in Benin and Nigeria east of the Mono river, with the sole exception of a few western Xwla communities located to the west of the Mono river in southeastern Togo, in and around the town of Adamé (see Henson and Kluge 1999). For the current study, the following eastern Gbe varieties were included in the analysis:

Ajra, Agbome, Alada, Arohun, Ayizo, Ci, Daxe, Fon, Gbekon, Gbesi, Gbokpa, Gun, Kotafon, Kpase, Maxi, Movolo, Saxwe, Se, Seto, Tofin, Toli, Weme, Xwela, and eastern and western Xwla

For the Eastern Gbe varieties multidimensional scaling of the two computed word-list matrices results in the MDS plots displayed in Figure 12 and Figure 13, with the MDS stress values of 0.235 and 0.284, and the RSQ values of 0.746 and 0.677 for WLC1 and WLC2, respectively, indicating that the derived configurations have an acceptable correlation with the input data. The ovals indicate my interpretation of the clustering of the eastern Gbe varieties. For the MDS plot of WLC2, though, dimension 1 and dimension 2 do not lend themselves to a clear interpretation. However, the MDS plot suggests a dimensional interpretation along the superimposed dashed lines. Thus, according to my interpretation, dimension 1 and dimension 1’ roughly indicate a perceptual east-west axis with west located to the left and east to the right, whereas both dimension 2 and dimension 2’ represent the extent to which the eastern Gbe varieties are grouped within the Fon or the Phla-Phera clusters.

Figure 12

Figure 12: Eastern Gbe varieties – MDS plot of WLC1 (F.Gb. = Fon, Gbekon)

 

The MDS plots for both word-list computations indicate three distinct clusters along the perceptual Fon – Phla-Phera axis: a Fon cluster situated on the Fon side of this axis, and a western and an eastern Phla-Phera cluster situated at its Phla-Phera side. Along the perceptual east-west axis, the three clusters are situated according to their actual geographical distribution (as displayed in Figure 14) with the western Phla-Phera varieties placed on the western side, the eastern Phla-Phera varieties on the eastern side, and the Fon cluster varieties in the center.

Figure 13

Figure 13: Eastern Gbe varieties – MDS plot of WLC2 (S.X.M. = Seto, eastern Xwla, Movolo)

 

The following varieties are indicated by both MDS plots as components of the Fon cluster: Agbome, Alada, Arohun, Ayizo, Ci, Fon, Gbekon, Kpase, Maxi, and Weme. The MDS plots further suggest Daxe, Gbesi, Gbokpa, Kotafon, Saxwe, Se, Xwela, and western Xwla as components of the western Phla-Phera cluster, and Ajra, Gun, Movolo, Seto, Tofin, and eastern Xwla as components of the eastern Phla-Phera cluster.

In addition, Toli is indicated as a component of the eastern Phla-Phera cluster. According to Capo’s map of the Gbe language area (1986: map 1a), the Toli area is located in Benin’s Atlantique province in and around Tori-Bossito. However, during a more recent and more in-depth sociolinguistic study conducted among the neighboring Ayizo speech communities, it was determined that the speech form of Tori-Bossito is considered to be a variety of Ayizo, i.e., Ayizo-Tori. When asked regarding their comprehension of Toli, the interviewed Ayizo-Tori speakers reported that the speech of Toli speakers who live north of Porto Novo has some phonological differences “when compared with the speech of Tori speakers in the Tori-Bossito region, but these are said to be minor and do not impede comprehension” (Hatfield and McHenry 1998). These Toli communities north of Port-Novo refer to Toli speech communities located in southeastern Benin, in the Ouémé province, more specifically, according to the Atlas Sociolinguistique du Bénin, in the Ajara, Avlanku and Mixlete (Akpro-Misserete) districts (CNL du Bénin 1983), and thus in close geographical proximity to the Tofin and Ajra speech communities. It was here that the Toli word list was elicited. Thus, the map in Figure 14 includes an additional entry for Toli located in the eastern Phla-Phera language area.

Figure 14

Figure 14: Geographical location of the Eastern Gbe varieties

 

The grouping of the eastern Gbe varieties into three distinct clusters appears to be supported by the respective average degrees of inter- and intra-group lexical similarity for both word-list computations.

For WLC1 the lexical similarity matrix indicates for the identified clusters average degrees of inter-group similarity of ≤85% (eastern Phla-Phera – Fon: 84%, eastern Phla-Phera – western Phla-Phera: 81%, Fon – western Phla-Phera: 84%), whereas average degrees of intra-group similarity are higher, i.e., ≥87% (eastern Phla-Phera: 89%, Fon: 90%, western Phla-Phera: 87%). For WLC2 the same pattern emerges, although lexical similarity percentages are overall somewhat lower, with average degrees of inter-group similarity of ≤75% (eastern Phla-Phera – Fon: 75%, eastern Phla-Phera – western Phla-Phera: 71%, Fon – western Phla-Phera: 71%) and average degrees of intra-group similarity of ≥77% (eastern Phla-Phera: 83%, Fon: 78%, western Phla-Phera: 77%).

However, the MDS plot for WLC1 suggests a possible alternative grouping of Gun within the Fon cluster, with Gun being the eastern Phla-Phera variety situated the closest to the Fon cluster varieties. This placement appears to concur with the comparable high degrees of lexical similarity of Gun to the identified eastern Phla-Phera cluster varieties as to the above-mentioned Fon cluster varieties (88-94% versus 81-96%, respectively), thus suggesting a possible alternative grouping of Gun within the later cluster.

Likewise, the MDS findings indicate a possible alternative grouping of the Gbokpa and Kotafon varieties within the Fon cluster rather than within the western Phla-Phera cluster. Although both MDS plots situated Gbokpa and the WLC2-MDS plot also situates Kotafon more towards the western Phla-Phera side of the MDS plot, Gbokpa (according to the MDS plots for WLC1 and WLC2) and Kotafon (according to the MDS plot for WLC2) are also the varieties situated the closest to the Fon cluster varieties. These placements appear to concur with the respective lexical similarity percentages: for Gkokpa degrees of lexical similarity with regard to the identified western Phla-Phera varieties and the above-mentioned Fon cluster varieties are comparable (WLC1: 84-96% versus 84-94%, WLC2: 75-87% versus 73-92%, respectively); likewise, for Kotafon degrees of lexical similarity are comparable with regard to the western Phla-Phera varieties and the Fon cluster varieties (70-80% versus 70-85%, respectively).

Thus, these findings indicate the following alternative grouping of the eastern Gbe varieties: the Agbome, Alada, Arohun, Ayizo, Ci, Fon, Gbekon, Gbokpa, Gun, Kotafon, Kpase, Maxi, and Weme varieties group within the Fon cluster, the Daxe, Gbesi, Saxwe, Se, Xwela, and western Xwla communities within the western Phla-Phera cluster, and the Ajra, Movolo, Seto, Tofin, and eastern Xwla varieties within the eastern Phla-Phera cluster.

 

5. Discussion and conclusions

5.1 Clustering of the Gbe varieties

The different treatment of the elicited lexical Gbe items according to two different sets of similarity judgment criteria resulted in different similarity judgment decisions for 84 glosses (84%) of the 100-item word list. These different decisions yielded significantly different lexical similarity percentages with consistently lower lexical similarity degrees for WLC2 which is based on a more rigorous set of similarity judgment criteria. At the same time though, the results of the correlation analysis indicated a significant linear relationship between the two computed word-list similarity matrices, thus supporting the conclusion that use of either similarity judgment criteria sets would lead to similar clustering results.

The findings of the MDS analysis of both word-list computations corroborated this conclusion, indicating three major Gbe clusters, comprising the same subgroupings: western Gbe comprising the Ewe, the Gen, and a distinct third cluster, central Gbe referring to the Aja varieties, and eastern Gbe comprising the Fon cluster and a western and eastern Phla-Phera cluster (see Figure 15).

Figure 15

Figure 15: Map of the Gbe language area according to the findings of multidimensional scaling

 

In Table 6, the components of each of the identified Gbe clusters are listed as indicated by the MDS findings for both word-list computations.

Western Gbe

Central Gbe (Aja)

Eastern Gbe

 

WLC1

WLC2

WLC1

WLC2

WLC1

WLC2

 

Ewe

Adan

Adan

Dogbo

Dogbo

Ajra

Ajra

E. P-Phera

 

Awlan

Awlan

Hwe-Aplahoué

Hwe-Aplahoué

Movolo

Toli

 
 

Agu

Agu

Hwe-Azovè

Hwe-Azovè

Seto

Tofin

 
 

Aveno

Aveno

Hwe-Gboto

Hwe-Gboto

Tofin

Movolo

 
 

Be

Be

Hwe-Tohoun

Hwe-Tohoun

Toli

Seto

 
 

Togo

Togo

Sikpi

Sikpi

Xwla-e.

Xwla-e.

 

Gen

Vo

Vo

   

Gun

Gun

 
 

Waci

Waci

   

Alada

Alada

Fon

 

Wance

Wance

   

Gbekon

Gbekon

 
 

Wundi

Wundi

   

Arohun

Arohun

 
 

Agoi/Gliji

Agoi/Gliji

   

Fon

Fon

 
 

Anexo

Anexo

   

Agbome

Agbome

 
 

Gen

Gen

   

Ci

Ci

 
 

Kpesi

Kpesi

   

Maxi

Maxi

 

Cluster-3

Kpelen

Kpelen

   

Weme

Weme

 
 

Gbin

Gbin

   

Kpase

Kpase

 
 

Ho

Ho

   

Ayizo

Ayizo

 
 

Vhlin

Vhlin

   

Gbokpa

Gbokpa

W. P-Phera

         

Kotafon

Kotafon

 
         

Gbesi

Gbesi

 
         

Xwla-w.

Xwla-w.

 
         

Xwela

Xwela

 
         

Saxwe

Saxwe

 
         

Se

Se

 
         

Daxe

Daxe

 

Table 6: Grouping of the Gbe varieties according to the findings of MDS

 

For both computations, the suggested major groupings and their components concur for all 49 investigated Gbe varieties (100%): 18 western, 6 central, and 25 eastern Gbe varieties (see Table 6).

However, for a number of western and eastern Gbe varieties the MDS findings and the results of the descriptive analysis of the computed word list similarity matrices indicate possible alternative groupings.

With regard to the 18 western Gbe varieties, the findings for WLC1 indicate possible alternative groupings for the Agu, Aveno, Be, Togo, Vo, Waci, Wance, and Wundi varieties. Instead of grouping the Agu, Aveno, Be, and Togo varieties within the Ewe cluster, these varieties could also be grouped (1) in a distinct cluster together with the Vo, Waci, Wance, and Wundi varieties, or (2) within the Gen cluster together with Agoi/Gliji, Anexo, Gen, Kpesi, Vo, Waci, Wance, and Wundi. These alternative groupings would lower the levels of agreement between word-list computations WLC1 and WLC2: for (1) from 18/18 (100%) to 10/18 (56%) identical components, and for (2) to 14/18 (78%) identical components.

For the 25 eastern Gbe varieties, possible alternative groupings are indicated for three varieties, i.e., Gun (computation WLC1), Gbokpa (computations WLC1 and WLC2) and Kotafon (computation WLC2). If these three varieties were to be grouped within the Fon cluster, the groupings for both word-list computations would still concur for 23/25 (92%) of the eastern Gbe varieties.

Overall, the indicated inconsistencies between both word-list computations appear to be due to the fact that WLC1 is based on a less rigorous set of similarity judgment criteria resulting in a similarity matrix and MDS plot that is less clear in identifying similarities and differences, and thus tends to indicate possible alternative groupings. In contrast, WLC2 is based on a more rigorous set of similarity judgment criteria resulting in a similarity matrix and MDS plot that more sharply identifies similarities and differences.

However, in spite of the indicated possible alternative groupings for up to eight western Gbe varieties and up to three eastern Gbe varieties, the levels of agreement between the two word-list computations are still very high. If for the western Gbe cluster the Agu, Aveno, Be, Togo, Vo, Waci, Wance, and Wundi varieties were to be grouped within a distinct cluster, and if for the eastern Gbe cluster the Gbokpa, Gun, or Kotafon varieties were to be grouped within the Fon cluster, the groupings would still concur for 39/49 (80%) of the investigated Gbe varieties. The level of agreement would be even higher if for the western Gbe cluster the Agu, Aveno, Be, and Togo varieties were to be grouped within the Gen cluster (43/49 – 88%).

 

5.2 Scope and priorities of further sociolinguistic research

In terms of the threshold level of 70% suggested by SIL’s ‘Language assessment criteria’ (International Language Assessment Conference 1990), intra- and inter-group similarity degrees indicated that application of different similarity judgment criteria sets resulted in the same Western, Central and Eastern Gbe clusters and thus would not lead to different recommendations concerning more in-depth research, as far as the second level of the Gbe clustering hierarchy is concerned. As regards more in-depth research within each of the identified three major Gbe clusters, the MDS findings for both word-list computations lend themselves to the following recommendations.

 

5.2.1 Western Gbe

Given the existence of literature for the Ewe and Gen clusters, further research should give the highest priority to those western Gbe speech communities that are not components of these clusters, i.e., the Cluster-3 varieties. Such research would need to focus on determining whether the Kpelen, Gbin, Ho, and Vhlin varieties could benefit from one of those efforts, or whether an additional development program for Cluster-3 would be beneficial and if so, which variety could serve as the reference variety for this subgrouping. In addition, such research would need to investigate whether existing literature could appropriately be used by all components of the respective Ewe and Gen clusters. Given their geographical location in close neighborhood of the Gen cluster varieties, research among the Be and Togo communities also needs to investigate the extensibility of Gen materials to the Be and Togo communities.

The possible alternative groupings of Agu, Aveno, Be, and Togo within the Gen cluster, as indicated by the MDS findings for WLC1, would suggest further more in-depth research as to whether or not these communities could benefit from the existing Gen cluster literacy efforts. In addition, given the geographical proximity of the Be and Togo varieties to the Ewe cluster varieties, further research would preferably also determine the extensibility of literacy materials among these communities. Thus, the alternative grouping of the Be and Togo varieties would not result in different conclusions as to the scope of further more in-depth research. However, with regard to the Agu and Aveno varieties their alternative grouping within the Gen cluster would lead to a different scope of further research in that extensibility of the Gen cluster literacy efforts would become a major issue whereas otherwise it would not.

Grouping the Agu, Aveno, Be, Togo, Vo, Waci, Wance, and Wundi varieties within a distinct cluster would not result in a different research scope but in assigning higher priority to more in-depth research among these varieties since none of these varieties have undergone language-based development on a larger scale. Thus, further research would need to prioritize these varieties and focus on the question whether these communities could benefit from existing literature efforts in Gen and/or Ewe, or whether an additional development program for this grouping would be beneficial and if so, which variety could serve as the reference variety for this sub-cluster.

 

5.2.2 Central Gbe

The MDS findings for both word-list computations suggest that more in-depth sociolinguistic research among the Aja speech communities should focus on whether all Aja varieties could benefit from the same Aja literature materials. Such research was carried out in 1996 with the results indicating that in fact all Aja speech communities can benefit from the same literature materials (Tompkins and Kluge 2002).

 

5.2.3 Eastern Gbe

Given that literature already exists in the Fon and Gun varieties, the highest priority should be given to further research among the western Phla-Phera varieties which are not components of a cluster with existing literature. Such research would need to focus on determining whether these communities could benefit from existing literacy efforts in the Fon cluster and/or – given their geographical proximity to the Gen speaking communities – in the Gen cluster or whether additional language-based development for the western Phla-Phera cluster would be beneficial and if so, which variety could serve as the reference variety for this sub-cluster. More in-depth research among the eastern Gbe varieties also needs to investigate whether existing literature in Fon and Gun could appropriately be used by the remaining Gbe speech communities, i.e., whether the Fon cluster varieties could benefit from existing literature in Fon and whether the eastern Phla-Phera varieties could benefit from existing literature in Gun and/or – given their geographical neighborhood to the Fon speaking communities – Fon.

The indicated possible alternative groupings of Gun within the Fon cluster would not result in a different research scope but in assigning higher priority to more in-depth research among the eastern Phla-Phera varieties since with the exclusion of Gun none of these varieties will have undergone language-based development on a larger scale. Thus, further research needs to prioritize the eastern Phla-Phera varieties and focus on whether these communities could benefit from existing literature efforts in Fon and/or Gun. The alternative grouping of the Gbokpa and Kotafon varieties within the Fon cluster would result in similar conclusions as to determining the extensibility of existing literature in Fon to these communities, except that the priority for conducting such research is lower than for conducting further research among the remaining western Phla-Phera varieties.

 

5.3 Conclusions

The fact that for both word-list computations average degrees of intra-group lexical similarity were above and inter-group similarity below the indicated threshold of 70% for the identified western, central, and eastern Gbe clusters, lends itself to the conclusion that in terms of SIL’s ‘Language assessment criteria (International Language Assessment Conference 1990), different sets of similarity judgment criteria do not lead to different recommendations concerning the prioritization of more in-depth research as far as the second level of the Gbe clustering hierarchy is concerned.

However, it cannot be concluded that this would be the case for all languages, or even for all West African languages. In contrast, the fact that the calculated similarity percentages differ significantly across both computations suggests the probability that for a different group of speech varieties the application of different criteria sets might result in conflicting average degrees of inter- and intra-group lexical similarity that would subsequently lead to differing recommendations regarding the prioritization of further sociolinguistic research. Such differing recommendations, however, would raise the question as to which of the underlying similarity judgment criteria sets are more appropriate for the investigated language situation.

In light of these ambiguities, the approach chosen for the Gbe data was to take the analysis a step further and apply a combination of techniques to investigate the relative relationships and clustering of the investigated varieties rather than fixating on the indicated threshold level of 70% per se. The findings of correlation analysis and multidimensional scaling suggested that the consistent application of different similarity judgment criteria sets yields compatible conclusions as to the relative relationships and clustering of the investigated varieties as well as to the scope of further research. Further, the findings suggest that the less rigorous similarity judgment criteria bring out similarities and differences less clearly, and thus yield possible alternative groupings which would, overall, affect the degree of priority assigned to the research of the respective varieties, but not the scope of such research.

Given the current lack of an informed, commonly agreed-upon approach to the synchronic analysis of lexical items with polymorphemic word-structure, and further given the limited scope of the larger Gbe study of assessing literacy extensibility, this combination of different techniques was found to offer an expedient approach to the analysis of the rather large Gbe data set without requiring a thorough morphemic analysis which would have been, although preferable, rather unfeasible. More specifically, the comparison of the computed MDS plots, derived from two similarity judgment criteria sets which differ in rigorousness, allowed to verify the suggested groupings and their components, and to identify those Gbe varieties for which the indicated clustering is less clear, thus requiring special attention in more in-depth sociolinguistic research.

Dealing with but a single set of speech varieties, the approach offered here is suggested as a case study for examining the effects of different criteria sets and the usefulness of focusing on the relative relationships and clustering of the investigated varieties by means of multidimensional scaling in establishing priorities for more in-depth sociolinguistic research. Comparable studies in different language situations, applying the techniques suggested here, are needed to verify the validity of this approach chosen for the analysis of the Gbe data.

 

References

Bennett, Patrick R. and Jan P. Sterk. 1977. South Central Niger-Congo: A reclassification. Studies in African Linguistics 8.241–73.

Blair, Frank. 1990. Survey on a shoestring: A manual for small-scale language surveys. Dallas: Summer Institute of Linguistics and University of Texas.

Brye, Edward and Elizabeth Brye. 2004. Intelligibility testing survey of Bebe and Kemezung and synthesis of sociolinguistic research of the Eastern Beboid cluster. SIL Electronic Survey Reports 2004–011. Dallas: SIL International. Online URL: http://www.sil.org/silesr/2004/silesr2004-011.pdf.

Capo, Hounkpati B.C. 1986. Renaissance du gbe. Une langue de l’Afrique occidentale. Etude critique sur les langues ajatado: l’ewe, le fon, le gen, l’aja, le gun, etc. Université du Bénin. Institut National des Sciences de l’Education. Etudes et Documents de Sciences Humaines. Série A: Etudes, Numéro 13. Lomé.

-----. 1991. A comparative phonology of Gbe. Berlin: Walter de Gruyter.

Commission Nationale de Linguistique du Bénin (CNL du Bénin) 1983. Atlas sociolinguistique du Bénin. Abidjan: Agence de Coopération Culturelle et Technique, and Institut de Linguistique Appliquée.

Dettweiler, Stephen and Sonia Dettweiler. 2003. Sociolinguistic survey of the Duka (Hun-Saare) people. SIL Electronic Survey Reports 2004–004. Dallas: SIL. Online URL: http://www.sil.org/silesr/2003/silesr2003-014.pdf.

Dieu, Michel and Patrick Renaud (eds.) 1983. Atlas linguistique du Cameroun. ALCAM. Paris: Agence de coopération culturelle et technique; Yaoundé : Centre régional de recherche et de documentation sur les traditions orales et pour le développement des langues africaines, Délégation générale à la recherche scientifique et technique, Institut des sciences humaines.

Direction de l’Alphabétisation, République du Bénin, Ministère de la Culture et des Communications (DA) 1992. Séminaire nationale sur la redéfinition des objectifs et stratégies d’alphabétisation et d’éducation des adultes. Rapport final. Cotonou: Presse de Onepi.

Direction de l’Alphabétisation, République du Bénin, Ministère de la Culture et des Communications (DA) no date, a. Nukpinkplon do gungbe me: Wema tintan, 1er livret. Cotonou: DA.

Direction de l’Alphabétisation, République du Bénin, Ministère de la Culture et des Communications (DA) no date, b. Nukpinkplon do gungbe me: Wema aweto, 2ème livret. Cotonou: DA.

Direction de l’Alphabétisation, République du Bénin, Ministère de la Culture et des Communications (DA) no date, c. Nukpinkplon do gungbe me: Wema atontlo, 3ème livret. Cotonou: DA.

Direction Nationale de l’Alphabétisation, Ministère de la Culture et des Communications de la République du Bénin (DNA). no date. Mi va mía kpla woma: Livre de l’élève. Lecture et écriture en waci et gen. Vols. 1, 2 & 3. Benin: DAPR & Comité Provincial de l’Alphabétisation et la Presse Rurale du Mono.

Duthie, Alan S. 1988. Ewe. The languages of Ghana, ed. by Mary E. Kropp Dakubu, 91–101. London: Kegan Paul International for the International African Institute.

Efron, Bradley and Robert J. Tibshirani. 1993. An introduction to the bootstrap. New York: Chapman and Hall.

Garrett, Peter, Nikolas Coupland, and Angie Williams. 1999. Evaluating dialect in discourse: Teachers’ and teenagers’ responses to young English speakers in Wales. Language in Society 28.321–54. doi:10.1017/s0047404599003012

Gordon, Raymond G. ed. 2005. Ethnologue: Languages of the world, Fifteenth Edition. Dallas: SIL International. Online URL: http://www.ethnologue.com.

Grimes, Barbara F., ed. 2000.

Grimes, Joseph E. 1988. Correlations between vocabulary similarity and intelligibility. Notes on Linguistics 41.19–33.

Gudschinsky, Sarah C. 1955. Lexico-statistical skewing from dialect borrowing. International Journal of American Linguistics 21.138–49. doi:10.1086/464322

-----. 1956. The ABC’s of lexicostatistics (glottochronology). Word 12.175–210.

Hair, Joseph F., Rolph E. Anderson, Ronald L. Tatham, and William C. Black. 1998. Multivariate data analysis. London: Prentice-Hall International, Inc.

Harrison, Byron, Annette Harrison, and Michael J. Rueck. 1999. Southern Songhay speech varieties in Niger: A sociolinguistic survey of the Zarma, Songhay, Kurtey, Wogo, and Dendi peoples of Niger. SIL Electronic Survey Reports 1999-004. Dallas: SIL International. Online URL: http://www.sil.org/silesr/1999/004/zarmarpt4.pdf.

Hatfield, Deborah H. and Michael M. McHenry. 1998. A sociolinguistic survey of the Ayizo language area. Cotonou, Benin: SIL International. ms.

Henson, Bonnie J. and Angela Kluge. 1999. A sociolinguistic survey of the Xwla language area. Cotonou, Benin: SIL International. ms.

Hochstetler, J. Lee, Jude A. Durieux, and Evelin I. K. Durieux-Boon. 2004. Sociolinguistic survey of the Dogon language area. SIL Electronic Survey Reports 2004–004. Dallas: SIL International. Online URL: http://www.sil.org/silesr/2004/silesr2004-004.pdf.

International Language Assessment Conference. 1990. Language assessment criteria: Conference recommendations. Proceedings of the Summer Institute of Linguistics International Language Assessment Conference, Horsleys Green, 23–31 May 1989, ed. by Gloria E. Kindell, 27–29. Dallas: SIL.

Kluge, Angela. 2000. The Gbe language varieties of West Africa: A quantitative analysis of lexical and grammatical features. Unpublished MA thesis. Cardiff: University of Wales, College of Cardiff.

-----. forthcoming. Qualitative and quantitative analysis of grammatical features elicited among the Gbe language varieties of West Africa. Journal of African Languages and Linguistics. doi:10.1515/jall.2006.004

Kruskal, Joseph B. and Myron Wish. 1978. Multidimensional scaling. Beverley Hills: Sage Publications.

Manly, Bryan F.J. 1986. Multivariate statistical methods: A primer. London, New-York: Chapman and Hall.

Norušis, Marija J. 1993. SPSS for Windows: Professional statistics. Release 6.0. Chicago: SPSS Inc.

Probst, Ulrich. 1992. Multimorphemic words in lexicostatistics: An evaluation on use and limitations of different approaches. Paper presented at the SIL Survey Course, Horsleys Green, UK. ms.

R-Project. no date. The R project for statistical computing. Online URL: http://www.r-project.org.

Sanders, Arden G. 1977. Guidelines for conducting a lexicostatistics survey in Papua New Guinea. Language planning and survey techniques, ed. by Richard Loving and Gary F. Simons, 21–41. Ukarumpa, PNG: Summer Institute of Linguistics.

Saussure, Ferdinand de. 1959. Course in general linguistics. London: Peter Owen Limited.

Savage, Dale. 1999. Understanding correlation. Windows on bilingualism, ed. by Eugene H. Casad, 117–146. Dallas: The Summer Institute of Linguistics and The University of Texas at Arlington.

Scholfield, Phil. 1991. Statistics in linguistics. Annual Review of Anthropology 20.377–93. doi:10.1146/annurev.anthro.20.1.377

Schooling, Stephen J. 1981. A linguistic and sociolinguistic survey of French Polynesia. Hamilton, New Zealand: Summer Institute of Linguistics.

Simons, Gary F. 1977. The role of purpose and perspective in planning a language survey. Language planning and survey techniques, ed. by Richard Loving and Gary F. Simons (eds.), 9–20. Ukarumpa, PNG: Summer Institute of Linguistics.

-----. 1979. Language variation and limits to communication. (Technical Report 3.) Ithaca, NY: Department of Modern Languages and Linguistics, Cornell University. (Reprinted 1983 by Summer Institute of Linguistics, Dallas, TX.)

SPSS Inc. 1998. SPSS for Windows: Release 9.0. SPSS, Inc., 1989–1999.

Stewart, John M. 1989. Kwa. The Niger-Congo languages, ed. by John, Bendor-Samuel, 217–245. Lanham, MD: The University Press of America.

Swadesh, Morris. 1955. Toward greater accuracy in lexicostatistical dating. International Journal of American Linguistics 21.121–37. doi:10.1086/464321

Tompkins, Barbara and Angela Kluge. 2002. Sociolinguistic survey of the Aja language area. SIL Electronic Survey Reports 2002–020. Dallas: SIL. Online URL: http://www.sil.org/silesr/2002/020/SILESR2002-020.PDF.

Wimbish, John. 1989. WORDSURV: A program for analyzing language survey word lists. Dallas: Summer Institute of Linguistics.

Woods, Anthony, Paul Fletcher, and Arthur Hughes. 1986. Statistics in language studies. Cambridge: Cambridge University Press.

Author's e-mail:

Angela Kluge

angela_kluge@sil.org


[1]I would like to thank M. Paul Lewis, Richard J. Nivens and Ramzi Nahhas of SIL International for their helpful comments on this paper. Nahhas, in particular, discussed various aspects of paired t-test and correlation analysis with me.

[2]In Kluge (forthcoming) I described the Gbe situation in some detail. The following section summarizes and refers to that description.

[3]Capo (1986) does not list a distinct Fon variety (seeTable 1). For the current study, however, a word list was elicited in the Fon variety (see Section 4.2).

[4]Nonvocalic segments are either exact matches or else they differ by only one phonological feature and this difference is attested in three pairs, and vowels differ by only one phonological feature.

[5]Nonvocalic segments differ by only one phonological feature but are not attested in three pairs; vowels differ by two or more phonological features.

[6]Thus, if A and B share 50% lexical similarity and the range of error is +/-25% then the actual lexical similarity could by any where from 25-75%. And if A and C share 68% lexical similarity and the range of error is +/-7% then the actual lexical similarity would by any where from 61-75%. In both cases ‘75%’ refers to ‘upper range of error’ and, based on SIL’s criteria, intelligibility testing between A and B and A and C would need to be considered.

[ Home | Current Issue | Browse the Archive | Search the Site | Submission Information | Register for Updates | About | Editorial Board | Site Map | Help ]

Published by the Dartmouth College Library.
Copyright © 2002 Trustees of Dartmouth College.
For comments or feedback E-mail the site editor.
ISSN 1537-0852

Linguistic Discovery HomeDartmouth College Home