Linguistic Discovery
Dartmouth College

Volume 14 Issue 1 (2016)        DOI:10.1349/PS1.1537-0852.A.468

Note: Linguistic Discovery uses Unicode characters to represent phonetic symbols. Please see Optimizing Display for requirements to accurately reproduce this page.

Exploring grammatical complexity crosslinguistically: The case of gender

 

Francesca Di Garbo

University of  Helsinki

 

This paper proposes a set of principles and methodologies for the crosslinguistic investigation of grammatical complexity and applies them to the in-depth study of one grammatical domain, gender. The complexity of gender is modeled on the basis of crosslinguistically documented properties of gender systems and by taking into consideration interactions between gender and two other grammatical domains: nominal number and evaluative morphology. The study proposes a complexity metric for gender that consists of six features: “Gender values”, “Assignment rules”, “Number of indexation (agreement) domains”, “Cumulative exponence of gender and number”, “Manipulation of gender assignment triggered by number/countability”, and “Manipulation of gender assignment triggered by size”. The metric is tested on a sample of 84 African languages, organized in subsamples of genealogically related languages. The results of the investigation show that: (1) the gender systems of the sampled languages lean towards high complexity scores; (2) languages with purely semantic gender assignment tend to lack pervasive gender indexation; (3) languages with a high number of gender distinctions tend to exhibit pervasive gender indexation; (4) some of the uses of manipulable gender assignment are only attested in languages with a high number of gender distinctions and/or pervasive indexation.  With respect to the distribution of the gender complexity scores, the results show that genealogically related languages tend to have the same or similar gender complexity scores. Languages that display exceedingly low or high gender complexity scores when compared with closely related languages exhibit distinctive sociolinguistic profiles (contact, bi- or multilingualism). The implications of these findings for the typology of gender systems and the crosslinguistic study of grammatical complexity and its distribution are discussed.

 

1. Introduction

 

Investigating the complexity of individual grammatical domains from a crosslinguistic perspective is still a novel research area within language typology. This paper focuses on the empirical study of grammatical complexity and proposes a set of principles and methodologies that can be operationalized to explore linguistic complexity crosslinguistically.[1] The paper takes inspiration from the suggestions made by Miestamo (2006b, 2008) on the typological study of grammatical complexity. According to Miestamo, complexity metrics suitable for typological purposes should not aim to assess the grammatical complexity of languages in their entirety (global complexity), but rather focus on specific domains of grammar (e.g. functional domains) as encoded across languages, and attempt to characterize “the cross-linguistic variety in the complexity of each functional domain and the interactions between domains” (2006b) (local complexity).

 

The grammatical domain that I investigate in this paper is grammatical gender. Gender is a type of nominal classification device (in the sense of Aikhenvald 2003) that is commonly associated with high degrees of complexity, inasmuch as it presupposes inflectional morphology (agreement) and rather opaque grammaticalization paths (Corbett 1991; Dahl 2004; Nichols 1992). In this study, I attempt to model the complexity of gender by identifying a set of dimensions that characterize gender systems crosslinguistically and by taking into consideration interactions and possible asymmetries between gender and two other nominal grammatical domains, number and evaluative morphology. The paper proposes a complexity metric for gender. This metric is then tested on a sample of 84 African languages. The aim of the paper is to investigate whether crosslinguistic variation in the types of gender systems attested in the sample languages is tied to certain levels of complexity, and why this might be the case. In addition, by exploring gender complexity within and across genealogical groupings, the study aims to investigate to which extent the complexity of gender a morphosyntactic feature that is usually conceived of as very stable in the history of language families is conservative across related languages and under which conditions it is subject to decrease or increase. The paper is structured as follows. In section 2, I define the notion of grammatical complexity that I work with. In section 3, I introduce gender as a grammatical domain and consider possible dimensions for the assessment of gender complexity. The methodology followed in the study is illustrated in section 4: section 4.1 provides an outline of the sampling procedure; section 4.2 presents the complexity metric and section 4.3 illustrates the method used to compute complexity scores for the gender systems of the sampled languages. The results are presented in section 5 and discussed in section 6, before I provide some concluding remarks in section 7.

 

2. Defining grammatical complexity

 

The idea that all languages are equally complex is known in the literature as the equi-complexity hypothesis and is based on the assumption that, even though individual languages may exhibit different levels of complexity in different domains of their grammars, complexity in one domain is compensated by simplicity in another domain (complexity trade-offs). The equi-complexity hypothesis has long been maintained as a truism within linguistic research (for an overview, see McWhorter 2001; Kusters 2003). During the past fifteen years, however, starting from the comparative study of grammatical complexity in creole and non-creole languages by McWhorter (2001), a whole body of research (see, among others, Dahl 2004; Kusters 2003; Miestamo 2006b; Miestamo et al. 2008; Sinnema¨ki 2011) has suggested that the equi-complexity hypothesis is difficult to test empirically and that, when tested (e.g., by McWhorter 2001), it is actually problematic to maintain. In a nutshell, this research has shown that “there is no principled reason why all languages should be equal in their overall complexity or why complexity in one grammatical area should be compensated by simplicity in another” (Miestamo 2006b). Once we acknowledge that human languages may differ in complexity,[2] and that these differences are worth exploring for a multifaceted array of purposes (typological, sociolinguistic, historical, etc.), three major challenges follow: (1) how to define complexity; (2) how big a scope a complexity metric should have for it to be meaningful, and (3) which principles might help to assess complexity differences in one or several domains of grammar. The three issues are discussed in section 2.1, section 2.2, and section 2.3, respectively.

 

2.1 Absolute and relative complexity

 

There exist two main approaches to the study of linguistic complexity, the relative and the absolute approach (Miestamo 2008). The relative approach (also known as user-oriented approach) focuses on the costs and difficulties in language learning and processing. The absolute approach (also known as theory-oriented approach) rather views complexity as an objective property of languages. Within the absolute approach, complexity can be assessed by measuring the number of distinctions within a system/grammatical domain, and the length of its description.

 

Both approaches have been used, and argued for, in typologically oriented literature on grammatical complexity. Kusters (2003), for instance, defines complexity in terms of difficulty. In his work on the typology of verbal inflection, Kusters examines four genealogically unrelated sets[3] of closely related languages and investigates how, within each set, languages differ in the complexity of verbal inflection and what type of sociolinguistic and sociohistorical factors may account for these differences. His definition of complexity is based on the difficulties as documented in the psycholinguistic literature on second language acquisition that adults incur when learning a new language. According to this definition, languages that are more “adapted” to the presence of L2 learners (exoteric languages, following the terminology proposed by Lupyan & Dale 2010) are less complex than languages that, throughout their history, have not been exposed, or not to the same extent, to the presence of adult learners (esoteric languages, based on Lupyan & Dale 2010). This definition of complexity/difficulty fits well the scope of Kusters’ (2003) study, which is to investigate the effects of multilingualism, asymmetrical bilingualism and adult language contact on language structures. However, as Miestamo (2006b) rightly points out, L2 learners represent only one type of language users. In addition, adult, post-critical threshold language contact is only one type of contact scenario in the history of a speech community.[4] It follows that a definition of complexity/difficulty that is targeted to one category of language users only might not be inclusive enough if our aim is to build a more general model of linguistic complexity. Finally, given our still limited knowledge of the cognitive processes behind language learning and usage, we do not have enough evidence to model the whole range of difficulties and costs that both L1 and L2 speakers and listeners experience when using language. Thus, based on our current state of knowledge, the absolute approach allows for a more general, objective, definition of the notion of complexity. This is in turn essential for the sake of crosslinguistic comparison. In addition, the absolute approach to grammatical complexity is the one that is more easily connectable with how complexity is approached by other disciplines (e.g., philosophy, information theory) and thus “opens possibilities for interdisciplinary research” (Miestamo 2008: 27). Advocates of the absolute approach to the typological study of grammatical complexity are, among others, McWhorter (2001); Dahl (2004); Miestamo (2006b, 2008); Nichols (2009); Sinnema¨ki (2011). The absolute approach is followed in this paper. Accordingly, I use the term complexity to refer to absolute complexity and the term difficulty to refer to relative complexity.

 

2.2 Global vs. local complexity

 

One issue that has been at the center of the recent debate on grammatical complexity is how big a scope a complexity metric should have for it to be meaningful. McWhorter (2001) elaborates a complexity metric that aims to measure overall differences in the grammatical complexity of creole and non-creole languages. The metric captures phonological, morphological, syntactic and semantic patterns that involve various types of redundancy (in terms of number of overt distinctions and amount of rules) and thus qualify a language as more complex than another. Two languages are investigated in the first part of the study, the highly inflectional language Tsez (Nakh-Daghestanian) and the creole language Saramaccan. The metric individuates clear-cut complexity differences between the two languages: Sarammaccan systematically qualifies as simpler than Tsez with respect to all the parameters under investigation. In the second part of the study, the same complexity metric is used to compare Saramaccan with an non-creole analytic language, Lahu (Sino-Tibetan), based on the hypothesis that “the complexity difference between creoles and analytic languages would be less than that between them and inflected languages” (McWhorter 2001: 143). Nevertheless, the comparison reveals complexity differences between Saramaccan and Lahu that are similar to those found for Tsez and Saramaccan. These results would seem to confirm McWhorter’s hypothesis whereby the grammar of creole languages is systematically simpler than that of non-creole languages. The question however remains whether a metric of this type could be effectively used to capture complexity differences (1) between a higher number of languages than those considered in McWhorter’s study, and (2) based on a sampling procedure that is independent of the creole/non-creole dichotomy. Developing a metric that would satisfy these conditions and would allow us to compute the total complexity of a language in typologically meaningful ways is ultimately a massive, daunting task (see also discussion in Miestamo 2006b, 2008; Nichols 2009). In addition, even if, as suggested by Nichols (2009: 111), one would be able “to draw a representative sample of complexity in enough different grammatical domains, relatively easy to survey, to give a reliable indication of whether overall complexity does or does not vary”, it would be still very hard (and probably even impossible) to establish the mutual comparability between the criteria used in the metric. In other words, it would be extremely difficult to decide whether, for instance, the number of tense distinctions, phonemes, or gender distinctions that are grammaticalized in a given language contribute in the same way to the total complexity of that language. Miestamo (2006b, 2008) refers to this as the problem of comparability and suggests that in view of this difficulty, the crosslinguistic study of grammatical complexity should be based on individual areas of grammar, such as functional domains, rather than on grammars in their entirety, and thus have a local rather than global scope. In this paper, I follow this suggestion and investigate the complexity of one grammatical domain, gender. In addition, based on Dahl (2011), I argue that in order to be maximally local, complexity metrics should be based on ceteribus paribus comparisons, that is on statements of the type: “Everything else being equal, X is more complex than Y.”

 

2.3 Complexity principles

 

In this study, I suggest that, within an absolute and local approach to grammatical complexity (see section 2.1 and 2.2), three principles can be used as general guidelines to define the variables of a complexity metric: the Principle of Fewer Distinctions, the Principle of One-Meaning–One-Form and the Principle of Independence. The first two principles are well established in the literature on grammatical complexity (for an overview, see Miestamo 2008). The third principle, the Principle of Independence, was introduced by Di Garbo (2014) to account for interactions between functional domains and complexity. In the following, I outline my definitions of the three principles:

 

•The Principle of Fewer Distinctions (proposed by Miestamo 2006a, 2008 and also known as Principle of Economy, see e.g., Kusters 2003): Everything else being equal, a grammatical domain with n distinctions is less complex than one with n+1 distinctions.

 

•The Principle of One-Meaning–One-Form (well established in the literature on theoretical morphology and linguistic complexity, also known as the Principle of Transparency, see, for instance, Kusters 2003): (a) Everything else being equal, a grammatical meaning with n forms is less complex than one with n+1 forms; (b) Everything else being equal, a grammatical form with n meanings is less complex than one with n+1 meanings.

 

•The Principle of Independence (introduced by Di Garbo 2014):[5] Everything else being equal, a grammatical domain that is independent of semantic and functional properties of other domains is less complex than a grammatical domain that is dependent on n or n+1 semantic and functional properties of other grammatical domains.

 

The Principle of Fewer Distinctions is concerned with the type and number of grammatical meanings that a language expresses within a given domain of grammar. For instance, other things being equal, a language with more than five genders (e.g., Swahili) is more complex in this respect than a language with three genders only (e.g., German). The Principle of One-Meaning–One-Form has to do with the type of encoding of a grammatical meaning within a given domain of grammar. The Principle of One-Meaning–One-Form can be operationalized in two ways, depending on whether we consider the mapping between form and meaning or, vice versa, the mapping between meaning and form. In addition, as suggested by Miestamo (2008: 33), the relationship between form and meaning can be investigated both at the paradigmatic and syntagmatic level. For instance, with respect to the encoding of standard negation, Italian, whose standard negator is non, is, other things being equal, less complex than French, which typically uses a discontinuous marker, ne...pas, to signal standard negation. Or, similarly, other things being equal, Turkish is simpler than German with respect to the type of exponence of case and number. In Turkish, the two grammatical meanings are encoded separately (one form for each meaning), whereas in German, number and case are encoded cumulatively (one marker for several meanings).[6] Both these violations of the Principle of One-Meaning–One-Form operate on the syntagmatic level. On the other hand, phenomena such as allomorphy and syncretism represent a violation of the Principle of One-Meaning–One-Form at the paradigmatic level. Finally, the Principle of Independence models interactions between domains and their effect on complexity. For instance, a language in which gender assignment is dependent on evaluative meanings if, e.g., masculine nouns can be shifted to the feminine gender when a diminutive meaning is encoded (as in the Berber language Kabyle) is more complex in this respect than a language in which gender assignment cannot be manipulated for such purposes (as in the Romance language Italian).

In the remainder of this paper, the Principle of Fewer Distinctions, the Principle of One-Meaning–One-Form and the Principle of Independence will be operationalized in designing a complexity metric for grammatical gender.

 

3. Grammatical gender and dimensions of gender complexity

 

3.1 Gender as a grammatical domain

 

In this paper, I follow the most widely accepted definition of gender within the typological literature (Corbett 1991; Hockett 1958). Thus I define gender as a type of nominal classification strategy that must be reflected beyond nouns, via agreement patterns (Di Garbo 2014: 3). Under this definition I include both systems of the Bantu type (large number of genders) and systems of the Romance type (small number of genders). Following Croft (2001, 2003, 2013), however, I refer to agreement patterns as indexation patterns. Accordingly, I define the entities whose inflectional morphology signals gender (e.g., pronouns, adjectives, verbs) as gender indexes (or gender indexing targets) and the entities that trigger a given gender indexation pattern (i.e., nouns, pronouns, noun phrase referents) as indexation triggers. In Corbett’s (1991) terminology, indexes and indexation triggers are referred to as agreement targets and controllers, respectively.[7] In the remainder of this section, I provide a short overview of the criteria used for the synchronic classification of gender systems, the debate over the origins of gender, and the function(s) of gender in discourse.

 

Synchronically, the gender systems of individual languages are usually classified based on: (1) the number of gender distinctions (Corbett 1991, 2013a); (2) whether gender distinctions are sex-based or non-sex-based based (Corbett 1991, 2013b); (3) the criteria according to which nouns are assigned to a given gender (Corbett 1991, 2013c).

 

Diachronically, gender has been observed to be one of the most stable features of grammar. Gender systems are stable with respect to two of the three criteria for stability proposed by Nichols (1992): diachronic persistence and areal contingency. Gender is one of the most conservative features in the history of language families (stability as diachronic persistence). For instance, Armenian is the only independent branch of the Indo-European language family that has completely lost grammatical gender. In addition, gender systems exhibit a hotbed–outlier type of distribution (stability as areal contingency): some areas of the world, such as Africa or Australia, are densely populated by languages with gender (gender hotbeds), whereas in other areas of the world (e.g., North America), the feature is absent or attested only in isolated cases (gender outliers).

 

The debate over the origins of gender is very controversial and, in many respects, still unresolved. On the one hand, it has been shown that gender systems may originate from classifier systems and/or from demonstratives (Greenberg 1978; Corbett 1991). On the other hand, among the issues that are still open for debate is, for instance, the question of whether indexation or classification comes first in the diachrony of gender within a given language or language family (Nichols 1992). The main difficulty behind the reconstruction of the diachrony of gender in many language families is that, in view of their overall stability, gender systems tend to presuppose long grammaticalization paths and their origin often precedes those stages that can be reconstructed via the historical-comparative method.

 

Finally, from a functional point of view gender has been defined as a grammatical device for the management of reference in discourse, its functions being often related to reference tracking (Heath 1975; Foley & Van Valin 1984) and/or discourse redundancy (Dahl 2004). The debate over the discourse functions of gender is huge and cannot be extensively surveyed here (for an overview, see Kilarski 2013: chapter 6, as well as Contini-Morava & Kilarski 2013). For the sake of this paper, suffice it to say that scholars usually disagree on whether the complex redundancies that gender indexation introduces in discourse facilitate communication (Dahl 2004) or exist beyond communicative necessity (McWhorter 2001). Evidence from second language acquisition is often brought in support of the latter argument: contact varieties that emerge as a result of intensive post-threshold language contact and nonnative acquisition tend to systematically lack gender; similarly, adult learners usually struggle with grammatical gender when acquiring a new language.

 

3.2 The dimensions of gender complexity

 

Together with verbal inflection (Kusters 2003) and core argument marking (Sinnema¨ki 2011), gender figures as one of the few areas of grammar that have, so far, received some attention in the literature on linguistic complexity. Perhaps this is because grammatical gender is one of the domains of grammar that most promptly leads itself to be associated with complexity, being both theoretically and empirically relevant for the study of such notions as inflectional morphology (Nichols 1992), maturity (Dahl 2004) and redundancy in information management (McWhorter 2001).

 

Grammatical gender, in the form of gender indexation and overt gender distinctions on nouns, is one of the features of the complexity metric proposed by Nichols (2009). In this study, properties of gender systems are surveyed together with properties of other nominal classification devices (numeral and possessive classifiers) under the label classification. Within the metric proposed by Nichols, presence of gender indexation and overt marking of gender on nouns feature higher degrees of complexity.[8]

 

A more detailed qualitative study of the dimensions of gender complexity viewed independently of other nominal classification devices is Audring (2014). Audring argues that the complexity of gender systems is tied to and can be investigated by taking into considerations three main dimensions: complexity of values; complexity of assignment rules; and complexity of formal marking.

 

Dimension 1, complexity of values, is concerned with the number of genders in a language: the higher the number of genders, the more complex the gender system. Dimension 2, complexity of assignment rules, is concerned with the type and scope of gender assignment rules.  With respect to type of assignment rules, the literature on the typology of gender systems (Corbett 1991, 2013c) has shown that there exists two principles according to which nouns are assigned to a gender in a given language: semantic and formal. Under semantic assignment rules, gender assignment is predicted on the basis of the meaning of nouns. Under formal assignment rules, gender assignment is predicted based on morphological rules (e.g., inflectional classes, derivational morphology) and/or phonological rules. In principle, the least complex gender system is one in which only one type of assignment rule is attested, semantic or formal. In reality, typological studies of gender (Corbett 1991, 2013c) have shown that while solely semantic gender systems are relatively common among the world’s languages (e.g., among Dravidian languages), gender systems purely based on formal assignment rules are almost never encountered. Even in those systems that are heavily skewed towards formal mechanisms of gender assignment, there is always at least a minimal portion of the nominal lexicon (often nouns denoting humans and/or animate entities) for which gender is assigned based on clear-cut semantic criteria.[9] As for the scope of assignment rules, this has to do with the degree of generality of a rule, that is the gender assignment of how many nouns a given rule is able to predict. The higher the number of nouns assigned to a certain gender by a given assignment rule, the larger the scope of the assignment rule. In general, a system with large assignment rules requires a lower number of rules, leading to lower complexity. These rules usually rests upon some basic semantic notions such as sex or animacy (Audring 2014: 11).

 

Dimension 3, complexity of formal marking, is concerned with the pervasiveness of gender marking in discourse, that is, via indexation. The most straightforward implementation of this dimension of the complexity of gender is to count how many gender indexes there are in a language based on how many word classes inflect for gender (e.g., pronouns, adjectives, verbs), and independently of how these inflections are realized in discourse. The higher the number of gender indexes, the greater the complexity of a gender system. However, it is also possible to explore this dimension of gender complexity by looking at discourse frequencies, that is by measuring how often gender inflections appear in a given chunk of discourse (the higher the frequency of gender marking in discourse, the more complex the system). This aspect of the complexity of gender (which will not be explored further in this paper) can also be operationalized in the investigation of the functionality of gender indexation in language learning and processing. In this sense, a particularly promising hypothesis that is put forward in Audring’s (2014) work is that, pervasive gender indexation facilitates the learning and processing of gender values and assignment rules, given that users are exposed to multiple occurrences of gender marking in a given chunk of discourse.

 

The gender system of English would rank low with respect to all three dimensions of complexity: it has only three genders, a few semantic assignment rules, and gender indexation is restricted to the pronominal domain.

 

To sum up, Audring (2014) suggests that the absolute complexity of gender systems can be explored on the basis of three macro-dimensions: number of values, assignment and indexation. This suggestion is followed in the present paper. In section 4.2, I propose one way of implementing the three dimensions into a complexity metric.

 

4. Methodology

 

4.1 Sampling procedure

 

This study is based on a sample of 84 gendered languages selected from the African macro-area and organized in subsets of genealogically related languages (the sample languages are listed in alphabetical order in appendix A).[10] The macro-area sampled in the study, Africa, is one of the world’s gender hotbeds (Nichols 1992, 2003): all major genealogical groupings within the area display gender at least at some level of their internal taxonomies. The language classification followed in the paper is the one proposed by Glottolog (Nordhoff et al. 2013) as of September, 2015.

 

The sample designed for this study differs from classical sampling procedures in linguistic typology. Traditionally, these procedures aim to maximize the representation of linguistic diversity by contributing one datapoint (i.e., one language) per genealogical unit.[11] In recent years, statistically implemented sampling methodologies that attempt to investigate linguistic patterns as distributed within language families have been proposed, for instance, by Maslova (2000) and Bickel (2013). The main assumption behind these methodologies is that typological distributions concerning linguistic variables reflect different historical scenarios that may favor the presence/development/maintenance or, rather, the absence/decline/loss of the variables in question. Accordingly, these studies argue that it is possible to explore “statistical biases in diachronic developments on the basis of synchronic samples” (Bickel 2013: 415). The design of the present sample is built on similar assumptions. However the study does not focus on the elaboration of stochastic models of language change based on the observation of synchronic distributions. The aim of the study is, in fact, mostly descriptive. What I am looking for is the degree of grammatical complexity that is associated with gender crosslinguistically and the extent to which this complexity is genealogically and areally uniform.

 

The sample consists of seventeen different genealogical units (or lineages following the terminology by Nichols 1992), among which two isolates (Hadza and Sandawe). Some of these units represent different subgroups of the same superordinate taxonomic level (stock)[12]. In general, language selection has been guided by the following rule of thumb: the higher the diversity (in terms of number of languages/subgroups) of a superordinate genealogical unit, the higher the number of languages/subgroups selected for that unit. Consequently, the biggest and more diverse language families are represented by a number of subsamples that tends to reflect this diversity. For instance, all major subdivisions of the Afro-Asiatic stock (except Egyptian) are represented in the sample. The subsamples created for each stock should be understood as convenience samples since (1) the number of languages per genealogical units is not established mathematically and (2) for the biggest stocks, not all subdivisions are included. The latter especially applies to the largest stock within the African macro-area, Atlantic-Congo. Some relevant genealogical units, such as Kru and, from the Volta-Congo sub-branch, Gur and Ubangi are, for instance, not included in the sample mainly due to lack of accessible resources. This impacts data analysis in that the data-set created for this study cannot be used for statistical analysis of the inferential type, that is to make predictions about preferred typological patterns in the languages of Africa and beyond. Thus, as mentioned above, the statistical analysis that will be applied to the data presented in the study is purely descriptive. Table 1 illustrates the number of genealogical units/languages per stock.

 

 

Superordinate/Stock level

Genealogical units

No. of lgs

 

Afro-Asiatic

Berber

6

 

Chadic

6

 

Cushitic

13

 

Semitic

7

 

 

Dizoid

1

 

Omotic[13]

South Omotic

1

 

 

Ta-Ne-Omotic

4

 

Atlantic Congo

Bantoid, Bantu

23

 

Kwa

1

 

Mel

3

 

North-Central Atlantic

7

 

Hadza

 

1

 

Khoe-Kwadi

 

5

 

Kka

 

1

 

Nilotic

Eastern Nilotic

3

 

Sandawe

 

1

 

Tuu

 

1

Total

 

84

Table 1. Genealogical units in the sample

 

4.2 The features of the complexity metric

 

The complexity metric that I designed for the purpose of this study consists of six features. These can be further grouped into three main domains, which are based on the three dimensions of gender complexity proposed by Audring (2014) and discussed in section 3.2: complexity of values, complexity of rules and complexity of formal marking. The features of the complexity metric are presented in table 2.

 

Dimension

Feature

ID

Description

Values

 

Number of gender values

gv

Everything else being equal, a gender system with two values (gender distinctions) is less complex than a gender system with more than two values.

Assignment rules

 

Number and nature of assignment rules

ar

Everything else being equal, a gender system with one type of assignment rules e.g., only semantic or only formal is less complex than a gender system with two types of assignment rules both semantic and formal.[14]

Manipulable assignment

Triggered by number/countability

m1

Everything else being equal, a gender system where gender assignment is only lexically given is less complex than a gender system where gender assignment is given in the lexicon + can be manipulated depending on the countability properties of the noun or the noun phrase.

Triggered by size

m2

Everything else being equal, a gender system where gender assignment is only lexically given is less complex than a gender systems where gender assignment is given in the lexicon + can be manipulated depending on the size of the noun phrase referent.

Form marking

 

Number of indexation domains

ind

Everything else being equal, a gender system that has gender indexation in one domain only (e.g. only on articles or only on pronouns) is less complex than a gender system with two or more indexation domains.

Cumulative exponence of gender and number

cum

Everything else being equal, a marker that only signals gender is less complex than a marker that signals gender + number.

Table 2. Features of the complexity metric and their description

 

Features GV, AR and IND can be seen as direct implementations of Audring’s (2014) three dimensions of gender complexity. Complexity with respect to GV counts as a violation of the Principle of Fewer Distinctions (the higher the number of gender distinctions, the more complex the system). Less straightforward is, on the other hand, the interpretation of AR and IND with respect to the three complexity principles outlined in 2.3. Here, I propose to view complexity with respect to AR as a violation of the Principle of Independence, and complexity with respect to IND as a violation of the Principle of One-Meaning–One-Form (both on the syntagmatic and paradigmatic level) and the Principle of Independence. On the one hand, systems of gender assignment that are dependent only on semantics or only on form are less complex than systems of gender assignment that are dependent both on semantics and form (violation of Principle of Independence). On the other hand, in a language in which many word classes inflect for gender, and gender inflections are attested in several indexation domains (e.g., articles, other adnominal modifiers, predicative expressions, pronouns): (a) information about the gender of a noun is likely to be repeated redundantly in discourse (syntagmatic violation of the Principle of One-Meaning–One-Form); (b) the same word class can take several inflections depending on the gender of the noun that is indexed in a given discourse domain (paradigmatic violation of the Principle of One-Meaning–One-Form and Principle of Independence).

 

Features M1, M2 and CUM are based on an aspect of the typology of gender that falls outside the scope of Audring’s work: how grammatical gender interacts with other nominal domains. Two domains are specifically targeted by my metric: number and evaluative morphology (i.e., the morphological encoding of diminutives and augmentatives).[15] M1 and M2 are concerned with interactions at the level of gender assignment whereas CUM has to do with interactions pertaining to the morphosyntactic encoding of gender distinctions on the indexing targets. I suggest that M1 and M2 can be interpreted as a violation of the Principle of Independence, and CUM as a violation of the Principle of One-Meaning–One-Form. Let us discuss these two types of interaction more in detail.

 

Di Garbo (2014) shows that an important criterion for the classification of gender systems in the African macro-area is to distinguish between rigid and manipulable gender assignment (for as similar suggestion, see also the study by Heine 1982). In languages with manipulable gender assignment, the gender of a noun can be changed depending on the construal of the noun phrase referent, that is based on pragmatic/discourse constraints. In these languages, there usually are default assignment rules, i.e., rules by which nouns have lexically specified gender values, and add-on assignment rules that allow speakers to modify the default meaning of the noun by changing its gender, thus changing the construal of the noun phrase referent. In Di Garbo’s sample, manipulable gender assignment is attested in connection with two main uses: (1) to encode variation in the countability properties of nouns (e.g., from uncountable to countable and vice versa), (2) to encode variation in size (diminutive vs. augmentative). In my metric, I refer to the first use of manipulable gender assignment as M1[16] and to the second as M2. M1 is illustrated in example (1) and M2 in example (2). The examples are taken from two Berber languages, Nefusi and Tachawit.[17]

 

(1) Nefusi (Berber) (Adapted from Beguinot 1942: 32)

 

(a)

ettefˆah̩

 

 

‘apples’ (masculine,  uncountable)

 

 

 

 

(b)

t-attefˆah̩-t

 

 

F-apples-F[SG]

 

 

‘one apple’

 

 

 

 

(c)

t-attefˆah̩-ˆin

 

 

F-apples-F.PL

 

 

‘apples’ (plural)

 

(2) Tachawit (Berber) (Adapted from Penchoen 1973: 12)

 

(a)

aq-nmuˇs

 

 

[M]SG-pot ‘pot’

 

 

 

 

(b)

t-aq.nmuˇs-t

 

 

F-SG-pot-F

 

 

‘small pot’

 

 

 

 

(c)

t-aɣ-nˇzak-t

 

 

F-SG-spoon-F

 

 

‘spoon’

 

 

 

 

(d)

aɣ-nˇz

 

 

[M]SG-spoon

 

 

‘big spoon, ladle’

 

In example (1) (taken from Nefusi), when the inherently masculine uncountable noun ettefˆa ‘apples’ is shifted to the feminine gender (as in (1b)), it becomes countable and can be thus regularly pluralized (as in (1c)) (in Berber, feminine gender marking on nouns is circumfixal both in the singular and in the plural). This is an instance of M1. In Tachawit (example (2)), inherently masculine nouns can be shifted to the feminine gender when a diminutive interpretation is intended for the noun phrase referent (as in (2a) and (2b)). Similarly, an inherently feminine noun can be assigned to the masculine gender when an augmentative interpretation is intended for the noun phrase referent (as in (2c) and (2d)). This is an instance of M2.[18] In general M1 and M2 are well attested in the languages of Africa, both in languages with large, non-sex-based gender systems and in languages with smaller sex-based systems. Within my sample, M2 is however more frequent and widely distributed than M1 (for an overview, see Di Garbo 2014: chapters 5 and 6). The possibility of manipulating gender assignment can be seen as piling on top of the default gender assignment rules that are used in a language. In languages with manipulable gender assignment, gender markers have default and add-on meanings. These add-on meanings are dependent on semantic and pragmatic associations between gender and other grammatical domains, notably countability and size/value. Thus, based on the Principle of Independence introduced above, their presence represents an increase in the absolute complexity of gender. Gender assignment is not only given in the lexicon for each and every noun, but it is also subject to change depending on semantic and pragmatic associations with other functional domains.

 

Feature CUM (cumulative encoding of gender and number on the indexing targets) evaluates the impact that type of exponence of gender and number has on the complexity of gender.  I interpret cumulative encoding of gender and number as a violation of the Principle of One-Meaning–One-Form (one morpheme expresses several grammatical meanings). One aspect of the morphosyntactic encoding of gender and number which, at least in the languages of my sample, appears to be strictly related to CUM is the tendency for gender distinctions to be reduced (syncretism) or lost (neutralization) in the context of non-singular number values. In my sample, syncretism and/or neutralization of gender in the context of nonsingular number occurs in 66 out of 84 languages; in nearly all these cases the languages in which syncretism is attested are also languages in which gender and number are encoded cumulatively (see also results in Di Garbo 2014: chapter 5).[19] In principle, gender syncretism and neutralization could be viewed as violations of the Principle of Independence inasmuch as, when they occur, the expression of gender within an inflectional paradigm depends on the number value of a noun. In addition, syncretism and neutralization could be also seen as violations of the Principle of One-Meaning–One-Form, given that two (or more) gender values are conflated into one in the context of non-singular number values. However, as Audring (forthcoming) points out, “[s]yncretism is a multifaceted phenomenon, and whether or not it should be considered a case of simplification or complexification depends on the perspective”. In this paper, I treat syncretism in a somewhat agnostic way and exclude it from my complexity metric. More research, I believe, is needed on the relationship between syncretism/neutralization, exponence, and paradigm size before we can assess the effects of syncretism/neutralization on the complexity of gender and related features (e.g., number and case) more confidently.

 

4.3 Method for computing Gender Complexity Scores

 

Having defined the features for measuring the absolute complexity of grammatical gender (see table 2), the next step is to establish the values associated with each feature and to convert them into numbers. Towards this aim, I follow Parkvall (2008) who designed a method for computing the grammatical complexity of creoles and non-creole languages on the basis of a set of features taken from the WALS database (Dryer & Haspelmath 2013). Within Parkvalls method, the values of each feature are assigned a number between 0 and 1.  Features with three values are converted into the numerical format 0, 1/2, 1. Similarly, features with five values are converted by Parkvall into the format 0, 1/4, 1/2, 3/4, 1. For all the features taken into account in Parkvall’s paper, 0 stands for minimally complex and 1 for maximally complex. The total complexity score for each language is divided by the number of features included for that language. This is done in order to allow languages for which less information is available on a given feature to get average scores comparable to those of the best documented languages. The same procedure is followed in this paper (naturally, features with four values are converted into the numerical format 0, 1/3, 2/3, 1). The feature values and their numerical interpretation are illustrated in table 3.

 

Feature

Feature Value

Score

 

GV[20]

Two genders

0

Three

1/3

Four

2/3

Five or more

1

AR

Purely semantic or purely formal assignment

0

Semantic or formal assignment

1

IND[21]

One

0

Two

1/3

Three

2/3

Four or more

1

CUM

Noncumulative

0

Partially cumulative

½

Cumulative

1

M1

Absent

0

Present

1

M2

Absent

0

Present

1

Table 3: Gender complexity metric

 

 

The composition of the metric is such that the least complex possible gender system is the one that scores zero with respect to all the features of the metric and exhibit the following properties: two gender values, semantic gender assignment, one indexing target, no cumulation with number, no manipulation of gender assignment triggered by number/countability and no manipulation of gender assignment triggered by size. On the other hand, the most complex possible gender system is the one that scores 1 with respect to all the parameters considered in the metric and exhibits the following properties: five or more genders, semantic and formal assignment, four or more indexing targets, cumulation with number, and manipulation of gender assignment triggered by both number/countability and size. In addition, the composition of the metric is such that, with the exception of languages with the highest score (= 1), languages may display the same index value but arrive to it on different paths. In other words, identical gender complexity scores (henceforth GCSs) do not stand for same type of gender system.

 

Before presenting the results of my calculations, it is worth mentioning that, in case of missing features, the index values resulting from the calculations should be taken with caution. In fact, even though average scores (rather than total scores) are used as index values, the index values of languages with missing features cannot be regarded as entirely comparable to the index values of languages for which all features are equally represented. The mutual comparability between the different domains of gender complexity covered by my metric is discussed in section 6.3.

 

5. Results

 

Table 4 illustrates the GCSs of the languages of the sample, which have been calculated based on the method presented in section 4. The table is divided in two macrocolumns and the GCSs of the individual languages are arranged from highest to lowest. The leftmost columns of each macro-column provide the rank: languages with the same average complexity score share the same rank. Next to the rank come the language names and their ISO code; the GCS assigned to each language is given in the rightmost columns of the two macro-columns. In appendix C, the GCSs are visualized on the basis of genealogical units. The complexity scores for each of the feature values in the metric, as well the GCSs, are given in appendix B.

 

Rank

Language

Isocode

GCS

Rank

Language

Isocode

GCS

1

Bandial

bqj

1

8

Gola

gol

0.67

1

Bemba

bem

1

8

Hausa

hau

0.67

1

Bidyogo

bjg

1

9

Awngi

awn

0.61

1

Chiga

cgg

1

9

Hadza

hts

0.61

1

Kagulu

kki

1

9

Moroccan Arabic

ary

0.61

1

Kikuyu

kik

1

9

Nama

naq

0.61

1

Lega

lea

1

9

Naro

nhr

0.61

1

Maasina Fulfulde

ffm

1

9

Sandawe

sad

0.61

1

Mongo-Nkundu

lol

1

9

Standard Arabic

arb

0.61

1

Makaa

mcp

1

9

Tigre

tig

0.61

1

Ndengereko

ndg

1

10

Miya

mkf

0.6

1

Shona

sna

1

11

Male

mdy

0.56

1

Serer

srr

1

11

Wolaytta

wal

0.56

1

Swahili

swh

1

12

Borana-Arsi-Guji Oromo

gax

0.53

1

Timne

tem

1

12

Lisha´n Dida´n

trg

0.53

1

Tonga

toi

1

12

Qimant

ahg

0.53

1

Venda

ven

1

12

Rendille

rel

0.53

1

Xoon

nmn

1

12

ǁAni

hnh

0.53

2

Nyanja

nya

0.95

13

Beja

bej

0.5

2

Tunen

baz

0.95

13

Masai

mas

0.5

3

Bafia

ksf

0.83

13

Somali

som

0.5

3

Dibole

bvx

0.83

14

Daasanach

dsh

0.47

3

Eton

eto

0.83

14

Dirasha

gdl

0.47

3

Northern Sotho

nso

0.83

14

Kxoe

xuu

0.47

3

Swati

ssw

0.83

14

Lele

lln

0.47

3

Turkana

tuv

0.83

15

Dizin

mdx

0.45

3

Wamey

cou

0.83

15

Hebrew

heb

0.45

3

Zulu

zul

0.83

15

Gidar

gid

0.45

4

Maltese

mlt

0.78

15

Tsamai

tsb

0.45

4

Noon

snf

0.78

16

Iraqw

irk

0.43

4

Nuclear Wolof

wol

0.78

17

Baiso

bsw

0.42

4

Sɛlɛɛ

snw

0.78

18

Dime

dim

0.39

4

Tswana

tsn

0.78

19

Ju|’hoan

ktz

0.36

5

Bench

bcq

0.75

19

Kambaata

ktb

0.36

5

Kissi

kss

0.75

20

Dahalo

dal

0.28

6

Karamojong

kdj

0.72

21

Koorete

kqy

0.25

7

Kabyle

kab

0.69

21

Kwadi

kwz

0.25

7

Nafusi

jbn

0.69

22

Lingala, Kinshasa

lin

0.22

7

Tachawit

shy

0.69

23

Bila

bip

0.16

7

Tamasheq, Kidal

taq

0.69

24

Pero

pip

0.12

7

Tamazight, Central

tzm

0.69

25

Mwaghavul

sur

0.08

7

Zenaga

zen

0.69

 

 

 

 

8

Amharic

amh

0.67

 

 

 

 

Table 4: GCSs of the languages of the sample

 

Table 4 shows that the highest GCS is 1 and the lowest 0.08. None of the languages of my sample thus gets the lowest possible score, 0 (see section 4.3). The results given in table 4 are also displayed in the graph in figure 1. The X-axis of the histogram displays the range of attested GCSs, whereas the Y-axis shows the distribution of the number of languages per GCS score. The box plot below the histogram provides the distribution of the GCSs per quartiles, with the boldface line in the middle representing the median. The figure shows that half of the languages of my sample have a GCS that ranges roughly from 0.5 to 0.8. In my data sample, high GCSs are substantially more frequent than low GCSs.

 

Figure 1: Distribution of the GCSs

 

The geographical distribution of the GCSs is represented in the map provided in figure 2.

 

Figure 2: Geographical distribution of the GCSs

 

The results presented in table 4, figure 1 and 2, as well as in appendix C, are discussed in section 6 based on three main foci:

 

1.

Genealogical distribution of the GCSs

 

Languages from the same genealogical units, or spoken within the same areas, tend to have similar or even identical GCSs. In many cases, areal pressure seems to be a relevant factor in explaining the distribution of the outliers.

2.

Interdependencies between sets of features: AR, GV, IND

 

Purely semantic gender assignment is only found in languages with few genders and poor gender indexation (no directional dependencies between the three features are assumed here).

3.

Possible predictors of gender complexity

 

Some features in the metric correlate more with each other and seem to have a stronger impact on the GCS than others.

 

Before moving on to the discussion, I illustrate the procedure followed to calculate the GCSs of two of the sampled languages. For the sake of clarity, I discuss one language for which all features are documented, Turkana (Eastern Nilotic, rank 3 in table 4), and one for which two features are missing, Timne (Mel, rank 1 in table 4).

My classification of the gender system of Turkana is based on Dimmendaal (1983). Turkana has three gender values: Masculine, Feminine and Neuter. It thus gets 1/3 with respect to the feature GV. Gender assignment is both semantic and formal, and, as such, the value of AR is 1. According to Dimmendaal, gender indexation appears in three domains: articles (definite articles), adnominal modifiers, and pronouns (not the Personal Pronouns). Thus the language gets 2/3 with respect to the feature IND. In Turkana, gender distinctions are encoded cumulatively with number (CUM = 1). Finally, in Turkana gender shifts can be used to encode variation both in the countability properties of nouns (M1 = 1) and in the size of the noun phrase referent (M2 = 1). In Turkana, when an uncountable masculine or feminine noun is shifted to the Neuter Gender,[22] the resulting meaning is singulative. On the other hand, when countable masculine or feminine nouns are shifted to the Neuter Gender, the resulting meaning is diminutive. To summarize, for Turkana, the values assigned to each feature of the metric are:

 

GV = 1/3; AR = 1; IND = 2/3; CUM = 1; M1 = 1; M2 = 1

 

Applying the formula illustrated in section 4 [(⅓+1+⅓+⅔+1+1+1)÷6] the GCS of 0.83 is obtained.

 

I classify the gender system of Timne based on the description provided by Wilson (1961). Timne has more than five genders and thus gets 1 with respect to the feature GV. Gender assignment is both semantic and formal. Therefore, Timne gets 1 with respect to the feature AR. According to Wilson’s description, Timne shows gender indexation on adnominal modifiers, pronouns, predicative expressions. In addition, in Timne, the Indefinite Stabilizer, which is used with indefinite nouns in order to encode non-verbal predication (Wilson 1961: 11), also inflects for gender (this is labeled as “other” in my coding). The language thus gets 1 with respect to IND. Gender and number are encoded cumulatively on the indexing targets (CUM=1). The source does not provide any kind of information about gender shifts, which are, however, rather common phenomena in languages with similar gender systems. The features M1 and M2 cannot be documented for Timne. To summarize, for Timne, the values assigned to each of the metric features are:

 

GV = 1; AR = 1; IND = 1; CUM = 1; M1 = –; M2 =

 

Since two features are missing, the sum of the feature values is in this case divided by 4 [(1+1+1+1) ÷ 4]. The GCS of Timne is thus 1.

 

6. Discussion

 

6.1 Genealogical and areal biases in the distribution of GCSs

 

In appendix C, the GCSs presented in table 4 are visualized on the basis of genealogical units. The tables in appendix C show that, in general, closely related languages tend to have the same or very similar GCSs. For instance, all the Berber languages in the sample have a gender complexity score of 0.69. This tendency towards intragenealogical homogeneity in the complexity of gender systems further supports the idea that grammatical gender is a chiefly stable feature in the history of language families (see section 3.1). Nevertheless, outliers (i.e., languages that exhibit a GCS that is exceedingly higher or lower than what found among closely related languages) are attested in the following genealogical units: Bantu, Chadic, Cushitic, Khoe-Kwadi, Eastern Nilotic, Semitic. I suggest that, at least in some of such cases, the distribution of the outliers can be accounted for by taking into consideration aspects of the social history of the speech communities in question (e.g., geography, number of speakers, number of contact languages, type of language contact, bilingualism, multilingualism). This is however only a preliminary suggestion, which would need to be investigated further in what goes beyond the scope of the present study.

 

Out of 84 languages, 18 scored 1, with all these being either Bantu, North- Central Atlantic or Mel. Typically, the gender systems of the Bantu and Atlantic type (i.e., North-Central Atlantic and Mel) exhibit features of high complexity: high number of gender distinctions, pervasive gender indexation, manipulability of gender assignment, which is used to express variation in the countability properties of nouns and/or in the size of the noun phrase referents. Those Atlantic and Bantu languages which rank lower than 1 in table 4 have gender systems in which one or more of the above-mentioned features has/have been either weakened or lost. For instance, in 8 of the 23 Bantu languages in the sample Bafia, Eton, Northern Sotho, Shona, Swati, Tswana, Venda, Zulu diminutive and augmentative suffixes have grammaticalized from nouns. Of these eight languages, only Venda and, to a lesser extent, Shona combine the use of the diminutive and augmentative suffixes with the uses of the dedicated diminutive and augmentative genders that are characteristic of many Atlantic-Congo languages.[23] In the remaining six languages, the evaluative genders have been lost. As a result, the complexity of the gender systems of these languages is lower than what found in other closely related languages.

 

Two outliers with respect to the Bantu and Atlantic type of gender system are the Bantu languages Kinshasa Lingala (GCS = 0.22) and Bila (GCS = 0.16). My coding for Kinshasa Lingala, the variety of Lingala spoken in the area of the capital city of the Democratic Republic of Congo, is based on Bokamba (1977) and Meeuwis (2013). Kinshasa Lingala preserves the system of noun class marking which is typical of Bantu languages only on nouns. Meeuwis (2013) rightly refers to this set of singular/plural pairs of nominal prefixes as inflectional classes: diachronically, they are a relic of the former Bantu-like gender system, but, synchronically, they merely function as markers of nominal number. The Third Person Pronouns and the Subject Prefixes index the animacy of the noun phrase referent. Based on this account, I classify Kinshasa Lingala as a language with two genders (Animate and Inanimate), semantic gender assignment and two domains of gender indexation (pronominal and predicative).  Compared to Makanza Lingala, the northwestern variety of Lingala whose origins go back to the language standardization policies operated by the Scheutist missionaries between 1901 and 1902, and which exhibits a more conservative gender system, the gender system of Kinshasa Lingala is massively reduced. According to Meeuwis (2013: 26), Kinshasa Lingala is the oldest variety and the direct descendant of the Bangala pidgin, which was originally spoken in the Bangala state post (on the northwestern banks of the Congo River) and later on spread northeastward.[24] This variety resisted to the grammatical reforms introduced by the Scheutists, and soon gained both native and second language speakers. The pidginization process from which Lingala originated, as well as the highly multilingual ecology in which the Kinshasa variety developed and expanded, can reasonably explain the patterns of simplification and reduction in the domain of grammatical gender that differentiate this variety from other Bantu languages, on the one hand, and from the standardized variety introduced by the missionaries in the northwestern areas of the Democratic Republic of Congo (Makanza Lingala), on the other (on this account, see also Bokamba 1977, 2009). Similarly to Kinshasa Lingala, Bila has only two genders (the Animate and the Inanimate Gender), semantic assignment rules and poor gender indexation. Differently from Kinshasa Lingala, however, gender indexation in Bila is exclusively internal to the noun phrase and limited to the domain of adnominal modifiers (Kutsch Lojenga 2003: 462). Bila is spoken in the northeastern part of the Democratic Republic of Congo, which is also the northernmost corner of the Bantu-speaking area. The northern part of the Bantu-speaking area is often described as a true borderland between linguistically very diverse communities that have extensive contact with each other. In this area, Bantu speakers are surrounded by speakers of Nilo-Saharan and Ubangi languages (Kutsch Lojenga 2003: 451-452). Due to intense mutual contact, both the Bantu and non-Bantu languages spoken in this area are characterized by massive lexical borrowing as well as by grammatical innovations that are not shared with the respective cognate languages outside the area. The reduced gender system of Bila and other neighboring Bantu languages is one of such area-specific features.

 

The Semitic languages provide another interesting illustration of a set of genealogically related languages with non-homogeneous GCSs. The highest ranking GCSs within the Semitic sample go to Maltese ( 0.78) and Amharic (0.67). Moroccan Arabic, Standard Arabic and Tigre have the same complexity score, 0.61. The lowest ranking gender system is found in Hebrew (0.45), whereas Lisha´n Dida´n scored 0.53. Interestingly, the highest GCS, 0.78, is scored by Maltese, the Semitic language that stands out for its peculiar history of long-term contact and bilingualism with English, on the one hand, and Romance languages (Italian and Sicilian), on the other. A similarly high GCS goes to Moroccan Arabic, a dialect of Arabic whose history is also characterized by long-term intense contact with Berber languages, French and Spanish (for a case study of complexity of verbal inflection in Moroccan Arabic and other varieties of Arabic, see Kusters 2003). Finally, the history of Modern (Israeli) Hebrew is also intertwined with intricate sociolinguistic dynamics involving processes of creolization, language shift and massive borrowing (see, among others, Doron 2015; Zuckermann 2009).

 

Two additional examples of outliers are Dahalo, with respect to the other Cushitic languages, and Kwadi, with respect to the Khoe-Kwadi group. Dahalo has a GCS of 0.28, and its gender system has been described by Tosco (1991: 20) as dying out as a result of contact with neighboring Bantu languages. Too little is known about Kwadi, a now extinct language of Angola. Gu¨ldemann (2004) describes its gender system as sex-based and pronominal, but not much information is given about mechanisms of gender assignment nor about the use of gender shifts to encode diminutive and augmentative meanings (which is well documented in all the other Khoe-Kwadi languages of the sample).

 

Finally, the two lowest ranking languages in the complexity rank given in table 4 are the Chadic languages Mwaghavul (GCS = 0.08) and Pero (GCS = 0.12), both of which are spoken in Nigeria. The two languages also qualify as outliers with respect to the other Chadic languages in the sample. Mwaghavul scores 0 with respect to all the features of the complexity metric except for CUM, for which the score is 0.5. There are two genders in Mwaghavul (Masculine and Feminine), gender assignment is semantic and gender indexation is only pronominal. Finally, there seems to be no possibility of manipulating gender assignment in the language. With respect to the cumulation parameter, Mwaghavul shows at least some patterns of interaction with number on the indexing targets. The Third Person Human Anaphoric