Other Formats:
PDF - 604k

Volume 8 Issue 1 (2010) DOI:10.1349/PS1.1537-0852.A.377

Note: Linguistic Discovery uses Unicode characters to represent phonetic symbols. Please see Optimizing Display for requirements to accurately reproduce this page.

Drawing Networks from Recurrent Polysemies

Comment on ‘Polysemous Qualities and Universal Networks’ by Loïc-Michel Perrin (2010)

Michael Cysouw

Max Planck Institute for Evolutionary Anthropology, Leipzig

Recurrent polysemy across languages is a very powerful source of information for studying semantic similarity. The data as collected by Perrin (2010) provide a fascinating basis for inquiries into the relations between meanings of adjective-like predicates, or ‘qualities’, as Perrin calls them. Unfortunately, I think that the real potential of his data does not become apparent from the way the data are analyzed and presented in his paper. In this comment, I will offer a different possibility for graphical display of the data from Perrin’s paper. This comment is in no way intended to be the definitive answer to the problem of visualizing cross-linguistic data. However, I hope that it will inspire linguists to look more closely into the extensive literature on data visualization to find suitable methods for any concrete problem at hand.

Before I will turn to the question of visualization, I would like to briefly address two other aspects of Perrin’s paper that I consider debatable. First, there is the problem of cross-linguistic invariants. The concepts investigated by Perrin are defined by their English, French and German translations. Although taking three languages is of course more precise than just taking one as the defining meta-language, such a definition of cross-linguistic concepts still allows for a large space of variation. For example, as Perrin himself notes in section 2.2, one could find more than 30 translations of the English word dry in French. Taking the English/French/German triad of words dry/sec/trocken as a definition of a cross-linguistic concept still underdetermines the precise meaning intended. For future projects of this kind, I would strongly urge to use a contextually-embedded definition of the cross-linguistic concepts, e.g. exemplified by a sentence or a small paragraph in which the intended meaning occurs. So, instead of looking for the somewhat ominous translation of dry/sec/trocken, one would look for the translation of a word in a context, like in French “fruit sec” or in English “hot and dry weather is expected to persist today through Friday ” (cf. Cysouw 2010; Wälchli 2010).

Second, there is the problem of statistical assessment of the attested frequencies. It is always difficult to deal with the large variation in cross-linguistic studies, but I consider arbitrary divisions of a scale not a very suitable means for interpretative practice. Why, for example, are federative notions defined as “qualities which are involved in a minimum of five polysemous patterns and across a minimum of six languages” (Perrin 2010: section 4.2)? Why five patterns and six languages? Why not two, or three, or ten? Or why not across at least three different genera or families? There is no easy answer to these questions, but I would urge all analysts of cross-linguistic data to try and avoid such all-or-nothing categorizations and think more along lines of continuous clines of gradual variation.

Finally then, let me turn to the visualization of recurrent polysemies. In the received approach to semantic maps, as presented concisely in Haspelmath (2003), all attested polysemy is taken into account for drawing a network. As criticized in Cysouw (2007), in that approach too much importance is given to incidentally-occurring polysemies relative to the much more interesting frequently-occurring ones. Perrin uses a minimum of three languages for a polysemy to be drawn on the map (section 4.1). Again, the question arises: why three languages? Why not two, or ten? Further, in the methodology for semantic maps as described by Haspelmath (2003), there is no mechanism proposed for how to organize the network of a semantic map, other than that the graph should be preferably planar (i.e. there should be no crossing lines). As a solution to the problem of visualizing such data, it has been proposed in various recent papers on semantic maps to use methods like multi-dimensional scaling (MDS) instead of a graph-based approach (Cysouw 2001; Levinson & Meira 2003; Cysouw 2007; Croft & Poole 2008; Wälchli 2010). However, it turns out that the semantic structure of lexical domains is often highly multidimensional, which renders MDS suboptimal for graphical display (Wälchli and Cysouw 2008). For that reason, I will propose here to visualize Perrin’s data by using one of the many other methods for graphical display as developed in the past decades. The networks shown below were produced by the Fruchterman-Reingold graph-layout algorithm (Fruchterman and Reingold 1991). In this algorithm, the lines between two concepts are considered to be springs, which pull concepts closer together the more often a particular polysemy is attested in the data.[1] A suitable layout is then computed which attempts to minimize the strain on the connections between concepts.

Figure 1 shows the network when including all polysemies as listed in Appendix 2 of Perrin’s paper. The length of the lines is approximately inversely proportional to the frequency of attestation (i.e. the longer a line, the less common the polysemy). However, as can be seen in the figure, there is a bewildering extent of variation of polysemies in the data, leading to a not very useful display with many crossing lines. Note, though, that the display still includes some information in the form of clusters of closely positioned concepts, like small/little/young or strong/solid/hard/gesund. The concepts in these clusters correlate strongly with Perrin’s federative notions.

Figure 1. Complete graph of all polysemies

To reduce the amount of information in the graph (so that a human can judge the content more easily), I removed all polysemies that occur only once in Perrin’s data. As a result, many concepts were not linked to any other concepts any more, and I have removed all these from the display. The graph resulting from the remaining concepts comprises eight separate subgraphs, as shown in Figure 2. There are five small graphs, connecting just two or three concepts (displayed at the bottom right of the figure). More interesting are the three larger graphs in the figure. First, the graph to the upper right shows the recurrent polysemies in the realm of tastes and smells with negative connotations, connected through the concepts bad and nasty at the center. Second, the large graph on the left is a more complex combination of small-dimension qualities at the bottom (centered around small/narrow/weak) and a collection of mostly positively connoted qualities at the top (centered around good and clean). The link between these two parts is established through the concept mou (English soft, German weich). The central chain of polysemies seems to form a semantic continuum: small – narrow – thin – weak – soft – sweet – good – clean.

Figure 2. Graph of polysemies that occur in more than one language

Finally, the largest graph is shown in the middle of Figure 2. This graph is still too complex to be displayed with reasonably true distances and without crossing lines, so I removed the link between difficult and heavy to obtain at least an approximately planar graph. The resulting graph shows two large clusters of concepts, one around strong/hard/solid and another around fat/big/thick. These two clusters are directly linked by the polysemies solid–dick, solid–dickflüssig and heavy–difficult (this last one is not displayed in Figure 2). To the left side of this graph, there is a somewhat less prominent chain of polysemies connecting these two main clusters by the semantic continuum hard – rude – raw – wet – cold – slow – heavy .

References

Croft, William and Keith T. Poole. 2008. Inferring universals from grammatical variation: Multidimensional scaling for typological analysis. Theoretical Linguistics 34/1.1-37. doi:10.1515/thli.2008.001

Cysouw, Michael. 2001. Review of Martin Haspelmath (1997) ‘Indefinite Pronouns’. Journal of Linguistics 37.99-114. doi:10.1017/s0022226701231351

-----. 2007. Building semantic maps: The case of person marking. New challenges in typology, ed. by Bernhard Wälchli and Matti Miestamo, 225-248. Berlin: Mouton de Gruyter. (Trends in Linguistics: Studies and Monographs 189).

-----. 2010. Semantic maps as metrics on meaning. Linguistic Discovery, this issue. doi:10.1349/ps1.1537-0852.a.346

Fruchterman, Thomas M. J. and Edward M. Reingold. 1991. Graph drawing by force-directed placement. Software: Practice and Experience 21/11.1129-1164.

Gabor, Csardi and Nepusz Tamas. 2006. The igraph software package for complex network research. InterJournal: Complex Systems 1695.

Haspelmath, Martin. 2003. The geometry of grammatical meaning: Semantic maps and cross-linguistic comparison. The new psychology of language: Cognitive and functional approaches to language structure, ed. by Michael Tomasello, vol. 2, 211-242. Mahwah, NJ: Erlbaum.

Levinson, Stephen C. and Sérgio Meira. 2003. ‘Natural concepts’ in the spatial topological domain - Adpositional meanings in crosslinguistic perspective: An exercise in semantic typology. Language 79/3.485-516. doi:10.1353/lan.2003.0174

Perrin, Loïc-Michel. 2010. Polysemous qualities and universal networks. Linguistic Discovery, this volume. doi:10.1349/ps1.1537-0852.a.353

R Development Core Team. 2008. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

Wälchli, Bernhard. 2010. Similarity semantics and building probabilistic semantic maps from parallel texts. Linguistic Discovery, this issue. doi:10.1349/ps1.1537-0852.a.356

Wälchli, Bernhard and Michael Cysouw. 2010. Lexical typology through similarity semantics: Toward a semantic map of motion verbs. Linguistics (forthcoming). doi:10.1515/ling-2012-0021

Author's contact information:

Michael Cysouw

Department of Linguistics

Max Planck Institute for Evolutionary Anthropology

Deutscher Platz 6

04103 Leipzig

Germany

cysouw@eva.mpg.de

[1]The graphs were drawn by using the layout function layout.fruchterman.reingold in the package igraph (Gabor and Tamas 2006) for the statistical program R (R Development Core Team 2008). It is not obvious how the frequency of attestation should be translated into the ‘strength’ of the pull between the concepts. For the illustrations shown here, I used the squared frequencies as strength of the pull. Although this results in easily interpretable pictures, I have no particular reason for this decision other than the fact that it gives good results.