ChiHwesa Verbal Constructions
Description
This dataset constitutes a structured repository designed to examine the semantic compatibility of verbal extensions in ChiHwesa, with particular focus on the reciprocal (-an-), passive (-iw-/-w-/-h-), and stative (-ik-/-ek-) extensions. The underlying research hypothesis is that the distribution of these extensions is semantically constrained, such that only verbs with compatible argument structure and lexical semantics can productively undergo specific derivational processes. In particular, reciprocal constructions are expected to occur primarily with transitive and interactional verbs, passive constructions with verbs that license a patient/theme argument, and stative constructions with verbs that encode potentiality or resultative states. The dataset comprises a systematically organised list of ChiHwesa verb stems presented alongside their derived forms and corresponding English glosses. Each entry includes a base verb, its meaning, and its reciprocal, passive, and stative forms where attested. Importantly, the dataset also records gaps where certain derivations are not possible, providing critical evidence for semantic restrictions. The verbs included represent a wide range of semantic classes, including transitive, intransitive, cognitive, physiological, and social interaction verbs. Data were collected through a combination of native speaker intuition, structured elicitation, and cross-speaker validation. Informants were asked to generate and evaluate derived verb forms, and all entries were verified for consistency and acceptability. Additional support was drawn from an existing ChiHwesa lexical database. This multi-method approach ensures both reliability and linguistic authenticity. The data show that reciprocal formation is restricted to verbs denoting mutual or bidirectional actions, while verbs lacking inherent interaction (e.g., ‘walk’ or ‘urinate’) typically do not permit reciprocal derivation. In contrast, passive constructions are highly productive across verb classes, although some yield marginal or context-dependent interpretations. Stative constructions are also widely attested and primarily encode potentiality (‘able to be X-ed’) or resultant states, particularly with change-of-state verbs. These findings support the interpretation that verbal extensions in ChiHwesa are governed by semantic and argument structure constraints. The dataset is therefore valuable for theoretical analysis within frameworks such as Lexical Mapping Theory, as well as for comparative Bantu studies, language documentation, and computational applications. It can be used as both a descriptive resource and a basis for further hypothesis testing on the interaction between morphology, syntax, and semantics in under-documented languages.
Files
Steps to reproduce
The dataset was developed through a systematic linguistic fieldwork and data validation process designed to ensure reliability, replicability, and descriptive accuracy. Data collection focused on the elicitation and verification of ChiHwesa verb stems and their compatibility with reciprocal, passive, and stative extensions. The process began with the compilation of an initial verb list drawn from an existing ChiHwesa lexical database and supplemented through researcher intuition and prior knowledge of related Bantu languages. Primary data were collected through structured elicitation sessions with native speakers of ChiHwesa. Informants were selected based on fluency, age distribution, and familiarity with traditional language use. During elicitation, participants were presented with base verbs and prompted to generate corresponding reciprocal, passive, and stative forms. They were further asked to provide interpretations and judge the acceptability of each derived form. Where necessary, follow-up questions were used to clarify semantic nuances and contextual usage. To enhance reliability, all elicited forms underwent cross-speaker validation. Multiple informants independently evaluated the same verb forms, and only those forms that achieved consistent acceptability judgments were retained as fully grammatical. Forms that received mixed judgments were noted as marginal or context-dependent. This step ensured that the dataset reflects shared linguistic competence rather than idiosyncratic usage. The dataset was organised using a tabular workflow, with each lexical item coded for its base form, derived forms, and glosses. Microsoft Word was used for initial data structuring, while spreadsheet conventions were applied to maintain consistency across entries. No specialised software or laboratory instruments were required, as the study is qualitative and linguistically oriented. To reproduce this dataset, researchers should follow similar procedures: compile a representative verb list, conduct structured elicitation with multiple native speakers, systematically test each verb for compatibility with relevant extensions, and validate results through cross-speaker comparison. Care should be taken to document both attested forms and systematic gaps, as these are crucial for analysing semantic constraints. This methodological approach ensures that the resulting dataset is both empirically grounded and analytically robust.