| RU | EN |
The Spoken corpus of the dialects of Khakas contains transcribed annotated texts, synchronized with the sound. The texts were recorded during the XXI century with speakers of 1916-1985 years of birth in the various expeditions from Moscow to the Repuplic of Khakasia. All texts are translated to Russian. Texts were analized by the automatic parser, and then edited and synchronized with the sound with the help of ELAN software.
This corpus is related to the project «Electronic Corpus of Khakas language». There is more information about the aims and methods of the project, but only in Russian for now.
We use the symbols of the Cyrillic khakas alphabet for transcription of our texts because the parser was made for the analisys of Literary Khakas. Regular phonetic dialect features are mainly ignored, but we tried to show the mophology and morphonology features.
In the layer with morpheme dividing of the wordforms the stems are written in phonology transcription, and the affixes are written in morphonology transcription. For example, туралар ‘houses’ look like тура-ЛАр when divided to morphemes. The plural marker has allomorphs лар, лер, нар, нер, тар, тер, and in the glossing layes there is one marker ЛАр. Morphonemes without alternations are written with the same symbols as phonemes.
See more about it in the paper: Anna V. Dybo, Philip S. Krylov, Vera S. Maltseva, Aleksandra V. Sheimovich. Segmental rules in the automatic parser for the Khakass corpus. In: Ural-Altaoc studies. N 1 (32), 2019. P. 48-69 (in Russian) https://iling-ran.ru/library/ural-altaic/ua2019_32.pdf
|Consonant morphonemes||Vocal morphonemes|
|П: б/п/м||А: е/а|
|К: ғ/г/х/к||Ы: i/ы|
|Г: ғ/г/х/к/Ø||О: о/ö|
This corpus belongs to a group of corpora that are built on the search platform tsakorpus. A more general instruction with common technical properties of these corpora can be found in the “Help” section (look for the button marked with a question mark in the top right corner of the search page). The current text describes those rules and conventions that are specific for this Corpus.
In this field, you can enter specific word forms that you want to find.
For example, ибде ‘at home’, килген ‘he came’.
This field should be used if you need to find all forms of a given word (a.k.a. lemma, or lexeme).
For example, if you enter “иб” ‘home’, the search results will show all sentences where any form of this noun is used, e.g., иб ‘home’, ибні [home-ACC] ‘home (direct object)’, ибінде [home-3pos-Loc] ‘at his home’, etc.
Lemmas should be entered in this field in their base form, that is, in the same form which is also used in dictionaries. For nouns, adjectives, adverbs, pronouns and numerals, identification of the base form is the same as stem (e.g., иб ‘house’, кічіг ‘small’, ам ‘now’, син ‘you’, ікі ‘two’). For verbs, in accordance with the lexicographic practice, the infinitive form with the suffixes ArGA is used, e.g., тоғынарға ‘to work’, килерге ‘to come’ etc.
For the case forms, including locatives, of pronouns ол ‘this’ and пу ‘that’ are used lemmmas ол and пу, although in the dictionnaries their case forms are written as separate lexemes. We consider substantivized forms with 3 rd person possessive marker ан(ы)зы, мын(ы)зы, пунызы as separate lexemes. All the forms of the personal pronouns (with the same person and number) also relate to one lemma: мин ‘I’, син ‘you’, олар ‘they’, etc.
This field can be used for building search queries based on part of speech tags and grammatical categories. In order to use this search field, you should press the button immediately to the right of the “Grammar” search field itself; when you press this button, you will see a pop-up window where you can choose from the available grammatical tags. If you want to select a marker, you should click the left mouse button on it, and it will lighten. To cancel the selection of a marker you should click again, and the lightening will turn off.
The parts of speach markerks used in our corpus are explained in the following table.
v – verb (including participle and converb), takes all the inflective markers.
n – nominal (noun, adjective, pronoun, numeral, postposition), doesn’t take negation, time, aspect and mood markers.
Some nominals doesn’t take case markers but take personal markers (ex. осхас ‘recembling’). We want to unite such lexemes in one separate part of speech after a corpus research.
i1 – invariable, which can combine with endoclitics (including particle -ох/-ӧх/-ӧк, which absorbs the last vovel of the stem). For example, піди ‘так’. This category unite the most part of adverbs, icluding the grammaticalized forms of converbs.
i – invariable, which can’t combine with endoclitics (particle, conjunction, interjection)
Both in “Gloss” field and in “Grammar” field you can only search for the forms with non-zero markers. The only exclusion is the imperative singular form which is a bare form of the verb. One can find it selecting in “Grammar” field meanings “imp” and “2sg”.
Marker with a number 1 or 2 (excluding “dur1”) are situated nearer to the stem then the same markers without numbers, and are used mainly as word formative markers. (Markers with number 1 are nearer to the stem then markers with number 2.)
|Pl||ЛАр||non-predicative plural||иблер ‘homes’, парғаннарына ‘for those who went’|
|PredPl||ЛАр||predicative plural||парғаннар ‘they went’|
|Gen1||НЫң, ДЫң||genitive||пістіңнер ‘ours’|
|Loc1||ТА||locative||аалдағылар ‘those who are (living) in village’|
“All1” and “Abl1” are very rare, you can only find them in some grammaticalized forms combinied with other cases.
The combination of “Gen1” and “3pos” synchronically is a cumulative marker ни (dialectal variant Ди), therefore we divide them not by hyphen but a dot. Example: сілерни / сілерди ‘yours’.
All the cases have the allomorphes which are used with the possessive singular markers.
Most cases have the dialectal variants. The ablative and the instumental cases use one morpheme in some dialects.
|Acc||НЫ, ДЫ, н||accusative||суғны / суғды ‘(drink) water’, суғын ‘(drink) his water’|
|Gen||НЫң, ДЫң, нЫң||genitive||азахтың ‘of leg’, азағының ‘of his leg’|
|Dat||ГА, (н)А||dative||ирге ‘to a man’, иріме ‘to my husband’|
|Loc||ТА, (н)ТА||locative||ибде ‘in the house’, ибінде ‘in his house’|
|All||САр, СА, САрЫ, нСАр, (н)СА, (н)САрЫ||allative||ибзер/ ибзері / ибзе ‘towards a house’, ибінзер /ибінзері / ибінзе ‘towards his house’|
|Abl||ДАң, нАң||ablative||аалнаң / аалдаң ‘from a village’, аалынаң ‘from his village’|
|Instr||ДАң, НАң, нАң, БАң, (н)БАң, мАң, (н)мАң||instrumental||малтынаң / малтыдаң / малтыбаң ‘by an axe’, абамнаң / абаммаң ‘with my dad’|
|Prol||ЧА, (н)ЧА||prolative (equative)||чолӌа ‘on a road’, соонӌа ‘following him’|
|Delib||нАңАр, ДАңАр(Ы)||deliberative||аннаңар ‘because’, кибірлердеңері ‘about the traditions’|
|1pos.sg||(Ы)м||1st person singular possession (‘I’)||хызым ‘my daughter’|
|1pos.pl||(Ы)ПЫс||1st person plural possession (‘us’)||хызыбыс ‘our daughter’|
|2pos.sg||(Ы)ң||2nd person singular possession (‘you’)||2nd person singular possession (‘you’)|
|2pos.pl||(Ы)ңар||2nd person plural possession (‘you’)||іӌеңер ‘your mother’|
|3pos||(з)Ы||3rd person possession (‘he’, ‘she’, ‘it’, ‘they’)||аал пазы ‘village’s beginning’|
|3pos1||(з)Ы||3rd person possession (inner position)||аал пазындағылар ‘those who are (living) in the beginning of the village’|
|Perf||(Ы)бЫс||perfective||парыбысхан ‘he’s gone’|
|Perf0||(Ы)с||perfective near the particle||чоохтаныпласчам ‘I speak almost every time’|
|Prosp.dial||АК, иК||prospective||парахча ‘is going to go’|
|Dur||чАт||durative||полчатсын ‘let it be’|
|Dur1||А(р), и(р), ит||durative / present for the verbs парарға ‘go’, килерге ‘come’||кили ‘comes now’|
|Iter||АдЫр, идЫр||iterative / present||тідирлер ‘they say’|
|RPast||ТЫ||recent past||килді ‘came (not long ago)’|
|Pres||чА||present||узупча ‘he sleeps’|
|Indir||ТЫр||evidential (indirective)||партыр ‘he went (they say)’|
|Evid||осхас||evidential (analytical form)||тіпчен осхас ‘he says (the speaker didn’t hear it himself)’|
|Affirm||ЧЫК||affirmative, subjuntive and other meanings||парарӌых ‘would come (if smth happened)’|
|Imp||imperative; takes the special set of personal markers||ат ‘shoot’, парим ‘should I go’|
|Cond||СА||conditional||чатса ‘if it lies’|
|Opt||ГАй||optative||халғай ‘let it be left’|
|Simul||(А)АчЫК||simulative, converts the verb to a nominal||талаачых ‘simulating fainting’|
We do not distinguish participle and finite forms with the same morphemes.
|Past||ГАн||прошедшее время||одырған ‘сидел’|
|PresPt||чАн||present participle||хомай чуртапчан кізілер ‘badly living people’|
|PresPt1||ин||present participle with the verbs пар ‘go’ and кил ‘come’||сӱр парин остар ‘drive (as now)’|
|Fut||А(р), и(р)||future||килер ‘will come’|
|Neg.Fut||ПАс||negative future||килбес ‘will not come’|
|Hab||ЧА(ң)||habitual (past as finite form and present as non-finite form)||тоғынӌаң ‘worked (usually)’|
|Assum||ГАдАГ||assumptive («it seems that…»)||хайтпаадағ ‘won’t happen (normally)’|
|Cunc||ГАлАК||cunctative («not yet…»)||пысхалах ‘is not yet ripe’|
|ConvP||(Ы)п||consequative converb||алып алып, парыбысхан ‘having bought, went away’|
|ConvA||А, и||simultanious converb||чара парарға ‘to go separating’|
|Neg.Conv||Пи(н), ПААн||negative form of converb||хурғатпин тартырарға ‘to grind without drying’|
|1sg||(Ы)м, СЫм, ПЫн, им||1st singular person marker||парам ‘I will go’|
|1pl||ПЫс, иБЫс||1st plural person marker||парарбыс ‘we’ll go’|
|2sg||(Ы)ң, СЫң||2nd singular person marker||парғаң ‘you went’|
|2pl||ңар, САр, (Ы)ңАр||2nd plural person marker||парғазар ‘you (pl) went’|
|3||Ø, СЫн||3 rd person marker (marked form only with imperative; it’s not possible to distinguish zero marker and the absence of marker in the word automatically)||ползын ‘let it be’|
|1.incl||Аң||inclusive imperative singular («I and you (sg)»)||параң ‘let’s (two of us) go!’|
|1pl.incl||АңАр, АлАр||inclusive imperative plural («I and you (pl)»)||параңар / паралар ‘let’s (all) go!’|
|Neg||ПА||negation||парба ‘don’t go’|
|Distr||(К)лА||distributive||тастағлаабыс ‘we throught (many things)’|
|NF||Ø / (Ы)п||word-formative marker from ConvP, which is used in some syntheical and analytical forms||пар-Ø-ча ‘goes’, сана-п-ча ‘counts’|
|Compl||тіп||complementizer (separate word)||парғам чаблах одалирға тіп ‘I went to dig potatoes’|
All the word formative markers are not divided from the stem by hyphen, so they can be found only with the search in “Grammar” field.
|Attr||КЫ||attributivizer (of locative and temporal forms)||аалдағы ‘situated in village’, пурунғы ‘prior’|
|Adv||Ли||adjectivizer||полосали ‘by strikes’|
|Comit||ЛЫГ||comitative («with…, «having…»)||тадылығ ‘tasty’, аттығ ‘on a horse, with a horse’|
|Dimin||(Ы)ӌАК||diminutive||хызыӌах ‘(small) girl’|
|Coll||ОлАң, АлАң||collective numeral||ікӧлең / ікелең ‘twosome, two together’|
|Distr||Ар||distributive numeral||пизер ‘by five’|
|Caus||т, тЫр||causative (also used as parrive)||итірбе ‘don’t do (with the help of other)’|
|Pass||(Ы)л||passive||салылған ‘(been) put’|
|Rec||(Ы)с||reciproc||ылғазып ‘crying together’|
Many of endoclitical particles are written as the separate words in the Khakas orthography, thought many of them have regular phonetical alternations, and some of them are used as enclitics.
|Q||па, пе, ма, ме, ба, бе||general question particle||парған ма? ‘(he) came?’|
|qpart||чи||question particle||а тігілер чи? ‘and they?’|
|Foc||ТЫр||focus particles||адың кемдір? ‘what’s your name?’|
|Magn||reduplication of the 1st syllable + п||high degree, superlative||тап-тадылығ ‘very tasty’|
|Emph||за, зе, нооза, нізе, and other||emphatic particle||ылғапча нізе ‘cryes indeed’|
|Confpart||ізе||confirmative particle||“ізе” тіпче ‘says «yes»’|
|Indef||ТА, тА||indefinite pronoun particle||хайдағ-да / хайдағ-та ‘some’|
|Ass||ОК||associative||парохтар ‘they are (there) too’|
|Cont||LA||continuative||хырарлача ‘reddens all the time’|
|Add||ТАА||additive particle||мин дее ‘even me’|
|Prec||ТАК||precative particle (polite request in some dialects)||пирдек ‘give, please’|
The field “Gloss” allows to submit search queries that concern the morphemic structure of the word forms. In general, this type of search is functionally similar to the search in the field “Grammar”. In particular, the list of markers that can be viewed by clicking the button next to the field “Gloss” largely overlaps with those given in the field “Grammar”.
The general principle of search by a gloss and the major differences of this type of search from the grammatical search are described in the “Help” section (the button with a question mark in the top right corner of the search window). The key features of the gloss-based search that are specific for this corpus are given below.
All the dialectal markers has the label .dial, both the morpheme variants (Acc.dial) and the markers which are not used in literary Khakas (Prosp.dial).
Gloss-based search does not include the word forms where there is no morphemic border between the marker in question and the stem. For instance, the dative case form of the pronoun син ‘you’ is сегее / сағаа / сее, which is not segmentable into morphemes and glossed “you.DAT”. This form will be among the hits of the grammatical query “dat”, but will not be included in the occurrences corresponding to the gloss-based search “STEM-DAT”.
The gloss-based query can be constructed using either specific glosses or the options CASE, CASE1, POSS, PRTCP, CONV, PERSON. These labels specify a group of morphemes rather than a specific gloss. CASE stands for any case marker, CASE1 stands for any case marker in inner position, POSS stands for any possessive marker, PRTCP stands for and participle marker, CONV stands for any converb marker, and PERSON stands for any person marker.
The corpus contains:
- 23 texts of Askiz dialect, collected in village Kazanovka in 2001-2002 during the expedition of the Linguistic Department of th Russian University State for Humanities, headed by Nina Sumbatova. It contains about 13 000 tokens, the duration is 2h 18 min.
- 27 texts of Belty dialect, collected in 2011 in villages Butrachty, Chylany, Karagay by Anna Dybo and Elvira Kyrzhinakova. It contains about 45 000 tokens, the duration is 9 h 22 min.
Texts on other dialects (Kacha, Kyzyl, Shor) will be added.