Skip to content

Feature Request/Idea: Sanitize languages controlled vocabulary values #8243

@jeromeroucou

Description

@jeromeroucou

Overview of the Feature Request

In order to improve the content of the proposed languages as a list of controlled values, and to be able to expose them with an identifier later on, we want to modify them by adding the ISO 639-3 code as an alternative value.

Before making a pull request, we would like to have feedback from you on our proposal.

Please note that the language "Bihari" does not have an ISO 639-3 code, but only ISO 639-2 / 5.

A modified data migration script will be required.

Below are the proposed values :

    language    Abkhaz        0    abk
    language    Afar        1    aar
    language    Afrikaans        2    afr
    language    Akan        3    aka
    language    Albanian        4    sqi
    language    Amharic        5    amh
    language    Arabic        6    ara
    language    Aragonese        7    arg    
    language    Armenian        8    hye    
    language    Assamese        9    asm    
    language    Avaric        10    ava    
    language    Avestan        11    ave    
    language    Aymara        12    aym    
    language    Azerbaijani        13    aze    
    language    Bambara        14    bam    
    language    Bashkir        15    bak    
    language    Basque        16    eus    
    language    Belarusian        17    bel    
    language    Bengali, Bangla        18    ben    
    language    Bihari        19    bih 
    language    Bislama        20    bis    
    language    Bosnian        21    bos    
    language    Breton        22    bre    
    language    Bulgarian        23    bul    
    language    Burmese        24    mya    
    language    Catalan, Valencian        25    cat    
    language    Chamorro        26    cha    
    language    Chechen        27    che    
    language    Chichewa, Chewa, Nyanja        28    nya    
    language    Chinese        29    zho    
    language    Church Slavic, Slavonic        30    chu
    language    Chuvash        31    chv
    language    Cornish        32    cor    
    language    Corsican        33    cos    
    language    Cree        34    cre    
    language    Croatian        35    hrv    
    language    Czech        36    ces    
    language    Danish        37    dan    
    language    Divehi, Dhivehi, Maldivian        38    div    
    language    Dutch        39    nld    
    language    Dzongkha        40    dzo    
    language    English        41    eng    
    language    Esperanto        42    epo    
    language    Estonian        43    est    
    language    Ewe        44    ewe    
    language    Faroese        45    fao    
    language    Fijian        46    fij    
    language    Finnish        47    fin    
    language    French        48    fra    
    language    Fula, Fulah        49    ful        
    language    Galician        50    glg    
    language    Ganda        51    lug    
    language    Georgian        52    kat    
    language    German        53    deu    
    language    Greek (modern)        54    ell        
    language    Guarani        55    grn        
    language    Gujarati        56    guj    
    language    Haitian, Haitian Creole        57    hat    
    language    Hausa        58    hau    
    language    Hebrew (modern)        59    heb        
    language    Herero        60    her    
    language    Hindi        61    hin    
    language    Hiri Motu        62    hmo    
    language    Hungarian        63    hun    
    language    Icelandic        64    isl    
    language    Ido        65    ido    
    language    Igbo        66    ibo    
    language    Indonesian        67    ind    
    language    Interlingua        68    ina    
    language    Interlingue        69    ile    
    language    Inuktitut        70    iku    
    language    Inupiaq        71    ipk    
    language    Irish        72    gle    
    language    Italian        73    ita    
    language    Japanese        74    jpn    
    language    Javanese        75    jav    
    language    Kalaallisut, Greenlandic        76    kal        
    language    Kannada        77    kan    
    language    Kanuri        78    kau    
    language    Kashmiri        79    kas    
    language    Kazakh        80    kaz    
    language    Khmer        81    khm    
    language    Kikuyu, Gikuyu        82    kik    
    language    Kinyarwanda        83    kin    
    language    Kirghiz, Kyrgyz        84    kir        
    language    Komi        85    kom    
    language    Kongo        86    kon    
    language    Korean        87    kor    
    language    Kurdish        88    kur    
    language    Kwanyama, Kuanyama        89    kua    
    language    Lao        90    lao    
    language    Latin        91    lat    
    language    Latvian        92    lav    
    language    Limburgish, Limburgan, Limburger        93    lim    
    language    Lingala        94    lin    
    language    Lithuanian        95    lit    
    language    Luba-Katanga        96    lub    
    language    Luxembourgish, Letzeburgesch        97    ltz    
    language    Macedonian        98    mkd    
    language    Malagasy        99    mlg    
    language    Malay (Standard)        100    zsm        
    language    Malay (Central)        101    pse
    language    Malayalam        102    mal    
    language    Maltese        103    mlt    
    language    Manx        104    glv    
    language    Maori        105    mri        
    language    Marathi        106    mar    
    language    Marshallese        107    mah    
    language    Mixtepec Mixtec        108    mix    
    language    Mongolian        109    mon    
    language    Nauru        110    nau    
    language    Navajo, Navaho        111    nav    
    language    Ndonga        112    ndo    
    language    Nepali (macrolanguage)        113    nep    
    language    North Ndebele        114    nde        
    language    Northern Sami        115    sme    
    language    Norwegian        116    nor    
    language    Norwegian Bokmål        117    nob    
    language    Norwegian Nynorsk        118    nno    
    language    Nuosu, Sichuan Yi        119    iii        
    language    Occitan        120    oci    
    language    Ojibwe, Ojibwa        121    oji    
    language    Oriya        122    ori        
    language    Oromo        123    orm    
    language    Ossetian, Ossetic        124    oss    
    language    Pali        125    pli        
    language    Panjabi, Punjabi        126    pan    
    language    Pashto, Pushto        127    pus        
    language    Persian (Farsi)        128    fas        
    language    Polish        129    pol    
    language    Portuguese        130    por    
    language    Pular        131    fuf
    language    Pulaar        132    fuc
    language    Quechua        133    que    
    language    Romanian        134    ron    
    language    Romansh        135    roh    
    language    Rundi, Kirundi        136    run
    language    Russian        137    rus    
    language    Samoan        138    smo    
    language    Sango        139    sag    
    language    Sanskrit        140    san    
    language    Sardinian        141    srd    
    language    Scottish Gaelic, Gaelic        142    gla    
    language    Serbian        143    srp    
    language    Shona        144    sna    
    language    Sindhi        145    snd    
    language    Sinhala, Sinhalese        146    sin    
    language    Slovak        147    slk    
    language    Slovenian        148    slv        
    language    Somali        149    som    
    language    South Ndebele        150    nbl        
    language    Southern Sotho        151    sot    
    language    Spanish, Castilian        152    spa    
    language    Sundanese        153    sun    
    language    Swahili (macrolanguage)        154    swa        
    language    Swati        155    ssw    
    language    Swedish        156    swe    
    language    Tagalog        157    tgl    
    language    Tahitian        158    tah    
    language    Tajik        159    tgk    
    language    Tamil        160    tam    
    language    Tatar        161    tat    
    language    Telugu        162    tel    
    language    Thai        163    tha    
    language    Tibetan Standard, Tibetan, Central        164    bod    
    language    Tigrinya        165    tir    
    language    Tonga (Tonga Islands)        166    ton    
    language    Tsonga        167    tso    
    language    Tswana        168    tsn    
    language    Turkish        169    tur    
    language    Turkmen        170    tuk    
    language    Twi        171    twi    
    language    Ukrainian        172    ukr    
    language    Urdu        173    urd    
    language    Uyghur        174    uig    
    language    Uzbek        175    uzb    
    language    Venda        176    ven    
    language    Vietnamese        177    vie    
    language    Volapük        178    vol    
    language    Walloon        179    wln    
    language    Welsh        180    cym    
    language    Western Frisian        181    fry    
    language    Wolof        182    wol    
    language    Xhosa        183    xho    
    language    Yiddish        184    yid    
    language    Yoruba        185    yor    
    language    Zhuang, Chuang        186    zha    
    language    Zulu        187    zul    
    language    Not applicable        188        

What kind of user is the feature intended for?
API User, Curator, Depositor, and Guest

What inspired the request?
Requirement of archive language metadata

What existing behavior do you want changed?
Improve languages list to be more compliant with ISO standard

Any brand new behavior do you want to add to Dataverse?
None

Any related open or closed issues to this feature request?
Pull request #7690

Metadata

Metadata

Assignees

Labels

Feature: HarvestingFeature: MetadataNIH OTA DCGrant: The Harvard Dataverse repository: A generalist repository integrated with a Data CommonsNIH OTA: 1.4.14 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ...Size: 3A percentage of a sprint. 2.1 hours.Type: Featurea feature requestUser Role: DepositorCreates datasets, uploads data, etc.pm.GREI-d-1.4.1NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issuespm.GREI-d-1.4.2NIH, yr1, aim4, task2: Create working group on packaging standardspm.GREI-d-2.4.1BNIH AIM:4 YR:2 TASK:1B | 2.4.1B | (started yr1) Resolve OAI-PMH harvesting issuespm.epic.nih_harvesting

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions