This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Category:Language data modules as well as Module:scripts/data.

Output සංස්කරණය

Discrepancies detected:

Template:langname-lite

  • Code: EL.. Saw name: Latin. Expected name: ලතින්.
  • Code: LL.. Saw name: Latin. Expected name: ලතින්.
  • Code: ML.. Saw name: Latin. Expected name: ලතින්.
  • Code: VL.. Saw name: Latin. Expected name: ලතින්.
  • Code: abs. Saw name: Ambonese මැලේ. Expected name: Ambonese Malay.
  • Code: acw. Saw name: Hijazi අරාබි. Expected name: Hijazi Arabic.
  • Code: acy. Saw name: Cypriot අරාබි. Expected name: Cypriot Arabic.
  • Code: aeb. Saw name: Tunisian අරාබි. Expected name: Tunisian Arabic.
  • Code: afb. Saw name: Gulf අරාබි. Expected name: Gulf Arabic.
  • Code: ajp. Saw name: South Levantine අරාබි. Expected name: South Levantine Arabic.
  • Code: ang. Saw name: Old ඉංග්‍රීසි. Expected name: Old English.
  • Code: apc. Saw name: North Levantine අරාබි. Expected name: North Levantine Arabic.
  • Code: ary. Saw name: Moroccan අරාබි. Expected name: Moroccan Arabic.
  • Code: arz. Saw name: Egyptian අරාබි. Expected name: Egyptian Arabic.
  • Code: ayl. Saw name: Libyan අරාබි. Expected name: Libyan Arabic.
  • Code: cdo. Saw name: Eastern Min. Expected name: Min Dong.
  • Code: cmn-ear. Saw name: Mandarin. Expected name: මැන්ඩරීන්.
  • Code: cy. Saw name: Welsh. Expected name: වේල්ස.
  • Code: dra-okn. Saw name: Old කන්නඩ. Expected name: Old Kannada.
  • Code: dum. Saw name: Middle ඕලන්ද. Expected name: Middle Dutch.
  • Code: enm. Saw name: Middle ඉංග්‍රීසි. Expected name: Middle English.
  • Code: fr-CA. Saw name: French. Expected name: ප්‍රංශ.
  • Code: frm. Saw name: Middle ප්‍රංශ. Expected name: Middle French.
  • Code: fro. Saw name: Old ප්‍රංශ. Expected name: Old French.
  • Code: gkm. Saw name: Ancient Greek. Expected name: පුරාතන ග්‍රීක.
  • Code: gkm-med. Saw name: Ancient Greek. Expected name: පුරාතන ග්‍රීක.
  • Code: gmh. Saw name: Middle High ජර්මානු. Expected name: Middle High German.
  • Code: gml. Saw name: Middle Low ජර්මානු. Expected name: Middle Low German.
  • Code: gmq-mno. Saw name: Middle නෝර්වීජියානු. Expected name: Middle Norwegian.
  • Code: gmq-oda. Saw name: Old ඩෙන්මාර්ක. Expected name: Old Danish.
  • Code: gmq-osw. Saw name: Old ස්වීඩන්. Expected name: Old Swedish.
  • Code: gmw-ecg. Saw name: East Central ජර්මානු. Expected name: East Central German.
  • Code: gmw-jdt. Saw name: Jersey ඕලන්ද. Expected name: Jersey Dutch.
  • Code: gmy. Saw name: Mycenaean ග්‍රීක. Expected name: Mycenaean Greek.
  • Code: goh. Saw name: Old High ජර්මානු. Expected name: Old High German.
  • Code: grk-mar. Saw name: Mariupol ග්‍රීක. Expected name: Mariupol Greek.
  • Code: gsw. Saw name: Alemannic ජර්මානු. Expected name: Alemannic German.
  • Code: idb. Saw name: Indo-පෘතුගීසි. Expected name: Indo-Portuguese.
  • Code: inc-ash. Saw name: Ashokan ප්‍රාකෘත. Expected name: Ashokan Prakrit.
  • Code: itc-ola. Saw name: Latin. Expected name: ලතින්.
  • Code: kaw. Saw name: Old ජාවා. Expected name: Old Javanese.
  • Code: kxd. Saw name: Brunei මැලේ. Expected name: Brunei Malay.
  • Code: la-ecc. Saw name: Latin. Expected name: ලතින්.
  • Code: la-lat. Saw name: Latin. Expected name: ලතින්.
  • Code: la-med. Saw name: Latin. Expected name: ලතින්.
  • Code: la-vul. Saw name: Latin. Expected name: ලතින්.
  • Code: ltc. Saw name: Middle චීන. Expected name: Middle Chinese.
  • Code: meo. Saw name: Kedah මැලේ. Expected name: Kedah Malay.
  • Code: mga. Saw name: Middle අයිරිෂ්. Expected name: Middle Irish.
  • Code: ms-cla. Saw name: Malay. Expected name: මැලේ.
  • Code: ms-old. Saw name: Malay. Expected name: මැලේ.
  • Code: nds. Saw name: Low ජර්මානු. Expected name: Low German.
  • Code: nds-de. Saw name: German Low ජර්මානු. Expected name: German Low German.
  • Code: nod. Saw name: Northern තායි. Expected name: Northern Thai.
  • Code: obr. Saw name: Old බුරුම. Expected name: Old Burmese.
  • Code: och. Saw name: Old චීන. Expected name: Old Chinese.
  • Code: odt. Saw name: Old ඕලන්ද. Expected name: Old Dutch.
  • Code: oge. Saw name: Old ජෝර්ජියානු. Expected name: Old Georgian.
  • Code: ohu. Saw name: Old හංගේරියානු. Expected name: Old Hungarian.
  • Code: ojp. Saw name: Old ජපන්. Expected name: Old Japanese.
  • Code: okm. Saw name: Middle කොරියානු. Expected name: Middle Korean.
  • Code: oko. Saw name: Old කොරියානු. Expected name: Old Korean.
  • Code: osp. Saw name: Old ස්පාඤ්ඤ. Expected name: Old Spanish.
  • Code: ota. Saw name: Ottoman තුර්කි. Expected name: Ottoman Turkish.
  • Code: pal. Saw name: Middle පර්සියානු. Expected name: Middle Persian.
  • Code: pdc. Saw name: Pennsylvania ජර්මානු. Expected name: Pennsylvania German.
  • Code: peo. Saw name: Old පර්සියානු. Expected name: Old Persian.
  • Code: rmg. Saw name: Traveller නෝර්වීජියානු. Expected name: Traveller Norwegian.
  • Code: roa-opt. Saw name: Old Galician-පෘතුගීසි. Expected name: Old Galician-Portuguese.
  • Code: ruo. Saw name: Istro-රුමේනියානු. Expected name: Istro-Romanian.
  • Code: ruq. Saw name: Megleno-රුමේනියානු. Expected name: Megleno-Romanian.
  • Code: sa-ved. Saw name: Sanskrit. Expected name: සංස්කෘත.
  • Code: sga. Saw name: Old අයිරිෂ්. Expected name: Old Irish.
  • Code: sit-pro. Saw name: Proto-Sino-ටිබෙට්. Expected name: Proto-Sino-Tibetan.
  • Code: sou. Saw name: Southern තායි. Expected name: Southern Thai.
  • Code: tbq-lob-pro. Saw name: Proto-Lolo-බුරුම. Expected name: Proto-Lolo-Burmese.
  • Code: trk-oat. Saw name: Old Anatolian තුර්කි. Expected name: Old Anatolian Turkish.
  • Code: xaa. Saw name: Andalusian අරාබි. Expected name: Andalusian Arabic.
  • Code: xcl. Saw name: Old ආමේනියානු. Expected name: Old Armenian.
  • Code: zlw-ocs. Saw name: Old චෙක්. Expected name: Old Czech.
  • Code: zlw-opl. Saw name: Old පෝලන්ත. Expected name: Old Polish.

Module:etymology languages/data

  • Literary Chinese භාෂාව (lzh-lit) has a canonical name that is not unique; it is also used by the code lzh.
  • The data key preprocess_links for ??? (th-new) is invalid.

Module:families/canonical names

  • The code ira-mid and the canonical name Middle Iranian should be removed; they are not found in Module:families/data.
  • The code ira-old and the canonical name Old Iranian should be removed; they are not found in Module:families/data.

Module:families/code to canonical name

  • The code ira-mid and the canonical name Middle Iranian should be removed; they are not found in Module:families/data.
  • The code ira-old and the canonical name Old Iranian should be removed; they are not found in Module:families/data.

Module:families/data

Module:languages/canonical names

  • The canonical name Min Dong (cdo) is missing.
  • Eastern Min, the canonical name for the code cdo, is wrong; it should be Min Dong.
  • The canonical name Puxian (cpx) is missing.
  • Puxian Min, the canonical name for the code cpx, is wrong; it should be Puxian.
  • Central Min, the canonical name for the code czo, is wrong; it should be Min Zhong.
  • The canonical name Min Zhong (czo) is missing.
  • The canonical name Khanty (kca) is missing.
  • The canonical name Tasmanian (xtz) is missing.

Module:languages/code to canonical name

  • Eastern Min, the canonical name for the code cdo, is wrong; it should be Min Dong.
  • Puxian Min, the canonical name for the code cpx, is wrong; it should be Puxian.
  • Central Min, the canonical name for the code czo, is wrong; it should be Min Zhong.
  • The code kca (Khanty) is missing.
  • The code xtz (Tasmanian) is missing.

Module:languages/data/2

Module:languages/data/3/b

  • Panyi Bai, the canonical name for bfc, is repeated in the table of otherNames.

Module:languages/data/3/m/extra

Module:languages/data/3/s/extra

Module:scripts/data

  • Blissymbols script (Blis) is not used by any language and has no characters listed for auto-detection.
  • Cypro-Minoan script (Cpmn) is not used by any language.
  • හිරගනා script (Hira) is not used by any language.
  • Kana script (Hrkt) is not used by any language.
  • Image-rendered script (Imag) is not used by any language and has no characters listed for auto-detection.
  • International Phonetic Alphabet script (Ipach) is not used by any language and has no characters listed for auto-detection.
  • Moon script (Moon) is not used by any language and has no characters listed for auto-detection.
  • Morse code (Morse) is not used by any language and has no characters listed for auto-detection.
  • Musical notation script (Music) is not used by any language.
  • Unspecified script (None) is not used by any language and has no characters listed for auto-detection.
  • Ol Onal script (Onao) is not used by any language and has no characters listed for auto-detection.
  • Rongorongo script (Roro) is not used by any language and has no characters listed for auto-detection.
  • Rumi numerals script (Rumin) is not used by any language.
  • flag semaphore (Semap) is not used by any language and has no characters listed for auto-detection.
  • Visible Speech script (Visp) is not used by any language and has no characters listed for auto-detection.
  • mathematical notation script (Zmth) is not used by any language.
  • symbol script (Zsym) is not used by any language.
  • undetermined script (Zyyy) is not used by any language and has no characters listed for auto-detection.
  • uncoded script (Zzzz) is not used by any language and has no characters listed for auto-detection.
  • The codes fa-Arab, ug-Arab, ks-Arab, ps-Arab, ur-Arab, tt-Arab, ota-Arab, mzn-Arab, sd-Arab and ku-Arab are currently alias codes. Only one code should be used in the data.
  • The codes ms-Arab and kk-Arab are currently alias codes. Only one code should be used in the data.
  • The data key sort_by_scraping for ජපන් script (Jpan) is invalid.

Checks performed සංස්කරණය

For multiple data modules:

  • Codes for languages, families and etymology-only languages must be unique and cannot clash with one another.
  • Canonical names for languages, families, and etymology-only languages must not be found in the list of other names.
  • Each name in the list of other names must appear only once.
  • otherNames, if present, must be an array.
  • Wikidata item IDs must be a positive integer or a string starting with Q and ending with decimal digits.

The following must be true of the data used by Module:languages:

  • Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
  • The canonical name (field 1) must be present and must not be the same as the canonical name of another language.
  • If field 2 is not nil, it must a valid Wikidata item ID.
  • If field 3 or family is given and not nil, it must be a valid family code.
  • If field 4 or scripts is given and not nil, it must be an array, and each string in the array must be a valid script code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code.
  • If family is given, it must be a valid family code.
  • If type is given, it must be one of the recognised values (regular, reconstructed, appendix-constructed).
  • If entry_name is given, it must be a table that contains either two arrays (from and to) or a string (remove_diacritics) or both.
  • If sort_key is given, it may either be a string, or at table that in turn contains either two arrays (from and to) or a string (remove_diacritics).
  • If entry_name or sort_key is given, the from array must be longer or equal in length to the to array.
  • If standardChars is given, it must form a valid Lua string pattern when placed between square brackets with ^ before it ("[^...]). (It should match all characters regularly used in the language, but that cannot be tested.)
  • If override_translit is set, translit must also be set, because there must be a transliteration module that can override manual transliteration.
  • If link_tr is present, it must be true.
  • Have no data keys besides these: 1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr".

Checks not performed:

  • If translit is present, it should be the name of a module, and this module should contain a tr function that takes a pagename (and optionally a language code and script code) as arguments.
  • If sort_key is a string, it should be the name of a module, and this module should contain a makeSortKey function that takes a pagename (and optionally a language code and script code) as arguments.
  • If entry_name or sort_key is a table and contains a field remove_diacritics, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation ([^...]).

These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or full_link attempts to use the transliteration module.

Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.

The following must be true of the data used by Module:etymology languages:

  • canonicalName must be given.
  • parent must be given must be a valid language, family or etymology-only language code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language.
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item".

Codes in Module:families data must:

  • Have canonicalName, which must not be the same as the canonical name of another family.
  • If family is given, it must be a valid family code.
  • Have at least one language or subfamily belonging to it.
  • Have no data keys besides these: "canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item".

Codes in Module:scripts data must:

  • Have canonicalName.
  • Have at least one language that lists it as one of its scripts.
  • Have a characters pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"). (It should match all characters in the script, but that cannot be tested.)
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction".


"https://si.wiktionary.org/w/index.php?title=Module:data_consistency_check/documentation&oldid=163911" වෙතින් සම්ප්‍රවේශනය කෙරිණි