Module talk:Lang/data
![]() | Module:Lang/data is permanently protected from editing because it is a heavily used or highly visible module. Substantial changes should first be proposed and discussed here on this page. If the proposal is uncontroversial or has been discussed and is supported by consensus, editors may use {{edit template-protected}} to notify an administrator or template editor to make the requested edit.
|
This is the talk page for discussing improvements to the Lang/data module. |
|
Archives: 1Auto-archiving period: 3 months ![]() |
![]() | This module does not require a rating on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||
|
Edit request 24 March 2025
[edit]![]() | This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Description of suggested change: Add support for additional proto-languages, under their family's ISO 639-5 codes:
- Proto-Kartvelian:
ccs
- Proto-Uralic:
urj
I ran into the need to tag these languages while performing language cleanup in Laryngeal theory. I'm certain their articles would benefit from proper tagging, as well.
Diff:
− | ["ca-x-old"] = "Old Catalan",
["cel-x-combrit"] = "Common Brittonic", -- cel in IANA is Celtic languages | + | ["ca-x-old"] = "Old Catalan",
["ccs-x-proto"] = "Proto-Kartvelian", -- ccs in IANA is Kartvelian languages
["cel-x-combrit"] = "Common Brittonic", -- cel in IANA is Celtic languages |
− | ["sla-x-proto"] = "Proto-Slavic", -- sla in IANA is Slavic languages
["yuf-x-hav"] = "Havasupai", -- IANA name for these three is Havasupai-Walapai-Yavapai | + | ["sla-x-proto"] = "Proto-Slavic", -- sla in IANA is Slavic languages
["urj-x-proto"] = "Proto-Uralic", -- urj in IANA is Uralic languages
["yuf-x-hav"] = "Havasupai", -- IANA name for these three is Havasupai-Walapai-Yavapai |
EnronEvolvedMy Talk Page 22:32, 24 March 2025 (UTC)
{{lang|fn=name_from_tag|link=yes|ccs-x-proto}}
→ Proto-Kartvelian{{lang|fn=name_from_tag|link=yes|urj-x-proto}}
→ Proto-Uralic- —Trappist the monk (talk) 22:57, 24 March 2025 (UTC)
Edit request 24 March 2025
[edit]![]() | This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Description of suggested change: Add a language code for a couple more proto-languages, also using their groups' ISO codes:
- Proto-Finno-Ugric:
fiu
- Proto-Samic:
smi
I hear Proto-Finno-Ugric a debatable proto-language these days, but I'm running into the need to tag it in Laryngeal theory.
Diff:
− | ["egy-x-old"] = "Old Egyptian",
["gem-x-proto"] = "Proto-Germanic", -- gem in IANA is Germanic languages | + | ["egy-x-old"] = "Old Egyptian",
["fiu-x-proto"] = "Proto-Finno-Ugric", -- fiu in IANA is Finno-Ugric languages
["gem-x-proto"] = "Proto-Germanic", -- gem in IANA is Germanic languages |
− | ["sem-x-taymanit"] = "Taymanitic",
["sla-x-proto"] = "Proto-Slavic", -- sla in IANA is Slavic languages | + | ["sem-x-taymanit"] = "Taymanitic",
["smi-x-proto"] = "Proto-Samic", -- smi in IANA is Samic languages
["sla-x-proto"] = "Proto-Slavic", -- sla in IANA is Slavic languages |
EnronEvolvedMy Talk Page 23:32, 24 March 2025 (UTC)
{{lang|fn=name_from_tag|link=yes|fiu-x-proto}}
→ Proto-Finno-Ugric{{lang|fn=name_from_tag|link=yes|smi-x-proto}}
→ Proto-Samic- —Trappist the monk (talk) 00:15, 25 March 2025 (UTC)
@Trappist the monk: I am curious what you think of the Belarusian Latin alphabet AKA "łacinka". The IANA language-subtag-registry for BCP47 does not seem to say much in this regard. For "be", I could only find variants "be-1959acad" and "be-tarask" and that "Cyrl" script should be suppressed with "be" (but not "Latn"). Since some Belarusian seems to actually be/have been originally written in "łacinka" (vs. transliterated for readers of Latn scripted languages) is this better as a variant via something like "be-łacinka" (I am not sure that technically qualifies due to the "ł") or a romanization via something like "be-Latn-łacinka"? And should "łacinka" be added here as a transliteration addition to translit_title_table
? What is the best way to markup such text: with a {{lang|be-Latn-łacinka|...}}
or {{translit|be|łacinka|...}}
or something else? Thank you, —Uzume (talk) 18:23, 31 March 2025 (UTC)
- From the point of view Module:Lang, latn script is latn script regardless of alphabet so the general case is
{{lang|be-latn|łacinka text}}
or{{langx|be-latn|łacinka text}}
. When the text is a łacinka-alphabetic romanization of Cyrillic Belarusian, you can use{{transl|be|łacinka}}
. So far as I know, łacinka is not a 'romanization standard' so is not supported by{{transl}}
. - We do not create variants like
1959acad
andtarask
because they must first be registered with IANA (there is no external standard from which variant subtags are derived). - If it is important to do so, you might consider creating a separate template like
{{lang-sr-Latn}}
which hard-codes the language label to link as[[Gaj's Latin alphabet|Serbian]]
. I don't think that easter-egging the language label is a good idea so the practice should be discouraged. - Łacinka is a latn script so should be simply marked up as a latn script.
- Did I answer your question?
- —Trappist the monk (talk) 22:33, 31 March 2025 (UTC)
- @Trappist the monk: Yes, pretty much. You seem to be advocating for
{{lang|be-Latn|łacinka text}}
and{{langx|be-Latn|łacinka text}}
and perhaps something likebe-Latn-latsinka
(wherelatsinka
is BGN/PCGN for лацінка or łacinka) if and when such a beast gets registered with IANA in much the same way aszh-Latn-pinyin
is althoughpinyin
seems to also be a romanization here as well. The only downside I see if that there is no real way to differentiate between{{langx|be|лацінка}}
(Belarusian: лацінка) and{{langx|be-Latn|łacinka}}
(Belarusian: łacinka) except for the fact that the latter is Latin script and thus gets automatically italicized. —Uzume (talk) 03:12, 1 April 2025 (UTC)- More-or-less, though
advocating
is a bit strong. The purpose of Module:Lang is to provide correct html markup for non-English text in compliance with MOS:FOREIGN. Writing{{langx|be|лацінка}}
and{{langx|be-Latn|łacinka}}
do that. If ever IANA adopts alatsinka
variant subtag, Module:lang will support it. - —Trappist the monk (talk) 13:17, 1 April 2025 (UTC)
- More-or-less, though
- @Trappist the monk: Yes, pretty much. You seem to be advocating for
Edit request 13 April 2025
[edit]![]() | It is requested that an edit be made to the template-protected module at Module:Lang/data. (edit · history · last · links · sandbox · edit sandbox · sandbox history · sandbox last edit · sandbox diff · transclusion count · protection log) This template must be followed by a complete and specific description of the request, so that an editor unfamiliar with the subject matter could complete the requested edit immediately.
Edit requests to template-protected pages should only be used for edits that are either uncontroversial or supported by consensus. If the proposed edit might be controversial, discuss it on the protected page's talk page before using this template. Consider making changes first to the module's sandbox before submitting an edit request. To request that a page be protected or unprotected, make a protection request. When the request has been completed or denied, please add the |
Description of suggested change:
Diff:
− | ["fr-ca"] = | + | ["fr-ca"] = "Canadian French", |
Introduced in this diff. Northern Moonlight 05:56, 13 April 2025 (UTC)
- See also Module_talk:Lang/data/Archive_1#Edit_request_8_January_2025. To address that request for consensus, let me propose that it is pretty self-evident that Quebec French is distinct from Canadian French (whether you call it a subset or a variant), as those articles amply describe. And Canadian French is expressible only as
fr-CA
in the schema used here. Is there an argument against this change based on a principle that eludes me? I have no objection to a separate question of whetherfr-quebec
(or something like that) ought to also exist, possibly along with other regional variants. But right now we have the problem that, for instance, Canadian French terms are being indicated as being specifically Quebec French, in error. TheFeds 08:13, 13 April 2025 (UTC) - Pinging Trappist the monk. Firefangledfeathers (talk / contribs) 16:26, 17 April 2025 (UTC)
- According to this search, there are about 70 articles that use
{{lang}}
(~60) /{{langx}}
(~10) withfr-CA
(also, ~6 templates). If we make this change, someone with sufficient language skills (that person is not me) must go through those articles and make sure that all instances of{{lang(x)|fr-CA|...}}
correctly identify the labeled dialect. Because Module:Lang does not have a mechanism to distinguish Québécois from generic Canadian French, we must invent one; perhapsfr-x-quebec
→ Quebec French. - Volunteers to make sure that the existing
{{lang(x)|fr-CA|...}}
templates are correctly applied or replaced with{{lang(x)|fr-x-quebec|...}}
? - —Trappist the monk (talk) 17:05, 17 April 2025 (UTC)
- To probe a little further before selecting a tag, the infobox at Quebec French suggests
fr-u-sd-caqc
as an IETF tag (added in this edit), though it seems it is not one that happens to correlate directly with ISO 639 & ISO 3166-1 alpha-2. Instead it seems to be using the RFC 6067 extension defined fully in Unicode Technical Standard #35, such thatu
means use the Unicode extensions,sd
means use a geographic subdivision,ca
is a semi-redundant way to encode the region information (meaning the same as ISO 639-1CA
), andqc
means the subdivision of Quebec. - Conversely, in
fr-x-quebec
,x
is for private use, withquebec
being the private use information (i.e. the string that English Wikipedia chooses to use to represent the place where Quebec French is spoken). - For the purposes of this module, how do we feel about either implementing a Unicode extension (
u
), a private use extension (x
), or neither? It looks like Module:Lang/data currently implements a few private use codes and no Unicode codes. TheFeds 19:34, 19 April 2025 (UTC)- I sometimes think of supporting the unicode locale extension for subdivisions. The necessary reference data are available at github. But, do we really need such precision? There are 5400+ defined subdivisions. I would venture to guess that almost none of them are actually required for en.wiki to provide correct html markup for non-English text and to provide appropriate labeling and tooltips for readers. For those languages that do have specific regional needs, like Québécois, private-use tags (with the
x
singleton) should be sufficient. - I suppose that we could support a very limited subset of the
u-sd-xxxx
subdivisions on an as-needed basis if it is deemed sufficiently important to do so. - —Trappist the monk (talk) 22:07, 19 April 2025 (UTC)
- I'm not really too concerned one way or another about which ought to be preferred (
fr-x-quebec
vs.fr-u-sd-caqc
), but wanted to consider the workflow of an editor attempting to use the {{lang}} and {{langx}} templates, whereby they might consult the mainspace article for guidance as to which tag to use, and find it doesn't work. We could amend the documentation for those templates to indicate that the Unicode extension is not presently supported, and that a private use tag corresponding to the ones at this module page ought to be used instead. Or, we could support some but not all—case-by-case as described. Or we could support them all, but that leads to the question whether a consensus exists to recommend one format or the other when there are now multiple ways of expressing the same concept (e.g.fr-CA
=fr-u-sd-ca
). Does any one alternative stand out as most elegant and workable? TheFeds 23:05, 20 April 2025 (UTC)- Presently there are 69 private-use tags known to Module:Lang. Most of those appear to refer to archaic (if that's the right word) languages. Some of them don't (
lmo-x-berg
→ Bergamasque,lmo-x-cremish
→ Cremish,lmo-x-milanese
→ Milanese; there may be others in that list. Of those three, two have unicode IETF tags in their article infoboxen: Bergamasque:lmo-u-sd-itbg
and Milanese:lmo-u-sd-itmi
. For Cremish, its unicode tag is likely:lmo-u-sd-itcr
. - This search suggests that there are about 140 articles that mention a unicode IETF tag. At a quick glance, most of those are for geographically specific living languages though I did find one (
gem-u-sd-ua43
→ Crimean Gothic) which is probably not a living language. There may be others; I didn't look closely. - On the other hand, this search finds about 1130 articles that use lang templates with private-use tags which suggests that editors are not too confused. But these are mostly used for dead languages so a unicode IETF tag is less likely to appear in a language article infobox (except for
gem-u-sd-ua43
and perhaps others). - I guess all of this suggests to me that if we are to adopt unicode IETF tags (as needed), they should be used for living languages only and only for those that are tied to a specific geographical area within the bounds of the larger area specified by the first to characters of the subdivision subtag (
it
initbg
). For non-living languages, private-use tags should be used. - —Trappist the monk (talk) 13:19, 21 April 2025 (UTC)
- Presently there are 69 private-use tags known to Module:Lang. Most of those appear to refer to archaic (if that's the right word) languages. Some of them don't (
- I'm not really too concerned one way or another about which ought to be preferred (
- I sometimes think of supporting the unicode locale extension for subdivisions. The necessary reference data are available at github. But, do we really need such precision? There are 5400+ defined subdivisions. I would venture to guess that almost none of them are actually required for en.wiki to provide correct html markup for non-English text and to provide appropriate labeling and tooltips for readers. For those languages that do have specific regional needs, like Québécois, private-use tags (with the
- Once we make the switch, I can go through the articles manually. Northern Moonlight 01:25, 22 April 2025 (UTC)
- To probe a little further before selecting a tag, the infobox at Quebec French suggests