Wiktionary:Beer parlour

From Wiktionary, the free dictionary
(Redirected from Wiktionary:BP)
Jump to navigation Jump to search

Wiktionary > Discussion rooms > Beer parlour

Welcome to the Beer Parlour! This is the place where many a historic decision has been made, and where important discussions are being held daily. If you have a question about fundamental aspects of Wiktionary—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list below (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don’t make personal attacks, don’t change other people’s posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page and consider before posting here whether one of our other discussion rooms may be a more appropriate venue for your questions or concerns.

Sometimes discussions started here are moved to other pages for further development. In particular, changes to a major policy or guideline may be discussed on the corresponding talk page and “simple votes” (as opposed to drawn-out discussions) can be conducted on our votes page.

Questions and answers typically remain visible on this page for one to two months, but they can always be found in the appropriate monthly archive (based on the date discussion was initiated). While we make a point to preserve all discussions that were started here, talk that is clearly not appropriate for this page may be deleted. Enjoy the Beer parlour!

Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


March 2024

A way to more easily connect with readers[edit]

I have seen this idea thrown around some and I have had it myself - what if we had some official social media accounts where we can respond to readers, give polls, etc., that admins have access to? In theory readers can interact with us here i.e. at the Information Desk etc., but I think that process can be a little obtuse for the average person, and for some even intimidating. I also know that various Wikimedia projects have their own accounts on various platforms. Also, since it's an open project, the more input the better, theoretically. Vininn126 (talk) 08:45, 1 March 2024 (UTC)[reply]

I see you beat me to it. I have personally had a need for this on numerous occasions, finding Reddit and Twitter posts that (often in the form of a joke/curiosity) showed serious errors in our dictionary. A recent example is diff. A way to interact with the people that bring such mistakes to our attention is in my opinion very important.
I think the easiest way to maintain this is to simply create a couple of accounts and share their login information in the Admin channel on Discord, which automatically gives all admins that have joined the Discord server the means to manage these accounts. Then afterward we can post various polls and/or announcements after a consensus with the community, while also having the ability to quickly respond to feedback. Thadh (talk) 09:26, 1 March 2024 (UTC)[reply]
@Thadh I agree, but we should probably message other admins and send it to their emails potentially, so as not to have a barrier to get access. Vininn126 (talk) 09:29, 1 March 2024 (UTC)[reply]
I would prefer we do that based on requests. Many admins are not very active and I don't want the mail to go to some years-long unchecked inbox. Thadh (talk) 09:42, 1 March 2024 (UTC)[reply]
Some platforms we should consider: Reddit, Twitter (X), Facebook. Any other suggestions? Vininn126 (talk) 09:58, 1 March 2024 (UTC)[reply]
Only fans? Allahverdi Verdizade (talk) 10:19, 1 March 2024 (UTC)[reply]
You wish. I ain't doing a body reveal that easily. Vininn126 (talk) 10:23, 1 March 2024 (UTC)[reply]
I hate the idea that we are, in effect, endorsing/legitimizing and making more attractice these intrusive systems, but we are effectively forced into it by user preference for them. DCDuring (talk) 13:54, 1 March 2024 (UTC)[reply]
@DCDuring One thing we could do is also promote how to engage with the site more directly. By creating accounts on these sites with wider reach, we can bridge the gap for readers who are scared to edit and also show them how to start discussions, etc., pontitally increasing editorship. Vininn126 (talk) 13:57, 1 March 2024 (UTC)[reply]
@DCDuring Alternatively, we could see it as engaging with the reality that users will not always come to the site directly in order to raise issues. Ignoring that isn't going to help anyone. Theknightwho (talk) 14:13, 1 March 2024 (UTC)[reply]
It's certainly not a bad idea. But, in the interest of protecting myself from "personalization", I waste my time at MW projects, not the commercial sites. DCDuring (talk) 14:50, 1 March 2024 (UTC)[reply]
@DCDuring What do you mean by personalization? Kiril kovachev (talkcontribs) 18:24, 2 March 2024 (UTC)[reply]
Generating content tailored to me, certainly including advertising, possibly already including or soon to include price discrimination. DCDuring (talk) 20:36, 2 March 2024 (UTC)[reply]
Mastodon. CitationsFreak (talk) 16:33, 1 March 2024 (UTC)[reply]
I second this. Allahverdi Verdizade (talk) 20:53, 2 March 2024 (UTC)[reply]
VK may be a good idea to attract any potential editors and/or readers from Russia. Thadh (talk) 20:45, 10 March 2024 (UTC)[reply]
I support this idea, but would it only be admins? I might like to have access as well. Ioaxxere (talk) 20:43, 1 March 2024 (UTC)[reply]
@Ioaxxere I think without a rigorous way to add users, it might devolve to a free-for-all. Also at the beginning, I think only truly trusted users should be given access. Perhaps there'd be a process for adding trusted users in the future. Vininn126 (talk) 20:46, 1 March 2024 (UTC)[reply]
We had an unofficial Twitter account created by WF. Only admins can see it, but there's some information at the deleted page for User:Wikt Twitterer (I believe they created it when they were using that account, but kept the Twitter account going after the Wiktionary account was blocked). Chuck Entz (talk) 23:56, 1 March 2024 (UTC)[reply]
Well considering no one has objected I think we can probably move forward. The question is what email to use when signing up for these accounts and what username to use. Vininn126 (talk) 10:49, 8 March 2024 (UTC)[reply]
We should probably create one to go with these accounts. Best not to advertise its address anywhere on-wiki though. Thadh (talk) 17:18, 8 March 2024 (UTC)[reply]
Use a Wikimedia address. Don't go third-party like Gmail etc. Equinox 17:25, 8 March 2024 (UTC)[reply]
That's a good point. Vininn126 (talk) 17:26, 8 March 2024 (UTC)[reply]
@Equinox How might we do that? Vininn126 (talk) 17:30, 8 March 2024 (UTC)[reply]
I assume we tell Wikimedia that we want to make a social media account. CitationsFreak (talk) 20:22, 8 March 2024 (UTC)[reply]
@Vininn126: Search me. When I said "we should stick with open tech like IRC" people laughed at me, and Discord won. But some day it will die, or become "Utility cloud computing" with a bill attached. It's smarter to keep free and open. You do it if you can. I'm not there anyway, being a transphobic nazi etc. Equinox 07:38, 29 March 2024 (UTC)[reply]

Bengali language[edit]

I intended to create a phonelist for Bengali. Is there anyone who can guide me through bot stuff? Arundhatisgupta (talk) 18:24, 1 March 2024 (UTC)[reply]

@Arundhatisgupta What do you mean by "phonelist"? What sort of bot work are you trying to do? (Keep in mind if you plan to do page edits using a bot, you need to get permission to do so.) Benwing2 (talk) 23:45, 5 March 2024 (UTC)[reply]

Restricting {{m}} in etymology sections[edit]

Wiktionary's etymology sections are not very machine-readable, and the main issue is the {{m}} template, which can be used in a wide variety of ways:

  • Origin within a language: A {{glossary|respelling}} of {{m|en|puisne}} (in puny)
  • Listing alternative forms of an etymon: From {{inh|en|enm|hed}}, {{m|enm|heed}}, {{m|enm|heved}}, {{m|enm|heaved}} (in head)
  • Listing related terms: More at {{m|en|Tyr}}, {{m|en|day}}. (in Tuesday)
  • Listing unrelated terms: Not related to {{m|en|Romanian}} or {{m|en|Roman}}. (in Rom)

I propose that {{m}} be used only for unrelated terms and that we create new templates for the other three cases. Ioaxxere (talk) 20:41, 1 March 2024 (UTC)[reply]

In the case of "More at...", that should be {{l}} anyway, since it refers to the entry and not the term. Theknightwho (talk) 21:29, 1 March 2024 (UTC)[reply]
Just in terms of the formatting produced, I dislike the use of {{l}} when used inline in other running text: {{m}} produces italicized text, which is visually distinct from the rest of the text.
For that matter, I understood the "l" in {{l}} to stand for "list", as this template was originally intended to only be used in lists of terms, where formatting to distinguish from other running text isn't needed. ‑‑ Eiríkr Útlendi │Tala við mig 05:09, 10 March 2024 (UTC)[reply]
That is an unrealistically lofty ideal. {{m}} has many other uses and you simply cannot sequester them all into separate templates. — SURJECTION / T / C / L / 21:36, 1 March 2024 (UTC)[reply]
I wouldn't mind that it no longer be used to italicize (often mis-italicize) taxonomic names. But there are no restrictions on its use at present, it has mostly been used for formatting, and there is no incentive for users to limit use. It seems particular hard to imagine that we could get users to comply with different rules based on the L3/4/5/6 header they were editing in. Our filters are already getting intrusive and unhelpful. DCDuring (talk) 22:50, 1 March 2024 (UTC)[reply]
This doesn't seem like a good idea... Thadh (talk) 22:53, 1 March 2024 (UTC)[reply]
Being able to mention other terms is very, very useful. Vininn126 (talk) 22:56, 1 March 2024 (UTC)[reply]
Yeah, I don't think we need a proliferation of different templates. It will just make coding much harder. As it is, it's already difficult to learn how to use templates like {{en-verb}}, {{inflection of}}, and {{Module:quote|call_quote_template}}. — Sgconlaw (talk) 22:59, 1 March 2024 (UTC)[reply]
Oppose. I spend a lot of time fixing etymologies where someone copied the entire etymology from an entry in another language without changing the language codes. If people routinely get that wrong, they're not going to have a clue about the subtleties and intricacies proposed. You'll end up with people copying from one entry where they make sense to another where they're all wrong- or worse, partly wrong. Unlike with language codes, there's no reliable way to tell if they're being misused without knowing something about the etymology (if there were, you wouldn't need them in the first place). Basically, this proposal would give editors more ways to be wrong. Chuck Entz (talk) 23:35, 1 March 2024 (UTC)[reply]
Also, I don't know if we would want to do this in the name of machine-readability. Wiktionary is not really a database, so if we wanted it to be machine-readable, it would have had to have been one to begin with. Maybe Wikidata could hanlde this kind of thing instead. Kiril kovachev (talkcontribs) 00:50, 2 March 2024 (UTC)[reply]
@Kiril kovachev I think it's a balance - we need to be machine-readable to some extent, since some users rely on that to collate info from a wide array of entries (and it also helps our bots), but I'd agree that the templates suggested here would be a step too far, as I don't really see what advantage they'd provide. Theknightwho (talk) 18:16, 2 March 2024 (UTC)[reply]
That's true, I agree with your point here. It's nice to have a clearly-defined {{inh|en|grc|...}} kind of thing, but there're also ways in which our etymology section can be virtually free-form and forcing it to be more machine-readable would kill that flexibility. Such as the ways we may be using {{m}} right now. Kiril kovachev (talkcontribs) 18:23, 2 March 2024 (UTC)[reply]
It doesn't seem very useful to me. Are there plans to have machines doing something with our etymology sections anytime soon? At some point far enough in the future, improvements in machine comprehension of natural language might make it easier for machines to understand what humans write, rather than forcing humans to adjust how they write so machines can understand it. I think there are a lot of aspects of writing etymologies that are difficult to boil down to a fixed set of templates, so I'm not enthusiastic about us engaging in that project unless there's some real benefit we can point to. Simple etymologies already use templates, so this proposal seems to deal with a tail of complicated etymologies (do you know what percentage do contain {{m}})?--Urszag (talk) 01:06, 2 March 2024 (UTC)[reply]
This seems to be a good attitude/strategy for such matters in general. DCDuring (talk) 17:52, 2 March 2024 (UTC)[reply]
Oppose. Makes things harder, would make me inclined to not do etymologies if it's such a pain, even if I know the word origin. If we must restrict codes in this way, introduce a nice GUI that creates the code from our menu choices or something. Equinox 18:27, 2 March 2024 (UTC)[reply]
Honestly, an extension of the New Entry Creator that can easily add etymologies and quotes would be nice... CitationsFreak (talk) 06:02, 3 March 2024 (UTC)[reply]
The lack of a template for this purpose has irked me in the past. We have nice templates for when a word comes from another language, but not for when it comes from another word in the same language (unless by some well-known process such as affixation).
Recently I wanted to generate a list of all English terms which are said to derive from another English term, but where that term's entry doesn't include the term derived from it in a "Derived terms" section. Such a list would help fill gaps in our "Derived terms" sections, but it's all but impossible to generate a comprehensive list like this with the current setup.
I would definitely support an effort to designate a template specifically for same-language derivations. It seems it would be possible to use {{af}} for this purpose with minor modifications to its code, and probably a new name too:
A {{glossary|respelling}} of {{af|en|puisne}}. Not related to {{m|en|some other term}}.
The other uses of {{m}} can be dealt with in other ways. This, that and the other (talk) 03:21, 4 March 2024 (UTC)[reply]
@This, that and the other what do you think it should be called? For now, we could create a template which redirects to {{affix}}. In the future, I think {{af}} should be adapted into a generic "internal derivation" template. I think @Benwing2 is on the same page on this. Ioaxxere (talk) 17:30, 4 March 2024 (UTC)[reply]
I realized that {{from}} didn't exist—I think that's a good name. Another idea is to be able to adapt {{der}} to allow a faster way of writing {{der|en|en|term|nocat=1}}. Ioaxxere (talk) 18:58, 4 March 2024 (UTC)[reply]
Yes, I recall seeing a discussion about broadening the use of, and renaming, {{af}} in the past (not sure where or with whom).
{{from}} is an excellent name - good find. This, that and the other (talk) 00:08, 5 March 2024 (UTC)[reply]
Sounds like a skill issue on the part of the machines. Nicodene (talk) 11:13, 7 March 2024 (UTC)[reply]

deprecate Template:1[edit]

I have renamed this to {{cap}} and deprecated it per the discussion in WT:RFM, but User:Equinox reverted the deprecation claiming it will save them keystrokes. I would like to see what people think about keeping this deprecated. I don't see how two keystrokes makes much of a difference, and {{1}} is just about the worst alias imaginable. If keystroke savings is really a big deal, we could use something like {{ca}} or {{cp}}, both of which are currently undefined. Benwing2 (talk) 02:02, 3 March 2024 (UTC)[reply]

Looking at the RFD, I see that {{M}} was suggested as a new name for this template by User:This, that and the other. We should just switch the template's name to that. Saves the same amount of keystrokes as {{1}}, better alias. (Plus, there was no real consensus to deprecate it in the first place.) CitationsFreak (talk) 06:24, 3 March 2024 (UTC)[reply]
Equinox's argument is weak - "cap" falls under the fingers nicely (on a QWERTY keyboard at least) and is just as typeable as "1". As for alternative names, {{M}} (for "majuscule") is just okay, because of the existence of lowercase {{m}}. The other obvious single-letter shortcuts ({{C}} for "capital" and {{U}} for "uppercase") are already taken. Another alternative would be {{^}}, implying "raising" the first letter to uppercase. This, that and the other (talk) 07:04, 3 March 2024 (UTC)[reply]
@This, that and the other @CitationsFreak {{U}} is hardly used so we could easily repurpose it. Also how about {{uc}}? Benwing2 (talk) 08:07, 3 March 2024 (UTC)[reply]
Please don't deprecate, rename, etc. An uppercase template name, like {{M}}, is a bit worse than a lowercase one, like {{1}}. I, for one, appreciate any keystroke savings for my arthritic joints. DCDuring (talk) 13:29, 4 March 2024 (UTC)[reply]
I know this template only for a few weeks, since editors always preferred bare links. The issue here is that it looks homographic to the non-italic linking template {{l}}, whereas we don’t want confusables. For this purposes anything cV seems to be bad already, looking like {{cat}}, {{c}} and {{C}}, the parameter |nocap=, and {{caps}} and {{cx}} and what not. I suppose Benwing2 wants to cleanup {{caps}} too, though, since this is only used in about 200 entries having {{he-root}}.
Intuitively I propose {{up}} and {{high}} since the letters are close together, and high, on the keyboard. And {{}} which is Shift + AltGr + U on my standard xkeyboard-config layout, I’d actually use that, it looking exactly as much better than {{1}} as needed, Abloh’s 3% rule or something. Not seen DCDuring using the template, but the same concern can be valid for other editors and a rename can make it better. Fay Freak (talk) 14:29, 4 March 2024 (UTC)[reply]
I don't see the point of deprecating this template if multiple editors use it to save keystrokes. However I think we should be automatically subst-ing every instance of {{1}} and {{cap}} for the sake of readability. Ioaxxere (talk) 17:36, 4 March 2024 (UTC)[reply]
In the software industry, "deprecation" usually gives you a long time to deal with something. For example, Microsoft deprecated WebClient (a class used to perform Internet downloads), but it continues to work for many years. Also, there is usually a genuine stated rationale by which the replacement is better, not just a programmer's whim. You can joke "it's not a big deal", but it is longer to type cap than 1 (especially if you create thousands of entries, like I do) and there's also muscle memory, which is really important for older people: please understand this, even if you are young: it's ableism. In this case, it costs us literally nothing to retain the 1 page as a redirect, which makes the template work fine. Removing and breaking the redirect can be nothing but either (i) punishing "old dogs" who can't learn "new tricks", or (ii) a fascist march ahead that supports developers but not users who create the project. Equinox 03:43, 5 March 2024 (UTC)[reply]
@Benwing2: Some years ago, we had a very aggressive template editor who upset many people by placing his/her software design decisions over user needs. Please don't be that person again. There are democratic discussion tools to allow you to work it out without turning off things that really matter to me, as a person who creates hundreds of entries per month and never fucks with a template. Equinox 03:47, 5 March 2024 (UTC)[reply]
@Equinox Would {{L}} work as a compromise? It looks sufficiently different that I don't find it confusable, and it's only one extra keystroke. I don't like {{1}} because it looks almost the same as {{l}} in the code. Theknightwho (talk) 19:13, 5 March 2024 (UTC)[reply]
L is better than nothing. But I really don't see why it's killing anyone to retain the working redirect.v Equinox 19:18, 5 March 2024 (UTC)[reply]
@Theknightwho @Equinox I was thinking of repurposing {{u}}. Not even a shift key extra and it's barely used; {{U}} can be used for user mentions if anyone cares. Please note that this change is not coming out of the blue; the discussions over getting rid of {{1}} have been going on for years, most recently in WT:RFM. I'm also not sure how useful or helpful it is to accuse me of being selfish, fascist, ableist and ageist, and IMO it's definitely not helpful to demand that no template be removed once it's created (or maintained over a several-year deprecation process, which is tantamount to the same thing). Benwing2 (talk) 22:53, 5 March 2024 (UTC)[reply]
@User:Benwing2 You should have said that at the start, to be honest. I feel that mentioning that would be more productive for you, since there is the same amount of joint movement in typing both {{1}} and {{u}}, so there could be no argument based on that. CitationsFreak (talk) 23:31, 5 March 2024 (UTC)[reply]
I'm getting to the point where I just use wikitext for everything I input. If the wikitext "required" for "proper formatting" is too hard, then I get the words right and leave something that doesn't necessarily conform to WT:ELE or whatever other norms we have for cleanup by others, who seem to like that kind of thing. If the next step is to filter such input, I'm out of here. DCDuring (talk) 01:07, 6 March 2024 (UTC)[reply]
@DCDuring I don’t have any strong views on the issue raised by this thread, but this attitude isn’t fair on other users, because you’re just creating clean-up work for others. The idea of using link templates outside of definition lines isn’t new, and it’s not complicated. Theknightwho (talk) 17:40, 7 March 2024 (UTC)[reply]
@User:Theknightwho Just more keystrokes and more learning overhead. I find it hard enough to try to make and keep taxonomic and related entries useful and to correct other users' mistaken and omitted uses of {{taxlink}}, {{vern}}, and now {{taxfmt}}. I don't undertake any non-morphological etymologies, instead inserting {{rfe}} (and getting complaints about that), because that's just more learning overhead, easily forgotten. I'm sure I get lots of descendants items wrong too. DCDuring (talk) 18:43, 7 March 2024 (UTC)[reply]
@Theknightwho, Benwing2: I’m with User:Ioaxxere above: Why not just automatically subst every instance of {{1}} (perhaps by bot), making this problem vanish? This template, whatever its name, is convenient for editors adding content but bad for readability; subst-ing would keep the convenience while resolving the problems of having such a template hang around in the code. — Vorziblix (talk · contribs) 02:57, 6 March 2024 (UTC)[reply]
@Vorziblix It's not possible to automatically do this except by periodically running a bot script. We only have a few things that currently run by periodic bot scripts, and AFAIK they are all triggered manually (by me, or in the case of {{t+}}, by User:Ruakh, although I don't know whether this still runs); in general I am reluctant to add more esp. to mainspace pages because they cause surprise for editors and are a maintenance burden. Also, for long words at least, it might be worse to have it duplicated in capitalized and lowercase forms than to have a (properly-named) template that wraps a single instance of the word. Benwing2 (talk) 03:04, 6 March 2024 (UTC)[reply]
Why aren't you 'reluctant' to do things that add more keystrokes? Is it because you aren't the one doing those keystrokes? Or do you think that our content is so good that all we have to do is pretty the dictionary up and let AI fill in the gaps? DCDuring (talk) 13:19, 6 March 2024 (UTC)[reply]
Needlessly snarky. Vininn126 (talk) 13:28, 6 March 2024 (UTC)[reply]
@DCDuring: Look, we can’t imagine well how it is to have arthritis and have to balance the concerns of joints and eyes of everyone, stop being so combative. Depending on the position of the keys, one or two keystrokes more may go easier for you than even one: if they are in a close area and if they are in the upper mid; 1 is at a corner and {{1}} strains the eyes of people with impeded and good eyesight in view of {{l}}. That’s why I have these three suggestions here, we might take two of: {{up}}, {{high}}, {{}}. I actually think a lot about keyboard layouts, the curly brackets are at the keys for 8 and 9 for me and for US standard <AD11> and <AD12> (the two right of O and P) and so these will be typed on one hand easily. Fay Freak (talk) 13:36, 6 March 2024 (UTC)[reply]
That's not snarky. I'm really concerned about attitude.
My eyesight isn't very good either. I've weighed the difference to me.
So, I'm just supposed to roll over? I haven't objected to the {{subst}} idea.
Why don't we have a thoroughgoing consideration of keystroke minimization. Why not use {{i}} for initial capitalization, instead of wasting it as a redirect to {{qualifier}}, when {{q}} also redirects thereto? DCDuring (talk) 15:45, 6 March 2024 (UTC)[reply]
In general the concerns of easy input and easy readability for future editors both have to be considered when naming templates. There are ways for editors to configure their own machines to make entry easier, e.g. I think an AutoHotkey script could be used on Windows to convert {{1| to {{cap|, or do anything similar like this, on the user's end.--Urszag (talk) 02:52, 7 March 2024 (UTC)[reply]
On the one hand I support the principle of things actually making sense, and nothing about the abbreviation "1" does. On the other hand it seems fairly harmless, and if it really is saving Equinox so much trouble, why not? Nicodene (talk) 11:10, 7 March 2024 (UTC)[reply]
Mehhh. I agree it's an unintelligible name (and therefore proposed at RFM that we make the 'main' name something more intelligible), but redirects are cheap and I don't see harm in leaving {{1}} as a redirect. Some prolific editors are clearly used to using it. (In any given couple of months, we have one or two entries which use {{altcaps}} and thus just display a redlink, because I or someone else has been unable to recall what the new name for that is.) I admit {{1}} is a particularly unintelligible name, though (unlike e.g. {{altcaps}}). - -sche (discuss) 06:04, 9 March 2024 (UTC)[reply]
Badly named redirects add cognitive burden to people trying to understand the wikicode, and redirects in general (esp. badly named ones) increase the tech debt; enough of them and the site becomes unmaintainable. This is why people like me and User:Theknightwho who put time into maintaining the site (rather than just using it) push back against having random redirects littering the site. I also still don't know why User:Equinox as well as User:DCDuring (who doesn't even use the alias) and are so attached to this particular alias when I have proposed a more sensible redirect {{u}} that is the same number of keystrokes. (Not to mention that using any template requires 5-6 keystrokes due to the left brace and vertical bar, so I have a hard time buying the argument that a single extra shift key makes a huge difference. I should also add, Equinox accused me of ageism and ableism knowing almost nothing about me -- I am in fact older than him and have suffered my own spate of hand-related disability.) Benwing2 (talk) 06:18, 9 March 2024 (UTC)[reply]
Personally, I feel that most of the arguments that apply to {{1}} apply to {{u}} as well. Also, after enough uses of it, they will be using it in no time (like with the mandated use of "en" in the etym and quote fields). CitationsFreak (talk) 20:50, 9 March 2024 (UTC)[reply]
{{up}} is still clearer than {{u}} and easy enough to type. My tendency is always that single-letter templates are badly named if they might look like something else (e.g. usage templates, {{user}}) and as after all there are little more than twenty letters available. This is not strictly comparable to terminal commands either, where we use to have a -V synonym of a longer --word. The one-ASCII-character ones really need broad consensus, even unconscious one. I doubt that {{u}} for {{cap}} will have this habitation like {{m}} and {{l}} have. The difference is also that these, and {{q}}, have semantics, even if it only consists in wrapping a language other than the working one, that capitalization at the beginning of English glosses hasn’t. All rationalizations that I am uneasy about {{u}} and {{i}} for any purpose. So far I have only three one-letter template-codes I use and watch out for. Fay Freak (talk) 21:49, 9 March 2024 (UTC)[reply]
@Benwing2 "I don't know why users keep doing their user things, rather than the better thing that I, the programmer god, imposed upon them". I hope some day you will realise why users hate your guts. I have been here since 2008, LOOK AT MY TRACK RECORD, I am doing nothing to you, I am not hurting you, but YOU, BENWING, you are changing things, you are hurting me and making it hard for me to continue the free open source project. Don't you dare make me sound like the criminal here. Equinox 07:52, 29 March 2024 (UTC)[reply]
I do not "agree" with some changes made to quote-book etc by Benwing, but it doesn't ruin what I'm doing. I am glad there is someone out there boldly editing at that level, because that seems to me like high-quality editor that can help out in complex code-related situations. I remember that I didn't agree with something that was going on related to Categories, too. But I'm kind of ambivalent most of the time on these things, and even if I "lose" an argument or whatever, ultimtaely it's not a big deal even if 100% of what I've ever done was just deleted. (I will mention that I recently (maybe Jan or Feb 2024?) started using the "{{u}}" in some citations that had underlined text in the original. I believe I found it in the code at a Wikipedia article and applied it here or something? I think Wiktionary should maintain code-level comaptibility with Wikipedia unless Wiktionary would thereby lose some area of functionality.) Geographyinitiative (talk) 08:08, 29 March 2024 (UTC)[reply]
@Geographyinitiative I'm glad you don't have a problem. I do. Do you also hang around abortion clinics saying "I'm sorry you lost a baby but I didn't"? Come back when you are relevant. This really hurts my usability and makes it hard for me to keep up the famously huge productivity that I have here. Equinox 08:13, 29 March 2024 (UTC)[reply]

Report of the U4C Charter ratification and U4C Call for Candidates now available[edit]

You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Hello all,

I am writing to you today with two important pieces of information. First, the report of the comments from the Universal Code of Conduct Coordinating Committee (U4C) Charter ratification is now available. Secondly, the call for candidates for the U4C is open now through April 1, 2024.

The Universal Code of Conduct Coordinating Committee (U4C) is a global group dedicated to providing an equitable and consistent implementation of the UCoC. Community members are invited to submit their applications for the U4C. For more information and the responsibilities of the U4C, please review the U4C Charter.

Per the charter, there are 16 seats on the U4C: eight community-at-large seats and eight regional seats to ensure the U4C represents the diversity of the movement.

Read more and submit your application on Meta-wiki.

On behalf of the UCoC project team,

RamzyM (WMF) 16:25, 5 March 2024 (UTC)[reply]

Module Breaker[edit]

User:Module Breaker should be blocked. Also, why can't I edit Wiktionary:Vandalism in progress? Avessa (talk) 15:06, 6 March 2024 (UTC)[reply]

@Avessa: Thanks. Because it is a vandalism target obviously. You can do some useful edits and then edit that page if needed. The bar to become autoconfirmed is low. Fay Freak (talk) 17:34, 7 March 2024 (UTC)[reply]

Unlink more and most in English headwords[edit]

For example:

common (comparative commoner or more common, superlative commonest or most common)

I don't see the point of having links to more and most in this kind of entry. In my view, having excessive links makes a page less visually appealing and could invite misclicks. Would anyone oppose removing these links? Ioaxxere (talk) 22:02, 6 March 2024 (UTC)[reply]

Not really seeing why unlinking the words is necessary. Maybe learners of English would find the links helpful. — Sgconlaw (talk) 11:49, 7 March 2024 (UTC)[reply]
Because it is easy to click on touchscreens. One would find it helpful only one time theoretically: and then would not understand the English definitions anyway; anyone who would find necessity to click them is at the wrong place with a monolingual dictionary. Nobody said it is necessary, it is about optimization. Fay Freak (talk) 13:35, 7 March 2024 (UTC)[reply]
For that matter why have links to commoner and commonest? (Alright, maybe commoner is a special case because of commoner#Noun.) DCDuring (talk) 16:58, 7 March 2024 (UTC)[reply]
Because the link is what’s left from WT:ACCEL, or any red link crosslinguistically when there is a comparable situation with periphrastic adjective gradation without the gadget, leaving a red link to invite creation. It would be too distressing to create a page and have no link then. The links are made not with random logics but impulses in mind. Fay Freak (talk) 17:31, 7 March 2024 (UTC)[reply]
I don't understand your last sentence. What "random logics" and what "impulses"? DCDuring (talk) 19:58, 7 March 2024 (UTC)[reply]
You find something that makes sense, harmonizes with some aesthetic equation. But we have to ponder what will be clicked, by the typical, impulse-driven behaviours of readers and editors. The contradiction against analogy logic (synthetic comparatives vs. periphrastic ones) would barely be felt. Fay Freak (talk) 20:06, 7 March 2024 (UTC)[reply]
I was about to close this but I've just realized that further and furthest are also linked. I would like to expand this proposal last-minute to cover those as well. @Sgconlaw, ExcarnateSojourner, does this change your opinions at all? Ioaxxere (talk) 21:48, 6 April 2024 (UTC)[reply]
@Ioaxxere: mmmm, not really. I think there's no harm in leaving them linked. — Sgconlaw (talk) 21:52, 6 April 2024 (UTC)[reply]
In that case this passes 2-1. Ioaxxere (talk) 06:38, 7 April 2024 (UTC)[reply]

Wikimedia Canada survey[edit]

Hi! Wikimedia Canada invites contributors living in Canada to take part in our 2024 Community Survey. The survey takes approximately five minutes to complete and closes on March 31, 2024. It is available in both French and English. To learn more, please visit the survey project page on Meta. Chelsea Chiovelli (WMCA) (talk) 00:23, 7 March 2024 (UTC)[reply]

Revoking autopatrolled status from Kwamikagami[edit]

Some background: @Kwamikagami has been autopatrolled since April 2009, and currently has just over 34,500 edits. They're sporadically active, but when they do edit they tend to make changes to large numbers of entries very quickly, and they tend to focus on single-character entries or anything relating to IPA.

Me, @Benwing2, Vininn126, AG202 and others have been pretty concerned about their sloppy editing for a few months now, and their autopatrolled status makes it much harder to spot. Some examples off the top of my head (but there are literally hundreds like this):

  1. [1]: mass-adding languages with tons of mistakes: the Khoekhoe entry uses the wrong language code throughout, Bodo (India) has the wrong L2 header, and Dogri (even today) still doesn't have a headword template.
  2. [2] Deciding to merge ң and ӈ with no consensus or discussion, despite the fact this caused a bunch of issues for several languages. They also did this for a bunch of analogous letters.
  3. [3] Adding a bunch of stenoscript entries like w—O or adm with merged part of speech headers and/or no headword templates. I can see why they've done this - to avoid repetition - but the obvious and sensible thing to do would have been to start a discussion, not create ~100 more entries with the same issue ([4]). We should not be giving '''{{PAGENAME}}''' on the headword line, and anyone with autopatrolled status should know that.
  4. [5] Adding definitions like "?". No request or attention template - just "?".
  5. [6] Even looking at their most recent contributions, they've wrongly given the pronunciation IPA(key): /ɪkˈsaɪ.ən, -ɒn/ at Ixion. This should be IPA(key): /ɪkˈsaɪ.ən/, /-ɒn/ or IPA(key): /ɪkˈsaɪ.ən/, /ɪkˈsaɪ.ɒn/.

All of this creates a massive clean-up job for everyone else, and Kwamikagami has repeatedly proclaimed that they don't understand the problem, which quite frankly means I don't think they should be autopatrolled anymore. Theknightwho (talk) 22:26, 8 March 2024 (UTC)[reply]

Support. I actually believe we should block Kwami for at least a month since they refuse to acknowledge the problematic nature of many of their edits and continue doing the same thing after warnings. But revoking autopatroller status is a good start. Benwing2 (talk) 22:30, 8 March 2024 (UTC)[reply]
Yeah, I decided not to bring up the repeated refusal to understand consensus, since it didn't seem relevant to this particular issue, but that's definitely a much worse issue.
The clean-up job of their contributions is going to be huge. Theknightwho (talk) 22:33, 8 March 2024 (UTC)[reply]
Support. Vininn126 (talk) 22:31, 8 March 2024 (UTC)[reply]
Support. There's also Category:Translingual entries with incorrect language header (not all of them are Kwami's, but too many are). There are good arguments for treating language-specific characters as either the language itself or translingual, but not both. Most of the entries in this category use translingual templates and language codes under other language headers.
The problem they have in general seems to be making snap decisions without thinking things through, then sticking with those bad decisions until forced to abandon them. They know more than I do on a lot of things, but they don't make very good use of that knowledge. As for the whole stenoscript issue: they did actually ask for advice at the time, so that may not be the best example. Chuck Entz (talk) 01:44, 9 March 2024 (UTC)[reply]
@Chuck Entz I'm not sure I agree with you re the stenoscript: regardless of when they asked for advice or what the response was, they've still created ~100 entries which are in a completely unacceptable state and will need to be cleaned up by someone. Even if they got no response to at all, what they did was definitely not the right thing to do, and is the kind of thing that has got some new users banned. Theknightwho (talk) 02:07, 9 March 2024 (UTC)[reply]
Support + they need a block. AG202 (talk) 04:58, 9 March 2024 (UTC)[reply]
@Theknightwho I confess that I've also used this forbidden headword line on sum#Multiple parts of speech and tht#Multiple parts of speech. I agree that it would be better as a template, but I think "multiple parts of speech" should be allowed as a POS header. Ioaxxere (talk) 06:49, 9 March 2024 (UTC)[reply]
No, it shouldn't. — SURJECTION / T / C / L / 09:53, 9 March 2024 (UTC)[reply]
It looks super objectionable. With little necessity, since at least with {{head}} you can just use |catN=. You might stretch WT:POS a bit by letting a part of speech header be followed by another part of speech header and then only the headword line, which probably contradicts basic publication logics of not having empty headers but would at least look better. For such alternative forms, to save vertical space, we could introduce headers like Pronoun · Adjective (i.e. in my example separated by middle dots). Years ago it was considered whether it would be better to have templates instead of headings, as on other Wiktionaries, only dismissed for Lua memory restrictions, to make appearance centrally manipulatable. Fay Freak (talk) 18:09, 9 March 2024 (UTC)[reply]
I don't think this would be widely accepted. It messes with categorization and not having a "head" template violates our practices. I would change it to how entries like obvi & unfort are. AG202 (talk) 20:04, 10 March 2024 (UTC)[reply]
@AG202 those entries are sensible, since each abbreviated word corresponds with a single part of speech. Compare tht, which would need to have four or five identical POS sections. Clearly a dedicated template would be preferable eventually, although for now I don't think a few missing categories are the end of the world. Ioaxxere (talk) 23:37, 10 March 2024 (UTC)[reply]
The repeated POS sections are what are required at this point per our policy. You should've brought it up with English editors at the very least if not everyone in general before creating those entries like that. It clearly violates our Entry Layout guidelines. AG202 (talk) 23:48, 10 March 2024 (UTC)[reply]
@Ioaxxere I agree with User:AG202. There are various imaginable ways of compressing repeated POS sections but (a) it needs discussion, (b) I doubt using a POS "Multiple parts of speech" is ideal in any case; certainly the actual parts of speech should be listed one way or another. Benwing2 (talk) 23:53, 10 March 2024 (UTC)[reply]
Support unfortunately, I think we exhausted other options. Kwami was given multiple written warnings and blocks for EACH these mass edits and continued regardless. They also haven't really helped clean up or shown remorse for their problematic mass edits... - سَمِیر | Sameer (مشارکت‌ها · بحث) 08:47, 9 March 2024 (UTC)[reply]
Support + a block to fix up the entries. CitationsFreak (talk) 23:08, 9 March 2024 (UTC)[reply]
Support Ioaxxere (talk) 23:35, 10 March 2024 (UTC)[reply]

Revoked, given:

  1. The unanimous and overwhelming support.
  2. It only takes a nomination from one admin and approval from another for a user to gain autopatrolled status.
  3. This has been open for just over 2 days, which is about 6 times longer than it took for the original nomination to get approved and actioned ([7] [8] [9]).

Theknightwho (talk) 00:18, 11 March 2024 (UTC)[reply]

@Theknightwho Thank you. Benwing2 (talk) 00:26, 11 March 2024 (UTC)[reply]

Eastern Geshiza language[edit]

User:Geshiza has been asking about adding this language, but in the meanwhile has created a walled garden of over 30 entries with their own improvised categories, but no templates and no links to or from the rest of Wiktionary.

Adding this language won't be easy, because it's hard to tell what it really is. It's apparently a sub-sublect of Horpa (language code ero), but the Wiktionary article for that language doesn't have much detail about what it describes as "a cluster of closely related yet unintelligible dialect groups/languages". In one analysis of the groupings that it cites, there are 5 "varieties", of which "Central Horpa" has 3 "dialects", one of them being "Dgebshesrtsa (Geshezha 革什扎) (non-tonal)". Whether "Gesheza" and "Geshiza" are the same thing isn't explicitly stated, but another quote in the article makes that seem likely. At any rate, there's no mention at all of "Eastern Geshiza". Does anyone have access to any sources that will make sense out of all this? Chuck Entz (talk) 02:28, 10 March 2024 (UTC)[reply]

@Chuck Entz This sort of "do it then get permission" approach was done for Belter Creole as well. I am strongly opposed to allowing this to proceed as it sets a terrible precedent. I would suggest moving the contents into that user's space until it becomes clearer whether there's any hope of supporting this variety or these varieties. Benwing2 (talk) 03:11, 10 March 2024 (UTC)[reply]
@Benwing2 I don't think they're comparable at all. Belter Creole is a constructed language, whereas Eastern Geshiza seems to be a variety of Horpa, and I can see that a published grammar exists. I would much rather that we simply put a moratorium on any new entries until we've hashed out how it should be handled, but regardless of the language code they still belong in mainspace. Theknightwho (talk) 03:16, 10 March 2024 (UTC)[reply]
@Theknightwho Ultimately maybe so, but not remotely in the current state they're in, and I doubt simply asking or telling this user to stop will make them stop. Who's gonna restructure and clean up the entries once we sort out how many varieties are involved and whether they are L2's or etymology variants? You? If you're not willing to personally commit to doing this then IMO we should move these ill-structured entries to userspace and put them back, gradually, in a properly structured form, once we add the lect codes. Benwing2 (talk) 03:36, 10 March 2024 (UTC)[reply]
@Benwing2 @Theknightwho moving the entry's to their userspace is probably fine. They seem to not understand templates (but they are making an effort, as they seem to be trying to make their entries match others here). We could have them practice using templates in their userspace and, once we feel like they understand how templates work, they can move the entries back themselves. — Sameer (مشارکت‌ها · بحث)
As someone who regularly patrols Abuse Filter 68, I can tell you that creating entries with no templates is more common than you might think. Usually it's not bad faith- just cluelessness. Chuck Entz (talk) 04:23, 10 March 2024 (UTC)[reply]
@Benwing2: Well, they've already been asked, but it's too soon to tell how they'll respond. Chuck Entz (talk) 03:55, 10 March 2024 (UTC)[reply]
@Chuck Entz @Benwing2 they responded and they indicated they will wait until everything is resolved before continuing to edit. — Sameer (مشارکت‌ها · بحث) 05:19, 10 March 2024 (UTC)[reply]
@Sameerhameedy Sounds good, thanks for making the request. Benwing2 (talk) 05:21, 10 March 2024 (UTC)[reply]

Language titles with category[edit]

Could the language.titles have a clickable link to their Category? (main, or lemmas, whatever?) Ideally, also with tooltip with their code? (would be very helpful!). At pages with many language sectors, it is very difficult to go down to the bottom and find the language.
e.g. [:Cat:Afar language|<span title="Afar (aa)">Afar</span>] Thank you! ‑‑Sarri.greek  I 12:12, 10 March 2024 (UTC)[reply]

I'd rather not add any templates to the headings. One could implement a JavaScript gadget that automatically does this, though. — SURJECTION / T / C / L / 12:31, 10 March 2024 (UTC)[reply]
M @Surjection, Thank you. I have no idea how it could be done. I would be delighted at the output. ‑‑Sarri.greek  I 12:57, 10 March 2024 (UTC)[reply]
I have a working prototype in User:Surjection/linkLanguageHeaders.js. You can add it to your common.js to test it. Perhaps it can be turned into a gadget if there is interest. — SURJECTION / T / C / L / 13:08, 10 March 2024 (UTC)[reply]
Yes, I think it was agreed awhile ago not to use templates in headings and IMO this is just as well. Benwing2 (talk) 00:25, 11 March 2024 (UTC)[reply]

I was not proposing a way to do it, I was just showing the desired result. I don't know what js is. I do not change default looks at platforms. As a reader, I would like to click language.titles, because I do not know what they are and Categories are too far away to click. Could, please, en.wiktionary rethink it? Thank you. ‑‑Sarri.greek  I 01:36, 14 March 2024 (UTC)[reply]

Hi, it should be available now through Special:Preferences under "Gadgets" as "Add links to language headings that point to the category of the corresponding language." — SURJECTION / T / C / L / 19:09, 14 March 2024 (UTC)[reply]
Ω! Μ @Surjection! you did this for me? Hooray! Thank you, thank you! I will find it immediately. You are too kind. I hope, lots of people will like it and that it become standard! ‑‑Sarri.greek  I 19:57, 14 March 2024 (UTC)[reply]
It works! it is wonderful; why not for everyone? why hidden in 'gadgets'... You are a magician M @Surjection. The default should be the 'best' and the most useful. ‑‑Sarri.greek  I 20:06, 14 March 2024 (UTC)[reply]

Make default language titles with category[edit]

Great news! M @Surjection, has made a Gadget and we can click the Language.Titles to go to the category! I propose it become default, for all to use. Kiitos! kiitos Surjection! ‑‑Sarri.greek  I 20:27, 14 March 2024 (UTC)[reply]

I don't personally think it should be the default, since it can be a bit distracting and confusing to those who aren't used to it. — SURJECTION / T / C / L / 21:44, 14 March 2024 (UTC)[reply]
But, M @Surjection, you have made it so discreet and elegant! There are no colours, or anything 'loud' about it. I find it very helpful, because there are many names of languages unknown to us. I am delighted, and I wish all people could use it too. (you may not guess it, but lots of us do not go to Preferences. This was my first time, except for Global Pref. for Vector Classic for wikipedias, and fr.wikt). ‑‑Sarri.greek  I 22:27, 14 March 2024 (UTC)[reply]
Thank you @Surjection for doing this! I tried it out and it looks great. @Sarri.greek I think enabling it by default could very well be done a bit down the line but for the moment we should wait to make sure it doesn't have any unexpected interactions with anything else. Benwing2 (talk) 22:36, 14 March 2024 (UTC)[reply]
Mainio! hieno! -in honour of M Surjection, from now on, Finnish will be the language of interjections. .js will be renamed .surjs @Benwing, many wiktionaries have clickable Lang.titles. I was, so longing for it. At el.wikt, the visible labels {{lb}} before definitions, link to their Cat. Where, we see on top, the word of the label in host language, and sorted on top, its translation in the target language :) Anything to fascilitate readers! ‑‑Sarri.greek  I 04:38, 15 March 2024 (UTC)[reply]
I'm also worried about how it will behave in the mobile view, specifically about if it makes the headings harder to click to expand. It does help get around the lack of categories in the mobile view, something which has always greatly irked me. — SURJECTION / T / C / L / 07:35, 15 March 2024 (UTC)[reply]

Two transliterations[edit]

A question (after endless discussions of how to transliterate Modern Greek at Module_talk:el-translit). I do not know about other languages, but at least for Modern Greek ISO offers two types of conversions.

  • TypeA = unique.conversion letter-to-letter transliteration, reversable (two-directional), used for international usage. Customs, machines etc when one-to-one translit is needed.
  • TypeB = slightly simplified, and pseudo-phonemic, calls it transcription (but not with IPA symbols), for national usage. For Greek, the only difference to TypeA are two macron diacritics.
  • ISO also introduces an idea of a 'level 3' mixed Type, more phonemic, for national usage, 'especially' when the above transliterations are very different from the pronunciation.

The question is: Does en.wiktionary have a rule that says: a) en.wikt is obliged to provide the official unique.conversion ISO transliterations. b) en.wiktionary also provides a more phonemic transliteration based on ISO and House Rules, through consensus.
If a) is yes, then we should have two transliterations (for some languages). Discussions would be needed only for b), saving a lot of our energy. Two translits, How? I propose

word (xxxxx© / xyyxxxyy) ...or I for ISO --please check the tooltips

Thank you. ‑‑Sarri.greek  I 12:42, 10 March 2024 (UTC)[reply]

@Sarri.greek Agreed, Persian is also running into this issue. After a discussion months ago it was agreed that Persian templates should have two transliterations (Classical + Iranian) but modules don't support that so we can't do anything rn. I believe Hebrew editors have wanted something similar as well. — Sameer (مشارکت‌ها · بحث) 18:21, 10 March 2024 (UTC)[reply]
@Sameerhameedy There is some language-specific support for this in place at the moment: the major example being Chinese (and I'm not referring to the separate languages grouped together), where several lects show two or three transliterations each in the dropdown; Cantonese has four, and Mandarin seven(!). Korean, Thai and Khmer also do this in various ways, too.
It's clear that there needs to be a language-neutral way of showing things like this, and (taking Mandarin as a benchmark) it shouldn't be limited to transliterations into the Latin script, either, given one of the systems is Zhuyin and another Cyrillic. Theknightwho (talk) 20:16, 10 March 2024 (UTC)[reply]
Thank you M @Sameerhameedy. Asking M @Theknightwho for languages mentioned with 3 or 4 transliterations. What is the legal status of these? By 'obligatory' for wiktionary to show, we mean: ISO-assigned for international transactions like exports. Is there one and only one topping the others? The problem we have here is: Because wiktionarians try to adapt ISO to something more useful to our readers, the discussions a. never end. and b. every 5 or so years, someone comes up with an alteration or a restoration of some letter conversion. This will never end. ‑‑Sarri.greek  I 22:43, 10 March 2024 (UTC)[reply]
@Sarri.greek As far as I can tell, they all have one system which is used for things like links (i.e. as the "transliteration" in the normal sense), and the others are only shown on the entry.
I don't think we're under any obligation to choose the ISO standard as the main transliteration, but if we don't, then it's a good idea to show it on the entry itself. Theknightwho (talk) 22:50, 10 March 2024 (UTC)[reply]
@Theknightwho, I see that such languages have boxes for transliterations = they can manage multiple solutions. I was thinking of languages that have one translit. next to PAGENAME, and disabled the option to add a second one. May I add a point:
ISOs have been critisised for poor results and unsuccesful conversions. Still, I am not proposing to reform ISO here. If ISO makes changes, we record them and update the official translit. I am proposing to free ourselves from the rigid 1st translit, which is not-to-be-debated. Also: how do wikipedians face this problem? Does en.wikt. have a liaison to en.wikipedia for questions or coordination? Thank you. ‑‑Sarri.greek  I 23:04, 10 March 2024 (UTC)[reply]
@Sarri.greek The community of Wikipedia editors who work on language entries seems much smaller than the community of Wiktionary editors, so whoever has the most stamina tends to win out. E.g. User:Mahmudmasri insisted on particular standards for transliteration and phonemic rendering of Egyptian Arabic that I disagree with, but I don't have the energy to fight him on this and he does have the energy to patrol all the relevant pages and edit-war as necessary to get his preferred system in place, so that is what Wikipedia has. Similarly for things like language names and family trees; User:Kwamikagami out-staminas everyone else. I definitely agree with User:Theknightwho that we are under no obligation whatsoever to choose ISO's or anyone else's proposed standard as the preferred/main transliteration. We need to do what's right for Wiktionary and hopefully maintain some consistency of approach across languages where feasible. Benwing2 (talk) 00:24, 11 March 2024 (UTC)[reply]
Thank you @Benwing2 About your comment (for general 'rules') >>we are under no obligation whatsoever to choose ISO's or anyone else's proposed standard as the preferred/main transliteration.<< (Also by @Theknightwho) The problem with not having some 'locked' directives, is, that talks be endless. Official things: (ISO, spelling directives of Academies or similar). Are not official things the first obligation of wikt? = credibility, stability, well-referenced, not subject to 'talks' and alterations. I dislike it too, but as a reader, I expect the info available. Otherwise, I would have to go elsewhere to get it. For some ISOs: Wiktionary's standards aspire to give better results than the official ISO :) That would be nice! But one has to see the comparison. ‑‑Sarri.greek  I 01:46, 11 March 2024 (UTC)[reply]
@Sarri.greek Yes, sometimes consensus is hard to achieve but we all know that some ISO standards are garbage and/or have no adoption, and many ISO standards simply have different aims than we do at Wiktionary. I think we should aim to not be gratuitously different from ISO standards where possible (e.g. we use ISO language codes whenever possible rather than incompatible ones), but at the same time not be bound by them (e.g. sometimes we merge lects that ISO considers different, and sometimes we split lects that ISO considers the same). Benwing2 (talk) 01:53, 11 March 2024 (UTC)[reply]
Ok, then @Benwing2. This is the end of this talk, so, my proposal for 2 transliterations is withdrawn. ‑‑Sarri.greek  I 02:07, 11 March 2024 (UTC)[reply]
@Sarri.greek I don't think you need to withdraw your proposal just to end the conversation :) ... I do think having multiple translits is an interesting idea to be potentially considered further. After all, this is not the first or second time this idea has come up. Benwing2 (talk) 02:11, 11 March 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Holding a discussion between two in this medium is difficult, I find one between so many impossible! I will only say that Dictionaries should be accessible (understandable to the "Man on the Clapham omnibus" — Oxford dictionaries appreciated this and have changed substantive to noun in their entries). I suspect that most people, not understanding IPA, use the transliteration as a guide to pronunciation. I hope that whoever makes a decision (the cynic in me says that it will probably be changed again next year) will bear the "man from Clapham" in mind.   — Saltmarsh 06:07, 11 March 2024 (UTC)[reply]

Ωωωω! my wise mentor and administrator for Greek, @Saltmarsh! Hear, hear! Thank you. ‑‑Sarri.greek  I 06:13, 11 March 2024 (UTC)[reply]

One system, multiple transliterations[edit]

For Vedic Sanskrit, transliteration is abused to show the placement of the accent. Our policy is not to show the placement in the spelling of the word. Now, for finite verbs incorporating prefixes, there are two possible placements for the same verb, depending on the grammatical usage of the verb. Is there an approved mechanism for showing the two transliterations, and if so, what is it and where, if anywhere, is it documented? Or do we only show the accent for finite verbs for the usage where a verb without a prefix would bear an accent? The placement in the other case appears to be reliably predictable if one can identify the prefix. --RichardW57m (talk) 11:07, 11 March 2024 (UTC)[reply]

There's an undocumented solution of using |tr2= in templates similar to {{head}}, which currently works for some (perhaps all) Sanskrit headword templates. @Theknightwho: I don't know whether it is likely to be declared a 'hack' and broken with disdain. It looks from the code of Module:headword that it is intended to work. --RichardW57m (talk) 15:48, 26 March 2024 (UTC)[reply]
@RichardW57m Stop tagging me if you’re just going to make rude comments. Theknightwho (talk) 15:52, 26 March 2024 (UTC)[reply]
@Theknightwho: Kindly advise whether this technique is safe to use. I'm not sure what to conclude from its lack of documentation. Perhaps the correct solution is to clone the headword module for Sanskrit, though I hope not. --RichardW57m (talk) 16:10, 26 March 2024 (UTC)[reply]

I refer to this marking of the accent as an abuse partly because transliteration-related categorisation assumes that explicit transliterations are exceptional and worthy of review, whereas it is the norm for words found in accented texts. --RichardW57m (talk) 11:07, 11 March 2024 (UTC)[reply]

How should we transliterate (into Japanese script or other scripts), romanize, and lemmatize Ryukyuan?[edit]

Previous discussions[edit]

The following previous discussions can have useful possibilities.

Information[edit]

Lately, the Ryukyuan orthography has been a mess. Various works vary between the hiragana or katakana or mixed script. There are vowels and syllabic consonants that cannot be transcribed cleanly/properly using Japanese orthography, so the central vowel (ɨ for example, サ行) been variously transcribed in Japanese script as シゥ, シィ, ス, す, スィ, ス𛅤 (CJK small katakana wi () if you cannot render this character), you name it. Aspirated and unaspirated consonants are also variously referred to as plain and glottalized consonants, and one of either is distinguished in hiragana or katakana. At Wiktionary we use an ad hoc transcription of inserting dakuten into the aspirated (Amami) and unaspirated (Okinawan/Kunigami), which is not used anywhere else. We also use an ad hoc method of including kanji in Ryukyuan languages, which some people do to transliterate Okinawa songs (but I can't find an example at the moment). Thus, 送り仮名 (okurigana) is basically another ad hoc transcription. In addition, we are basically duplicating kanji information from the Japanese entry, which requires more time and effort.

For the glottalized consonants such as [⸢ʔwáː] 'pig', should we do っわー, or ’わー?

Miyako has a special vowel, variously referred to as an apical vowel, laminal vowel, or fricative vowel (it is not a central vowel), which is variously transcribed as (S)ɨ, (S)ï, ʉ, ɿ, z, ü, you also name it. In fact, there are syllabic consonants in Ogami Miyako that cannot be transcribed cleanly in Japanese kana script, although there's a possibility that some Ogami words are actually reflections of a fricative vowel, as Kaneda Akihiro's vocabulary spreadsheet (from personal communication) does.

For romanizing, take Okinawan Shuri dialect [⸢ʔútɕínáː] for example. We could variously romanize it as ucinaa, 'ucinaa, ?ucinaa, uchinaa, uchinā, 'uchinā, you name it as well. And central vowels ([ɨ] in this instance) could either be transliterated as ï or ɨ (perhaps IPA only, so the former can be more plausible?), or we can transliterate [i] as yi and [ɨ] as i, and also have a glottalized initial as qV (as in qutyinaa). For aspiration, we could include <h> for aspiration, but nothing for unaspiration (or <'>), or include <'> for aspiration but nothing for unaspiration.

Finally, do we lemmatize at the kanji, the kana, or romanization? The current situation is just a total mess.

TL:DR: Transliteration and lemmatization of Ryukyuan needs a massive overhaul; it's a mess as of right now.

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Mcph2): This is an important discussion for the orthography and lemmatization of the Ryukyuan languages. Please come to a consensus. Chuterix (talk) 17:10, 11 March 2024 (UTC)[reply]

We should lemmatize at what native speakers have used the most, absent a standard orthography, regardless of if it seems inconsistent or "ad-hoc". Defective or variant orthographies are not specific to Ryukyuan, and in other cases, we list the variants as alternative forms with the "standard" or most-common form as the lemma. (Or in the case of two differently-pronounced words represented by the same orthography, we disambiguate in the etymology + pronunciation sections)
For Okinawan in particular, there are several works written in mixed script (Kanji & kana(, and it looks to be the traditional orthography as well, so I wouldn't support a move to solely kana, and definitely not the Latin script. The same level of research should be done for the other languages as well; if they are more-written in the Latin script or katakana, then shifts can be made, but the research needs to be done first. AG202 (talk) 17:25, 11 March 2024 (UTC)[reply]
As someone who does not read Japonic/Ryukyuan literature and cannot otherwise comment much on this, I would just like to register my (ignorant) doubts towards/concerns regarding Wiktionary constructs such as {{ryn-readings}} (the concept of on'yomi vs. kun'yomi, at least) and Category:Northern Amami-Oshima Han characters (the concept of "Ryukyuan kanji" in general). Kana orthography seems to be under-developed, let alone usage of kanji (or should it be the other way around? placenames, etc.). Are we just reapplying 標準語 kanji to Ryukyuan? (can we examine 1. Japonic dialects [using kanji seems non-problematic] 2. Chinese "dialects" [本字 debates, "unwritten", etc.] 3. Jeju [the concept of Sino-Jeju is discouraged on Wiktionary]? as a comparison point for this topic?) —Fish bowl (talk) 09:29, 14 March 2024 (UTC)[reply]

Recent change to government standard for Japanese[edit]

I broke this off into a subtopic because I do not understand Japanese (and therefore cannot check original sources) and I'm generally ignorant of CJK languages, but per Wiktionary:Grease_pit/2024/March#FYI:_Major_romanization_change_coming_in_Japan, the government standard in Japan for Japanese is now Hepburn. As AG202 notes above about "absent a standard orthography", I'm just soliciting that the feds there may have a standard for Ainu, Ryukuan, etc. as well and that standard may be Hepburn also. Sorry if my ignorance introduces noise. :/ —Justin (koavf)TCM 17:53, 11 March 2024 (UTC)[reply]

the government standard in Japan for Japanese is now Hepburn.

Notably, this is for romanization, which is included on various kinds of signage explicitly for foreigners, as part of the country's efforts to court tourism money. This shift to Hepburn has nothing to do with text written in Japanese or other Japonic languages, outside of this very limited context (signs for foreigners). ‑‑ Eiríkr Útlendi │Tala við mig 20:43, 12 March 2024 (UTC)[reply]

Wikimedia Foundation Board of Trustees 2024 Selection[edit]

You can find this message translated into additional languages on Meta-wiki.

Dear all,

This year, the term of 4 (four) Community- and Affiliate-selected Trustees on the Wikimedia Foundation Board of Trustees will come to an end [1]. The Board invites the whole movement to participate in this year’s selection process and vote to fill those seats.

The Elections Committee will oversee this process with support from Foundation staff [2]. The Board Governance Committee created a Board Selection Working Group from Trustees who cannot be candidates in the 2024 community- and affiliate-selected trustee selection process composed of Dariusz Jemielniak, Nataliia Tymkiv, Esra'a Al Shafei, Kathy Collins, and Shani Evenstein Sigalov [3]. The group is tasked with providing Board oversight for the 2024 trustee selection process, and for keeping the Board informed. More details on the roles of the Elections Committee, Board, and staff are here [4].

Here are the key planned dates:

  • May 2024: Call for candidates and call for questions
  • June 2024: Affiliates vote to shortlist 12 candidates (no shortlisting if 15 or less candidates apply) [5]
  • June-August 2024: Campaign period
  • End of August / beginning of September 2024: Two-week community voting period
  • October–November 2024: Background check of selected candidates
  • Board's Meeting in December 2024: New trustees seated

Learn more about the 2024 selection process - including the detailed timeline, the candidacy process, the campaign rules, and the voter eligibility criteria - on this Meta-wiki page, and make your plan.

Election Volunteers

Another way to be involved with the 2024 selection process is to be an Election Volunteer. Election Volunteers are a bridge between the Elections Committee and their respective community. They help ensure their community is represented and mobilize them to vote. Learn more about the program and how to join on this Meta-wiki page.

Best regards,

Dariusz Jemielniak (Governance Committee Chair, Board Selection Working Group)

[1] https://meta.wikimedia.org/wiki/Special:MyLanguage/Wikimedia_Foundation_elections/2021/Results#Elected

[2] https://foundation.wikimedia.org/wiki/Committee:Elections_Committee_Charter

[3] https://foundation.wikimedia.org/wiki/Minutes:2023-08-15#Governance_Committee

[4] https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections_committee/Roles

[5] Even though the ideal number is 12 candidates for 4 open seats, the shortlisting process will be triggered if there are more than 15 candidates because the 1-3 candidates that are removed might feel ostracized and it would be a lot of work for affiliates to carry out the shortlisting process to only eliminate 1-3 candidates from the candidate list.

MPossoupe_(WMF)19:57, 12 March 2024 (UTC)[reply]

Last month, this user added well over a thousand problematic Greenlandic entries over a day or two by scraping a Greenlandic dictionary site and running an unauthorized bot on their account. I blocked them from mainspace and the Reconstruction namespace as an unauthorized bot and asked for help at the Grease pit (see Wiktionary:Grease pit#Hundreds of Incomplete Greenlandic entries need to be cleaned up) on getting them up to Wiktionary standards. The consensus seemed to be that it would be best to just nuke them all, which I have since done, for the most part. Aside from copyvio concerns (compilation copyright, if nothing else), the verbatim inclusion of typos and other irregularities in the headwords showed that the bot run had been prepared with only minimal attention to the content. They have admitted that they don't speak Greenlandic at all (they're editing from Brazil).

The user responded by apologizing on their talk page and by attempting to clean the entries up using an alternate account and as an ip, for which those were blocked by others on grounds of block evasion.

We need to discuss what to do next. While their methods were wrong, their motivation was to add content to the dictionary. They have admitted their mistakes and agreed not to repeat them. I made a point of only blocking them from two namespaces so they could discuss things here and on talk pages. This should not be about punishment for anything they did, but about whether they can be trusted to edit responsibly and add worthwhile content.

Pinging participants in the Grease pit discussion: (@Benwing2, DCDuring, Thadh, Vininn126), and users I've seen editing Greenlandic entries: (@Gamren, Jakeybean, Tesco250). Chuck Entz (talk) 14:57, 13 March 2024 (UTC)[reply]

The only input I can give is on admin decisions - I definitely think we should WT:Assume good faith and discuss with this user and teach them. Unfortunately when it comes to specifically Greenlandic I am very unfamiliar. I do think that they should stick to languages whose text they can at least read and understand (and not just rely on something else). Perhaps this user shouldn't be editing Greenlandic at all. Vininn126 (talk) 15:02, 13 March 2024 (UTC)[reply]
Unfortunately our dictionary does suffer from a severe lack of terms in many languages. However, if we don't have any editors who know the language, there is nothing we can do about that. The best course of action in my opinion would be to simply remove all these contributions, because currently a larger problem we are facing as a dictionary is untrustworthiness, which in turn decreases the number of willing editors in these languages. Better to not have any entries in a language than to have hundreds of questionable quality and validity at best. Thadh (talk) 16:35, 13 March 2024 (UTC)[reply]
Untrustworthiness is probably mostly based on English entries. Maybe we need to start over with a clean sheet of virtual paper. DCDuring (talk) 18:46, 13 March 2024 (UTC)[reply]
@DCDuring: I feel like you are using some kind of tone that isn't being taken over into your writing. What are you saying? That we should re-do our whole dictionary? Also applies to the message below, I'm confused what your opinion is. Thadh (talk) 19:46, 13 March 2024 (UTC)[reply]
I found the argument given spurious. If our problem is that we are thought untrustworthy, I find it hard to believe that the problem can be anywhere other than English entries. If untristworthiness is a reason to delete content, then it is English entries that should be deleted. DCDuring (talk) 20:02, 13 March 2024 (UTC)[reply]
Most of the readers I interact with don't even use the English entries, so I think you're talking about a whole other reader base. If a language's sections don't have any references and half of the time feature an incorrect translation, then this is a disservice to the readers, and we should remove or improve those sections. If you think an English entry does not fulfill our CFI, you should RFV it, too, but mostly our English entries are pretty well-formed and represent the language adequately, as they are proofread by hundreds of native speakers. Not at all the case with our other language sections. Thadh (talk) 20:51, 13 March 2024 (UTC)[reply]
I've barely glanced at English entries except occasionally to make Romance etymologies that bleed into them more consistent. Nicodene (talk) 12:52, 15 March 2024 (UTC)[reply]
I guess that either we don't need no stinking first-draft-level Greenlandic entries from a volunteer or we should be trying to recruit someone (from where?) to add them from scratch. DCDuring (talk) 18:44, 13 March 2024 (UTC)[reply]
Perhaps we could see whether there is some other language's wiktionary that has some good Greenlandic entries. da.wikt? is.wikt? DCDuring (talk) 20:02, 13 March 2024 (UTC)[reply]
@DCDuring Alas, no. da.wikt has ~ 100 Greenlandic entries and they're all extremely basic stubs with a single-word definition and nothing more. is.wikt is even worse, with only 2 basic stub Greenlandic entries. In general, many non-English Wiktionaries are slim pickings; the entries are typically OK only for the native language of the Wiktionary in question and often not even then. For many languages, en.wikt does a far better job than the corresponding language's own Wiktionary. Benwing2 (talk) 07:28, 14 March 2024 (UTC)[reply]
It seems a shame that there are so many resources for Greenlandic available from the Greenland Language Secretariat to evaluate and improve the first-draft/stub entries, but we can't find the linguistic talent motivated to improve the entries. Oh well. DCDuring (talk) 14:30, 14 March 2024 (UTC)[reply]
@DCDuring You are welcome to do the entries yourself. Theknightwho (talk) 00:35, 15 March 2024 (UTC)[reply]
Needless to say, please do consult a grammar (or more) before doing so. Thadh (talk) 09:10, 15 March 2024 (UTC)[reply]
I'll just request Greenlandic translations for the organisms that sometimes live there. Maybe I'll venture to add a Greenlandic entry for them too. DCDuring (talk) 12:48, 15 March 2024 (UTC)[reply]
As I understand the question, it is not about what to do or not do with Greenlandic entries, but whether to lift the blocks. My inclination is to remove the blocks while admonishing the user to constrain their edits in future to their languages of (reasonable) competence. Given the apologies, I feel we currently do not have more reason to distrust this user than any random new user.  --Lambiam 20:58, 20 March 2024 (UTC)[reply]
I haven't had time to review the user's edits in depth, but reading through the user's talk page and this discussion, I am inclined to agree: as far as the user/block is concerned, we can unblock and see whether subsequent edits are good or not. (If no other admin wants to beat me to that, and if no-one has objections, then someone can ping me in a few days and I'll unblock them.) - -sche (discuss) 18:07, 23 March 2024 (UTC)[reply]
Yes, this is fine with me too. When I have blocked users for long periods or indefinitely, it's generally because the user (a) thinks they know what they're doing but doesn't, (b) continues adding trash despite multiple warnings, and (c) doesn't respond to those warnings (or responds defensively and denies there's an issue). It's a good sign if a user apologizes and promises to change their behavior (although some users do that and then continue the same pattern of adding trash, so they need to be monitored). Benwing2 (talk) 18:26, 23 March 2024 (UTC)[reply]

Hoping to convene on practice regarding natural overlap of hyponyms and derived terms[edit]

There are many nouns for which a population of hyponyms and a population of derived terms will quite naturally have a substantial overlap. For example, in English, the noun list has laundry list, punch list, and dozens more. The theme is quite generalizable. What I propose here is to codify the principle that it is not wrong to show such terms both in a hyponyms section and in a derived terms section. Earlier I had avoided doing so because I anticipated that otherwise someone else might complain that having the same term twice on the page was "clutter". But there are some good reasons, regarding w:structured data, why allowing for the natural degree of double-posting is a good idea. Does anyone strenuously object to doing so? Note that column wrappers can be used, so there is no excuse for a section with lots of content not to auto-collapse. Thus, the user will not be presented with a giant unfolded list. Thanks. Quercus solaris (talk) 17:34, 13 March 2024 (UTC)[reply]

Definitely not wrong, actually encouraged inmo. Thadh (talk) 17:37, 13 March 2024 (UTC)[reply]
I support this. "Laundry list" is both a derived term from "list" as well as a hyponym of "list". CitationsFreak (talk) 18:38, 13 March 2024 (UTC)[reply]
It is pure clutter on taxonomic name entries. I have limited derived terms to items that are not hyponyms, ie, accepted species names are hyponyms, not derived terms. No-longer-accepted species names that have been placed in other genera may appear as derived terms. The rule-driven, mechanical nature of the derivation of almost all tribe and family names and many order names makes their inclusion in any derived terms lists often of similarly low value, insufficient to warrant the clutter. OTOH, if we really want this duplication, there is nothing to prevent a bot from doing the job thoroughly. DCDuring (talk) 20:14, 13 March 2024 (UTC)[reply]
User:Sae1962 had a frequent habit of adding extremely basic hypernyms (i.e. going the other way), so he would take something like Hypertext Markup Language and add a hypernym of language. I consider this fairly unhelpful to human readers, though technically correct: it reminds me of reading Java or .NET programming documentation where you have a huge list of derived or inherited classes going all the way back to object, because everything is an object in the end. Equinox 20:20, 13 March 2024 (UTC)[reply]
i thought the Derived terms section was only for terms that could not fit under Hyponyms or some other section (though I suppose it would nearly always be Hyponyms). I think there are at least some pages that have an HTML comment in the Derived terms section warning users not to add terms that could be put into a hyponyms section or some other section. But I didnt bookmark anything and I dont see it on the policy page. Soap 20:53, 13 March 2024 (UTC)[reply]
Great points everyone. Thanks. Perhaps not a firm rule to be codified now. Wiktionary could impose firm consistency retroactively, later, if it ever feels the need. So where I'll leave it is that I'll respect and obey any existing setups that don't double-post (such as taxonomy, or entries with comments discouraging it). And I'll follow the principle that whichever method is used, just make sure that auto-collapse is keeping everything nice and orderly. Quercus solaris (talk) 16:08, 14 March 2024 (UTC)[reply]

Renaming "etymology-only language"[edit]

@Theknightwho, -sche I think the time has come to rename the term "etymology-only language" to something else. This term is cumbersome, and while it was accurate originally when the codes in question could be used only in etymology templates, it's long outgrown that particular use case. I would propose one of "dialect", "subvariety" or "sublect". "Dialect" is the most straightforward and arguably is exactly what these varieties are in most cases, but it's a bit of a loaded term given the longstanding language-vs-dialect controversy that happens with many language varieties. Thoughts? Benwing2 (talk) 04:57, 14 March 2024 (UTC)[reply]

I'll just add we treat Middle Polish as such, and I'm not sure dialect would be the best term for it. Unless we accept "dialect" to mean "any variant of"... Vininn126 (talk) 07:21, 14 March 2024 (UTC)[reply]
@Vininn126 Good point. That is why I suggested "subvariant" and "sublect". "Variant" on its own could sort of work but it feels too vague without some other qualifier, since "variant" and "lect", at least in some contexts, are generic terms covering any type of language. Benwing2 (talk) 07:25, 14 March 2024 (UTC)[reply]
@Benwing2, Vininn126: May I suggest merolect? The word sees no established usage, so we can thereby avoid any undesirable connotations, and with the etymological sense of "part-language", it means exactly what we want it to mean. 0DF (talk) 14:33, 14 March 2024 (UTC)[reply]
Variant is succinct, and covers all the different kinds of etym-only language: dialects, chronolects, regional varieties, written standards etc. Theknightwho (talk) 14:37, 14 March 2024 (UTC)[reply]
@Benwing2, at el.wikt we mark them as 'sublang' = subordinate languages. The weird thing here, is that they can be donors but not receivers. How is this possible? The 'subordinate' or 'hosted' languages/varieties/dialects/whatever have Cat:Terms derived from this.sublang (donor to other languages) Cat:Sublang terms derived from X.languagage (as receivers) e.g. MedLat alchemia at wikt:el:alchemia has a Cat:Med.Lat terms borrowed from arabic. ‑‑Sarri.greek  I 15:03, 14 March 2024 (UTC)[reply]
@Sarri.greek I'm not keen on this; I'm not sure about Greek, but in English the term "subordinate" implies a lesser status, which is likely to put some contributors off. Theknightwho (talk) 16:23, 14 March 2024 (UTC)[reply]
I meant, M @Theknightwho, that they are marked at module sublang=true. If the question is about the 'name' of all of them, it doesn't matter. But, how could this title convey that these languages are not allowed what the others are? They are code-only languages with only existence, in the template {{m}} and being a donor but never a receiver at etym.templates. My big surprise, worry, and question is: why are they not receivers?? Probably this is not the place to ask this. I just bring it up because it is relevant, and because I do not intend to open such a subject myself. ‑‑Sarri.greek  I 16:41, 14 March 2024 (UTC)[reply]
@Sarri.greek We do often include them in descendant sections. Also, the elephant in the room is Chinese, which we already subdivide for this purpose already; we simply group them all under one header. Theknightwho (talk) 16:44, 14 March 2024 (UTC)[reply]
Please, please, Sir, think about it! @Theknightwho, Benwing2 why etymologies should be inaccurate? Medieval Latin alchemia and similar LaMed words, give the Cat:Latin terms derived from Arabic, which should include only a subcategory: Medieval Latin terms derived from Arabic. cf @el.wikt.Cat.Lat.from.Ar has only this subcat. The etymologies of descendants, should say 'from Med.lat' not 'from Lat'? because it is a medieval word. ‑‑Sarri.greek  I 16:58, 14 March 2024 (UTC)[reply]
@Sarri.greek This is an orthogonal point. I think you're asking for allowing etym-only languages in the |1= param of etym templates and categorize under e.g. CAT:Medieval Latin terms derived from Arabic in addition to CAT:Latin terms derived from Arabic. We don't currently do this but the trend is towards allowing etym-only languages in more places (hence this renaming discussion), so potentially we could allow this. IMO though this should be a separate discussion from what we should rename "etym-only language" to. Benwing2 (talk) 20:45, 14 March 2024 (UTC)[reply]
Agreed. Vininn126 (talk) 20:47, 14 March 2024 (UTC)[reply]
May I suggest variety? It is the most neutral commonly-used term that comes to mind. ‘A variety of Spanish’ brings up some seven million hits on Google. Nicodene (talk) 15:41, 14 March 2024 (UTC)[reply]
Yeah, this is probably a better suggestion than "variant", and is a widely-used term. Theknightwho (talk) 16:20, 14 March 2024 (UTC)[reply]
So far variety is my top option. I understand the logic of merolect but I think we should avoid obtusisms if possible. Vininn126 (talk) 16:23, 14 March 2024 (UTC)[reply]
@Nicodene, Theknightwho, Vininn126: I'm also happy with variety. 0DF (talk) 17:47, 14 March 2024 (UTC)[reply]
"Dialect" isn't ideal, because we already have dialectal data modules. Likewise, "variety" isn't ideal, because language data has a field for "varieties" that is just a list of names (see e.g. Category:English language). — SURJECTION / T / C / L / 18:41, 14 March 2024 (UTC)[reply]
@Surjection I’d argue the opposite: the two listed for English are both lects which would benefit from having a code of this type, so naming them “variety codes” make total sense. Theknightwho (talk) 19:45, 14 March 2024 (UTC)[reply]
Can you say the same of every "variety" currently specified for every language? — SURJECTION / T / C / L / 20:26, 14 March 2024 (UTC)[reply]
Do you have an alternative suggestion? Vininn126 (talk) 20:27, 14 March 2024 (UTC)[reply]
My point is that we should avoid adopting terminology that is already used for something else. It's only going to make everything more confusing than it is. — SURJECTION / T / C / L / 20:42, 14 March 2024 (UTC)[reply]
I don't think this is something else, though - the whole point of the varieties field is to list more specific types of the main language, which is precisely what these codes are for. I've not been able to find a counter-example to that yet, since alternative names for the language itself should go under the "aliases" field instead. Theknightwho (talk) 21:24, 14 March 2024 (UTC)[reply]
@Surjection I think we shouldn't worry about existing internal names. The "dialectal data modules" are probably going away in any case (see my post in the Grease pit) and we can rename the language data field. Benwing2 (talk) 20:40, 14 March 2024 (UTC)[reply]
BTW since most people seem to support the term "variety", maybe we can call the internal data field "variant" or "lect". Benwing2 (talk) 20:41, 14 March 2024 (UTC)[reply]
Sure, renaming that field is another option, and then we can call etymology-only language "varieties". — SURJECTION / T / C / L / 20:42, 14 March 2024 (UTC)[reply]
@Surjection Can you provide an example of something which should go under the "varieties" field in the language data which shouldn't ever have an etymology-only code? Theknightwho (talk) 21:25, 14 March 2024 (UTC)[reply]
I'm not saying I know of any such cases. What I am saying is that nobody knows, until the work to check them is put in, that all of the currently registered "varieties" could reasonably have their own codes. — SURJECTION / T / C / L / 21:31, 14 March 2024 (UTC)[reply]
Alright - I'll do that. Theknightwho (talk) 21:56, 14 March 2024 (UTC)[reply]
Just FYI I suspect that everything that qualifies as a "variety" under the variety field can reasonably have an etym-only code. We have current etym-only codes for conventional dialects (regional lects/topolects), chronolects (e.g. Early Modern English), registers/sociolects (e.g. Katharevousa), cants (e.g. Polari), even writing systems (e.g. Wade-Giles). It might be useful to set up the ability to categorize etym-only varieties by the type of lect involved; currently this info is found only in the associated category and only at the level of regional lect vs. everything else. Also, if we get serious about adding etym-only codes for all varieties, we might want to split the data into submodules the way we currently do for full languages. Benwing2 (talk) 22:32, 14 March 2024 (UTC)[reply]
If we're adding etym-only codes for all varieties, do we still need a "varieties" field in Module:languages, or is it just redundant to Module:etymology languages/data and its "parent"/"3" field? (I think the/an original reason varieties were listed in Module:languages is so people searching the module for e.g. Twi would find what code covered it; we might want to retain "varieties" that have ISO codes but let Module:etymology languages/data handle all the other ones...?) - -sche (discuss) 23:10, 14 March 2024 (UTC)[reply]
My (half-serious) suggestion last time this came up was "subsumed variety", since that seems to be the distinguishing characteristic (?), that these are codes that are subsumed under other codes. There are some edge cases like substrates which none of the proposed names fit, e.g. the pre-Roman substrate of the Balkans—or as it was recently (non-consensusly?) renamed, Paleo-Balkan—is not really a "dialect" or "subvariety" or "subsumed variety" of anything, it's "one or more unknown languages from place X". I agree with Surjection we shouldn't be using the same name for two nonidentical things, so if we call these "varieties", we should consider whether to rename or retire the "varieties" field in Module:languages as discussed above.
BTW, re "dialect", another issue with that term is: are things like "Classical Latin" and "Late Latin" "dialects", per se? (If they are, we need to update the entry dialect.) - -sche (discuss) 23:10, 14 March 2024 (UTC)[reply]
Anyway, I think "variety" is fine as long as we decide what to do with the "varieties" field in Module:languages. (In particular, if we're giving every one of Module:language's "varieties" a code, do we still need the "varieties" field? Maybe we just make sure the "search for a language" thing on Module languages also searches through the list of variety codes, and that solves the issue that "varieties" were IIRC initially added to Module:languages to deal with, which was if someone wondered what code to enter e.g. "Twi" under.) - -sche (discuss) 18:11, 23 March 2024 (UTC)[reply]
@-sche I think for the moment we will need to have the "varieties" field because a lot of the info in that field isn't well-vetted, meaning it will take work to convert the info in that field into proper etym-only languages. Since the current practice is to not put varieties in the "varieties" field that also exist as etym-only languages, I think we should rename the field "other lects" (or "other variants" or even just "other varieties"). As for the search box on Module:languages, it looks like it already does what you want it to do; e.g. if I search for "Twi", the first entry that comes up is in Module:etymology languages/data, which is the module holding etym-only languages (aka variety codes). Benwing2 (talk) 18:21, 23 March 2024 (UTC)[reply]

Proposal[edit]

@Theknightwho, -sche, Vininn126, Surjection, Nicodene, 0DF, Sarri.greek Pinging the people who took part in the above discussion. I'd like to formally propose renaming "etym-only language" to "language variety" and rename the "varieties" field in Module:languages to "lects".

Note also: As per my recent discussion Wiktionary:Grease pit/2024/March#merging lect info, I'd like for the "varieties"/"lects" field to go away in favor of consolidated info somewhere, probably in the labels data modules (which is where I've put such information for Chinese, see Module:labels/data/lang/zh). Then the lects can be pulled out of the label data by looking for labels with the parent field (which indicates they are lects in a tree of such lects). But whether you like or dislike this approach and/or would prefer a different one is orthogonal to the above proposal, which is only about renaming terminology that has outlived its usefulness.

Implementation: This should not be terribly hard as AFAIK we don't have any templates that specifically reference etymology-only languages using that name. We'd just need to rename Module:etymology languages and associated data modules to Module:language varieties, change any references to those modules in other code (which shouldn't be that many since most of them go through Module:languages anyway), and update documentation. We should also consider (at the same time or later) renaming the methods getNonEtymological, getNonEtymologicalName and getNonEtymologicalCode to getFullLanguage, getFullLanguageName and getFullLanguageCode. This can be done by renaming the methods in Module:languages while keeping the old ones as aliases until all callers are updated. Note that this is all purely internal and won't affect any mainspace Wikicode. Finally, we need to rename the "varieties" field in the languages extra data to "lects"; again this is all internal, and hardly anyone references or uses this data so it won't be very much effort.

Please indicate support, abstain, oppose, etc. below so that we have clear consensus for doing this.

Benwing2 (talk) 05:36, 2 April 2024 (UTC)[reply]

Support what you want since it is internal anyway and you needed to do something about redundant data fields. Fay Freak (talk) 05:43, 2 April 2024 (UTC)[reply]
Support Vininn126 (talk) 05:52, 2 April 2024 (UTC)[reply]
SupportSURJECTION / T / C / L / 06:02, 2 April 2024 (UTC)[reply]
Support 0DF (talk) 06:46, 2 April 2024 (UTC)[reply]
Support, support! ‑‑Sarri.greek  I 10:13, 2 April 2024 (UTC)[reply]
@Theknightwho, -sche, Vininn126, Surjection, Nicodene, 0DF, Sarri.greek Sorry for the second ping. I went to implement the first part of this (`varieties` field -> lects) and I realize there's a small issue, which is that there is currently a `varieties` field for all three of languages, families and scripts, and "lects" is only applicable to languages. We could keep `varieties` for families and scripts except there are also "family varieties" renamed from etymology-only families (just Old and Middle Iranian languages). There are two other possibilities I can think of, which are "variant" and "subvariety". I think subvariety might be confusing in that people would think there's something inherently "lesser" about the subvarieties listed in the extra data, so I propose "variant". Benwing2 (talk) 19:43, 5 April 2024 (UTC)[reply]
@Benwing2: It's not as good, but since it's all internal, it doesn't matter much, so I'm also OK with variant. 0DF (talk) 20:25, 5 April 2024 (UTC)[reply]
Not to get off-topic, but given the way we use "Middle Iranian"—that we actually reconstruct terms in it, "Middle Iranian *foobar"—it does not make sense to me that we're calling it a family code. I seem to recall it was just one user who insisted we mustn't internally put it in the lect module? But maybe we just take a !vote and see whether people agree with that, if treating it as a family is also complicating other things. If it's not a lect, then I'm not sure it makes sense to be reconstructing terms in it; if we're reconstructing terms in it, then we're ipso facto treating it as a lect, similar to a reconstructed language (like how some people think Proto-Indo-European may have been more of a dialect continuum than a unitary lect, but we still treat it as a lect for our purposes). - -sche (discuss) 22:59, 5 April 2024 (UTC)[reply]
@-sche Pinging User:Vahagn Petrosyan who I think is the one who mostly uses it, and User:Theknightwho who may have opinions. IMO neither "Middle Iranian languages" nor "Old Iranian languages" should exist at all but it seems that Armenian specialists like to reconstruct terms in "unspecified Middle Iranian" and "unspecified Old Iranian". Maybe these should be treated as etym-only languages (aka language varieties) whose parent is a family (which is allowed), and called "unspecified Middle Iranian" etc. since that's what they are. I should note though that currently we have the variety field filled out in various places for all three of languages, families and scripts. If we eliminate "family varieties" "Middle Iranian" and "Old Iranian", we could keep the variety as variety for families and scripts and use lect for languages, although that might be slightly confusing; or we could rename "etym-only languages" to "lects" instead of "varieties". I dunno. Benwing2 (talk) 23:28, 5 April 2024 (UTC)[reply]
As I said before, I can stop using Middle Iranian and Old Iranian. Simply "Iranian" is good enough for me. Vahag (talk) 11:01, 7 April 2024 (UTC)[reply]
I can also, since you have a technical background providing a reason for it, basically simplification, as the sole reason I use etym-only-languages “Middle Iranian” and “Old Iranian” is periodization, me informing the reader that I have an idea whether the Greek term (for example) was borrowed 0–700 CE or 700–0 BCE. One can see internal differences between Old and Middle Iranian but they are only rough distributions. Inb4 Victar is annoyed due to witnessing a change he has not consulted about, in spite of seeing this discussion header – techbro imperialists removing languages, d'oh! (But really changing the internal relations of preserved langcodes does not change the dictionary content, so one can be bold about recoding them, as you do nothing without affording most diligent advice.) Fay Freak (talk) 12:33, 7 April 2024 (UTC)[reply]
Support variant even better! thank you for your hard work! ‑‑Sarri.greek  I 20:14, 5 April 2024 (UTC) PS The actual situation of it, is 'hosted' or 'subordinate to' = We find this language hosted Under the title 'XX language' Regardless of the reason (etymological, regional, or other). ‑‑Sarri.greek  I 21:14, 5 April 2024 (UTC)[reply]

Wiktionary really needs structured etymology[edit]

I've become convinced that Wiktionary's current etymology system, in which each entry contains the complete ancestry of a term, is creating massive problems that prevent Wiktionary from being a good etymological dictionary. Here are the problems:

  • Massive duplication: consider English puny and its earlier form puisne. We repeat the exact same information in different entries, blatantly violating the DRY principle. That's just two entries: the etymology of a widely-borrowed term like the ancestor of English sugar has to be duplicated across hundreds of languages. Often, editors don't bother and just write something like "see term#English for more details".
  • Entries falling out of sync: English nexus claims to derive from Proto-Indo-European *gned- or *gnod- through Latin necto. But necto was recently revised, claiming that its origin is "uncertain". Which entry is a reader meant to trust? This kind of inconsistency is actually encouraged by the current system, because after changing editing the etymology of a term, an editor has to hunt down and correct every single place where that etymology is referenced or copied. More often than not, they don't, and the result is that entries can drift out of sync and sometimes even contradict each other.
  • Redundant edits: editors spend large amounts of time expanding "derived terms" and "descendants" sections which is necessary only because of limitations in the current system. Because if we know that A is an ancestor of B, there's no point in also writing that B is a descendant of A—that's clearly implied. But we have to anyway, since there's no automated system that can make that logical step.

Structured etymologies would also let us do cool things, like create etymological trees and automatically find cognates and doublets across different languages.

Here is a simple model for creating structured etymologies:

  • Each etymology section of an entry needs to be associated with one or more etymons. An etymon is a term which is the ancestor of another with no intermediate steps. Thus, the etymon of puny is puisne, the etymon of puisne is Anglo-Norman puisné, and so on.
  • An entry can have more than one etymon. For example, English arrangement can be said to derive from English arrange, English -ment, and French arrangement.
  • There are different kinds of etymons: English fullwidth clearly derives from full +‎ width, but is also calqued from Japanese 全角 (zenkaku). The first two are morphological etymons, while the last is a semantic etymon. Another example is bullroar (sense 3) which is morphologically from bull +‎ roar but semantically from bullshit.
  • An etymon can also have a degree of certainty: the levels might be "certain", "likely", and "unlikely". This is an improvement from the current system, where something is either {{derived}} or it isn't. Sometimes, when editors aren't confident, they add |nocat=1, but this isn't a standardized practice.

Thus, to create a list of derived terms or descendants, all you would need to do is get a list of entries which have a particular term as an etymon.

The main problem to consider is how all this can be accomplished. Here are the possibilities:

  1. Use Lua data modules, which are well-established on this week but are fairly unintuitive for new users and might cause performance issues.
  2. Get an extension like Wikibase, which is already used on Wikidata. To be clear, I'm not saying we should turn Wiktionary into Wikidata, but rather use a Wikidata-like structure for this application. The drawback is that this would require WMF developers to get something done (not their strong suit).
  3. Use bots. A bot can essentially function as a parser which converts high-level information into wikitext. This kind of thing is already being done with our {{anagrams}} system. This is the technically simplest solution, but would require someone to continually run a bot.
  4. Do nothing and keep writing etymologies manually. This is easier for now, but probably not a great long-term solution.

Another problem to consider is how this structured information should be presented to the reader. But all in all, I'm curious as to what the community thinks should be done. Ioaxxere (talk) 23:18, 14 March 2024 (UTC)[reply]

I agree that the current way etymologies are handled is problematic. If there is some technical solution to that, it would be great; I don't understand that side of things so am not sure what can be done.--Urszag (talk) 00:17, 15 March 2024 (UTC)[reply]
@Ioaxxere IMO none of your proposed solutions is workable. I would rather suggest a scraping solution. This is what we do for Descendants, for example, and it seems to work fairly well. (Your proposed solution #1 was tried for Descendants prior to implementing scraping, and failed, which led to the scraping solution.) This should not be too hard, but it might require the introduction of a few more templates to more clearly spell out the relations between etyms. Not sure. Benwing2 (talk) 00:46, 15 March 2024 (UTC)[reply]
@Benwing2 By scraping, do you mean creating an etymology equivalent of {{desctree}}? I think the main limitation of this is that you can only go in one direction (i.e. you wouldn't be able use the etymology data to get a list of descendants). But I would definitely consider that an improvement over the current situation.
Actually: I thought of a way to resolve this, by using the category system. We already have categories like Category:English terms suffixed with -en. If we created categories for every single "terms descended from LEMMA" (there would be millions) this could theoretically be used to encode a tree. What do you think? Ioaxxere (talk) 02:15, 15 March 2024 (UTC)[reply]
@Ioaxxere Yes, something like {{desctree}}. It's true this wouldn't easily let you go from lists of descendants to ancestors and vice-versa, but (a) it would solve the other issues, (b) it's not clear in any case you would want to automate things in both directions in all situations; there are lots of complex cases involving etymologies that can't be neatly categorized and need descriptive text, and the Descendants lists and Etymology sections are conceptually different in their current implementations. Since the etymology sections are less structured than Descendants sections, some thought would have to go both into the conventions needed in Etymology sections so that the scraping result looks reasonable, and into how to implement the scraping itself and handle the various edge cases. Not something I have time to work on now but I agree it would be a good idea in the longer run and avoid lots of duplication and the inevitable bit rot associated with this (maybe there's a better term than "bit rot" to describe the inevitability of things getting out of sync when you have duplication). Benwing2 (talk) 03:28, 15 March 2024 (UTC)[reply]
I wholeheartedly agree but think you may be underestimating the sheer scale of the undertaking. Even with bot help it will take a lot of hands-on work by knowledgeable editors. My instinct is to pare this down to the basic core: set etymologies to point only one lemma higher and add the kind of 'scraper' currently being discussed. (And then call in the clean-up crew for a massive spectrum of languages...) As someone who edits mainly ety and desc sections, this alone, if feasible, would clear up 80% of my headaches. Nicodene (talk) 05:39, 15 March 2024 (UTC)[reply]
I don't disagree with 'make etymologies only point to the next level up' as a goal, but the issues that have come up when that's been discussed before, which you are probably aware of but which I want to make sure are mentioned here now that the idea itself has been mentioned, include (1) what if the next level up doesn't have an entry? e.g. I just added an etymology to chabot, but neither the Occitan chabotz nor the Latin capoceus exists to house the information that they ultimately seem to come from caput. (maybe in that case we add the full ety to chabot but also add a template that categorizes the entry as needing Occitan and Latin editors to help by creating those entries and moving the information thither?) (2) someone who's only interested in e.g. the etymology of English or French words now has to watchlist Occitan and Latin (etc) pages to see if that etymology gets changed; in some cases where minor languages see vandalism, this would make it less likely to be spotted (e.g. people try to add all kinds of weirdness to Kamboja, and if they were adding it to some less-watched Indian language instead, it'd get noticed less). (Also 3: if I want to know how many Old English words survive in English, and our English entries only point to Middle English, I'm stymied... but that one we could solve if the scraper/bot mass-adds that template that people currently use to add categories to etymologies.) - -sche (discuss) 06:39, 15 March 2024 (UTC)[reply]
For 1) I meant the next entry up which exists, and if it's a dead end, the etymology is left as-is. The proposed change wouldn't affect the chabot situation one way or another. For 2) I don't have an answer. Is vandalism that bad of a problem? Maybe I've just not noticed since I'm not the one dealing with it. 3) I wouldn't do this without including some kind of automatic categorization that runs through the etymology chain, or maybe regular bot sweeps that add/fix dercat. Not that I know how feasible that is, now that you mention it. Nicodene (talk) 09:44, 15 March 2024 (UTC)[reply]
For point 2 with Wikibase, the model is not Wikidata but c:Commons:Structured data. It's a new tab with data that bots can populate from key templates. Lua can access it recursively to create new and more powerful templates. Other tools like SPARQL can query the data. It's all about how to model metadata. Vriullop (talk) 09:33, 15 March 2024 (UTC)[reply]
I generally support this idea but I have no strong opinions on what the exact solution should be. I like being able to point to a specific etymon to generate structure from there, however that structure looks. Vininn126 (talk) 10:11, 15 March 2024 (UTC)[reply]
I've been thinking about this, and I'm starting to feel that a category system is the only sensible way to implement this. Here's how it would work:
  1. Start at an entry (say biology)
  2. Add an etymon template to the top of the etymology section. It might be formatted like this: {{etymons|en|id=life science|Biologie#German: biology|bio-#English: life|-logy#English: study}}. The |id= parameter defines the {{etymid}}, while the subsequent parameters link to etymons by their etymids.
  3. The {{etymons}} template adds the category Category:ety:biology (English: life science), which represents a node in the etymology tree (although the naming scheme isn't final).
  4. A bot creates the category with {{auto cat}}.
  5. {{auto cat}} scrapes the page biology and discovers the {{etymons}} template. Using this information, it adds Category:ety:biology (English: life science) into the categories Category:ety:Biologie (German: biology), Category:ety:bio- (English: life), and Category:ety:-logy (English: study).
Now, getting the descendants or derived terms of biology is as simple as seeing what entries are in Category:ety:biology (English: life science). There might be subpages, like Category:ety:biology (English: life science)/uncertain or Category:ety:biology (English: life science)/semantic, to include cases I discussed in my original post. But overall, the concept is essentially {{prefixsee}} or {{suffixsee}}, just for every term. @Benwing2, would you support implementing this template? It could coexist in parallel with the current system for now until we figure out a way forward.
To answer a few others:
  • @Nicodene: Yes! Having etymologies point only one lemma higher is the entire purpose of this proposal. Because if we have a chain A -> B -> C, there's no reason why C needs to "know" that it comes from A. It's implicit. The problem is that editors spend lots of time writing out the entire chain on every entry, and this is done in an inconsistent way. But there's no rush to implement this on a massive scale right away. As stated above, we should have this coexist with the current system.
  • @-sche: Those are honestly good questions. In the case of A -> B -> C, if B doesn't exist, it might be reasonable to create it as a "dummy entry" with an etymology section and nothing else. Another possibility is to just link A -> C. For the second point, I don't think we should be designing our systems with the expectation of vandalism. But yes, an editor would have to watch a variety of pages to follow an entire etymological chain. However, someone who's really only interested in English etymology wouldn't care about, say, which PIE root an entry comes from, because that's not English etymology. In the case of French chabot, we might do {{etymons|fr|id=fish|caput#Latin: head|q1=uncertain}}, which would add it into Category:ety:caput (Latin: head)/uncertain.
  • @Benwing2: I think the term you're looking for might be entropy.
Ioaxxere (talk) 14:43, 15 March 2024 (UTC)[reply]
@Ioaxxere I think we need a solution that can work with existing etymologies. That probably means accessing a chain of data if possible from a single entry, and if that doesn't work, falling back to accessing from multiple entries in a chain. I don't think having an entirely new system in place in addition to the old system will work very well. Benwing2 (talk) 19:32, 15 March 2024 (UTC)[reply]
@Benwing2: What you're asking for seems impossible. The current system is ambiguous in that we link to an entry without specifying its etymid, meaning that "going up the chain" is rarely possible to do in an automated way. If your plan then involves specifying etymids for every etymology section, then we might as well just overhaul everything, because it's the same amount of work anyway. Ioaxxere (talk) 20:53, 15 March 2024 (UTC)[reply]
@Ioaxxere I think we'd have to examine some actual use cases before deciding it's impossible. In many cases there's only one Etymology section, for example. Benwing2 (talk) 21:00, 15 March 2024 (UTC)[reply]
@Benwing: the problem with this heuristic is that a) there's no guarantee that that the single-etymology entry is actually the correct one (maybe the actual ancestor hasn’t been added yet) and b) could break unpredictably if someone adds a new etymology section later. The basic use case, in my view, is to replace our current "from X, from Y, from Z" with just "from X" and have the rest be automatically filled in. Those on the Discord will have seen my struggles in trying to do just this. Ioaxxere (talk) 21:29, 15 March 2024 (UTC)[reply]
@Ioaxxere Sure but realistically I don't think trying to implement a completely new system will work. We need to find a solution that leverages what's already there. Benwing2 (talk) 21:35, 15 March 2024 (UTC)[reply]
@Benwing2: We can leverage our current data by using bots to convert etymology sections into a structured format, but this can only be done in situations where we are certain that errors won't be propagated. For example: if A is listed an ancestor of B#Etymology_2, and we find B listed as a derived term or descendant at A#Etymology_2, we can fairly confidently connect A#Etymology_2 and B#Etymology_2. I have implemented this heuristic in my own script and it works very well. Ioaxxere (talk) 03:26, 16 March 2024 (UTC)[reply]
@Ioaxxere Just FYI, before you set off to radically restructure etymologies, you need to (a) get consensus, (b) keep in mind what will be workable for the typical editor; ideally the system should be as little different as possible from what we have already. It's also better to do this stuff dynamically through scraping if at all possible, vs. requiring a bot to run periodically. Benwing2 (talk) 04:16, 16 March 2024 (UTC)[reply]
@Benwing2 It seems as though there's consensus for a change of some kind, but no agreement as to how it should be implemented. And that's something I'm also still thinking about... Ioaxxere (talk) 07:32, 17 March 2024 (UTC)[reply]
Ben's idea on something like {{desctree}} would imply that {{bor}}/{{inh}} would be able to check the pointed-at etymon and print information from there, and potentially go several pages back. If such a system were implemented, I think {{af}} should obviously be excluded, imagine printing the information for all the morphemes! It also wouldn't work for redlink pages just the same as {{desctree}}. Vininn126 (talk) 07:42, 17 March 2024 (UTC)[reply]
But, as mentioned, every word would need etymid's and etymology sections, of course... Vininn126 (talk) 07:48, 17 March 2024 (UTC)[reply]
@Benwing2, Vininn126 I created a mockup of my concept at User:Ioaxxere/under. I created a module, Module:User:Ioaxxere/etymon, which can recursively go backwards through various entries to build a chain of etymons (no categories are involved). Currently, it can only handle "From X, from Y"-type etymologies, so we'll need more complex parameters to represent stuff like {{af}}. Please let me know what you think! Ioaxxere (talk) 05:15, 18 March 2024 (UTC)[reply]
@Ioaxxere I took a brief look. I really don't want to be a party pooper but my sense is the scraping needs to be a lot more sophisticated and more able to work with existing entries (I feel I've said this before). Manual or even manual-assisted conversion of large numbers of entries to a new system/template just won't be happening any time soon. The reason {{desctree}} works is that it works with existing entries without requiring everything to be converted to a new format (and to the extent things have been converted, like when I changed {{desc}} to accept multiple terms, it's been in a completely automated fashion). Benwing2 (talk) 05:22, 18 March 2024 (UTC)[reply]
@Benwing2 Take the example of father. Let's say I want to the etymology to by synced up with its etymon, Middle English fader. But wait, do we want fader (Etymology 1) or fader (Etymology 2)? A human would obviously realize that the correct section is etymology 1. An automated scraping template could easily figure this out as well if we added heuristics like "Etymology 1 is a lot longer" and "Etymology 1 is on top" and "Etymology 1 links to father in its descendants section" and "Etymology 1 and father list the same ancestors" and "Etymology 1 is defined as father". The problem is that these heuristics can get arbitrarily complex and break in unpredictable ways. That's why we should be working towards using etymids.
Also, I have a question about {{desctree}}: how would I get it to scrape bar#Descendants_2? The entry doesn't have etymids, so this doesn't seem to be possible.

Manual or even manual-assisted conversion of large numbers of entries to a new system/template just won't be happening any time soon.

Would you be opposed to trying out a new system on a few entries, such as father and its five ancestors? Like {{desctree}}, this would have no effect on any other entry. Ioaxxere (talk) 06:11, 18 March 2024 (UTC)[reply]
@Ioaxxere: I see a number of possible problems. Can you assure me that they aren't?
1. Intermediate steps may be unattested.
2. Uncertainty as to the borrowing route. The OED has this problem with terms that may come from French or some form of Latin, and Thai has many words for which the ultimate source is Pali or Sanskrit, and indeed some which are blends of the two. A further problem is that many of these words were probably (but not certainly) borrowed via Old Khmer, where the spelling is chaotic. The path of mainland SE Asian loans from Pali or Sanskrit may be very uncertain.
3. Clusters of 'obvious' cognates, but for which there is no authoritative proto-form. Tai languages often show this.
4. Word A is derived in language X, and word B in language Y may be inherited from A in X or independently formed in Y. RichardW57m (talk) 16:51, 18 March 2024 (UTC)[reply]
@RichardW57m Here are my proposed resolutions.
1. User:-sche highlighted this issue with entries like French chabot, which a reference suggests derives from Latin caput through Vulgar Latin *capoceus (which is unlikely to ever be created). This could be written as: {{etymon|id=fish|der|unc|la>caput>head|text=perhaps from <1> (via Occitan, from unattested {{m+|VL.|*capoceus}})}}.
In natural language, this represents: French chabot (etymid: fish) may be derived from Latin caput (etymid: head), but this is uncertain. Also, the entry should display the text "perhaps from Latin caput (via Occitan, from unattested Vulgar Latin *capoceus)".
The template would also be able to automatically fetch the ancestors of Latin caput (etymid: head), although we probably don't want that in this case.
-- Ioaxxere (talk) 18:32, 18 March 2024 (UTC)[reply]
2. One example of this is in English crusado, which is partially borrowed from Spanish cruzado as well as Portuguese cruzado. This could be written as: {{etymon|id=crusader|bor|es>cruzado>cross|pt>cruzado>cross}}
In natural language, this represents: English crusado (etymid: crusader) is borrowed from either/both Spanish cruzado (etymid: cross) and Portuguese cruzado (etymid: cross). The template would automatically generate the text "Borrowed from Spanish cruzado and/or Portuguese cruzado." in the entry (the |text= parameter could be used to change the display text).
-- Ioaxxere (talk) 18:32, 18 March 2024 (UTC)[reply]
@Ioaxxere: What I particularly had in mind is cases where one possible source is 'inherited' from the other. (In the case of Pali and 'Sanskrit', this requires that we write in Wiktionarian.) This example doesn't address that issue. RichardW57m (talk) 12:46, 19 March 2024 (UTC)[reply]
@RichardW57m: it would be helpful if you told me the specific case you have in mind. However, in the example I gave it doesn't matter at all how the Spanish and Portuguese terms are connected. Ioaxxere (talk) 15:30, 19 March 2024 (UTC)[reply]
@Ioaxxere: I was having trouble finding a clean example. I think our etymologisers have wrongly assumed that a Sanskrit-based spelling implies borrowing from Sanskrit, rather than reshaping. A clean example is พันธุ์ (pan). --RichardW57m (talk) 16:48, 19 March 2024 (UTC)[reply]
@RichardW57m: Thank you for the example! Before I answer, I need to get some clarity on the meaning of "or" in that etymology. Does it mean that the term was borrowed either from Sanskrit or Pali (but definitely not both), or does it suggest that the term could have been borrowed from both languages at different points? This is the kind of ambiguity that I'm hoping to eliminate. Ioaxxere (talk) 17:38, 19 March 2024 (UTC)[reply]
@Ioaxxere: I find it hard to see how if it could have been borrowed from either then it could not have been borrowed by both. Indeed, it could have been borrowed from both simultaneously, and on multiple occasions. Now, in Thai, there are some cases where a word seems to have been borrowed from Pali and then the spelling upgraded to Sanskrit, e.g. ธมม (or had it become ธัมม?) from Pali being replaced by ธรรม (tam) with (rɔɔ) from Sanskrit (but with gemination of the letter seemingly being the borrowing of a Pali spelling pattern not applicable to (rɔɔ) in words of Pali origin). --RichardW57m (talk) 18:00, 19 March 2024 (UTC)[reply]
@RichardW57m: In that case, it seems like the etymology is equivalent to crusado. The code used in the Thai entry describes the immediate origin of the Thai term, so the fact that the Pali and Sanskrit terms are related doesn't actually change anything. So the code would be: {{etymon|th|id=breed|bor|sa>बन्धु>kinsman|pi>bandhu>kinsman}}, which might produce "Partially borrowed from Sanskrit बन्धु (bandhu) and Pali bandhu." Ioaxxere (talk) 18:47, 19 March 2024 (UTC)[reply]
@Ioaxxere: In which case {{etymon}} should not be used. --RichardW57m (talk) 09:53, 20 March 2024 (UTC)[reply]
@RichardW57m: I'm not sure what you mean by this. Is there something wrong with what I said? Ioaxxere (talk) 17:45, 20 March 2024 (UTC)[reply]
@Ioaxxere: Yes. It rather implies that there is a some third source of the word.
Moreover, with the tight restriction on how far back tracing would go, our statements would not be compatible with the word actually being borrowed via Khmer. I did begin to have some doubts as to the origin of the apocope in this word; I am not totally sure that it is part of the mechanism of directly borrowing from Pali and Sanskrit, especially as some words have been borrowed without apocope. Possibly we can stick in 'ultimately' to cover ourselves. The dropping of final -a from the 'rough forms' (i.e. stems) of words has been incorporated in the way Thai borrows from Pali and Sanskrit, but not as a mandatory process. (I suspect some words may also have been borrowed as the first elements of compounds, thereby preserving the final -a.) --RichardW57m (talk) 18:12, 20 March 2024 (UTC)[reply]
@RichardW57m: I see what you mean. In that case, I suggest "Borrowed from Sanskrit बन्धु (bandhu) and/or Pali bandhu." If we want to allow the possibility of an intermediate step or steps, it could be "Derived from Sanskrit बन्धु (bandhu) and/or Pali bandhu." (in which case the template would be using |der rather than |bor). Also, by "borrowed via Khmer", are you referring to Khmer ព័ន្ធុ (pŏənthuʼ)? If so, that can easily be added in as well. Ioaxxere (talk) 19:07, 20 March 2024 (UTC)[reply]
@Ioaxxere: I'm not sure why you're excluding borrowing in the latter case, but these wordings are better. For borrowing via Khmer, I would have expected the final written vowel to have been silent, but the word you give is definitely a cognate. --RichardW57m (talk) 09:24, 21 March 2024 (UTC)[reply]
3. If there is no proto-form, the {{etymon}} template wouldn't have any ancestors listed and wouldn't be very useful. If for some reason the only thing we knew about English king was that it was cognate with German König, the entry might have: {{etymon|id=monarch}} Cognate with {{m+|de|König}}. (Note: for now, I'm not sure if it's possible to automatically get cognates as we would have to go up and then down the etymology tree, although it might be possible with category stuff).
-- Ioaxxere (talk) 18:32, 18 March 2024 (UTC)[reply]
4. This one's simple: English unlock (for example) is from Middle English unloken but also equivalent to un- +‎ lock. This could be written as: {{etymon|id=open lock|enm>unloken>unlock|afeq|un->inverse|lock>mechanism}}.
In natural language, this represents: English unlock (etymid: open lock) is inherited from Middle English unloken (etymid: unlock), and is also equivalent to English un- (etymid: inverse) + English lock (etymid: mechanism). The template would automatically generate the text "From Middle English unloken, equivalent to un- +‎ lock." in the entry.
-- Ioaxxere (talk) 18:32, 18 March 2024 (UTC)[reply]
@Ioaxxere: Actually, this type of assertion is one that I am trying to avoid. The situation I gave was, "Word A is derived in language X, and word B in language Y may be inherited from A in X or independently formed in Y". You are asserting that word B derives from word A. There are also cases where one can confidently note that, despite formal appearance, word B does not descend from word A. --10:40, 19 March 2024 (UTC) RichardW57m (talk) 10:40, 19 March 2024 (UTC)[reply]
Another interesting case of this would be something like rocznik, i.e. possibly from Old Polish, but scholars aren't sure. Vininn126 (talk) 12:59, 19 March 2024 (UTC)[reply]
@RichardW57m, Vininn126 Based on the current information at Polish rocznik, I would write this as {{etymon|id=year|unc|zlw-opl>rocznik>flower|af|rok>year|-nik>performer}}. I used the keyword afeq (equivalent affix) in a previous example but I'm beginning to doubt whether there's any need to have both it and af (affix). Ioaxxere (talk) 15:38, 19 March 2024 (UTC)[reply]
{{{eqaf}}} just seems to be {{surf}}. Vininn126 (talk) 15:40, 19 March 2024 (UTC)[reply]
Correct. Ioaxxere (talk) 15:47, 19 March 2024 (UTC)[reply]
@Ioaxxere: I think the question is about the Polish word. The formal match is fine, but the semantics don't work except in so far as an old formation may guide a new formation. --16:53, 19 March 2024 (UTC) RichardW57m (talk) 16:53, 19 March 2024 (UTC)[reply]
@Ioaxxere:: It seems like a rather elaborate system, and for that reason it won't be easy to get widespread approval. I reiterate the earlier comment about slimming this down to its ‘core’, which is already very ambitious. Nicodene (talk) 12:19, 20 March 2024 (UTC)[reply]
@Nicodene Yes, this is the point I've been trying to make as well. Ideally we could leverage what we have already, possibly with some minimal changes (e.g. adding an extra etymid param or something if absolutely necessary). The tradeoff should be in the direction of adding more logic to the scraping function so that less explicit specification on the part of the editor is needed. Benwing2 (talk) 17:14, 20 March 2024 (UTC)[reply]

Update: I've added the template to father (& ancestors). In the end I didn't implement any text generation, since I don't think template-generated text can ever be better than human-generated text. Instead, I'm having it create an etymology tree which is pretty cool as well. Ioaxxere (talk) 00:28, 27 March 2024 (UTC)[reply]

@Ioaxxere: Did you get approval to add this? It doesn't seem like there's an actual consensus for it, and it is a major change. Additionally in the Proto-Indo-European entries that you've added it to, like at *peh₂-, it's causing weird spacing issues that weren't there before. You should only add it if there's a clear consensus for it. For testing purposes, you can use sandbox pages instead. CC: @Benwing2 AG202 (talk) 01:16, 27 March 2024 (UTC)[reply]
@AG202 @Ioaxxere I agree, please use sandbox pages until there's consensus to add this to mainspace pages. Benwing2 (talk) 01:29, 27 March 2024 (UTC)[reply]

Adding "Língua Geral" as a new language[edit]

It's a fairly well documented language, and represents a 150-year-old path between Old Tupi (tpw) and Nheengatu (yrl). Língua Geral has some evolutions that appear in both late Portuguese borrowings and Nheengatu, and are hard to show without it, like some changes in pronunciation (kunumĩ > kurumĩ) and meaning (paranã (sea) > paraná (“river”)). Língua Geral is also present in a good number of Brazilian toponyms, like Botuverá, and having an Etymology section saying "From Old Tupi" would just be wrong.

The cut between Língua Geral and Nheengatu is set at 1853 by most scholars, when the word "Nheengatu" was first used with the current meaning. The cut between Old Tupi and Lingua Geral is a bit more nebulous. Navarro used 1700 for his dictionary, so I'd go with that. For the code, I sugest <tpw-lg>, as it comes from Old Tupi.

There existed two varieties, Língua Geral Amazônica and Língua Geral Paulista, but I think that they could just be pointed out using {{lb}} when needed, rather than two separated L2 headings (if this ever become a new L2). What do you think?. Trooper57 (talk) 20:31, 16 March 2024 (UTC)[reply]

@Trooper57 I think what you are proposing is a full (L2 header) language. I don't know much about the differences between Old Tupi, Nheengatu and Língua Geral, but 150 years seems rather narrow a window for a full language; unless things evolved really fast, this could also (maybe better) be handled as an etym-only language variant of either the preceding or following stages. At least to me, the changes in pronunciation you give (kunumĩ > kurumĩ and paranã -> paraná with a semantic shift) do not seem indicative of a radical transformation in the language. Also, for a code I'd suggest maybe tpw-lig as we try to make the second component of a two-component language code have three chars. Benwing2 (talk) 21:57, 16 March 2024 (UTC)[reply]
Maybe a etym-only would suffice. The trouble I was having was how to state a word came from a later stage o Tupi, and not from the 16th century.
Also, as I understand, the fast pace of LG comes from its marginalization: Marquis de Pombal prohibited anything besides Portuguese, so it ended being a unscripted, nonstandardized language. Trooper57 (talk) 00:26, 17 March 2024 (UTC)[reply]
@Trooper57 If we add tpw-lig as an etym-only language variant of Old Tupi, then all you need to do is use the code in place of tpw and it will show as "From Língua Geral ..." with the appropriate link to the Língua Geral category (which doesn't seem to exist but can be created). Benwing2 (talk) 00:32, 17 March 2024 (UTC)[reply]
@Trooper57 BTW "Língua Geral" and "Geral" are listed as other names for Nheengatu in our data, which is consistent with how Wikipedia describes things (i.e. Língua Geral being an older stage of Nheengatu rather than a later stage of Old Tupi). Benwing2 (talk) 00:39, 17 March 2024 (UTC)[reply]
There are authors that call everything from 1500 onwards "Lingua Geral". It gets really confusing sometimes. Trooper57 (talk) 01:23, 17 March 2024 (UTC)[reply]
@Trooper57 OK. Let's wait a couple of days for anyone else to weigh in who might be knowledgeable about this topic (please ping anyone you think might be able to contribute), and then we can create an etym-only lang for Língua Geral, either tpw-lig or yrl-lig, whatever you think most appropriate. We can also create subvarieties *-lga (Língua Geral Amazônica) and *-lgp (Língua Geral Paulista) if you think this would be helpful (e.g. if you think these varieties will ever see fit to be cited in an etymology). Benwing2 (talk) 01:53, 17 March 2024 (UTC)[reply]
@RodRabelo7 and @NoKiAthami are the other Old Tupi editors I can think of. There's also @Arthur botelho, but he's inactive since 2019. Trooper57 (talk) 02:22, 17 March 2024 (UTC)[reply]
@Benwing2, the grammars of the General Languages are very distinctive from the ones of Old Tupi and Nheengatu, hence I agree with @Trooper57. An example: "Nitípe nde Caräíba?" in Old Tupi would be "Nda karaíba ruãpe endé?"; "Xe çüí nití oiabàb" would be "Xe suí i îababe'ymi", etc. In Nheengatu it would probably be similar, but still there are significant differences, in grammar and vocabulary. I honestly don't know the implications of "adding a new language" on Wiktionary, but undoubtfully the General Languages are independent languages. Those who call the language spoken from 1500 onwards Língua Geral (or even Nheengatu) should be completely ignored; for instance, Cândida Barros and Monserrat categorically state that the term isn't pertinent to the 16th century. In my opinion, Navarro's division is the most didactic one. Melhor feito do que perfeito… Pinging Erick Soares3 and Bageense in case they have anything to add. RodRabelo7 (talk) 17:38, 23 March 2024 (UTC)[reply]
@RodRabelo7 One gauge is to consider the differences between Old, Middle and Modern English. Middle English covered a c. 450-year time window with significant differences in the grammar and phonology between Early and Late Middle English (e.g. Early Middle English had 4 cases and 3 genders, while Late Middle English had no cases and no genders), but we don't split Early and Late Middle English for various reasons, e.g. (a) there's no obvious line to draw between Early and Late Middle English; (b) overly fine splits scatter information in different places that might be better grouped together; (c) splits risk creating duplicate information, where essentially the same lemmas appear in several places, with inevitable bit rot as changes in one place don't get properly propagated to the other(s). If the original motivation was simply to better show etymological progression, that can be done without any issue with etym-only languages and the appropriate labels; we do that for example with the different stages of Latin, where Classical Latin, Late Latin, Early Medieval Latin, Medieval Latin, New Latin, etc. are all grouped under the L2 Latin header and labels used to identify distinct stages. Benwing2 (talk) 18:13, 23 March 2024 (UTC)[reply]
I think Latin would be a good example to follow in this case. For example paranã, we could use {{lb|tpw|Língua Geral}} and say "3. (Língua Geral) river" to state this sense appeared later. Trooper57 (talk) 18:31, 23 March 2024 (UTC)[reply]

Reconstruction:Latin → Reconstruction:Proto-Romance?[edit]

Periodically people leave me messages like this asking why the provided pronunciations can be so different from what they would expect from ‘Vulgar Latin’, not understanding that these are reconstructions that work backwards from Romance and not forwards from Latin. To be fair, it's not as if we make this easy to understand. Every page in question prominently says ‘Latin’ at the top, while the ‘Proto-’ and ‘Romance’ parts are buried deeper in the entries, in template labels that are (apparently) easy to miss. Examples that have confused people include *cordarium and *damnaticum.

So, would we be better off splitting out reconstructions based on Romance under a new name? An incidental benefit would be that the relatively few Classical or pre-Classical reconstructions such as *futo, which are currently buried under a mass of Romance forms, would be easier to find and compare. Naturally the objection can be made that most if not all of the reconstructions probably pre-date any meaningful split between Latin and Romance, and so a proper entry of this kind amounts to ‘Late Latin, but unattested’ - and this is why I've not been inclined to change the way that we handle these so far. But I've come to realize that a change in name would have the benefit of clarifying for people how these reconstructions work, which must be rather opaque for anyone but a specialist.

Then the follow-up question is: if we do this, then should all such entries fall under the umbrella of ‘Reconstruction:Proto-Romance’, with labels distinguishing lower-order reconstructions like Proto-Gallo-Romance, or should they be split up accordingly? Nicodene (talk) 12:58, 20 March 2024 (UTC)[reply]

@Nicodene Personally I'd rather not introduce a new L2 for Proto-Romance because the reconstructed forms look so much like Latin; but I understand your concerns as well. If we are to do this, IMO there should be only one L2 for Proto-Romance, at least for now, with lower-order reconstructions distinguished using labels. In any case AFAIK the inner structure of the Romance languages isn't that worked out? Benwing2 (talk) 17:18, 20 March 2024 (UTC)[reply]
Just throwing out there that two "least-change" solutions to people being confused about the pronunciations, which we could do either or both (or, of course, neither) of, are to start repeating the "Proto-Ibero-Romance" etc labels as {{q}}s right before or after the pronunciation, and/or to start (additionally) providing the Classicizing / Classical-esque pronunciation, which is plainly what people are looking for in these and other Latin entries and tend to use in the situations (like Latin classes at school) where people use Latin anymore, which I think we should be providing across all our Latin entries. (I know you've opposed that because the people using those pronunciations are not Classical Romans, but I think we could find some label to make that clear.) - -sche (discuss) 17:39, 20 March 2024 (UTC)[reply]
@-sche @Nicodene I agree with this. Note that for example, we provide the "modern Italianate Ecclesiastical" pronunciation on Classical Latin entries even though this isn't how the Romans pronounced things, and we provide the Egyptological pronunciation on Ancient Egyptian terms, which is totally artificial. Benwing2 (talk) 17:48, 20 March 2024 (UTC)[reply]
We assign a modern ecclesiastical pronunciation to words that are actually in use by modern speakers of Latin - which is neither *damnaticum, nor *cordarium, nor *muccicalium, nor *ramuscellum - ad infinitum. That is to say, no, there is absolutely zero sociolinguistic justification for doing this - much less any historical or phonological. Egyptological pronunciation is based on a scholarly convention - which incidentally arose because scholars needed a way to refer to attested words without knowing the vowels in them - and so not at all comparable.
To put it more concretely - it'd be exactly as if the OED cited a Proto-Germanic reconstruction and assigned it a modern cockney pronunciation based on its spelling. Or vice-versa: cited cockney slang in a reconstructed Anglo-Saxon pronunciation. That is to say, it'd be utterly mad either way. Nicodene (talk) 18:42, 20 March 2024 (UTC)[reply]
No word that is known only from Romance reconstruction exists ‘in Latin classes’, or in any dictionary of Latin, by definition. To assign a pronunciation from the first century BC to a reconstructed word that shepherds in the Balkans would have come up with ca. AD 900 would be fantasy - misinformation - the precise opposite of what any serious dictionary should be providing. What you are proposing is quite literally to feed the misconceptions of (some) people coming here for clarification. Just because a reconstruction is spelt for etymological reasons in traditional Latin style does not mean it is equivalent to any old word Cicero used in the first century BC that people actually use now and have used throughout the entire history of Latin as a literary language, from end to end.
Actually your comment has convinced me that splitting is really the best option. There is simply no other way I can see to prevent this problem recurring over and over again. Well, there is of course always the option of giving up and letting fantasy run amok forever, where comparing a Friulian word to a Catalan word magically results in a reconstruction with the phonetics of Cicero's era based on quite literally nothing more than the spelling that the reconstructed entries are given.
Do editors in Proto-Germanic have to deal with people inserting modern English spelling-pronunciations like /ɪˈtʃɹəʊnə/ for *aitrōną? Do editors in Proto-Slavic have to deal with people insisting on modern Russian Church Slavonic IPA? I imagine not. Nicodene (talk) 19:14, 20 March 2024 (UTC)[reply]
That is not how reconstruction works, for any language. The pronunciation given on a reconstructed entry is what can be deduced from the descendants, not some kind of modern spelling-pronunciation based on the letters used to spell the reconstruction.
@-sche: could you name any academic source that assigns to a Gallo-Romance reconstruction like *damnaticum a supposed Classical Latin pronunciation like [d̪ämˈnäːt̪ɪkʊ̃ˑ]? Or for that matter assigns a pronunciation like [foːrˈmäːt̪ɪkʊ̃ˑ] to a Gallo-Romance form like ⟨formaticum⟩ ‘cheese’, found for a couple of decades in Latin records from medieval France and never before or since? Or claims that modern English scholars discussing such words switch to Classical Latin phonetics, complete with trilled /r/, phonemic vowel length, and nasal vowels? Or that either word has been artificially resurrected by modern Latin enthusiasts, or is even known by them?
So long as the answer to all four questions is ‘no’, I don't see what there is to discuss. The pronunciation in question would be a fiction divorced not only from scholarship but also history and modern reality. The only remaining point is what you mentioned: that some uninformed passers-by want to see this fiction, because they look at a word spelt in Latin orthography and can't imagine any other pronunciation. And if that is to prevail over the aforementioned points, if this site is to deliberately place fiction over fact, then tell me and I will leave it in peace instead of trying to observe some modicum of academic rigour. Nicodene (talk) 13:32, 21 March 2024 (UTC)[reply]
@Nicodene Hi again. I'm not sure what your latest response is responding to, but I thought of this some more and I'm not convinced by your arguments. The thing is that the "Classicizing" pronunciation is essentially a modern invention that bears a certain similarity to the pronunciation of c. 50 BC but isn't the same. For example, the typical Classicizing pronunciation pronounces final -m as /m/, which is not how it was actually pronounced, and does not bother making any distinction between different types of written l. For these reasons, it *does* remind me a bit of the Egyptological pronunciation, and even more of the Modern Standard Arabic pronunciation, which (like the Classicizing pronunciation) is based on a specific era's pronunciation (that of Koranic Arabic of c. 600 AD) but with significant changes. MSA pronunciation can be given to all Arabic words, including ones that originated far after Koranic times, and there's nothing particularly wrong with assigning and listing such a pronunciation because (a) it's a modern invention anyway, and (b) it is in common use today. In fact there are further similarities, e.g. both the MSA and Classicizing pronunciations differ somewhat depending on the native languages of the speakers, because in various circumstances they match up certain written letters to the closest native-language phoneme instead of attempting a "true" representation of the original era's pronunciation. Benwing2 (talk) 21:41, 21 March 2024 (UTC)[reply]
@Benwing2 Sorry, I think I missed what @-sche was actually getting at. And it seems to be what you're getting at too. Namely that if we were to use an accurate label like ‘Modern Classicizing’ rather than ‘Classical’ then everything else becomes very easy.
In that case there's no longer any concern about ahistoricity and we can indeed ‘provide [this pronunciation] across all our Latin entries’. We'd no longer be portraying this as how for example a word attested in 9th-century Italy was actually pronounced, we'd simply be saying this is how a modern speaker would read it. And that's completely reasonable.
Also this makes it clear what pronunciation we should be showing - the standard that modern speakers follow, i.e. Allen's Vox Latina. No need to host wacko home-brewed theories like [z̪d̪͡z̪] anymore. Your idea of setting it all to [phonetic only] would also be well-suited for a modern convention if you're still in favour.
As far as the original topic is concerned though I can't agree with the idea that such pronunciations would be reasonable in a reconstructed entry, because in my view the only pronunciation that is valid on a reconstructed entry is one that is actually reconstructed from the descendants. Nicodene (talk) 23:27, 21 March 2024 (UTC)[reply]

Minimal viable quotation that satisfies the WT:QUOTE policy requirements[edit]

The quote-book template provides amazing flexibility and great level of details. However this also creates an impression of a steep learning curve, so it's natural that some editors are reluctant to add quotations as a result. So I wonder, what is the absolute minimum amount of information that has to be provided to make a valid quotation?

My understanding is that it's always necessary to mention the author, because not doing so would constitute an act of plagiarism. But then there's the publication year, the title of the work, the information about the publisher, ISBN, a link to Google Books or some other durably archived source, an English translation for non-English quotations and many other details. How much of this can be omitted during the initial edits, done by inexperienced contributors, without becoming a problem? --Ssvb (talk) 22:29, 20 March 2024 (UTC)[reply]

I would say that the minimum viable quote has |title=, one of |year= or |date=, and |text=. Obviously |author= should be included when available, but that's not always the case. Ioaxxere (talk) 23:10, 20 March 2024 (UTC)[reply]
Agreed. I think in fact that if you omit any of |title=, |year=+|date= or |text=, you get a maintenance message asking for them; if not, that should be what happens. Other parameters, e.g. publisher, link to a durably archived source, etc. should also be present ideally but aren't strictly required. Benwing2 (talk) 04:07, 21 March 2024 (UTC)[reply]
If mentioning the author is currently not required, then this probably needs to change. Because this conflicts with WT:FAQ ("the right to be properly credited for one’s works", "What is fair use? [...] Any such quotation must be properly credited.") and the copyright laws of many countries. The Belarusian paper dictionary in 5 volumes, directly copying from which instigated this discussion in the first place, only lists the text itself and the author's surname in its entry, but not the year or any other information. They had to make their entries compact due to the restrictions of the paper format, but out of all things, it was the author's name that was kept there. --Ssvb (talk) 10:02, 21 March 2024 (UTC)[reply]
If mentioning the author is currently not required, then this probably needs to change. In context at WT:FAQ, "the right to be properly credited for one’s works" doesn't imply that authors should be required in quote-book. If ascertaining the "author" to ensure the "right to be properly credited for one's works" becomes an actual requirement, then the texts in the Bible or other anonymous works can not be quoted since the authors are uncertain. Proper credit for a work can come in many forms, some of which do not necessarily include identification of the author directly by some name. I guess "due diligence" might ask that you put "Anonymous" or similar on a text with uncertain authorship, but what about when authorship claims are contested? --Geographyinitiative (talk) 10:38, 21 March 2024 (UTC)[reply]
There's already a policy about WT:QUOTE#Debated_authorship. --Ssvb (talk) 11:23, 21 March 2024 (UTC)[reply]
It doesn't say what to do if we don't know the exact author's name, as in with an unsigned editorial. CitationsFreak (talk) 15:17, 21 March 2024 (UTC)[reply]
My interpretation of "giving proper credit" is that some point of contact should be provided. It can be the author, an editor or a publisher house. So that it's clear that the Wiktionary editor doesn't try to claim authorship of the quoted text. Anyway, as a result of this discussion, I just would like to have some clear, simple and actionable instructions. Quotes from the Bible are useful, but they are more like a special case. My point is that people should be able to easily add quotes without having unnecessary headache in typical cases:
  • The authors are usually known for modern books, but the original year of publication is a pain, because Google Books often messes it up. See WT:Quotations/Resources#Google_Books.
  • Wikisource is a great place for finding quotes from older public domain books, so some simple guidelines specifically tailored for its usage with the quote-book template would be useful.
--Ssvb (talk) 11:50, 21 March 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Some thoughts:

  • I think the main point is to provide sufficient information so that another editor who wishes to verify the information can easily do so. For example, if the quotation can be found online, then please add the URL. I have seen really bad quotations where an editor provided a quote from, for example, The Guardian newspaper (presumably the one published in London? but no information was provided) some time in 2020—not even a complete date. It's not so bad if the quotation can be found through an online search, but if it can't then I'd have to reject this as a quotation. Ditto for quotations from works that I cannot find online and aren't even listed in Worldcat—I usually move these to the Citations page and mark them "unverifiable".
  • As to whether providing the author's name is mandatory, I'd say no but if we are unsure let's ask the Wikimedia Foundation for advice. It's definitely a good practice, though. Also, there's a difference between works still subject to copyright and works now in the public domain. A stricter standard may well apply to copyrighted works.

Sgconlaw (talk) 11:32, 21 March 2024 (UTC)[reply]

> mark them "unverifiable"
How do you do that? Does the quote-book template allow setting the "unverifiable" status and automatically put such quotations into their own category?
I think that it would be very useful for creating English translations. For example, I could translate Belarusian quotations to the best of my ability and set some kind of "needs to be proofread by a native English speaker" status for them. Then some native English speaker could fix the grammar and style issues, rephrase the translation if necessary, and set "needs to be verified whether the translation still conveys the same meaning" status. Then I could take a look again and remove this status if everything is fine. With an iterative process like this, the quality of translations could be potentially improved. --Ssvb (talk) 16:09, 21 March 2024 (UTC)[reply]
> the main point is to provide sufficient information so that another editor who wishes to verify the information can easily do so
That's a good point and I agree. In the case of dealing with modern books indexed by Google Books, just listing ISBN in the quote-book template is probably good enough to make the information verifiable and the duty of providing all the extra details can be delegated to Google. As long as the text of the quotation is searchable itself. I mean, it's probably okay to use "|isbn=" instead of "|author=" from the compliance point of view. Explicitly specifying the author is indeed a good practice, but things may get tricky for the inexperienced Wiktionary editors when dealing with translated works. Another tricky case is when some book is a collection of multiple shorts stories from different authors. And again, just providing the title of the whole book and ISBN is probably good enough. Rather than trying to figure out who was the actual author of that particular text snippet. --Ssvb (talk) 16:55, 21 March 2024 (UTC)[reply]
@Ssvb I tend to disagree that just providing the ISBN would be enough; I think as a courtesy to the reader, you should supply the title, author, date and ideally page number. As you mention above, there are edge cases where it may be hard to determine the author, and in such a case I think it's OK to leave out the author, but the vast majority of the time, the author is right there on the front of the book (or in the table of contents if the book is a collection from multiple authors). I also completely agree with User:Sgconlaw that quotes need to be verifiable. I did a lot of rewriting of {{quote-book}} a few months ago and cleaned up all the bad parameter uses (formerly, parameters were not checked properly), and came across a lot of badly formatted quotes, most of which I was able to verify but sometimes it took a lot of work to do so, which is not what you want to force people to do. As for adding things like "unverifiable" and "needs to be proofread", there aren't things built into {{quote-book}} to let you do that currently. I think User:Sgconlaw is just writing [unverifiable] and putting it somewhere next to or within the quote (e.g. using {{attn}}); you'd have to ask them for sure. I could definitely see adding options to supply notes "unverifiable" and "needs to be proofread" automatically and add them to appropriate categories. Benwing2 (talk) 21:51, 21 March 2024 (UTC)[reply]
I've just been adding to such quotations on citations pages |footer={{small|Unable to verify this quotation.}}. — Sgconlaw (talk) 21:58, 21 March 2024 (UTC)[reply]
@Sgconlaw: Thanks! I made the following diff to give it a try. Still, as User:Benwing2 mentioned, having a dedicated option and a category for it might be useful. --Ssvb (talk) 14:22, 22 March 2024 (UTC)[reply]
@Ssvb: I'm not sure it should be used on the main entry page. I would suggest you move the entire quotation to the citations page. — Sgconlaw (talk) 15:59, 22 March 2024 (UTC)[reply]
@Sgconlaw: Removing the quotation from the main entry page would be less than ideal and I believe that my translations into English aren't too horrible. I'm just not fully confident about the things like "useful for me" vs. "useful to me" and also wonder whether I'm possibly doing a kind of odd Yoda-style sentences composition in some of my translations.
Looks like I probably need something like |t-check= parameter support in the quote-book template. Similar to the WT:TRANS's approach to handling this. --Ssvb (talk) 15:28, 23 March 2024 (UTC)[reply]
@Benwing2: I mean that providing |title=|year=|text=|isbn= with a valid ISBN known to Google Books is better than providing just |title=|year=|text= and nothing else. But having |title=|year=|text=|author= with a valid author is even better. Page numbers are not always available in the newer ebook editions of older paper books.
When you are talking about badly formatted quotes, do you mean the uses of quote-book template, which incorrectly add advanced options with inaccurate/bogus information? Or was it something else? --Ssvb (talk) 00:12, 22 March 2024 (UTC)[reply]

I agree with the above: "My interpretation of "giving proper credit" is that some point of contact should be provided. It can be the author, an editor or a publisher house." How do you feel about Lufeng's quote here: diff? You may say "there are other cites with authors on them"- but consider that you exclude an important part of the literature of humanity that has an anonymous or semi-anonymous character, but can still be adequately identified with OCLC, etc. Formally identified authors/translators are important to identify, yes, but there's generalized "sources". I mean it's an interesting question, of course. I don't really know what's right legally speaking for the site. Many magazines and newspapers have articles with unidentified authorship, and "AP News" is the source for many articles in 20th century newspapers.
Or again, consider this cite for Citations:Manoi, which has no specific author except the city "CHUNGKING":

  • 1945 May 23, “Chinese Expand New Drive on East Coast”, in Manila Free Philippines[10], volume III, number 24, Manila, sourced from Chungking, →OCLC, page 1, column 5:
    On the east China coast, onrushing Chinese forces captured Manoi, Min river estuary coastal town nine miles east of Foochow. Other forces, driving northeast from Foochow, were reported at the outskirts of Lienkong on the coast.
    --Geographyinitiative (talk) 22:18, 21 March 2024 (UTC) (Modified)[reply]
@Geographyinitiative: I feel that these quotes are way too elaborate and detailed. So they are exactly the thing that scares away new contributors. The beginners wouldn't touch any of these excessively complex templates with a ten-foot pole.
The discussion is about how much can be left out, while still being an acceptable and useful contribution. There was a suggestion that just having "|title=, |year= and |text=" is enough. Yet my opinion is that such bare minimum likely fails to "give proper credit" and a little bit more information is necessary, such as the "|author=" option. Or, as it's in your example, it may be the issue number, OCLC, page number, etc. However I think that the beginners should focus on just quotes from books and the "|author=" option, initially staying away from more complicated cases. --Ssvb (talk) 23:31, 21 March 2024 (UTC)[reply]
Here is "title year text", bare bones, for the above quote:
1945, Manila Free Philippines:
On the east China coast, onrushing Chinese forces captured Manoi, Min river estuary coastal town nine miles east of Foochow. Other forces, driving northeast from Foochow, were reported at the outskirts of Lienkong on the coast.

User:Wyang once called the website 烏煙瘴氣. I don't think anyone will respect the website that is made up of only relatively simple quote cites. Those simplistic quote cites are part of the equation, but you don't want to hurt the high-quality quotes just because some quotes are simple.
I think the main problem is that it's not obvious to entry-level people that Template:quote-book is the place to go to find out about quote-book parameters. It's hidden from them.
WT:ATTEST says: "Where possible, it is better to cite sources that are likely to remain easily accessible over time, so that someone referring to Wiktionary years from now is likely to be able to find the original source." My goal in providing detail in quote cites is two-fold: (1) to allow someone looking for this in 50 years after Internet Archive is long gone to find it again, and (2) to give an atmosphere to this website of professionalism that encourages high-quality editing and citations. If I did "title year text" on my quote cites in my high school paper, the teacher would give me an F.
I also work on a lot of words that were ignored by Wiktionary and Wikipedia for about 20 years time, and they will be ignored again when I'm gone. I expect no one will follow up after me and that the soft-power campaigns of authoritarian China will mostly reverse what I've done so far to illuminate some of this terminology that the CCP doesn't like people to know about. It's an area of English language vocabulary that is scorned by the field of Asian Studies.
All I have to cling to is that the cites I've done so far might be high-quality enough to convince some future admins in 2040 not to allow deletion of everything I did. If I just did "title year text" quote cites, my argument to those future admins is significantly weakened, because they can say "well, who the fuck knows where that Manila Free Philippines bullshit quote was really located? that's just some bullshit from the cretins in 2024 who didn't have MindLinkAI." I'm adding indicia of authenticity with these details to stave off deletion in 2040, 2050, etc. I want to make it so that if you delete some of these high-quality quote cites, you have to feel palpably ashamed of yourself, unless you're totally dead inside. No one would feel ashamed deleting "title year text" quote cite. --Geographyinitiative (talk) 00:08, 22 March 2024 (UTC)[reply]
@Geographyinitiative: I'm not asking you to reduce the level of details in your quotes. Keep up the good job. I'm actually asking people to start using Template:quote-book instead of Template:uxi for adding quotes to Wiktionary. And this transition doesn't have to be difficult. --Ssvb (talk) 00:39, 22 March 2024 (UTC)[reply]
I'm telling you, we will always regret simplification to less than what exists now in quote-book. But we will also always regret throwing out things that are semi-anonymous. You give credit to the extent the work allows you to give credit, not less, not more. The work being cited is the guide to its citation. Cookie cutter simplifications will not work. It is as beautiful and marevelous as it is without Stalinist dictates to simplfy or to require specific authors. Geographyinitiative (talk) 02:22, 22 March 2024 (UTC)[reply]
@Geographyinitiative: This is getting ridiculous. I don't just "give credit to the extent the work allows you to give credit, not less, not more". I'm providing a lot of additional details when adding quotations myself. But I'm also occasionally fixing quotations like this. And please take a look at how people opt to just delete an ux template rather than convert it into a proper quotation. Why is this happening? What can we do to improve the current situation? I don't believe that it's productive to just shame people by claiming that allegedly "the teacher would give them an F". --Ssvb (talk) 21:40, 23 March 2024 (UTC)[reply]

Template:ko-etym-native without parameters is pointless and misleading[edit]

@AG202 @Solarkoid @Chom.kwoy @Tibidibi

Calling Template:ko-etym-native without parameters is currently explicitly allowed, producing the message "Of native Korean origin."

This to me seems completely pointless. How does this in any way help the reader? Not only that, it is misleading and gives the ety false confidence; if you have nothing to add to the ety, how can you be so sure that the word is of native origin to begin with?

I propose that this usage be deprecated so that it can eventually be phased out. Ideally this would be done by categorizing any entries with parameter-less invocations into a separate category (or just ) so that the respective entries can be dealt with appropriately. Lunabunn (talk) 00:46, 21 March 2024 (UTC)[reply]

I think that template is currently being used as a signal to indicate the average Korean speaker's intuition about that word, as being neither Sino-Korean nor "foreign", as this intuition sometimes has an effect on the word's usage and grammar (e.g. Sino-Korean words better go along with other Sino-Korean words when compounding, etc). So while this information has some uses, it should not be conflated with the actual etymology of the word, which is what the etymology section is for.
So I agree that it should be deprecated as well. Chom.kwoy (talk) 09:10, 21 March 2024 (UTC)[reply]
@Chom.kwoy: Are there any bright ideas as to what should replace it? There is a significant similar notion in Thai of a partition of vocabulary into 'native' and Pali/Sanskrit vocabulary, distinguished as 'blue' and 'red' in one English-language grammar, and associated with compounding rules (which have exceptions, most notoriously ผลไม้ (pǒn-lá-máai)). English also has a similar but sloppy categorisation, though the effects of 'marking' in the lexicon are shown by other information in lemmas' entries, so probably less useful to record. --RichardW57m (talk) 11:33, 21 March 2024 (UTC)[reply]
As an outsider (beginner in Korean studies), I find this useful information, for the reasons you (@Chom.kwoy) note.
As a long-time Wiktionary editor, I think "native Korean origin" is indeed etymological information, as etymology is about where a word comes from (its origin). I struggle to think where else in our WT:ELE entry structure that this kind of information would go. ‑‑ Eiríkr Útlendi │Tala við mig 18:56, 22 March 2024 (UTC)[reply]
If a word is truly of native Korean origin, that is of course etymological information. However, the issue is that "Of native Korean origin." is currently a copout that gets added to every entry under the sun that isn't an obvious loan (i.e. from English or Sino-Korean) and obscuring actual etymological info based on "speaker intuition." As Chom.kwoy said, this speaker intuition is also important. It should not, however, be conflated with the true origin of a word.
See also examples from Surjection below on words that are clearly loans (and even described as such in the etymology section) but still end up using Template:ko-etym-native because of this template's double duty as "first attested in" and "of native origin." Lunabunn (talk) 01:18, 23 March 2024 (UTC)[reply]
Fully agree, as an outsider. The "first attested in" part should be moved into its own template (perhaps named {{ko-attest}}) and the rest removed, if not completely, then at least from the etymology sections. Category:Native Korean words must also go. — SURJECTION / T / C / L / 12:06, 21 March 2024 (UTC)[reply]
First attestation seems to me like part of etymology, in describing the start date (origin) of textual evidence; I've certainly been putting that info in the ===Etymology=== sections in Japanese entries, for years now. Where else should that go? ‑‑ Eiríkr Útlendi │Tala við mig 19:06, 22 March 2024 (UTC)[reply]
First attestation should stay in the etymology. I wasn't arguing it shouldn't, just that it be moved into its own template that could then be used in etymology sections. — SURJECTION / T / C / L / 21:35, 22 March 2024 (UTC)[reply]
Keep until a better alternative is found, don't simply remove. I liken this to Japanese on'yomi and kun'yomi - native Japanese and Sino-Japanese readings of Chinese characters (kanji). Apart from Sino-Korean, it also distinguishes from loanwords from European languages, more modern Japanese loanwords where occasional tensification of consonants occur.
Finding etymologies for all words may not be possible bu distinguishing loanwords from native words specifically for Korean has its values and is practice with certain dictionaries and contributors.
Less categorical on Category:Native Korean words but I don't see why it should be removed. Anatoli T. (обсудить/вклад) 13:36, 21 March 2024 (UTC)[reply]
If our English entries had ever adopted a similar system, there would be people defending it too. "Of native origin" is not an etymology nor are entries perceived to be of "native" origin a valid category. I've seen this even be misused for compounds derived entirely from recent borrowings that nobody would ever call "native". In short, this system is nothing more than an absurd cop-out. — SURJECTION / T / C / L / 14:15, 21 March 2024 (UTC)[reply]
And to illustrate just how terrible of an idea it is to combine an attestation template with a "this is native I promise" template (which does not make the latter any less absurd), look at 가라치 (garachi) and 고두리 (goduri). Truly, some native Korean terms that were borrowed from Mongolic. — SURJECTION / T / C / L / 14:19, 21 March 2024 (UTC)[reply]
Here's the misuse too for good measure. — SURJECTION / T / C / L / 14:21, 21 March 2024 (UTC)[reply]
Definitely agree to remove. Logged in just to reply to this. Also thanks, Surjection, great examples as to why it's a bad/misleading template. Assuming right from the get-go that a word is of native origin simply won't do. Additionally, we use Of native Korean origin. for everything attested; so, it adding those word to a category called 'Native Korean words' is a really bad practice (Compare a Chinese loan ). Also, I share Surjection's suggestion that that the template should be changed (i.e. remove parameterless option, categories) and renamed (to something like a ko-attest). - Solarkoid (talk) 14:40, 21 March 2024 (UTC)[reply]
The issue is whether this corresponds to a vocabulary marking (for an 'unmarked' value) that is relevant to Korean grammar. In English, the abstract lexicon has words marked as 'foreign' or 'Latinate', for which the unmarked value approximates to 'native', for which 'effectively native' may be a better term. And in English, the 'native' category includes such words of French origin as beak and beef. For grammar, being 'Sino-Korean' is reported above to be a matter of synchronic fact, not of historical fact. --15:01, 21 March 2024 (UTC) RichardW57m (talk) 15:01, 21 March 2024 (UTC)[reply]
If it has significant relevance, then it should be documented in some way, but by using a better term than "native" and by without abusing the etymology section to document it. — SURJECTION / T / C / L / 15:07, 21 March 2024 (UTC)[reply]
@Surjection: Agree. But let's not just delete the records, but rather eliminate them by conversion to the new way of documenting. --RichardW57m (talk) 15:32, 21 March 2024 (UTC)[reply]
Sure, I'd be fine with (a) splitting first attestation into its own template, (b) moving {{ko-etym-native}} out of the etymology section to somewhere else (maybe usage notes), (c) rewording the text displayed by the template as appropriate, and (d) getting rid of Category:Native Korean words and stopping the template from categorizing. — SURJECTION / T / C / L / 15:38, 21 March 2024 (UTC)[reply]
@Surjection: Thank you for the valuable input. I definitely see value in keeping some kind of label (CC: @Chom.kwoy), but otherwise agree that all of this is cruft that needs to be gone.
Perhaps, then, we may just deprecate the entire ko-etym-native template and then gradually replace it with a near-identical template with parameterless invocation and categorization removed? (We can still use Module:ko-etym for this new template.) We should be able to use either one of Category:Pages using deprecated templates or Category:Native Korean words to keep track of all the entries that need to be updated (eventually). Lunabunn (talk) 19:11, 21 March 2024 (UTC)[reply]
@Atitarev I am not sure how applicable your points are to Korean.
  1. I liken this to Japanese on'yomi and kun'yomi - native Japanese and Sino-Japanese readings of Chinese characters (kanji).
    • This is already covered by the Sino-Korean template, as you have noted.
  2. it also distinguishes from loanwords from European languages
    • This is already covered by the fact that, ya know, modern loanwords are marked as such. If an etym section does not say it is a loan, that is enough to perceive that it is not a modern European loanage such that this kind of thing would matter. Conversely, just because a word is not a modern European loan does not mean it is of native origin. See the many Mongolian/Japanese/Korean... loans, for example, that are now perceived as native. (e.g. 수라 (sura), 담배 (dambae), 김치 (gimchi))
  3. more modern Japanese loanwords where occasional tensification of consonants occur.
    • This is already explicitly covered by Template:ko-ipa. Note that not all words that receive this "loanword-like" tensing are loanwords, nor does every loanword receive this kind of tensing.
But most importantly, once you add a first attestation to Template:ko-etym-native, the "native origin" message disappears. So even if everything you said is true and indeed "native origin" is a useful label here, this template is still the worst of both worlds. Lunabunn (talk) 01:24, 23 March 2024 (UTC)[reply]
Support removal. We do need some kind of hidden category tracking though, like "Korean words with no etymological history" or something like that. AG202 (talk) 03:47, 25 March 2024 (UTC)[reply]

Splitting Etymology by Accentuation[edit]

(Notifying AryamanA, Bhagadatta, Svartava, JohnC5, Kutchkutch, Getsnoopy, Rishabhbhat, Dragonoid76, RichardW57, Exarchus): When forms of a lemma differ in accentuation which is not normally marked in the orthography, my reading of WT:EL says that if we have entries for the forms, they should be recorded under different etymologies, as is done for the present and past of English read, which have different vowel sounds. I have accordingly followed that rule at Sanskrit अमृता (amṛtā, deathless), where the Vedic accent is on the first syllable of the word in the vocative and on the second syllable in the other cases. Should or could I instead have followed the example of Russian сковороды (skovorody, frying pan), where there is only one etymology section and the two pronunciations are given in the same pronunciation section, with the pronunciation tied to the relevant noun form entry by the accent shown on the Cyrillic? --RichardW57m (talk) 16:56, 21 March 2024 (UTC)[reply]

@RichardW57m I'm more inclined to say we should follow the example of сковороды (skovorody) for a few reasons:
1. "Sanskrit" represents a continuum of Indo Aryan languages, including Vedic, but in general most entries are Classical Sanskrit (which does not have the Vedic stress), i.e. Classical Sanskrit is the default and we tend to use "(Vedic)" as it is required.
2. Hence, it seems like needless complication, and right now the page for अमृता (amṛtā) is crazily complicated and tries to over-explain. Why are there so many entries for "Noun" for "sandhi form of अमृत (amṛta) under various definitions? We should just have one entry like "sandhi form of अमृता (amṛtā)" which covers all the definitions and maybe put a "Usage Note" there if there is something specific to address. These are all non-lemma forms, and hence the actual stuff like Usage Notes and meaning nuances based on stress should probably be on the lemma page, as I understand.
2. There are tons of cases where an inflecting nominal/adjective has a feminine form that has it's own specific definitions. We should just have one entry for the non-lemma of the masculine form and one entry for the special feminine meanings, to be succinct. Dragonoid76 (talk) 18:22, 21 March 2024 (UTC)[reply]
@Dragonoid76: The multiplicity arises from the decision taken by someone (not me) that there are separate noun and adjective for अमृत (amṛta). Having separated adjective and noun, in this case we get at least one lemma for each of the three genders. अमृता (amṛtā) is a form of each of these four lemmas, and for each lemma they include both vocative and non-vocative forms, so two semantically distinct pronunciations for each of the four. We thus end up with two adjective forms, one noun and five noun forms. (Forms identical with the lemma do not get separate entries; the feminine noun is a lemma.) We use the same PoS heading for both noun and noun form, in accordance with WT:EL#Part of Speech. My understanding is that forms of different lemmas should have different entries.
Vedic Sanskrit is a valid language and therefore its words are eligible for inclusion. I've cross-referenced the forms with different accents to cover words with pitch accents not indicated, as in the writing of Classical Sanskrit. --RichardW57 (talk) 02:07, 22 March 2024 (UTC)[reply]
@Dragonoid76: You seem to have overlooked specific meanings of the neuter form. --02:07, 22 March 2024 (UTC)[reply]
Even with the collapsing of what are currently nouns into the adjective, we would need different entries for semantically distinctive differently placed Vedic accents. I think we would still have to separate out at least three nouns, for 'ambrosia', 'kudzu' and 'root', and a bunch of feminine plant names. --RichardW57 (talk) 02:07, 22 March 2024 (UTC)[reply]
@RichardW57 What do you make of something like User:Dragonoid76/sandbox/अमृता. I think all the "See also" boxes are very excessive and will be quite hard to replicate on all pages since Sanskrit has quite a bit of syncretism and words with multiple meanings. Here, all different forms of words with the same pronunciation and with the same etymology are put under the same header. Dragonoid76 (talk) 05:49, 22 March 2024 (UTC)[reply]
@Dragonoid76: I don't object in principle to the pronunciation section, but I'm not sure it satisfies the requirements of Gangetic chauvinism, for it gives a clearly greater rôle to the Roman script than to Devanagari. I think we may need to add to the capabilities of {{sa-IPA}}.
The orthographically identical forms of the same part of speech should either be kept together or cross-linked, as a reader may think that a PoS section is exhaustive. Possibly, and this also applies to the current form of the page, the noun forms should have a link to the feminine noun, as it may not be obvious to the reader that the forms of the feminine noun will not have separate definitions. See also is quite appropriate for these cross-links.
As to the work involved, I would remind you that 'lexicographer' is famously defined as 'a harmless drudge'.
I'm not confident that the botanical senses have the same etymology as the more obviously derived senses. The use case for not grouping senses with the same etymology as a single word is the risk of etymologies being added for a single sense, which is more likely to happen with derivative lemmas, such as participles, than with case forms such as we have. Incidentally, these case forms don't all have the same lemma form, and I will correct that in the mock-up. --RichardW57m (talk) 11:32, 22 March 2024 (UTC)[reply]
@RichardW57m For the accentuation you can add the Devanagari accents like this: अ॒मृता॑ (amṛ́tā) ... अमृ॑ता (ámṛtā).
For me it makes simply no sense to have 'See also' for something on the same page. Exarchus (talk) 15:38, 23 March 2024 (UTC)[reply]
@Exarchus: Are you unaware of the senseif/etymid functionality? That enables one to link to specific elements within a language's section. --RichardW57 (talk) 16:39, 24 March 2024 (UTC)[reply]
@RichardW57 I see what you mean, one could end up on one section of a page and not notice the others, but is this really relevant here (as we are talking about non-lemma forms)? Why would someone link specifically to these forms instead of simply the main lemma? Exarchus (talk) 16:57, 24 March 2024 (UTC)[reply]
@Exarchus: Yes, it is relevant. When looking up अमृता with Vedic accent unknown, one will find a matching section, and quite likely forget to look for one with a different Vedic accent. Remember, Wiktionary's published aim is to cover every word, not every lemma.
The main utility of the etymids (I really wanted 'wordid' and was tempted to abuse senseid) is in these intra-page crosslinks, but there may be other occasions to link to them. --RichardW57m (talk) 10:17, 25 March 2024 (UTC)[reply]
@RichardW57m I'm still not sure how this is different from someone looking up the English noun swallow, then finding "(archaic) A deep chasm or abyss in the earth", reasoning that it's probably not archaic, so it's much more likely "The amount swallowed in one gulp; the act of swallowing", forgetting that there's also an 'Etymology 2'. Exarchus (talk) 11:04, 25 March 2024 (UTC)[reply]
@RichardW57m To put this discussion into perspective: of all the forms currently classified under 'Etymology 2' (ámṛtā), only the vocatives (dual and plural) of the adjective occur in the DCS. And none of these vocatives actually has a pitch accent, as they don't occur at the start of a pāda. (Even in Padapatha they are written अ॒मृ॒ता॒ (amṛtā).) Exarchus (talk) 22:20, 25 March 2024 (UTC)[reply]
@Exarchus: I don't think the DCS is exhaustive. We also have the policy of allowing regular inflected forms despite their lack of attestation. On the basis of this, I've been adding terms for Pali imperative singular actives when they are the same as another word, to avoid Pali inflection tables wrongly linking to words of other languages, regardless of whether I can find an attestation of the Pali form. The alternative approach would be to stub the vocative dual and plural forms out from the noun inflection tables for अमृत (amṛta) and अमृता (amṛtā).
This now gives us three different pronunciations with the same basic spelling!
However, I think I have a better solution, which is to use
{{sa-adj form|tr=amṛ́tā|tr2=amṛtā|tr3=ámṛtā}}, which currently yields
अमृता (amṛ́tā or amṛtā or ámṛtā)
This has implicit parameters |head=, |head2= and |head3=, which implicitly default to the page name.
We can then use the usage notes section to explain that dependent upon position, the vocative is either unaccented or is accented on the first syllable, while the other cases are accented on the second syllable. --RichardW57m (talk) 10:45, 26 March 2024 (UTC)[reply]
@RichardW57m Sounds like a good idea Exarchus (talk) 11:30, 26 March 2024 (UTC)[reply]
@Benwing2, Theknightwho This relies on the undocumented merger of the output of the terms - may this be relied upon? --RichardW57m (talk) 10:45, 26 March 2024 (UTC)[reply]
And I suppose that if one wants to link specifically to one of these forms (because it's in a citation), then one should be able to link directly to the correct form (either ámṛtā or amṛ́tā), so people shouldn't have to look at the other one. Exarchus (talk) 17:19, 24 March 2024 (UTC)[reply]
  • If I understand correctly, read is the exception, not the common practice in English entries. This search reveals many English terms like object and document with one etymology section and one pronunciation section that splits the pronunciations based on part of speech. — excarnateSojourner (ta·co) 15:59, 26 March 2024 (UTC)[reply]
    @excarnateSojourner: With regard to the method of you searched for, looking for "{{a|noun}}", the documentation of {{accent}} says, "It should not be used for other qualifiers like noun, verb, adjective, and so on"! It does, however, say that {{qualifier}} is used for these, and indeed, if often is. WT:EL does not mandate following the example of read, but merely gives the example of lead as where one may use 'etymology' to label pronunciations. Your finding usefully supports the solution I currently favour; I can use {{q}} to grammatically label the pronunciations themselves in this Sanskrit case. --RichardW57m (talk) 16:51, 26 March 2024 (UTC)[reply]

Equivalent of Template:ellipsis of in compounds[edit]

I think we would need something like this. There are quite a few cases in e.g. Finnish where a word can stand for a compound using that word, if it is clear from context, e.g. kone. Currently many of these use {{short for}}, but that is not ideal. — SURJECTION / T / C / L / 23:25, 23 March 2024 (UTC)[reply]

FWIW, for me {{ellipsis of}} implies a multiword term, not a compound, which is why I don't feel like using it. — SURJECTION / T / C / L / 23:29, 23 March 2024 (UTC)[reply]
{{clipping of}}? Vininn126 (talk) 08:49, 24 March 2024 (UTC)[reply]
Doesn't really fit either - we are removing whole words, even if they are still part of a compound word. — SURJECTION / T / C / L / 08:57, 24 March 2024 (UTC)[reply]
I've switched these to {{ellipsis of}} for now - but I still feel it might be better to have a separate template for this. — SURJECTION / T / C / L / 14:42, 24 March 2024 (UTC)[reply]
Hmm... I think this kind of thing (where lexical) must be clipping, ellipsis, or "short for"; at least, I don't think we could realistically distinguish some fourth category ("Template:shortening of"?) from T:short for, ellipsis, and clipping; there's too little distinction and too much overlap.
I can find a few works discussing ellipses of compounds, and slightly more works discussing clipping compounds (many are discussing clipping words out of a spaced multi-word compound, but this seems to function identically to clipping unspaced compounds; as with kone, a manual for a washing machine might say to load things into the machine). And of course other works discuss some element of a compound being "short for" or "a shortening of" the compound. Quotes.
But we should perhaps consider at what point this is no longer lexical (no longer "one of the definitions of kone is pyykinpesukone") and is just hypernymy; I mean, you can also shorten sports car to car or Chinese food (Indian food, delicious food, more food, etc) to food if the meaning is clear from context... ("I ordered Chinese food. Later, when I was eating that [Chinese] food, I noticed it had onions in it.") - -sche (discuss) 20:09, 25 March 2024 (UTC)[reply]
@-sche @Surjection I am inclined to agree with -sche here. I do understand the idea that ellipses are multiword, but I think that is somewhat of an arbitrary distinction. Consider English vs. German, for example, where compounding is similar but English often writes the compounds open whereas German writes them closed. I even think {{short for}} should usually be rewritten as either ellipsis, clipping or abbreviation. Benwing2 (talk) 05:54, 26 March 2024 (UTC)[reply]

These are pretty much all letter entries where the entire entry uses the "mul" language code, but the header is that of a specific language (there's often a Pronunciation section using the language code that matches the header- but that's it).

My understanding is that there should be two types of letter entries:

  1. A translingual one giving the kind of information that's not specific to any one language, such as Unicode codepoint. For this, both the "mul" language code and the "Translingual" header are required. This always goes at the top of the page, though the language section may also include mathematical and other non-language-specific symbols that use the same character.
  2. A language-specific one giving the type of information that's typically different between languages, such as position in the language's alphabetic order, and its pronunciation. For this, the Wiktionary language code for a specific language and the header with the Wiktionary name of that language are required. This always goes in the same place on the page that any entry for that language would go. If there are multiple languages that use a given character, there should be multiple language sections.

IMO, there should be a language section (of the second of the two types above) for every language that uses that letter as part of its standard orthography. There probably should also be a single Translingual section.

There should never be a Translingual entry with the header for a specific language, and there should never be a section for a specific language with a Translingual header. There should also never be a pronunciation section in a Translingual entry, unless it's a phonetic symbol- "Translingual" isn't a language, so it has no speakers.

This is the prevailing practice at Wiktionary since we've had language codes, language headers, and translingual entries. Do we have this spelled out somewhere, so we can point to something when we tell people to stop doing otherwise?

Finally, I would like us to get rid of all the entries in this category, by fixing them. I see three ways to do this:

  1. keep the "mul" code and give them a Translingual header
  2. keep the language-specific header, but change all the language codes to match the language of the header.
  3. split them into two sections: a Translingual one at the top of the page with "mul" language codes, and a language-specific one in the body of the page with matching language-specific headers and language codes.

What does everyone think? Chuck Entz (talk) 01:38, 24 March 2024 (UTC)[reply]

Wouldn't it be easier to have a table under the Translingual L2 that contained columns for language name, pronunciation, serial position, alternative representations, and anything else that applied to more-or-less every language's use of the letter/symbol, with an extra column for language(-family)-specific info or, at least, links to sources for such info? The same idea could be applied to personal names. Maybe the same approach would be useful for taxonomic names, etc. DCDuring (talk) 14:16, 24 March 2024 (UTC)[reply]
That might work for letters, which are generally not treated as words in the various languages (as written), but personal names have all kinds of grammatical and etymological information that doesn't go well in a table format. Also, there's Special:PrefixIndex/Template:list:Latin script letters, etc. Chuck Entz (talk) 14:46, 24 March 2024 (UTC)[reply]
I worry I'm cutting the Gordian knot with too simple a solution, but the bulk of the entries currently in the category seem to be Osage entries created by a single user (whose defiance of other norms, like one POS header per POS, people have discussed a few times including recently), and to me, the simplest / easiest / "least-change" solution seems to be to just revise the language code to match the language header in line with how we also have e.g. as ==Korean== and not ==Translingual==. This is what I would've done (quietly, as basic cleanup, without even thinking to have a BP discussion about it) if I had seen a new user adding these. Like with , there does not currently seem to be any codepoint information in 𐓶̋ to discuss whether or not to have a ==Translingual== section for. Once the Osage entries are changed, we can see whether any other entries in the category look like they would need different treatment, e.g. any letters which are actually used by more than one language, but I don't suppose it makes sense to discuss a table to compactly provide multiple languages' pronunciations of 𐓶̋ if only one language uses it. - -sche (discuss) 15:18, 24 March 2024 (UTC)[reply]
Note that Osage 𐓶̋ isn't a real letter, but a letter plus diacritic combination, which was encoded too late to be assigned a precomposed character in Unicode. I'm not sure if we have a policy on giving the code points (or even entries) for such things; we generally don't have entries for combinations of Indic letters and vowel 'diacritics'. I've a feeling there was sentiment was against allowing such entries, but rather insisted on status as letters, as seen in Scandinavia but generally not France or Germany, but with a counter-argument that Unicode characters in use were eligible. --RichardW57 (talk) 16:32, 24 March 2024 (UTC)[reply]
@-sche: I don't understand your comparison with (ryeo), which does have codepoint information, and is a precomposed character. --RichardW57 (talk) 16:32, 24 March 2024 (UTC)[reply]
Chuck seemed to suggest we might need/want Translingual entries to handle non-language specific information like what Unicode codepoint(s) the letter has, but these letters do not currently have such information, so I'm saying the simple fix is just to fix the headword-line template to use the language code that corresponds to the L2. - -sche (discuss) 20:15, 24 March 2024 (UTC)[reply]
I stumbled across today. The entry seems to have "alternative forms" shared between two languages. Isn't this common?
Is there other information, besides the Unicode codepoint that is shared by all languages that use the letter/character/symbol? Like typographic realizations, as in ? DCDuring (talk) 23:53, 24 March 2024 (UTC)[reply]
@DCDuring: I don't think it's common, and such information probably belongs on Wikipedia rather than here. For an entry like Osage 𐓶̋, I think we simply want a head line template such as {{head|osa|letter|upper case|{{uc:{{PAGENAME}}}}|tr=-}}. If we chose to keep such entries, we can then worry about displaying the codepoints for the non-letters with PoS 'letter'. (We probably already have a template to do it.) Can we just go ahead and fix the Osage letters and letter-accent combinations? --RichardW57m (talk) 18:11, 25 March 2024 (UTC)[reply]
@-sche: When you wrote, "Like with 려", did you mean "Unlike with 려"? --RichardW57m (talk) 10:41, 25 March 2024 (UTC)[reply]
려 contains no information about what Unicode codepoint(s) a user would have to enter/use to obtain the character. Like with 려, 𐓶̋ contains no information about what Unicode codepoint(s) a user would have to enter/use to obtain the character. - -sche (discuss) 18:48, 25 March 2024 (UTC)[reply]
@-sche: The codepoint information for is generated by the invocation {{character info}} at the top of the page. It's true that you have to work to get the NFD code sequence, as the vowel is given as a compatibility jamo rather than as a combining jamo. For me it displays at the top right of the page. If you click on a language-tagged link, you may have to scroll up to see that information box. --RichardW57 (talk) 05:38, 26 March 2024 (UTC)[reply]
The 'simple fix' hides the omission, so we don't remember to supply the lack. --RichardW57m (talk) 10:41, 25 March 2024 (UTC)[reply]
Belay this. The straightforward fix above hides nothing. --RichardW57m (talk) 18:16, 25 March 2024 (UTC)[reply]
I agree. And this shows once again why this user should've stayed blocked. AG202 (talk) 03:44, 25 March 2024 (UTC)[reply]
Anyway, unless someone has a cogent objection, I will just fix these entries with AWB soon. - -sche (discuss) 18:48, 25 March 2024 (UTC)[reply]
@-sche What are 'these' entries, and what changes will you make? I'm not sure that the fix for 🐀 is obvious. The English meaning of the emoji looks dependent on English - I wouldn't be surprised if it were associated with endearment in Thai usage - compare Thai หนู (nǔu). --RichardW57m (talk) 12:14, 26 March 2024 (UTC)[reply]
As I described above, the fix is to conform the few stray instances in each entry where the language code doesn't match the language the entry is declared to be in. For the rat, that means [11]. - -sche (discuss) 14:24, 26 March 2024 (UTC)[reply]
@-sche: OK, that {{mul-symbol}} does less than I thought it did. But just replacing 'mul-letter' by 'osa-letter' damages the entries - the undocumented template doesn't display the casing partner, which I'd rather than calculate than enter manually. --RichardW57m (talk) 18:00, 26 March 2024 (UTC)[reply]
@Chuck Entz: A vote to prohibit creating an entry for every letter of every alphabetically written letter gained significant support, but not enough to make it policy. As there are, according to Wiktionary:List_of_scripts, 5,114 languages on Wiktionary using the Roman script, I think it's a bad idea to do things that way before pages are split by language. --RichardW57m (talk) 11:55, 26 March 2024 (UTC)[reply]
The vote was Wiktionary:Votes/2020-07/Removing_letter_entries_except_Translingual - there was a majority in favour, but not a consensus. --RichardW57m (talk) 18:02, 26 March 2024 (UTC)[reply]

Chinese lect labels and categories[edit]

(Notifying Atitarev, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏): @Theknightwho Sorry to ping everyone again. I am going through and adding labels to Module:labels/data/lang/zh for missing Chinese lects and creating the associated categories, and it's revealing some issues in the way that categories are currently handled. To fill everyone in who isn't familiar with the unique way that Chinese is handled:

  1. All lects go under the Chinese L2 header.
  2. The 'Foo lemmas' categories are added for individual Chinese languages using the {{zh-pron}} template in the Pronunciation section, which lists all the possible pronunciations of a given term in different Chinese languages (including Old Chinese and Middle Chinese).
  3. There are also labels, added using {{lb|zh|...}} or {{tlb|zh|...}}, to identify that a given sense is defined only for specified Chinese lects.

The problem here is that different structures are used for the categories generated by the labels vs. the categories generated by {{zh-pron}}. In particular, a label like Mandarin, Hakka or Xiang places the term in (respectively) categories Category:Mandarin Chinese, Category:Hakka Chinese and Category:Xiang Chinese, which are subcategories of Category:Dialectal Chinese (which in turn is a subcategory of Category:Regional Chinese, which is ultimately under Category:Chinese language). It's true that Category:Mandarin Chinese is also a subcategory of Category:Mandarin lemmas, and similarly for Category:Hakka Chinese, Category:Xiang Chinese, etc., but the existence of two categories for each language seems redundant. To make matters worse, as I've been creating new categories for missing lects like Category:Dabu Hakka, I haven't been explicitly specifying the parent category as e.g. Category:Hakka Chinese, with the result that they're placed under Category:Regional Hakka (rather than a child of Category:Regional Chinese), which leads to a very different breadcrumb trail than older lect categories like Category:Hong Kong Hakka. I propose the following:

  1. Eliminate categories 'Foo Chinese' as much as possible. Labels like Mandarin and Hakka should directly categorize into Category:Mandarin lemmas and Category:Hakka lemmas. (It's true that such labels could potentially be used for non-lemma forms as well, but in practice this isn't an issue due to the fact that Chinese languages have almost no morphology to speak of.)
  2. Older-created lect categories like Category:Hong Kong Hakka should have their parents set to e.g. Category:Regional Hakka rather than Category:Hakka Chinese. This brings Chinese lects in line with non-Chinese lects, which always work this way.

This leaves a few outstanding issues:

  1. What about the remaining lemmas in categories like Category:Malaysian Chinese and Category:Philippine Chinese? The problem here is that the lemmas are tagged just using the labels Malaysia and/or Philippines without properly identifying which language is involved. Maybe these can be recategorized by bot but I don't know enough about Chinese to do it without help.
  2. Related to this: Currently categories like Category:Malaysian Chinese and Category:Philippine Chinese are children (sometimes grandchildren, etc.) of Category:Overseas Chinese. Do we want to bother with per-language categories like Category:Overseas Hakka, Category:Overseas Hokkien, Category:Overseas Teochew, etc. or just put things like Category:Malaysian Hokkien directly under Category:Regional Hokkien?

Benwing2 (talk) 00:08, 26 March 2024 (UTC)[reply]

@Theknightwho Do you have any opinions here? As I'm filling out the labels and descriptions in Module:labels/data/lang/zh, the dual hierarchy is becoming more and more annoying. Benwing2 (talk) 00:05, 4 April 2024 (UTC)[reply]
I'm also thinking we should add a field to the language extra data and/or the {{auto cat}} call on the language page, which describes the language in more detail. For example, Category:Eastern Min Chinese currently has the definition "Terms or senses in a branch of the Sinitic languages spoken in eastern Fujian Province in southeast China, as well as in parts of extreme southern Zhejiang Province to the north of Fujian and in the Matsu Islands (belonging to Taiwan)", which gives a lot more information than the Category:Eastern Min language page, which just says it's a language spoken in China and some other countries. Benwing2 (talk) 00:09, 4 April 2024 (UTC)[reply]
@Benwing2 I've not been keeping up with the Chinese label discussions, so I'll need to catch up with everything first. I agree the dual system is silly, though. Theknightwho (talk) 02:43, 4 April 2024 (UTC)[reply]

Adding Transitional Proto-Norse as an etymology-only language[edit]

Transitional Proto-Norse is the name for the language found in Scandinavian runic inscriptions from around 550–650. It's basically the last stage of Elder Futhark writing before the transition to Younger Futhark (generally also considered the start of Old Norse), and has several orthographic traits in common with the YF. The relevant inscriptions also show innovations found in Old Norse, but not yet in classic Proto-Norse, e.g. syncopation and the merger of the 2nd and 3rd persons in the present tense indicative of verbs. I think having it as an etymological-only language would be useful. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 21:30, 26 March 2024 (UTC)[reply]

@Mårtensås No objections here; we can use a code like gmq-tra for this. Benwing2 (talk) 01:54, 28 March 2024 (UTC)[reply]
@Benwing2 That code would work. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 19:36, 28 March 2024 (UTC)[reply]
What sources do you have for this name? -- Sokkjō 06:55, 28 March 2024 (UTC)[reply]
It's well accepted in the literature. If you search "transitional period" "Proto-Norse" on Google you will find numerous digitalised scholarly articles that uses it. Düwel & Nedoma (Runenkunde 5th edition, p. 124):
Einige skandinavische Inschriften um 600 zeigen bereits Charakteristika des jüngeren Fuþark, z.B. hAborum (ᛡ A für a) und hagestumʀ (ᚨ a für an) auf Stentoften (S. 25 f.) oder uþArAbsbA (-sbA für -spā) auf Björketorp (S. 51). Aus der nachfolgenden Zeit bis in das 8. Jh. hinein sind relativ wenige runenepigraphische Texte überliefert; die Beschaffenheit des Korpus dieser Übergangsinschriften (engl. transitional inscriptions) wird verschieden beurteilt (s. u.a. Birkmann 1995, 219 ff.; Barnes 1998; Grønvik 2001a, 64 ff.; Schulte 2010; Stoklund 2010).
The references are:
  • Thomas Birkmann, Von Ågedal bis Malt. Die skandinavischen Runeninschriften vom Ende des 5. bis Ende des 9. Jahrhunderts (= RGA-E 12; Berlin – New York 1995).
  • Michael P. Barnes, The Transitional Inscriptions. In: Düwel / Nowak 1998, 448–461.
  • Ottar Grønvik, Über die Bildung des älteren und des jüngeren Runenalphabets (= OBG 29; Frankfurt/Main etc. 2001).
  • Michael Schulte, Der Problemkreis der Übergangsinschriften im Lichte neuerer Forschungsbeiträge. In: Askedal et al. 2010, 163–189.
  • Marie Stoklund, The Danish Inscriptions of the Early Viking Age and the Transition to the Younger Futhark. In: Askedal et al. 2010, 237–252.
ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 18:02, 28 March 2024 (UTC)[reply]
I've never seen the term "Transitional Proto-Norse" though. For many languages, we have etymology-only codes for late and early forms, and Early Old Norse and Late Proto-Norse are terms that actually have traction in the literature, so I would recommend either [non-ear] or [gmq-lat] instead. -- Sokkjō 19:51, 28 March 2024 (UTC)[reply]
Some pings: @Mnemosientje, Mahagaja -- Sokkjō 19:54, 28 March 2024 (UTC)[reply]
I prefer "Late Proto-Norse" as well. The phrase "transitional period of Proto-Norse" is common enough, but not "Transitional Proto-Norse" as a lect name. —Mahāgaja · talk 20:09, 28 March 2024 (UTC)[reply]

Etymology trees[edit]

I would like to add etymology trees to some of our entries. Here's an example which could potentially be used in mainspace. I personally think there's value in representing etymology in such a visual way, and these trees are extremely easy to create (my template can create them automatically). I'd like to hear your thoughts. Ioaxxere (talk) 02:15, 27 March 2024 (UTC)[reply]

Support The visual styling is helpful for understanding. Maybe in instances where the etymology is more than x levels deep, a tree could be created. I'm open to others' thoughts as well. —Justin (koavf)TCM 02:20, 27 March 2024 (UTC)[reply]
Oppose The example looks like {{desctree}}, just on a different page. --RichardW57m (talk) 13:48, 27 March 2024 (UTC)[reply]
@RichardW57m: I think you're misunderstanding the concept. A descendants tree and an etymology tree are exact opposites in that a descendants tree has all the descendants for a particular ancestor while an etymology tree has all the ancestors for a particular descendant. They only look similar when the tree is a simple chain (i.e., A -> B -> C -> D). Some etymology trees are much more complicated, like puny or every. Ioaxxere (talk) 21:20, 27 March 2024 (UTC)[reply]
@Ioaxxere: So why not give examples of the trees for these? --RichardW57m (talk) 09:23, 28 March 2024 (UTC)[reply]
@RichardW57m: See #New design, #Feedback on proposed label designs, and Special:Permalink/78726876 for more examples. Ioaxxere (talk) 08:04, 1 April 2024 (UTC)[reply]
Support I concur. I think there should be a way the etymology section can be visualised since it can definitely make things more clearer. It can also make things more codified. However, if this becomes permanent, a point to discuss would be lemmas who's etymology isn't clear, ie. how would they be represented visually, or for lemmas which have mixed etymology (for instance lemmas who's specific sense is influenced by a different language). نعم البدل (talk) 23:54, 27 March 2024 (UTC)[reply]
Mild Oppose. I am opposed in particular to the current implementation involving a separate {{etymon}} specification with information duplicated between the {{etymon}} call and the actual text of the Etymology sections. I have expressed my concerns above. Benwing2 (talk) 01:53, 28 March 2024 (UTC)[reply]
@Benwing2: I agree that duplication is not ideal. My original idea was to have the template automatically generate text and have that be the only thing in the etymology section, but I feel now that that's only really possible for the simplest cases. Another idea is to use special flags on our current templates, so something like From {{m|en|something}} might be replaced with From {{m|en|something<id:whatever><etymon>}} to mark that particular term as an etymon. Ioaxxere (talk) 05:58, 28 March 2024 (UTC)[reply]
@Ioaxxere As User:AG202 points out, this is a huge change, and needs a lot more thought and design before being rolled out. That's another reason I Oppose adding it to mainspace at this time. Benwing2 (talk) 06:02, 28 March 2024 (UTC)[reply]
Weak oppose: The issues with spacing need to be resolved, and there need to be more eyes on the matter. I like the idea, nonetheless, but it feels very rushed for such a big change. (I'm also not really sure I like the way it looks that much?) AG202 (talk) 02:03, 28 March 2024 (UTC)[reply]
Strong oppose: Looks just awful. -- Sokkjō 06:53, 28 March 2024 (UTC)[reply]
Strong support - looks pretty much the same as what we already have in descendant sections, and automating this has huge potential. Theknightwho (talk) 04:08, 29 March 2024 (UTC)[reply]
Support I think it has potential. I saw the redesigned tree and it looks very promising! That being said, I still would like to see the end product before fully implementing it. — Sameer مشارکت‌هابحث﴿ 07:07, 30 March 2024 (UTC)[reply]

New design[edit]

I've created a new etymology tree design. See below:

Let's do a new poll in regards to this design, as a few oppose votes were made on the basis of the template's appearance. Ioaxxere (talk) 07:46, 29 March 2024 (UTC)[reply]

Still Oppose, except this one straight up does not work well on mobile. Again, with a change this visible and massive, I'd avoid rushing it. It needs to be properly tested everywhere. I really do support the idea, but it needs to be done well. (Also one might argue that it'd need to have a formal vote, and if it does, you'd really need to have it entirely fleshed out) AG202 (talk) 07:57, 29 March 2024 (UTC)[reply]
I definitely agree that such a change would need a formal vote because it is a pretty large change to how the dictionary looks. Thadh (talk) 09:15, 29 March 2024 (UTC)[reply]
Shame it doesn't work on mobile, otherwise it's a great improvement over the first one. It makes things so much clearer. نعم البدل (talk) 15:34, 29 March 2024 (UTC)[reply]
@نعم البدل: What does it look like on your device? On my phone it seems to be working properly. Ioaxxere (talk) 15:55, 29 March 2024 (UTC)[reply]
@Ioaxxere: Sorry, I meant like it becomes too wide, not that it doesn't work (but that's generally with any visual template on this website). I tried it on my phone again, just now, and it turned out to be an anomaly before. It works better than other visual templates on this website. نعم البدل (talk) 16:03, 29 March 2024 (UTC)[reply]
@نعم البدل: Good to know! And thank you for the kind words—I've spent an absurd amount of time making those grey connectors pixel-perfect. Also note that the tree on English puny is one that is exceptionally wide. Here's a more representative example, for English father:

Proto-Indo-European *peh₂-

Proto-Indo-European *-tḗr


Proto-Indo-European *ph₂tḗr


Proto-Germanic *fadēr


Proto-West Germanic *fader


Old English fæder


Middle English fader


English father

No poll. Let's have a discussion about the design. Also, I think it should be working fine on mobile now. Ioaxxere (talk) 08:46, 29 March 2024 (UTC)[reply]

Oppose: If we ever started creating etymology trees for entries, it would have be a very complex module, with ways to mark derivational types, certainty, alternatives, mergers, etc. Create such a module and then maybe we can explore its inclusion on entry pages with a proper vote. --{{victar|talk}} 19:58, 29 March 2024 (UTC)[reply]
Oppose. I'm impressed by the visual output and effort that went into this, but we are still a dictionary. Treeing around is amusing but unscientific. {{desctree}} is also in my opinion much abused. Common sense tells us when an etymology or a descendants branch should stop. Catonif (talk) 17:53, 30 March 2024 (UTC)[reply]
Abstain. Interesting idea but it is missing a distinction between inheritance and borrowing and seems rather elaborate to implement. I would rather we focus on your other idea of limiting ety sections to ‘one step up’ (when the ancestor entry exists) and filling in the rest with an automated module. Said module could then later be modified to output tree diagrams, perhaps, if that can be done well. Nicodene (talk) 22:42, 16 April 2024 (UTC)[reply]

Changing the letters sort order on Arabic dialects' category pages[edit]

I suggest changing the letters sort order on Arabic dialects category pages so that the non-native Arabic letters would come at the end

for example the current order for Hijazi Arabic is

آ أ إ ا ب پ ت ث ج ح خ د ذ ر ز س ش ص
ض ط ظ ع غ ف ڤ ق ك ل م ن ه و ي

but it should be

آ أ إ ا ب ت ث ج ح خ د ذ ر ز س ش ص
ض ط ظ ع غ ف ق ك ل م ن ه و ي . پ ڤ

with the non-native پ and ڤ coming at the end and separated by a dot, since they are not part of the original Arabic letters

and the Arabic sorting should be the same and it should be from right to left as per the example (unlike the current standard arabic left to right) since it looks more appealing and correct @Benwing2 عربي-٣١ (talk) 12:54, 27 March 2024 (UTC)[reply]

@عربي-٣١: A better argument would be examples of alphabetic orderings in use, as people have many ways of handling the alphabetical position of additional letters. --RichardW57m (talk) 13:36, 27 March 2024 (UTC)[reply]
The current order is more appealing and correct. The current order of the alphabet pays respect to internal etymological and graphical relations of the letters anyway. A division due to nonnativeness is outlandish. A German or English entry with é will also be sought under e, and so on for any Latin-script language. Interestingly German çöp is at the end of the order, and should arguably be because c is not much of a letter in German and since the 1901 spelling reform virtually only used in digraphs: I rather expect it with c anyway, as due to French and Czech borrowings, and the only reason it is otherwise now is the default Unicode order. But پ (p) has little reason not to be put to ب (b), as a specification of it, even less reason than ç to c. Fay Freak (talk) 14:58, 27 March 2024 (UTC)[reply]
I was only talking about the table where the letters are written on top of each page (like this one Category:Hijazi Arabic terms with IPA pronunciation), if so then they should be removed altogether since they are variants of the letters (like German ç is a variant of German c) not full letters and most speakers do not use them/pronounce them either
The sorting table should be as follows in every Arabic page:
1-should be either with no پ ڤ گ ژ چ or any other non-native variants
2-those variants would be put at the end, but putting the variants between the letters would give an impression as if they're part of the official language or the alphabetical order عربي-٣١ (talk) 16:38, 27 March 2024 (UTC)[reply]
There is no such thing as an official letterset, nor does the category give an impression, it is to navigate for people who already have an idea about the writing system of the language and what could occur in an entry title. Fay Freak (talk) 17:03, 27 March 2024 (UTC)[reply]
Actually, there are all sorts of orders even for the Latin script. In Vietnamese, tone marks don't create separate letters, but vowel quality modifiers do. The modified vowel letters occur at the end of the alphabet in some Scandinavian alphabets. Looking at the listing of various Arabic script alphabets in Appendix:Arabic_script, I note that while پ (p) occurs in same part of the alphabet as ب (b), its precise position varies. --RichardW57 (talk) 08:27, 28 March 2024 (UTC)[reply]
@Fay Freak. @Benwing2 Yes and I was talking only about the Arabic dialects and Standard Arabic, these additional letters are already sorted after the regular letters, but on the main table they are shown in the middle of it (پ after ب and ڤ after ف), so I suggest removing them or putting them at the end of it as they are sorted already
I did not mean to remove those variants completely عربي-٣١ (talk) 20:33, 28 March 2024 (UTC)[reply]
@عربي-٣١: A codepoint sort, which is what you are seeing, should not be confused with a considered order for sorting. --RichardW57 (talk) 03:04, 29 March 2024 (UTC)[reply]

Aquitanian entries in reconstruction namespace[edit]

As of now, all pages in the Aquitanian lemmas category are in the reconstruction namespace, aside from a few personal names which are in the main namespace. Most of these words appear to be attested, they should be moved to main. -saph 🍏 13:47, 28 March 2024 (UTC)[reply]

@Saph668 Overall this sounds fine for attested terms except that when they are moved, they should have the source more clearly indicated and use the form as actually attested rather than reconstructed. Currently they just say "Known from Aquitanian inscriptions" but that says nothing about (a) which inscription it is, (b) what form the term appeared in the inscription, (c) what the context was. Benwing2 (talk) 20:29, 28 March 2024 (UTC)[reply]
User:UtherPendrogn haunting us till this day.
My understanding is that Aquitanian is unattested, with the reconstructions based solely on place and personal names found is Latin texts, much like Gaulish. If that is indeed the case, all the proper names should be moved a Latin header or an Aquitanian reconstruction. --{{victar|talk}} 22:16, 28 March 2024 (UTC)[reply]
The Aquitanian corpus is limited to a few proper nouns in otherwise Latin inscriptions. While these proper nouns often contain elements from Proto-Basque, most entries in Category:Aquitanian lemmas are just duplicates of (sometimes dubious) Proto-Basque reconstructions written with an ad-hoc orthography. I've cleaned up a few of them (keeping them under the Aquitanian header), but I find the reconstruction entries pretty useless. If we were to move the actually attested forms to Latin, it might be a good idea to make Aquitanian an etymology-only language. Santi2222 (talk) 21:16, 18 April 2024 (UTC)[reply]

Japanese bot task proposal - on'yomi categorization[edit]

Hello, since I last posted (link broken, please Ctrl+F "Granularity of reading types"...) about the topic of whether we should specify the type of on'yomi (kan'on, goon, kanyouon, etc.) used by Chinese-derived terms (via {{ja-kanjitab}}'s |o= param) — which didn't get much activity, but I believe we agreed it would be good to specify — I've been making quite a few changes to that effect in which I simply go to the respective kanji pages, read off whether the on'yomi is kan'on or goon and just add it to the relevant entry (e.g. diff). This task is no trouble to automate, at least partially, since I could make the bot fetch all this data and see whether the reading types can be unambiguously labelled, or whether they'd overlap (I don't propose guessing, if a character has e.g. きょう as both goon and kan'on readings, which one it is). And if it's unambiguous, the reading types can simply be filled in. It also helps that some months ago I parsed out the complete readings of all the kanji we had covered at the time, so they could be accessed quite quickly (although they're a bit less current now).

Does this sound like a good/acceptable bot task? Thanks for any feedback, Kiril kovachev (talkcontribs) 14:03, 28 March 2024 (UTC)[reply]

@Kiril kovachev Hi. Please take a look at [12], which is a script I wrote awhile ago to do something very similar, which is pull out the type of reading from kanji pages and insert it into category pages of the form Category:Japanese terms spelled with FOO read as BAR, e.g. Category:Japanese terms spelled with 柄 read as ひ. It looks either for {{ja-readings}} or {{ja-kanjitab}}. As for your proposed bot task, yes it sounds fine to me. Benwing2 (talk) 20:26, 28 March 2024 (UTC)[reply]
@Benwing2 Oh, thanks for that. Do you think I should adapt your script to do the changes I need or did you just mean to have a look for ideas what needs to be checked, etc.? Kiril kovachev (talkcontribs) 20:43, 28 March 2024 (UTC)[reply]
@Kiril kovachev It's your choice. I'm not sure what state your scripts are in, but feel free to reuse/adapt the code or simply look at the logic to make sure you haven't missed anything. Benwing2 (talk) 20:50, 28 March 2024 (UTC)[reply]
Alright, thanks very much, I'll have a proper good look when I sit down and try to code it. Kiril kovachev (talkcontribs) 20:53, 28 March 2024 (UTC)[reply]

I noticed I was blocked permanently in March, while I did not edit anything these 2 months.[edit]

Can anyone give me an explanation? I don't know who is in charge now.

@Benwing2, Chuck Entz -- Huhu9001 (talk) 03:39, 29 March 2024 (UTC)[reply]

@Huhu9001 The block log shows that you were blocked in Nov 2023 by User:Theknightwho for one year from the Module space, which was changed earlier this March to a permanent block. You'll have to ask User:Theknightwho why he saw fit to block you like this. However, I do notice that several other people blocked you earlier, so this can't be chalked up to "just not getting along with a single administrator". Wyang in particular said "Repeat offender, discourteous, defiant", which means you don't recognize what you did wrong; IMO this doesn't bode well for an unblock. Benwing2 (talk) 03:48, 29 March 2024 (UTC)[reply]
@Benwing2 Huhu9001 has ignored everything I've said for months now, but I changed the block length because Huhu9001 had already had two one-month blocks from the module namespace, which made absolutely no difference to their behaviour. In March, I changed the block to a permanent one with the explanation Edit: this should require an appeal to expire, given the long-term nature of the abuse., because it seems highly unlikely that Huhu9001 is ever going to learn how to get along with other users, and I don't want to periodically have to deal with their (mis)behaviour every time a block expires. Theknightwho (talk) 03:54, 29 March 2024 (UTC)[reply]
Wyang himself left Wiktionary in a highly dishonorable manner which pretty much tells that he is himself "Repeat offender, discourteous, defiant" instead. And are you justifying a block now by another one several years ago? -- Huhu9001 (talk) 03:55, 29 March 2024 (UTC)[reply]

@Benwing2: Also I notice other well-managed Wikiprojects like English Wikipedia enforce a rule that:

Administrators must not block users with whom they are engaged in a content dispute; instead, they should report the problem to other administrators. Administrators should also be aware of potential conflicts involving pages or subject areas with which they are involved. It is acceptable for an administrator to block someone who has been engaging in clear-cut vandalism in that administrator's userspace.

(w:WP:BLOCKNO). I roughly remember someone told me this is a custom or somewhat "softer" rule here. Why it never has any effect when I need it to protect me from abuse? Is Wiktionary giving its admins too much arbitrariness? -- Huhu9001 (talk) 04:01, 29 March 2024 (UTC)[reply]

The block breaks even Wiktionary's own blocking policy (Wiktionary:Blocking policy). For infinite block length:

  1. Blatant or confirmed sockpuppets created for the purpose of vandalism or block evasion.
    Sockpuppets, nope.
  2. Abuse, plagiarism, persona non grata type blocks, based on community consensus.
    Put aside "Abuse, plagiarism, persona non grata" part. TKW silently changed my block to infinite. What consensus did he get? What consensus did he ever get for any of my blocks?
  3. Bad username accounts, including: email addresses, exploitative names, copycats, offensive names, etc.
    Nope.
  4. CheckUser-identified bad sockpuppets.
    Same as #1, nope.

Is not even Wiktionary:Blocking policy treated seriously any more on Wiktionary? -- Huhu9001 (talk)

Is Wiktionary:Blocking policy a valid document? @Benwing2, Chuck Entz -- Huhu9001 (talk) 04:10, 29 March 2024 (UTC)[reply]

@Huhu9001 I would have more patience with you if you showed even a little bit of understanding of the pattern of bad behavior you've engaged in. Instead you resort to wikilawyering and casting aspersions at people who are no longer active and hence cannot defend themselves. Benwing2 (talk) 04:36, 29 March 2024 (UTC)[reply]
When have you ever had any patience for me? Did you try, let alone to stop TKW's abusive behaviour, to think carefully about my stance and reasoning even a single time? If so I simply have never see it. "Wikilawyering" is a convenient accusation, but shouldn't there be any respect for rules, including Wiktionary:Blocking policy? For unprevileged normal users, what else can we rely on except for rules? -- Huhu9001 (talk) 04:46, 29 March 2024 (UTC)[reply]

Now since someone mentioned my block history. I can say I have never been blocked on any other Wikiprojects with the only exception being here. It is quite arguable whether the responsibility lies on my own side or on English Wiktionary's side, as English Wiktionary has almost always been failing to put meaningful checks on how its admins use their rights, just as I can see in the Wiktionary:Blocking policy case here. -- Huhu9001 (talk) 05:30, 29 March 2024 (UTC)[reply]

Japanese いぃ, うぅ, イィ, ウゥ[edit]

Bringing up this topic again because I get permanently blocked for this. Orginal discussion: Wiktionary:Beer_parlour/2023/August#Japanese いぃ, うぅ, イィ, ウゥ. -- Huhu9001 (talk) 04:35, 29 March 2024 (UTC)[reply]

To check whether they represent yi and wu or ī and ū, Here are the top hits from Google. Hits of the same proper names, mojibake and cases where it is not possible to tell, like うぅ居酒屋 (name of an izakaya), are ignored.
* いぃ
*# 那珂宣伝部/いぃ那珂暮らし (a city promotion group of Naka 那珂, "good Naka"): ī
*# 【音量注意】"樋口いぃいいぃいぃぃいいい" ("Higuchi gooooood"): ī
*# 新喜劇アキ【いぃよぉ~講座】("good"): ī
*# 海を感じる、エモいぃ~スポット ("emotional", inflection ending): ī
*# いぃべあー楽天 (a company name, "e-Bear"): ī
*# 台本のないコメディーvol.5 ~全部アドリブでいぃよぉ~~("good"): ī
*# いぃ〜バンド(e-band)結束バンド、アソート: ī
* イィ
*# yee(イィ) (a fashion brand): yi
*# イィの英訳 - 英辞郎: yi
*# 10年後イィ女になるために!! ("good"): ī
*# Satomi (演歌)/イィ...女 ("good"): ī
*# 闇の洗礼をうけるがイィ! 暴君ハバネロ外伝 ("good"): ī
*# ヴィエイィニュアンセ (a brand name, "vieille nuance", French "vieille" /vjɛj/): yi?
*# 鳥イィ? | 上地雄輔 OFFICIAL SITE ("Like some birds?"): ī
* うぅ
*# うぅ・・・の人気イラストやマンガ/ううぅ、うぅ、あ・・(息が苦しい)/うぅぉおお/うぅぅ~暑いっ、コンビニ行こっ! (interjections "urgh", "hmm" or the likes): ū (many hits from different sites)
*# ひねくれ領主の幸福譚 性格が悪くても辺境開拓できますうぅ!(lengthening of the sentence's ending): ū
*# ぶぅふぅうぅ農園 (@boohoowoofarm): wu
*# さうすまうぅん sausumaUun (an artist from Sapporo?): ū
*# 【ホットペッパービューティー】デフィ(defi)のフォトギャラリー:カラフルぅうぅううぅー!("colorfuuuuul"): ū
* ウゥ
*# 腕時計 ヴィヴィアンウゥストウッド (a brand name, "Vivienne Westwood"): wu
*# ウゥ~~~ン、店を出てからは振り返りたくない店かもしんないな/ウゥ~ン・・・落ちた~??/ ウゥ〜(interjections "urgh", "hmm" or the likes): ū (many hits from different sites)
*# アズノウゥアズピンキー 美品 ロングシーズン ロゴ刻印ボタン (a brand name, "As Know As"): wu?
*: (Many ウゥ hits seem to be misspelling of ウィ, ウェ and ウォ, like "ウゥストウッド" above and "ミニウゥレット" for "ミニウォレット".)
My conclusion is these syllables are more often ī and ū. Especially when in haragana, they are almost always ī and ū. Wikitionary transliterating all of them into yi and wu is a mistake. yi and wu should be taken as special cases, like we have ヲチ (芸能人ヲチ, etc) as unusual wochi instead of regular ochi. -- Huhu9001 (talk) 08:58, 27 August 2023 (UTC)[reply]

Request for comments. (Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2): -- Huhu9001 (talk) 04:35, 29 March 2024 (UTC)[reply]

Support stuff like かわいいぃいぃいぃいぃ would then become kawaiyiyiyiyi instead of kawaiiiiiiiii if いぃ transcribes just yi. をぅ•ヲゥ can be used to transcribe wu, although を/ヲ is only used for accusative. Transcription of yi is more controversial, because there are no modern single kana characters of [j] + [+front], as ye could theoretically be transcribed いぇ as in イェイ (iei, yay!), but otherwise looks like a blend of a [+high] + [+mid] which could be recognized as a glide. Chuterix (talk) 13:57, 29 March 2024 (UTC)[reply]
Looks legit, as long as it's somehow possible to choose which one to use in a given context. I agree that ī/ū seems more logical, based on your examples, but frankly I've never seen these used in a real scenario so I can't say much more with confidence. Kiril kovachev (talkcontribs) 02:35, 2 April 2024 (UTC)[reply]
@Chuterix @Kiril kovachev If you look at the examples, given, almost all of the ī and ū examples are down to syllable lengthening in the same manner as English: "gooooood". However, this is an absurd standard to use: it's the equivalent of using a sentence like "it's hooooot today" to justify "oo" sometimes standing for short "o" in English. It doesn't - it's simply lengthening for emphasis, and isn't the kind of spelling we're ever going to lemmatise at. On the other hand, the yi and wu terms are actually terms in their own right. Theknightwho (talk) 00:42, 13 April 2024 (UTC)[reply]
@Theknightwho You're right... it would make sense that yours be the default, with that in mind. But I still do think we need both to be possible somehow, because how we will handle sentences (e.g. in {{ja-usex}} where we need いぃ to equal ī? Kiril kovachev (talkcontribs) 14:38, 13 April 2024 (UTC)[reply]
@Kiril kovachev An override exists with the rom= parameter (though this should probably be changed to tr=). It might be worth having a flag like . and ^, but this is really rare: we have two terms which use wu (ウゥルカーヌス (Wurukānusu) and its alt form), and I don't think we have any yi terms, though it does show up in some French place names, like プイィ゠シュル゠ロワール (Puyi-shuru-Rowāru, Pouilly-sur-Loire). We have no terms that I can find which use these for ī or ū. Theknightwho (talk) 15:04, 13 April 2024 (UTC)[reply]
@Theknightwho Hm, okay, that's fine, I guess. I believe it's not optimal because the romanization would still need to be manually written out, whereas we usually automate it with the kana transcription. But no, that might not be worth fussing over given that is a very rare thing. I've not actually seen any uses of either of these till these discussions, so I won't worry about it any more. Thanks for the explanation. Kiril kovachev (talkcontribs) 15:13, 13 April 2024 (UTC)[reply]

Splitting WT:RFM[edit]

What do people think of splitting WT:RFM? It's currently at > 1MB in size, which is very large. We split the various RFV and RFD pages at half that size. Yes, we should be better about archiving conversations but there's only so far that gets you. There are two possibilities: Split along the same lines as RFV/RFD (which splits approximately into English, CJK, Italic, and everything else), or just split for now into English/Non-English. I am inclined to do the latter as a first split; we can always split later as needed. One issue is what to do with Templates, Categories and the like that occur in WT:RFM; maybe we should have a three-way split at first: (1) English lemmas (WT:RFME); (2) non-English lemmas (WT:RFMN); (3) non-mainspace pages, including categories, templates, languages and the like (WT:RFMO). Thoughts? Benwing2 (talk) 05:10, 29 March 2024 (UTC)[reply]

Makes sense. — Sgconlaw (talk) 05:16, 29 March 2024 (UTC)[reply]
It really doesn't matter how much you split it. If no one is closing discussions then the new subpages will get arbitrarily large eventually. Ioaxxere (talk) 08:03, 29 March 2024 (UTC)[reply]
In the past, some people suggested moving language mergers, splits, renames, etc to either the BP or a language-specific page (the latter of which is the better idea to avoid just inflating the BP while deflating RFM, heh). (The reason they're on RFM at all is that originally, merging or splitting a language entailed merging its specific template, merging an actual page in the manner RFM usually does.) I've come around to the idea: have a language-discussion page, maybe even Wiktionary:Language treatment/Discussions, and then archive the discussions to manageably-sized archive subpages of its talk page (which would entail moving the current contents of that page and its talk page to such subpages). Moving language discussions off RFM would knock it down from 1,012,358 bytes to 457,067 bytes (removing 555,291‎ bytes).
Ioaxxere is correct that if no-one is archiving, the split pages will just get large again, though. - -sche (discuss) 13:37, 29 March 2024 (UTC)[reply]
@-sche Support moving language change discussions somewhere else. Benwing2 (talk) 20:31, 29 March 2024 (UTC)[reply]
I've wanted to do this for a long time. The hard part is what to call it. Is there some type of place where people go to discuss or make decisions on language issues? I suppose something alliterative is better than nothing: maybe "Lect lounge" or "Lect library". Or the "Lect embassy"/"Lect office"/"Lect bureau"? Or maybe emulating some kind of international organization: "League of lects/languages"? "Language court"? "Language academy"? Or something random like the vaguely Lovecraftian "Glossonomicon"?
Another option would be to separate entries/terms from everything else, so that templates, categories, appendices, languages, etc. would go to "RFMO". Chuck Entz (talk) 22:44, 29 March 2024 (UTC)[reply]
Lect Lounge! Lect Lounge! (Not that I’m going to be spending much time there, unfortunately.) — Sgconlaw (talk) 22:52, 29 March 2024 (UTC)[reply]
I admire the creativity, but I think we should stick with something that clearly communicates the purpose of the page, which will not be a general "lounge" to discuss anything lect-related (compare WT:Etymology scriptorium), but will specifically host proposals to change the way Wiktionary divides up the world's languages. I'd prefer -sche's idea of using Wiktionary:Language treatment/Discussions, even if it is terribly anodyne and sits outside the existing "requests for..." structure. Another possibility would be to split RFM into WT:Requests for moves, mergers and splits/Entries and WT:Requests for moves, mergers and splits/Languages, although that leaves no scope for the relatively rare requests to merge templates, appendices etc. This, that and the other (talk) 09:56, 30 March 2024 (UTC)[reply]

Hard Lithuanian Dotting[edit]

Should dots be preserved for Lithuanian when the accent is marked on 'i'? I noticed that we have a head word pìrmas when I would have expected pi̇̀rmas. The preservation is done by inserting U+COMBINING DOT ABOVE between the ASCII letter and the accent. (Some fonts barely show the difference.) WT:About Lithuanian is silent about the spelling of Lithuanian head words.

There seems to be a problem with stripping these combining dots as part of diacritic stripping for links. I would assume that that's minor rather than a show stopper. --RichardW57 (talk) 15:28, 29 March 2024 (UTC)[reply]

@RichardW57 Why would we need to insert U+COMBINING DOT ABOVE in the Lithuanian entry names? This isn't done anywhere else and the result looks bad in my browser. Something like pìrmas is completely unambiguous since Lithuanian doesn't have dotless i's in its repertoire. Benwing2 (talk) 19:25, 29 March 2024 (UTC)[reply]
It's always been the case that when you add a diacritic on top of i, it replaces the dot. I'd be surprised to see a font that didn't do this. Entry name replacements have no effect on displayed text. — Eru·tuon 21:58, 29 March 2024 (UTC)[reply]
@Benwing2, Erutuon: Historically, the suppression of the dot above in 'i' doesn't always happen, and in Vietnam and beside the Baltic there is some attachment to keeping the dot above when there is a diacritic above the letter. Unicode has decreed (https://www.unicode.org/versions/Unicode15.0.0/ch07.pdf pp293-4, section entitled 'Diacritics on i and j') that to keep that dot, it must be separately encoded as 'overdot', i.e. U+0307 COMBINING DOT ABOVE. Indeed, the Unicode Character Database (UCD) declares that in a Lithuanian locale, lowercasing LATIN CAPITAL LETTER I WITH GRAVE introduces U+0307; the presumption is that in Lithuanian contexts, the dot remains even when there is a diacritic above the 'i'. --RichardW57 (talk) 00:30, 30 March 2024 (UTC)[reply]
Now, as these diacritics aren't used in the normal writing of Lithuanian, examples of behaviour are hard to come by. However, I have found some quotations with these letters at https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=aa4570669a29839471f1a220ad2649a4bae0f5c5, an article by Vladas Tumasonis. The letters for showing accentuation are given in Figures1-3 therein, and perhaps more usefully, there are short quotations from a dictionary, a missal and a grammar on pp19-20. --RichardW57 (talk) 00:30, 30 March 2024 (UTC)[reply]
Malfunctions in entry name replacement can change what should be blue text into red text, as I discovered to my surprise in the opening post in this section. Imagine what happens to links to Latin terms if macrons on Latin terms aren't stripped. (Macrons distinguish entry names for some languages.) --RichardW57 (talk) 00:30, 30 March 2024 (UTC)[reply]
@RichardW57 IMO we definitely do not want U+0307 in the actual titles of entries. If this needs to be done it can be done automatically but I would be opposed to that for Lithuanian. Benwing2 (talk) 00:41, 30 March 2024 (UTC)[reply]
@Benwing2: By 'title', do you mean page name or head word? --RichardW57 (talk) 00:53, 30 March 2024 (UTC)[reply]
@RichardW57 In the page name. It can be added automatically to headwords but as I said, I'd be opposed to that. As User:Erutuon said, most fonts automatically remove the dot when an accent is added to i, and IMO that is correct and perfectly fine for Lithuanian. Benwing2 (talk) 00:58, 30 March 2024 (UTC)[reply]
@Benwing2: But seemingly not to the editors of the dictionary, missal and grammar indirectly cited above! The intervening U+0307 shall not appear in page numbers, as it should be stripped out along with the acutes, graves and tildes above it. (Note that 'ė́ ' will be reduced to the single code point for 'ė'.) --RichardW57 (talk) 01:18, 30 March 2024 (UTC)[reply]
If Lithuanian editors actually want dots on all their is and js in headwords, it could be done. I know almost nothing about Lithuanian or these dictionary editors. Alternatively, Lithuanians could lobby font makers to program different glyphs when the language of the text is Lithuanian. — Eru·tuon 01:52, 30 March 2024 (UTC)[reply]
After reading the linked article on "Encoding of Lithuanian Accented Letters", my opinion is that retaining the dot in Lithuanian entries would be preferrable as a matter of typographical style. I'm less sure however whether the best way to implement that on the technical level is to include U+COMBINING DOT ABOVE (as a practical matter, pi̇̀rmas looks wrong in the font used by my browser: it shows up with an extra dot in addition to the tittle. pı̇̀rmas looks better but doesn't have correct centering, although it shares that problem with some other oblique letters with combining diacritics such as ī̆. When not oblique, pı̇̀rmas looks OK, although it's apparently not officially the right way to do this). Tumasonis seems to present this as a proposed encoding convention; is there evidence that it has actually been adopted by anyone in practice?--Urszag (talk) 01:59, 30 March 2024 (UTC)[reply]
@Urszag This seems to be one person's opinion, and the article is quite old (it's not dated but the last citation is from 1998). Do we have any evidence that standard Lithuanian practice actually retains dots over the i along with accents? If so, how is this implemented? BTW, by now, if the use of U+0307 were standard, certainly the fonts would have been fixed to display this correctly. The fact that it shows up wrong is a strong indication that this is not standard. Benwing2 (talk) 02:05, 30 March 2024 (UTC)[reply]
As RichardW57 said, this typographical form is also mentioned in The Unicode Standard Version 15.0 – Core Specification (published September 2022): "To express the forms sometimes used in the Baltic (where the dot is retained under a top accent in dictionaries), use i + overdot + accent (see Figure 7-2)" (page 293). So I guess it is officially the correct way to encode this after all and not just a proposal. But I don't know whether any digital fonts have bothered with it.--Urszag (talk) 02:10, 30 March 2024 (UTC)[reply]
@Benwing2 it may be of interest that the main Lithuanian academic dictionary retains the dot when applying accents to lowercase i: http://www.lkz.lt/?zodis=pirmas&id=22167950000. The encoding appears to be the "i" character plus a combining grave accent. This, that and the other (talk) 10:01, 30 March 2024 (UTC)[reply]
@Benwing2: What about the readability of Lithuanian pelė̃ (mouse)? The accentuation mark (a tilde) overstrikes the dot above the 'ė' in the font I'm getting for this page. The preview font actually moves the second mark to the right, overhanging subsequent letters. On the other hand, Tahoma handles both words correctly. Liberation Sans makes an effort, but squashes the marks above, while Liberation Serif handles 'pi̇̀rmas' by stacking but handles 'pelė̃ ' by writing the accents side by side. Side by side is not unknown for dot above and acute together on 'e' - call it 'accent kerning'. We see it in kūrė́jau 'Lord' on the first line quoted from the missal - Tumasonis p20. All three fonts, Tahoma, Liberation Serif and Liberation Sans stack the dot and acute on 'e'.
In short, there are fonts that support dot above and mark on top. --RichardW57 (talk) 23:14, 30 March 2024 (UTC)[reply]
@RichardW57 On my Mac Book Pro under Chrome, pelė̃ written in upright (non-italic) font looks correct but pelė̃ in italic font has the tilde overhanging the space between the l and ė. Benwing2 (talk) 23:26, 30 March 2024 (UTC)[reply]

Lydian letters[edit]

Would it be possible to update Wiktionary's rendering of the Lydian script to reflect the new values of the letters 𐤮 (formerly ś, now s) and 𐤳 (formerly s, now š)? Antiquistik (talk) 08:41, 30 March 2024 (UTC)[reply]

@Antiquistik Yes although I'd like to hear from someone else who has some knowledge of Lydian (which isn't me). Benwing2 (talk) 20:18, 30 March 2024 (UTC)[reply]
Can you provide any more information, e.g. who changed the values? Links to information about why the new values are what they are? - -sche (discuss) 04:12, 6 April 2024 (UTC)[reply]
@Benwing2, -sche Per Diether Schurr (1999), Lydisches I: zur Doppelinschrift von Pergamon, the values of the letters 𐤮 (formerly ś) and 𐤳 (formerly s) were erroneously assigned. Lydian transcriptions of non-Lydian names show that 𐤮 is always used for the sound /s/ (that is, s) while 𐤳 is always used for the sound /ʃ/ (that is, š).
It also appears that the value of the letter 𐤡 (previously /b/) needs to be updated, since Schurr concludes that it instead represents /p/.
Schurr (1997), Lydisches IV: Zur Grammatik der Inschrift Nr. 22 (Sardes) also reassigned to the letter 𐤥 (previously "v") the new value of "w" to avoid confusion with the letter 𐤸 (transcribed using the Greek letter "ν").
These new values are now used as the standard Lydian transliteration. So it would be preferable to use them. Antiquistik (talk) 16:12, 7 April 2024 (UTC)[reply]
Thanks. OK, I can find both systems in use when I quickly search Google Books for some common words in one vs the other system (like laqrisa / laqriša), but the "new" system does seem clearer, if it also involves changing 𐤸 from ν (Greek nu, which seems like a horribly confusing choice) to "ñ" to match Lycian. Pinging @Vorziblix if you have any thoughts as the only active user to have edited Module:Lydi-translit. - -sche (discuss) 17:05, 7 April 2024 (UTC)[reply]
@-sche On a purely personal level, I would support the change from ν to ñ. However I cannot find the relevant literature advocating for any value change, all I can find for now is the use of different values without explanation. So I will leave whether or not to implement this specific change to be discussed here. Antiquistik (talk) 10:28, 8 April 2024 (UTC)[reply]
@-sche: It’s been too long since I looked at the relevant literature; at present I have no personal opinions one way or the other. — Vorziblix (talk · contribs) 19:34, 10 April 2024 (UTC)[reply]
@Vorziblix Would you object to updating the values of these Lydian letters on Wiktionary? Or can we go ahead with the changes? Antiquistik (talk) 06:49, 15 April 2024 (UTC)[reply]
@Antiquistik: No objections here! — Vorziblix (talk · contribs) 13:32, 16 April 2024 (UTC)[reply]
OK, let's think which letters need to be changed. Modern works that use s for 𐤮, š for 𐤳, w for 𐤥, p for 𐤡 (I can again find both systems in use even in recent works, searching for e.g. pira vs bira "house"), and ñ for 𐤸, do they also update any of the other letters? (We appear to already be using w for 𐤥.) Then we can change everything that needs to be changed in the module at once. (Does anyone update 𐤴 to anything different avoid confusion between τ and t?) - -sche (discuss) 16:11, 15 April 2024 (UTC)[reply]
@-sche The letters 𐤮, 𐤳, 𐤥, and 𐤡 are the only ones requiring updates. There is, for now, no proposal to change the value of 𐤴 or of other letters by linguists covering the Lydian language. Antiquistik (talk) 05:53, 16 April 2024 (UTC)[reply]
@Antiquistik Can we please change the value of 𐤸 to ñ? Per User:-sche, there is support in recent works for this. Use of a mixture of Greek letters and Latin letters is bad enough, but it's IMO intolerable when the Greek letters look like unrelated Latin letters. Benwing2 (talk) 06:26, 16 April 2024 (UTC)[reply]
@Benwing2 Yes, that would be good too. Antiquistik (talk) 06:31, 16 April 2024 (UTC)[reply]
Full disclosure, 1) Despite 𐤸 reportedly occurring in several words that seem like they should be common, including certain case forms of demonstratives and pronominal clitics, I actually haven't managed to find very many works that contain words spelled with 𐤸 (Wiktionary doesn't have any, either, AFAICT), in order to determine how they transliterate it (I tried searching for such words with ν, with ñ, with n, with v ... couldn't find many works mentioning the words at all, in any spelling I could think to search for). 2) In the few works I was able to find, there were somewhat more using Greek nu, but yes, ñ is also found, and has the advantages of much greater clarity, plus agreement with Lycian where a similar letter is transliterated ñ.
OK, I guess I will change the module in a few days if no one brings any other issues forward... - -sche (discuss) 23:43, 16 April 2024 (UTC)[reply]

Orthographic borrowing[edit]

If I say I'm going to Москва next year, using Cyrillic in the middle of the sentence (as some people do), and especially if I approximate the Russian pronunciation rather than sub in a different pronunciation like "Moscow", is that an "orthographic borrowing" of the Russian word, or am I just (basically) quoting the Russian word (if not code-switching)?
My understanding has been that "orthographic borrowing" is a phenomenon that happens almost exclusively in Asian languages, where a language like Japanese borrows only the written shape of a Chinese character but not an approximation of the Chinese pronunciation: that's what makes the borrowing an only "orthographic" borrowing and not just a regular "borrowing".
But I see that e.g. い-adjective#English is currently given as an "orthographic borrowing", although it's kind of the opposite of how we define "orthographic borrowing" on T:obor and in Appendix:Glossary, and of how e.g. Korean 葉書 is an orthographic borrowing: like with Москва, い-adjective is borrowing/quoting both the pronunciation and the spelling, a straight-up borrowing. This isn't the first time I've seen someone use "orthographic borrowing" in a way that seems incorrect (compare bauxite). Am I right about the scope of {{obor}} here? (Is there anything we could do to make the scope of {{obor}} clearer / discourage misuse?) - -sche (discuss) 04:01, 31 March 2024 (UTC)[reply]

link to another relevant discussion: Wiktionary:Information desk/2022/July#Is_this_Orthographic_Borrowing? - -sche (discuss) 05:19, 1 April 2024 (UTC)[reply]
@-sche Thanks. If we systemically restrict orthographic borrowings to logographic languages, we can implement that in the code, but if it's just a suggestion and we allow things like English CCCP to be considered orthographic borrowings, I think the best we can do is display an "are you sure?" type of warning. Benwing2 (talk) 05:25, 1 April 2024 (UTC)[reply]
I suppose you could argue that the standard Finnish pronunciation of sioux, which sounds like /sjouks/, is an orthographic borrowing because the pronunciation is entirely based on the spelling and not the source language's pronunciation; but this could as well just be called a rather striking example of a spelling pronunciation. It depends on how we define "orthographic borrowing" and whether it's restricted to writing systems based on logograms (in which case orthographic borrowings can't occur in the Latin alphabet but could occur for example in cuneiform). Benwing2 (talk) 05:16, 31 March 2024 (UTC)[reply]
But yes, the case of い-adjective#English is definitely not an orthographic borrowing. Benwing2 (talk) 05:17, 31 March 2024 (UTC)[reply]
@-sche how about this scheme for your example? Ioaxxere (talk) 05:39, 31 March 2024 (UTC)[reply]
@Ioaxxere I don't understand your table. In the case of @-sche's example, I would say that an example like I'm going to Москва next year is just code-switching. It is similar to people who say "I just got back from Nicaragua" and pronounce the word "Nicaragua" exactly as it would be pronounced in Spanish; essentially they are inserting a Spanish word in the middle of an English sentence (never mind that it's spelled the same in both languages). Benwing2 (talk) 05:47, 31 March 2024 (UTC)[reply]
It seems like you're agreeing with me. Code-switching is when you briefly switch into another language. So if you say "I'm going to Москва next year", that's an English sentence with a Russian word. Since the word is intended to be as Russian as possible, we can't possibly call it a "borrowing" of any kind.
Also, to address the question directly, い-adjective is essentially equivalent to something like γ-ray, which I would call an unadapted borrowing compounded with an English term. Ioaxxere (talk) 05:54, 31 March 2024 (UTC)[reply]
@Ioaxxere: We currently have only two entries in Category:English orthographic borrowings from Russian: CCCP and cyka. I assume these are correctly categorized and hence *MOCKBA would also be an orthographic borrowing? J3133 (talk) 05:53, 31 March 2024 (UTC)[reply]
@J3133: Yes, I agree with that categorization since those terms are spelled according to Russian conventions, or their nearest ASCII equivalents, but have been adapted into English (from a quick search I see people pluralizing "cyka" as "cykas"). Ioaxxere (talk) 06:01, 31 March 2024 (UTC)[reply]
Benwing found the words for something I was struggling to (thank you), "spelling pronunciations"—and that bauxite-type entries are better viewed as "spelling pronunciations". (And "spelling pronunciation" isn't "orthographic borrowing", or else where's "CAT:English orthographic inheritances from Middle English" for when the pronunciation but not spelling changed on inherited words like one?) That's a good explanation for why intra-Latin-script borrowings are better viewed as "spelling pronunciations". Can we move kaolin out of Category:English orthographic borrowings from French on that basis?
And I appreciate Ioaxxere's comparison of い-adjective-type entries to γ-ray, though I think that even moreso than "γ-ray", "い-adjective" is just 'quoting' the other language/script, a la code-switching but in such a way that it's OK to have an English entry (whereas we don't have Москва#English).
I am inclined to agree "CCCP" is an orthographic borrowing, but I'm not 100% sure, because while on one hand it looks identical to the Cyrillic script form, it is still changing from Cyrillic to Latin script, and well, where is the line past which approximating one script via similar characters in another is no longer "orthographic borrowing"? E.g. if it were attested, would POCCNR for Россия (Rossija) be "orthographic borrowing", although it changes the direction of some letters? What if someone ASCIIizes "ㄸ-initial" [words in Korean] as "cc-initial"? Surely there is some point beyond which an ersatz representation is definitely not orthographic borrowing(?), so does it make more sense for the line to be "when the script changes" (so Latin-script CCCP is not an orthographic borrowing), or "when the written form is not identical" (so Latin-script CCCP counts but not POCCNR)? I'm unsure. (Maybe "orthographic borrowing" should even be restricted to logographic scripts, as Benwing mentions.) - -sche (discuss) 00:06, 1 April 2024 (UTC)[reply]
@-sche There is in fact an underused template {{spelling pronunciation}}, which I have used on kaolin. Maybe this template should be augmented to allow for "spelling-pronunciation borrowings" or some such; currently it only takes a single param (the current lang), and categorizes into e.g. Category:English spelling pronunciations. Benwing2 (talk) 05:00, 1 April 2024 (UTC)[reply]
/ˈmɑskaʊ/ /mɐskˈva/ (or its closest English equivalent)
<Moscow> Borrowing Why would you do this?
<Moskva> Why would you do this? Unadapted borrowing
<Москва> Orthographic borrowing You're just speaking Russian

Well let’s see what I have thought out two months ago: orthographic borrowings are a transscriptural concept. A method of writing a language must be transferred upon the manner in which another language is written. Not the case with い-adjective#English because the term embeds content in another language, referring to the content of another language, i.e. it does not loan anything at all from Japanese. You might say that code-switching even exists inside of lexical units and when one is polyglot, that is at least started a lexicon of Japanese in one’s mind, then the lexicon of each individual language is permeable to transclude, fetch as an external resource, lexical units from another language also documented within the mind.
I'm going to Москва next year has no orthographic borrowing because the category of orthographic borrowing is a category of the dictionary sphere, not of sentence analysis, in other words because I'm going to Москва next year is not parsed by us as a lexeme which we could enter somewhere as such, claiming it to contain etymological relations.
As for Ioaxxere’s table, per my previously found dogmatics, that which bro calls “orthographic borrowing” is a heterogram, and “you’re just speaking Russian” is even more an “unadapted borrowing”, for I cannot admit that the language or code-switching nature changes depending on the spelling, also the constellation is empirically rare in so far as relevant for possible dictionary entries. Fay Freak (talk) 06:39, 31 March 2024 (UTC)[reply]

Feedback on proposed label designs[edit]

In this example, German Term was possibly borrowed from English term, which was derived from a combination of Latin anc2, Latin anc4, and Latin anc6, which each have a further uncertain inherited ancestor.

Proto-Italic *anc1

?

Latin anc2

der.

Proto-Italic *anc3

?

Latin anc4

der.

English term

bor.?

German Term

My main question is: is the text too hard to read on small screens? The font size is 12 pixels, equivalent to putting text in a <small> tag.

I'm very interested if anyone has feedback and suggestions for improving this design. Pinging @Victar, Vininn126, who highlighted the need for these kinds of labels, and @Lunabunn, who inspired the current design. Ioaxxere (talk) 04:51, 31 March 2024 (UTC)[reply]

Nice work, I like it! While I personally don't care much for the highlighted backgrounds, I can see their value in making the labels stand out more. As for the font size, I think we have found a good balance. Lunabunn (talk) 04:55, 31 March 2024 (UTC)[reply]
As a contrasting opinion, I really rate the backgrounds. They look really nice on dark theme. Kiril kovachev (talkcontribs) 02:24, 2 April 2024 (UTC)[reply]
Definitely an improvement. Would tooltips be possible? Vininn126 (talk) 07:25, 31 March 2024 (UTC)[reply]

@Lunabunn, Vininn126: I made some adjustments and added a link to the glossary for the benefit of mobile users. Ioaxxere (talk) 20:04, 31 March 2024 (UTC)[reply]

A proposal for how future big template changes should be done.[edit]

Big template changes happen too quickly for some people. While some people (mostly those that keep up with/make the latest Wikt programming news) know what the latest change to templates are, others have no clue until the big red "deprecated" banner or the Lua error text shows up. In addition, sometimes the big changes leave errors in unforeseen ways that didn't exist before.

As such, I propose a new system for big template changes, to help users get used to the templates and programmers have fewer bugs. It goes something like this:

0. Make a big change to a template (which includes actions such as deprecating any template or changing the logic of a highly-used template).
1. Let your changed version and the unchanged one exist side-by-side, but with the changed one being encouraged and receiving updates.
2. See if there are any bugs, and fix them.
3. After some time (I'd say around a month for big fixes, and maybe a little more for people to get used to the changes), remove the unchanged template.

What do you think? CitationsFreak (talk) 07:26, 31 March 2024 (UTC)[reply]

@CitationsFreak What sorts of changes prompted this? Also keep in mind that logic changes to templates often cannot easily be done in the way you suggest. Benwing2 (talk) 08:03, 31 March 2024 (UTC)[reply]
My big concern with this proposal is that it'll result in templates changing names periodically, which will annoy basically everyone. How would this work with a template like {{l}}, for instance?Theknightwho (talk) 18:03, 31 March 2024 (UTC)[reply]
@Benwing2 The threads "deprecate Template:1" and "Accelerated English plurals generate the wrong template".
@Theknightwho I was thinking of renaming the unchanged template something like {{[template-name]-old}}, so that the new template can still used as much as possible, but we have a backup in case something goes wrong. CitationsFreak (talk) 18:40, 31 March 2024 (UTC)[reply]
@CitationsFreak Would every page need to be bot converted to the old version and then changed over, or are you envisioning that people could choose which one to use? The former seems impractical, and the latter feels like a recipe for confusion, since we'll end up with a mish-mash of versions and we'll need to ensure every other relevant module is able to support both (which could get very messy). Theknightwho (talk) 18:49, 31 March 2024 (UTC)[reply]
@Theknightwho The later, but only for some time, and with a later bot job to replace the old template with the new. The other templates should only recognize the old template if they did before, and clearly mark where the old template is being recognized in-code so that it can be deleted when its time comes. CitationsFreak (talk) 19:08, 31 March 2024 (UTC)[reply]
Should be uncontroversial. DCDuring (talk) 15:52, 31 March 2024 (UTC)[reply]
A month is not much time. Commercial APIs often give six months or a year. (Backwards compatibility is even better. You can still use Stripe's 2014 APIs if your HTTP header requests it.) Equinox 15:23, 1 April 2024 (UTC)[reply]
Does it really take a year for people to get used to new templates on Wikt? CitationsFreak (talk) 17:48, 1 April 2024 (UTC)[reply]
Also, we're not a commercial operation, development is difficult enough as it is, and the barrier to getting involved in module development is already high due to the learning curve. Theknightwho (talk) 17:56, 1 April 2024 (UTC)[reply]
That's exactly right. I don't think any of this is needed; I make a lot of changes and few of them have caused any issue. Best to handle any issues on a case by case basis. We only have a few "developers" working on their own time; we can't afford to put more barriers up. Benwing2 (talk) 19:43, 1 April 2024 (UTC)[reply]
I feel that the issue of fewer developers is easily solved, by adding more developers from this wiki and not. (Also, is there a guidebook for developers? Wouldn't hurt to have one.) CitationsFreak (talk) 23:11, 1 April 2024 (UTC)[reply]
I doubt that we need a year, but some kind of statement of, 1., what the overall plans are and, 2., what particular code changes, monitoring categories, filters, etc. should not be too much to ask. There have been many technical changes over the years that have not generated as much annoyance as some recent ones. It is definitely more fun to not ever answer to anyone. Having pesky users objecting to changes will definitely slow down the pace of change. But, will it slow down the pace of desirable change? And who gets to say what change is desirable, perps or victims? DCDuring (talk) 22:37, 1 April 2024 (UTC)[reply]
@DCDuring It would be much easier to work with you if you stopped acting like such a drama-queen. I have tried to have these conversations with you many times, but we never seem to get anywhere because things are rarely good enough for you, and the objections often boil down to your personal needs at the expense of everything else. Calling yourself a "victim" is seriously unhelpful. Theknightwho (talk) 23:22, 1 April 2024 (UTC)[reply]
  1. User:Thejnightwho I apologize for working in a realm that is different in kind from a language and therefore has needs that do not fit into the normal framework. DCDuring (talk) 15:58, 2 April 2024 (UTC)[reply]
    @DCDuring And I'm working from a framework that includes more people than just you. Thanks. Theknightwho (talk) 18:01, 3 April 2024 (UTC)[reply]
    I'm sorry that my needs are different. DCDuring (talk) 18:56, 3 April 2024 (UTC)[reply]
@Theknightwho, DCDuring: I see it as an attempt to train developers about user perceptions. For example, @JeffDoozan's parameter checking module Module:checkparams is in principle a good idea, but dealing with its warnings for invocations of declension templates is a pain for whoever deals with them and seeing them is distinctly off-putting for anyone making other changes to the pages. One problem is a widespread lack of documentation of the templates, and the other is that some positional parameters have become obsolete as better ways of doing things became available or more widely known. And this is despite Jeff making efforts to reduce or eliminate its impact (albeit sometimes with the aid of other editors) when its warnings are misguided. --RichardW57m (talk) 09:37, 2 April 2024 (UTC)[reply]
They fix the bugs already, I see no issue. I don’t even keep up with programming news, page creation goes smoothless, and I don’t think there is manpower or mental capacity on either developer or editor side to run multiple template and module versions concurrently. BTW I used Arch. Fay Freak (talk) 23:01, 1 April 2024 (UTC)[reply]
I don't see an immediate problem, but by #3's "remove", I suggest redirecting, as someone will possibly want to use the new name or old name in the future without having seen the parallel testing period. No particular response to the rest of the proposal. —Justin (koavf)TCM 23:18, 1 April 2024 (UTC)[reply]
You mean the old and new templates, right? I can't see a situation where a person would want to use the old name of a template on purpose after everyone's gotten used to the change. An old version, sure, but not an old name, (as in {{1}}.) CitationsFreak (talk) 23:23, 1 April 2024 (UTC)[reply]
Correct: the names. And the reason someone would want to use the old name is "I haven't edited Wiktionary in three years, but I remember that this is how you do <var>x</var>". —Justin (koavf)TCM 23:36, 1 April 2024 (UTC)[reply]
This feels like us having to keep templates around forever, even if it is a redirect. I suppose there is nothing wrong with it. However, I do feel like the template name must be removed at some point. I'll mull over it. CitationsFreak (talk) 23:48, 1 April 2024 (UTC)[reply]

Derived terms[edit]

According to Appendix:Glossary:

derived terms
A post-POS heading listing terms in the same language that are morphological derivatives. [Italics mine]

According to Template:derived:

This template is used to format the etymology of terms derived from another language. [Italics mine]

This seems confusing. I think one of them should be renamed. Ioaxxere (talk) 19:40, 31 March 2024 (UTC)[reply]

@Ioaxxere
I see what you mean. It happens in etymologies that terms are derived from the same language, but we don't, maybe illogically, use the "derived" template in those cases, even though we do use a relationship of the same name when referring to the given word on the page of the word it's derived from.
Despite that I still don't like a rename. I didn't notice any problem till you mentioned it, so IMO it's not a big deal. Kiril kovachev (talkcontribs) 02:18, 2 April 2024 (UTC)[reply]
@Ioaxxere: I think the difficulty is that there aren't many (or any?) alternative words that can be substituted in place of derived. Thus, the same word has been used in different contexts. Do you have a suggestion as to an alternative word for one of the contexts? — Sgconlaw (talk) 17:48, 15 April 2024 (UTC)[reply]
@Sgconlaw: "Derived" in the second sense could be changed to "descended". Ioaxxere (talk) 17:54, 15 April 2024 (UTC)[reply]
@Ioaxxere: wouldn’t this clash with {{desc}}, although we would be using the word descended/descendant in the same sense? — Sgconlaw (talk) 18:46, 15 April 2024 (UTC)[reply]

Idea: Categorizing illustrated terms[edit]

I love illustrated pages on Wiktionary. They can give a visual grip to otherwise bland white pages. And that visual grip can, moreover, convey the culture of the speakers in a way transcending characters and phonetics.

Moreover, a fair bunch of language learners are also visual learners, and illustrations would only help. On top of that, Wiktionary can not only become the most comprehensive online dictionary, but also the most delightfully illustrated one. With time, this will draw more readers, and from there, more contributors, which leads to higher quality pages, which leads to more readers – you get the drill.

For this reason, I have a proposal: Get a bot to parse the lemmas of a given language, and drop them into categories such as "Fooian terms with illustrations". We already do this with "Fooian terms with quotations". Anyone on board?

P.S. This is unrelated to April Fools' Day :) Shoshin000 (talk) 20:38, 31 March 2024 (UTC)[reply]

@Shoshin000 You know what, this sounds good, and it would be even better if we systematically used a template for adding pictures onto entries that could do the categorization for us. Then there'd be no need for period bot tasks. The problem right now is that we just use the bare wiki syntax for images (or at least I do, I dunno), so I don't think it's able to be tracked ATM. We could consider changing over to a template for that, even if it's just a thin wrapper over the image syntax? Kiril kovachev (talkcontribs) 02:20, 2 April 2024 (UTC)[reply]
Yeah, and the bot would switch Wiki syntax to the template. Sounds more simple. Shoshin000 (talk) 11:09, 3 April 2024 (UTC)[reply]
How does having the category or the new template help achieve the goal of having more entries with good visuals?
We already have {{rfi|[language]}} which shows that someone thinks that the tagged L2 would benefit from an image. That template populates the language subcategories of Category:Requests for images by language. It is principally used for English (1961) and Translingual (taxonomic) (2249) entries.
As for pages with images there are 17,735 English noun lemma pages with "File:" and 5,410 with "Image:" and 906 English proper noun lemmas with "Image" and 3,447 with "File:". There are 9,058 Translingual taxonomic pages with "File:" and 386 with "Image:". There would be some duplication in the total count of pages, but also many instances of multiple images in L2s and between L2s. There are also some English verb pages with images. Also many of the pages for letters and symbols have images.
Among the problems we have with images are the lack of informational rather than esthetic value. A picture of a tree from a distance may convey information about shape (for a "specimen" tree not growing a forest). Plants that bear colorful flowers for a week in Spring are often illustrated only by a picture of the flower. More value is contained in an image that illustrates the probable reason for the name, eg, red-bellied piranha. DCDuring (talk) 17:35, 3 April 2024 (UTC)[reply]
@DCDuring I don't think it would do that, I think it would just make it easier to discover and search pages with images on them. Question: how are you coming up with all these figures? I can't figure it out. Which is why we would benefit from such a category.
I also disagree about the purpose of images. The entry already defines the word, and the etymology would also explain anything about the origin of the word. The most useful thing an image can do is show you the most distinctive form of whatever's being defined. For example, I'll look up 木漏れ日 and immediately see what it's meant to be referring to, perhaps even without needing to see the definition; for words with slightly abstract or culturally-specific definitions, a picture really speaks a thousand words, so I think we should have as many pictures as possible for words where they make sense — they just make it much easier to understand some definitions. Not just for etymological reasons. Kiril kovachev (talkcontribs) 18:17, 7 April 2024 (UTC)[reply]
It is amazing what can be accomplished by using CirrusSearch, especially "insource:" with regular expressions.
Many of our English verbal definitions are laughably incomplete, not mention the often-ambiguous glosses that pass for definitions in non-English L2s. That justifies more, rather than fewer, illustrations. In the case of vernacular names of organisms, what passes for "red" in definitions of "red-breasted this" or "red that" can vary dramatically. DCDuring (talk) 20:37, 7 April 2024 (UTC)[reply]
I don't know if this should necessarily be subdivided by language — consider a page like Ukraine, where the images are in the English section, but equally relevant to the other language sections, but it'd look bad to repeat them in every language section — but in my view, at least, it'd be fine if someone wants to periodically bot-categorize pages with images into Category:Pages with images or something (compare Category:Pages with broken file links, for cases where an image or audio is linked but doesn't work). - -sche (discuss) 17:11, 7 April 2024 (UTC)[reply]
It's possible to do it automatically, though any images added by templates wouldn't be included. Plus, we'd want to exclude any images which are part of a link template, since they're sometimes used for scripts which haven't been encoded yet. It's probably one of those jobs that sounds simple, but ends up being quite complicated. Theknightwho (talk) 20:52, 7 April 2024 (UTC)[reply]

French Wiktionary[edit]

Gosh, what is happening over there? I visited it only to see this banner at the top:


L’accès à la base de données est désactivé pour une durée indéterminée

Plainte

Je vous écris cette lettre au nom et pour le compte de Monsieur (omis), afin de vous informer que j’ai déposé le 25 mars dernier une plainte officielle auprès de la Préfecture de Police de Versailles contre votre site Internet, car il est coupable d’avoir publié des nouvelles et des faits à caractère injurieux, susceptibles de porter atteinte à l’honneur et à la réputation de mon client.

Comme l’a rappelé la jurisprudence (par exemple, Cour de cassation 03/3956) « … l’utilisation d’un site Internet pour la diffusion d'images ou d'écrits susceptibles d’offenser une personne est une action susceptible de porter atteinte au patrimoine juridique de l’honneur, et constitue donc le délit de diffamation aggravée… ».

Nous regrettons d’avoir dû recourir à l’autorité judiciaire, mais le ton et le contenu de l’article en question ne laissaient pas la moindre place à la négociation.

Je vous prie d’agréer, Madame, Monsieur, l’expression de mes salutations distinguées.

Avocat (omis) Ordonnance

En vertu de la plainte n° 2154021/24 déposée à l’Hôtel de Police de Versailles le 25/03/2024, il est notifié ce qui suit :

ayant été informé d’une infraction contre fr.wiktionary.org, étant directeur général de l’association Wiktionnaire,

la fermeture préventive du site fr.wiktionary.org est ordonnée car il fait l’objet d'une enquête pour l’infraction visée à l’article 595 du code pénal, avec la circonstance aggravante visée au paragraphe 3 du même article.

La fermeture doit être achevée au plus tard le lundi 1er avril 2024 à 15 heures.

Vous êtes également informé que, conformément aux dispositions du code de procédure pénale, vous avez été désigné comme avocat commis d’office par M. (omis).

Ceci est sans préjudice de votre droit de désigner un conseil de votre choix, en notifiant cette procuration par l’envoi, également par fax à (omis), d’une désignation formelle d’un conseil.

Versailles 31/03/24

Procureur (omis) Avis aux utilisateurs et utilisatrices du site Wiktionnaire

Nous avons considéré qu’il était préférable d’occulter les noms des personnes et des utilisateurs directement concernés.

Il est impossible de trouver les mots justes pour exprimer le sentiment de perte profonde que cette décision nous laisse. Une seule chose est sûre : le Wiktionnaire ne s’arrête pas là.

L’esprit qui animait le projet jusqu’à hier est profondément blessé, mais la communauté saura trouver de nouveaux espaces pour repartir.

Restez en contact, le Wiktionnaire francophone.


Sgconlaw (talk) 22:41, 31 March 2024 (UTC)[reply]

For those of us who parles un petite rancais (like myself), DuckDuckGo says this means:
Access to the database is disabled for an indefinite period of time
Complaint
I am writing this letter to you in the name and on behalf of Monsieur (omitted), to inform you that on March 25 I filed an official complaint with the Prefecture of Police of Versailles against your website, because it is guilty of having published news and facts of an offensive nature, likely to damage the honor and reputation of my client.
As recalled in case law (e.g. Court of Cassation 03/3956) "... the use of a website for the dissemination of images or writings likely to offend a person is an action likely to harm the legal patrimony of honour, and therefore constitutes the offence of aggravated defamation... ».
We regret that we had to resort to the judicial authority, but the tone and content of the article in question did not leave the slightest room for negotiation.
Please accept, Madam, Sir, the expression of my distinguished greetings.
Counsel (omitted)
Order Pursuant to complaint no. 2154021/24 filed at the Versailles Police Station on 25/03/2024, the following is notified:
having been informed of an offence against fr.wiktionary.org, being the general manager of the Wiktionary association,
The preventive closure of the fr.wiktionary.org site is ordered because it is under investigation for the offence referred to in Article 595 of the Criminal Code, with the aggravating circumstance referred to in paragraph 3 of the same Article.
The closure must be completed no later than Monday, April 1, 2024 at 3 p.m.
You are also informed that, in accordance with the provisions of the Code of Criminal Procedure, you have been appointed as a court-appointed lawyer by M. (omitted). This is without prejudice to your right to appoint counsel of your choice, by notifying such power of attorney by sending, also by fax to (omitted), a formal appointment of counsel. Versailles 31/03/24 Prosecutor (omitted) Notice to users of the Wiktionary site We felt it was best to redact the names of the people and users directly involved. It is impossible to find the right words to express the sense of deep loss that this decision leaves us with. Only one thing is certain: Wiktionary doesn't stop there. The spirit that animated the project until yesterday is deeply wounded, but the community will be able to find new spaces to start again. Stay in touch, the French-speaking Wiktionary.
You are also informed that, in accordance with the provisions of the Code of Criminal Procedure, you have been appointed as a court-appointed lawyer by M. (omitted).
This is without prejudice to your right to appoint counsel of your choice, by notifying such power of attorney by sending, also by fax to (omitted), a formal appointment of counsel.
Versailles 31/03/24
Prosecutor (omitted) Notice to users of the Wiktionary site
We felt it was best to redact the names of the people and users directly involved.
It is impossible to find the right words to express the sense of deep loss that this decision leaves us with. Only one thing is certain: Wiktionary doesn't stop there.
The spirit that animated the project until yesterday is deeply wounded, but the community will be able to find new spaces to start again.
Stay in touch, the French-speaking Wiktionary.
Courtesy link: fr:. —Justin (koavf)TCM 23:44, 31 March 2024 (UTC)[reply]
fr:MediaWiki:Sitenotice. —Justin (koavf)TCM 23:46, 31 March 2024 (UTC)[reply]
@Àncilu: —Justin (koavf)TCM 23:51, 31 March 2024 (UTC)[reply]
Wow. I knew that British libel laws are seriously f***ed (i.e. extremely biased in favor of rich plaintiffs) but I didn't realize things are even worse in France. Benwing2 (talk) 23:54, 31 March 2024 (UTC)[reply]
🐟 this Is April 1st 🐟 Àncilu (talk) 23:55, 31 March 2024 (UTC)[reply]
@Benwing2 @Koavf @Sgconlaw Àncilu (talk) 23:56, 31 March 2024 (UTC)[reply]
Sacre bleu! Zoot alors! —Justin (koavf)TCM 00:02, 1 April 2024 (UTC)[reply]
fr:Wiktionnaire:P/24. —Justin (koavf)TCM 00:03, 1 April 2024 (UTC)[reply]
@Àncilu: OH! Ha ha ha! Good one! I was mystified as to how content at the Wiktionnaire (as opposed to Wikipedia) could end up defaming someone… — Sgconlaw (talk) 01:39, 1 April 2024 (UTC)[reply]
I don’t see that anything could have happened either, even if it was not an Aprils joke. @Àncilu added this to fr:MediaWiki:Sitenotice, so suppose he got an e-mail about a supposed proceeding. What ever that proceeding is in France, a court proceeding or administrative proceeding, it has to be delivered to an actual person representative of French Wiktionary (international delivery, given Àncilu is in Italy (?), is also quite a feat), which doesn’t even have legal capacity, so there can be no pending case, the lawyer would be incompetent. Fay Freak (talk) 00:02, 1 April 2024 (UTC)[reply]
@Fay Freak i live in France. Àncilu (talk) 00:07, 1 April 2024 (UTC)[reply]
@Àncilu: Well, have you heard how they invented separation of powers in France? I doubt that they would bring a defamation case before the Préfecture de Police de Versailles 😵‍💫, because that would be a civil matter. Can we admit this as creativity? Fay Freak (talk) 00:37, 1 April 2024 (UTC)[reply]
@Fay Freak : I wrote this to make April Fool's Day less realistic to avoid external problems, because of people might outside the Wiktionary misunderstand if it would be mentioned even indirectly. Àncilu (talk) 12:07, 1 April 2024 (UTC)[reply]

April 2024

Request lemma[edit]

Requesting template rf-lemma or rf-entry = creation of this lemma is wanted (urgently). Why?

Is this OK? Should I stop moving them? I did not find a req-lemma at Category:Request templates (sooo many!) Is there another template for needed new pages? alert programmers alert - Please erase your username from this template if you are not available to receive alerts! Thank you, Sarri.greekMM Benwing2, Surjection Thank you ‑‑Sarri.greek  I 17:41, 1 April 2024 (UTC)[reply]

@Sarri.greek: I believe {{rfdef|<lang>}} is the template you're looking for. However, there is not much point to just moving the etymology if the new word (ἀκανθόχοιρος in this case) doesn't even have a definition. Yes, we should avoid "repetitions", but if the new word doesn't have a definition, it's not "repetition" to simply not create the word in the first place.
Also, please be careful when moving information, as you moved the "displaced ..." part, which applies to the Greek lemma and not the Ancient Greek lemma. I also could not find the word in Hesychius.
There are also some minor formatting issues in your new page. Please be more careful in the future. --kc_kennylau (talk) 10:13, 2 April 2024 (UTC)[reply]
Thank you M @kc_kennylau -sorry, I did not get a ping from my browser for your {reply}. O! And sorrry for my mistake during moving material (I make more mistakes as, alas, I am getting older...) About {wanted} or {requested} lemma. It is especially needed when the 'other' language is in the same page; it happens a lot in Greek. There is no way (orange links and the similar do not help) to make an urgent call to create a lemma. Pages like Wiktionary:Requested entries (Ancient Greek) are wishlists. But an urgent need, means there is a gap, a need for a lemma. It asks editors who might be interested in creating lemmata in this language to begin first and mainly with the {wanted} calls. Thank you. ‑‑Sarri.greek  I 03:30, 4 April 2024 (UTC)[reply]

Etymology tree testing[edit]

As @AG202 and others pointed out, it is very important to test out major changes before implementing them in mainspace. The problem is that I don't know everyone's use cases. Therefore, I invite the community to suggest terms to create an etymology tree for. If the output is undesirable in some way, I can tweak the template to give a better result. Before commenting, please note the following: 1. No test trees will be created in mainspace. 2. The template requires each ancestor to have an entry. If your entry says "from English redlink1, French redlink2, from Latin redlink3", I can't really do much with that. 3. If there's a problem with the output, please give constructive criticism so I can fix it.

To start, here's an example of a hypothetical Swedish term which is borrowed from English father and calqued from Old English fæder. This would be generated by: {{etymon|sv|id=whatever|bor|en>father>male parent|calque|ang>fæder>father|tree=1}}

    Ioaxxere (talk) 19:14, 1 April 2024 (UTC)[reply]

    Not sure why it's not working but
    test:

    {{etymon|hi|id=whatever|title=गुल|bor|fa-cls>گُل>flower|tree=1}} (escaped by kc_kennylau (talk) 09:53, 2 April 2024 (UTC))[reply]

    my guess is that it does not support etymology codes? — Sameer مشارکت‌هابحث﴿ 08:08, 2 April 2024 (UTC)[reply]
    I think this is best suited to a subpage in the template. I've moved these two examples to Template:etymon/testcases. I have also escaped the above code because it is generating a error and filling up Cat:E. --kc_kennylau (talk) 09:53, 2 April 2024 (UTC)[reply]
    @Sameer: Yes, it turned out that the template was incorrectly rejecting etymology language codes. Here is how it should look:
      It seems like the Persian character is tall enough to slightly overflow the box. I'm not sure what I can do about this, since the font size is actually being pumped up by the {{m+}} template which is used within the box. Ioaxxere (talk) 14:33, 2 April 2024 (UTC)[reply]
      @Ioaxxere: It seems that height:auto; could solve this problem; see example. --kc_kennylau (talk) 19:10, 2 April 2024 (UTC)[reply]
       Fixed Ioaxxere (talk) 00:00, 3 April 2024 (UTC)[reply]
      While I agree with the reservations expressed by others in the earlier March discussion on this same proposal, I have to say this looks pretty good already. If the infrastructure behind it were robust enough, it could become a pretty neat addition to the project. — Mnemosientje (t · c) 13:47, 2 April 2024 (UTC)[reply]
      100% agree that this is visually interesting, more intelligible, and handy. I totally support this and have only minor cosmetic tweaks to suggest. —Justin (koavf)TCM 16:49, 3 April 2024 (UTC)[reply]

      @Ioaxxere A bit belated, but I like the idea. My only (minor) criticism is to ask why the text inside the boxes is so large. Is there any particular reason for that? Could we reduce it to the same size as the other ordinary text found in the rest of a given entry? (Or is it just that way on my display, for whatever reason? Not sure if it shows up normal-sized for other people.) — Vorziblix (talk · contribs) 19:46, 10 April 2024 (UTC)[reply]

      @Vorziblix: Yes, the font size was (by default) slightly larger than regular text (16px versus 14px). I've gone ahead and made it smaller (although I'm not totally sure whether it's better this way...). Ioaxxere (talk) 20:13, 10 April 2024 (UTC)[reply]
      @Ioaxxere: Thanks! IMO it looks great. — Vorziblix (talk · contribs) 15:11, 11 April 2024 (UTC)[reply]

      This looks pretty neat! I want to suggest using a tabular format (left- or right-alignment, like two columns) instead of center-alignment for the boxes. It is much easier to read and compare similar data when they are aligned with each other, rather than arbitrarily aligned due to varying text width. (I'm not sure how it would best be implemented – maybe a consistent width for language name and etymon – but anyway should be tested on mobile screen sizes.) I also find the small "?" icon not immediately intuitive (despite the tooltip) and not emphatic enough for representing anything that is iffy (i.e. too easy to miss), and would prefer spelling out the word "uncertain" instead, perhaps with a change in box color contrast as well. It would be nice if the collapsible box header said "Etymology tree for <word>" instead of "Etymology tree". Hftf (talk) 22:10, 17 April 2024 (UTC)[reply]

      @Hftf: Thank you for the feedback! 1) Using a tabular format would mean packing everything into a rectangular grid, which I don't think would look good. 2) My ideal solution for uncertainty would be to have a dashed line, but unfortunately this doesn't seem to be technically possible. I'm not sure I like the appearance of uncertain spelled in full — but I'm fine with doing that if other people agree. Another idea would be to change the colour of the box itself, so every uncertain etymon might be pink rather than beige. I don't want to make any major design changes at this point though, since I'm preparing to start a vote on the template in the next few days (see #Etymology tree vote). 3) The idea of the template is to be used on the entry page itself, and of course there's no need to remind the reader what entry they're on. However, if we started adding trees to other pages you would be absolutely right. Ioaxxere (talk) 17:16, 18 April 2024 (UTC)[reply]

      T:antsense, to finally clarify T:sense on antonyms[edit]

      As has been discussed a number of times, including Wiktionary:Beer_parlour/2016/August#Suggestion_for_sense_tags_on_antonyms (which I stumbled upon while trying to find a different discussion where I suggested the same thing), it perennially confuses lots of people that we write things like "(to start work): clock in, clock on, punch in, go on the clock" ... so I've gone ahead and made the T:antsense template Benwing proposed in the 2016 discussion; you can see it in use now, displaying "(antonym(s) of "to end work"): clock in". Please feel free and encouraged to make the template better, find a better name, whatever... once it's in a state everyone's happy with, maybe we can bot-deploy it to entries and finally end this enduring source of confusion... - -sche (discuss) 02:26, 2 April 2024 (UTC)[reply]

      @-sche Thanks! Benwing2 (talk) 05:14, 2 April 2024 (UTC)[reply]
      I hope that one day we can delete this template because we will have placed all the antonyms under the relevant sense lines! This, that and the other (talk) 22:50, 2 April 2024 (UTC)[reply]
      I am going to do a bot run in a couple of days to switch all occurrences of {{sense}} to {{antsense}} in Antonyms sections. Benwing2 (talk) 01:15, 4 April 2024 (UTC)[reply]
      @-sche: I just now found out about this template, and honestly, it confused me even more, and in the same way you pointed out above ("big" does not have a sense that is translated as "antonym of big"...; and furthermore in some language it might!). In my opinion, the only way to solve this so that it is clear what is meant, is by doing something like suur#Antonyms (bar the new template), where the meaning of the antonym is given within the link template. Thadh (talk) 21:53, 11 April 2024 (UTC)[reply]
      Hmm... so you interpret
      Antonyms
      (antonym(s) of "begin working time"): clock out
      as saying antonym(s) of "begin working time" is one of the definitions of clock in? (or that it is the definition of clock out?) If anyone else interprets it that way, I hope they'll chime in; to me, the parenthetical (s) and the fact that part of the text is set off in quotation marks and quotes the definition further up the page makes it very unlike anything that appears in our definitions of words — I don't think any language has a single word that we would translate as "antonyms, plural, of quote "big" unquote", we would definitely format it differently, without quotation marks, and "antonym" would be exclusively singular — so I would not have expected anyone to interpret it as a definition. But your comment shows we could benefit from being even more explicit! What if we change the wording to (the following word(s) is/are antonym(s) of "begin working time"): clock out? (The beauty of this being a template is that we can change the wording in one place and it will propagate out to all entries.) - -sche (discuss) 22:15, 11 April 2024 (UTC)[reply]
      We don't always use {{sense}} to give the gloss already used in the entry, sometimes we make it a little more specific. If an entry has a meaning "not X" (which happens quite often, English doesn't always have a good equivalent for these), then it would actually make sense for me to say "antonym of X" as a sense. The parenthesised "(s)" indeed does make it less likely, but it's also pretty easily missed.
      It's possible that I interpret it that way because I am used to antonym sections using the sense as in the entry, rather than describing the following term, but on the other hand, so are our readers, right? Thadh (talk) 23:55, 11 April 2024 (UTC)[reply]

      FYI: April Updates (Unicode)[edit]

      https://mailchi.mp/a9bd0287cce4/testing-rickys-template-6362454


      Note that it includes Wikimedia news. —Justin (koavf)TCM 16:47, 3 April 2024 (UTC)[reply]

      Automatic cognate generation[edit]

      Many of our entries, like king#Etymology_1, include long lists of cognates. Would anyone be interested in a template that could be used to generate these kinds of lists automatically? The template would work by adding terms into a category which would be accessed to get a list of cognates. Ioaxxere (talk) 19:24, 3 April 2024 (UTC)[reply]

      @Ioaxxere: That's really a job that is done by Proto-Germanic *kuningaz. The only benefit of this sort of template would be that if I had a Proto-Tai descendant on my watchlist, I wouldn't get alerted every time someone added the form from yet another Zhuang dialect. --RichardW57 (talk) 20:54, 3 April 2024 (UTC)[reply]
      Like Richard said, I don't think this is really necessary, and I am personally of the opinion that these long lists of cognates do more harm than good, so we should rather remove them than make it easier to generate more of them. Thadh (talk) 20:58, 3 April 2024 (UTC)[reply]
      I second both of the above comments. Nicodene (talk) 06:22, 10 April 2024 (UTC)[reply]

      "terms spelled with"[edit]

      (moved from User talk:Benwing2#"terms spelled with") We currently have two categories Category:Hindi terms spelled with ॉ and Category:Translingual terms spelled with ◌ॉ whose names are in conflict.

      In terms of Unicode, the former is:

      • [...] U+0020 'SPACE'
      • U+0949 'DEVANAGARI VOWEL SIGN CANDRA O'

      And the latter is:

      • [...] U+0020 'SPACE'
      • U+25CC 'DOTTED CIRCLE'
      • U+0949 'DEVANAGARI VOWEL SIGN CANDRA O'

      Usually, U+25CC is included in the name of the category, such as in Category:Translingual terms spelled with ◌̺ which in Unicode is:

      • [...] U+0020 'SPACE'
      • U+25CC 'DOTTED CIRCLE'
      • U+033A 'COMBINING INVERTED BRIDGE BELOW'

      Thus the name appears with the combining character U+033A being displayed on the dotted circle for demonstration purposes.

      U+0949 is a combining character as well, and indeed if you try to highlight the character in the first category name, you are forced to select the space as well. However, visually the dotted circle is also rendered, at least on my browser.

      Should we unify them? If so, which one should we choose? --kc_kennylau (talk) 00:58, 4 April 2024 (UTC)[reply]

      It's not that simple. Whether dotted circle plus mark renders properly is, or has been, renderer and script dependent. Sometimes the result is two dotted circles! --RichardW57 (talk) 04:42, 4 April 2024 (UTC)[reply]

      Requiring attribution when moving from one Wiktionary page to another[edit]

      Er, what do we think about this? [13] Equinox 13:59, 4 April 2024 (UTC)[reply]

      Well I'm a little confused, since @Indigopari reverted @A westman's edit to haemoglobin that copied material from hemoglobin on the basis of giving no attribution, but then they immediately violated that rule less than 15 minutes later when copying material in the other direction. Theknightwho (talk) 14:22, 4 April 2024 (UTC)[reply]
      The edit summary looks sufficient to me. It would have been better, though, to just make a minor edit and say "the previous edit copied from hemoglobin"- although it was obvious enough what they were doing. Chuck Entz (talk) 14:46, 4 April 2024 (UTC)[reply]
      That's correct, though the attribution can be as simple as saying where you got it from in the edit summary. Of course, not every change is distinctive enough to require attribution- we're talking copyvio or plagiarism, not dotting i's and crossing t's. Chuck Entz (talk) 14:25, 4 April 2024 (UTC)[reply]
      Oh woe, what kind of attribution should I have given? A Westman talk stalk 14:38, 4 April 2024 (UTC)[reply]
      @A westman: There's a policy WT:FORMS about handling alt forms and soft redirects. Probably you shouldn't have duplicated the information in two entries, because they may easily get out of sync and it's a maintenance burden. Also was your deletion of the Welsh entry actually intended? --Ssvb (talk) 15:45, 4 April 2024 (UTC)[reply]
      The deletion was unintended. A Westman talk stalk 16:49, 4 April 2024 (UTC)[reply]
      One way to look at it is to treat the collective of Wiktionary editors as a single entity from the copyright standpoint. But if individual editors want to be always personally credited when moving every tiny bit of text from one entry to another, then this looks more like the case of malicious compliance. For example, in this diff I copied the texts of glosses from their English entries. Did I have to explicitly mention these entries in the edit's summary? Did I have to track the actual handles of the editors, who contributed these pieces of text in the first place?
      In principle, the wiki engine could do the identification of copied text fragments automatically, reducing the need for manual labour. Yes, this would be very resources intensive and probably won't be implemented any time soon. But theoretically this can be done. --Ssvb (talk) 15:14, 4 April 2024 (UTC)[reply]
      @Vahagn Petrosyan as someone who expressed interest in attribution in earlier discussions. Thadh (talk) 16:55, 4 April 2024 (UTC)[reply]
      Wikipedia:Copying within Wikipedia may be relevant. Whatever the formal rules may be, mentioning in the edit summary that content is copied from a certain page is the decent thing to do. We spend a lot of time, energy and ability to write a good Wiktionary entry. We deserve some credit. Vahag (talk) 18:17, 4 April 2024 (UTC)[reply]
      Some thought has been given to this issue on Wikipedia, and the result was w:Template:copied. The problem is that each individual's (surviving) contribution has to be acknowledged for as long as they retain copyright. Now, it can be done by implicitly referencing the change history of the source, but that only works while the change history remains accessible. That template attempts to protect this history, but I don't know how well that works. In practical terms, the history usually dies with the page. As {{copied}} on Wiktionary was deleted, I suspect the idea went down like a lead balloon on Wiktionary and the collective decision was to live dangerously.
      The selection of which cognates to display is sufficiently original to be protected by copyright - the question is then whether the 'fair use'-like exemptions from copyright apply. Wikimedia seems only to worry about US law, but many of us might fall foul of local laws. --RichardW57 (talk) 17:59, 7 April 2024 (UTC)[reply]
      Separate from the copyright question: A Westman, why were you moving the content from hemoglobin to haemoglobin in the first place? In the case of US/UK spelling differences, the thing we've been doing to be neutral (not inherently favouring one national variety or the other) is centralizing the content on the older entry, which in this case is hemoglobin (which is also, as an aside, the more common spelling). - -sche (discuss) 16:58, 4 April 2024 (UTC)[reply]
      Because haemoglobin is used everywhere outside the US and matches the Greek and Latin spellings, it's not just a UK spelling. A Westman talk stalk 18:51, 4 April 2024 (UTC)[reply]
      Also just using the older spellings is not "neutral" because most of them are in US English (inherently not neutral). Which makes sense because most Wikimedians are American afaik. A Westman talk stalk 18:53, 4 April 2024 (UTC)[reply]
      @A westman Favoring UK/Commonwealth spellings is no more "neutral" than favoring US spellings. But in fact your AFAIK about most Wikimedians being American isn't even true; they are scattered across the world, and I actually get the sense more current Wiktionarians are British than American. (From what I've seen, there is a fairly random mixture of pages where the main version is hosted using the British spelling vs. the American spelling.) Benwing2 (talk) 19:40, 4 April 2024 (UTC)[reply]
      For the last one from what I've seen there really isn't. And I never said UK spellings are neutral, what i did say is that there is no way to be neutral. US spellings are localized to the US and to an extent Canada and the Phillippines but everywhere else CW English prevails. A Westman talk stalk 19:54, 4 April 2024 (UTC)[reply]
      And anyway: https://stats.wikimedia.org/#/en.wiktionary.org/reading/page-views-by-country/normal%7Cmap%7Clast-month%7C(access)~desktop*mobile-app*mobile-web%7Cmonthly A Westman talk stalk 19:58, 4 April 2024 (UTC)[reply]
      And that tells you precisely zero about who edits Wiktionary. Benwing2 (talk) 20:02, 4 April 2024 (UTC)[reply]
      Fair enough but those statistics aren't available. A Westman talk stalk 20:03, 4 April 2024 (UTC)[reply]
      Yeah, those who prefer UK spellings point to the number of countries that theoretically consider those standard (although how many people there actually use them, when English is not the main native language, is another matter), and those who prefer US spellings point to the number of greater uses of US vs UK spellings (greater number of works using them)... sometimes people try to work out how many native English users there are of one or the other... to avoid pages being moved back and forth because you come along in April and think your preference is the rational one, and then someone else comes along in May and thinks their own preference is the rational one, just use the older entry. However, when a spelling is not merely "alternative" but a national "standard", we could be using T:standard spelling of like this—that template postdates a lot of entries, so many don't use it yet. - -sche (discuss) 20:30, 4 April 2024 (UTC)[reply]
      No part of the page reaches the threshold of originality, in the jurisdiction I know. Luckily, the definition is written by late SemperBlotto in 2005, vouchsafing verisimilitude of lacking creativity. The collective of Wiktionary editors is no natural person and thus cannot be the author of a work, which is the requirement for copyright to arise, nor can there be co-authors if there is nor will nor imaginative power to create a joint work under a joint conception; instead contributions are driven by intrinsic logics of the subject matters and rarely collaborative. Fay Freak (talk) 17:17, 4 April 2024 (UTC)[reply]
      @Fay Freak: The WMF:ToU#7._Licensing_of_Content mentions "the Wikimedia community" in its text, maybe not as the copyright holder, but it's still mentioned. The Terms of Use also explains that the outsiders only have to provide the article URL to comply with the attribution requirements of the license and that's sufficient. Not having this particular clause would render the content of Wikipedia unusable to be republished anywhere under the CC BY-SA license, because some individual Wikipedia editors would start pestering the re-publishers and demand to be given credit personally. Now back to Wiktionary. Reusing and copying parts of text from the corresponding English entries or from the cognates of the same word in other languages is rather common in Wiktionary. I have witnessed this myself many times. Can we possibly agree that mentioning this is generally not necessary in the edit summary? Because such copying either does not reach the threshold of originality or because the original text is still easily available just one wikilink away. I mean, via cognate links in the Etymology section or via links to the English words in the list of the foreign word senses. --Ssvb (talk) 01:09, 5 April 2024 (UTC)[reply]
      Of course, either the threshold of originality is not reached (with list-like content, which is almost everything in Wiktionary, sometimes formulating weighing and illustrating aspects, as said in homoeopathic doses for what Blotto wrote there, who described chemistry like one is five), or anyone interested should ask himself and guess the provenience or attribution, or because authors with the licence agreed to collective attribution, even by implication of participation if not expressis verbis: I mean such a site works with a low entry barrier, we can’t rebase like in git commits. Even if a source got deleted one can just ask about the edit history if explaining a legitimate interest, this can play a role for establishing a copyright violation or at least proceeding, otherwise main character syndrome. Fay Freak (talk) 03:28, 5 April 2024 (UTC)[reply]
      Some authors seem to have agreed to collective attribution, but I don't know an easy way of finding out who has. --RichardW57 (talk) 17:25, 7 April 2024 (UTC)[reply]

      Limburgish nominal inflection[edit]

      So this is apparently a meme in certain online communities. Looking at the table, even without any knowledge of Limburgish, it is clear that dative *berem cannot be correct and that the ‘locatives’ are all extremely suspect. Nor is it obvious what the ‘consonant mutation’ refers to (progressive assimilation?). The Limburgish page looks a lot more sane by contrast. 109.184.88.220 19:16, 5 April 2024 (UTC)[reply]

      @BartGerardsSodermans You seem to have been the one who added these entries. Can you fix this? Benwing2 (talk) 05:14, 6 April 2024 (UTC)[reply]
      @Benwing2 I've gone around and simply removed those, as I added most of them when I transforming the original pages' bare wikitables into templates. I based this template on whoever originally added the inflection tables, but they seem either very specific to one specific dialect or just to have been wrong in describing nominal inflection in Limburgish. These questionable non-template inflection tables are still present in some cases, like geo, glee, hieër, hoes, kindj, krieëk, meule, wien, water, and hit. I haven't removed those yet as I didn't originally add them and you might want to ask the original contributor (if they are even active anymore) whether these are correct or can be removed. Though to my judgement the do seem problematic as well, so I'd be fine with removing those as well.
      The "locative" case for example seems to have been someone who was to eager with generalising a rule over a few actually existing examples (specifically hieër (lord)hieëves (to the lord) & heim (home)heives (to (the) home)), instead of being a locative case it may just as well be a regular suffix -ves (which may be related to suffixes like English -ward en Dutch -waarts).
      As far as I can tell the dative is identical in most dialects to the nominative, though they were different at some point I don't believe an -m has ever been present. The only dative marking that ever occurred seems to have been a simple -e. Similarly, the "consonant mutation" seems to just be progressive assimilation, which most spelling systems would not spell out anyway so I also don't see why that was added initially. BartGerardsSodermans (talk) 06:37, 6 April 2024 (UTC)[reply]
      @BartGerardsSodermans Thank you! The original contributor is User:Ooswesthoesbes, who is sporadically active; but given what you've said, I am inclined to simply remove all the problematic tables regardless. I have a hard time believing, for example, there is a separate locative case in Limburgish. Benwing2 (talk) 06:45, 6 April 2024 (UTC)[reply]
      The inflections given are based on the "High Limburgish" standardisation. After a community consultation on the Limburgish Wiktionary, we decided to drop it altogether and instead use no standardisation. As BartGerardsSodermans indicates, the assimilation is generally left out in spelling, and as such can be dropped as well.
      While some dialects do differentiate in genitive, it is mainly tonal, f.e. daa~g vs. daa\g.
      My advise would therefore be to remove the templates or replace them with the ones similar to those on the Limburgish Wiktionary. --Ooswesthoesbes (talk) 06:50, 6 April 2024 (UTC)[reply]
      @BartGerardsSodermans Can you do this? Benwing2 (talk) 06:59, 6 April 2024 (UTC)[reply]
      Yeah, I'll go ahead and remove the inflection tables for now. And maybe in the future I'll see about creating a correct template. BartGerardsSodermans (talk) 12:14, 8 April 2024 (UTC)[reply]

      {{tcl}}[edit]

      Pinging interested parties, feel free to ping more: @Benwing2, Surjection, Vininn126

      Last December, WingerBot went through hundreds of pages replacing definitions of countries with this template, which automatically transcludes the English definition on the various language-specific pages. The only reason I noticed this is because this broke some quotation templates' display, but even if we ignore this issue (which should be fixed as soon as possible), in my opinion using this template across languages is a terrible idea:

      Languages are very different with regards to how they perceive the world, and proper nouns and even place names are not different. When an English speaker sees "largest country in the world", a Russian speaker sees "home country", whereas a Ukrainian speaker... Well, you get it.

      This becomes ever so relevant when you go down in size. A speaker of Votic will see the concept of Canada as fundamentally different from a speaker of Dogrib. While this is difficult to put into words, slight changes in the definition do matter! This becomes evident when you go to smaller-sized place names - The fact that Den Hoorn is located in Midden-Delfland is only really part of the worldview for the Dutch speakers in the region, for a Limburgish speaker it will already only be useful to know that it is located in South Holland, while for a speaker in Singapore this will just be a village in the Netherlands (or perhaps even in Europe!).

      In any case I think the indiscriminate introduction of the template accross languages by a bot should be undone, but I also suspect that it might be better to not use the {{tcl}} template at all: Definitions are ever changing, both in the language and in English, and while at one point the English definition might match the one in a given language, it is possible that this English definition will be optimised in the future, leaving the definition in the other language incorrect. Thadh (talk) 11:48, 6 April 2024 (UTC)[reply]

      Strongly agree that any mass adoption of {{tcl}} should have been discussed properly, and if I were to decide, I'd burn that template with fire. Not only for technical reasons, but with the valid points raised here. — SURJECTION / T / C / L / 11:50, 6 April 2024 (UTC)[reply]
      The technical issue aside, I strongly am of the opinion that there are absolutely terms that are the same semantically within languages, this removing it wholesale is not necessary. The argument comes across as ideological to me. trójkąt along with tons' other scientific terminology come to mind as being semantic matches. Vininn126 (talk) 11:55, 6 April 2024 (UTC)[reply]
      What on earth is gained by using {{tcl}} to include a definition of Russia or triangle on an entry instead of just using a link and writing a short gloss? The tcl approach 1. is subject to whims by editors of the English entry, 2. wastes technical resources, 3. introduces risks of breakage with existing templates and gadgets (like the quotation issue that prompted this discussion), 4. makes it harder for editors to see where the definition comes from, 5. makes it harder for people who want to parse the Wiktionary entry data to get senses, etc. It is worse in literally every single way I can think of. — SURJECTION / T / C / L / 11:58, 6 April 2024 (UTC)[reply]
      1) It provides more precise linking between terms. To your 1) this can be an upside or a downside for particular terms. TO your 2) On 99% of potential matches, the difference is insignificant. 3) There's argument for every template/module for "potential breaking", let's not have any in that case. IPA modules can be changed, too, and break. 4) How? 5) For some people. It's also based on Wikidata, which I'm sure people can parse. Vininn126 (talk) 12:02, 6 April 2024 (UTC)[reply]
      1 is meaningless. What is "more precise linking" between terms? 2 is never insignificant; this line of thinking is why we had memory errors for years. 4 is obvious; if you try to edit the page to fix a typo in the sense and just see a {{tcl}}, how is an editor not familiar with this template supposed to do anything? Your response to 5 I think just shows you're not familiar with this topic; forcing people to use the Wikidata API too just to display Wiktionary definitions makes no sense. If all you care about is the Wikidata lexeme ID, you can add that with {{senseid}}. {{tcl}} adds no value whatsoever. — SURJECTION / T / C / L / 12:08, 6 April 2024 (UTC)[reply]
      As to memory issues - we have tons of templates that can cause these problems and it's always on shared pages. This does not stop us from including templates on pages that are not shared and are unlikely to be shared. We've had to come up with other ways to deal with memory issues for specific pages in the past. I don't see why we can't just limit it's use instead of banning it entirely for this reason. Vininn126 (talk) 12:12, 6 April 2024 (UTC)[reply]
      @Surjection Unless you can point to something concrete, complaints about use of resources are not helpful here. I see nothing about this template which uses anything significant, and vague hand-wringing about it is not productive. Theknightwho (talk) 12:13, 6 April 2024 (UTC)[reply]
      Fetching and transcluding another page and parsing it in Lua is not free and is never going to be free. — SURJECTION / T / C / L / 12:15, 6 April 2024 (UTC)[reply]
      @Surjection Correct, but it’s also a negligible cost outside of pages which do it to hundreds/thousands of other pages, and I cannot think of any examples where that would happen with this template. Theknightwho (talk) 12:17, 6 April 2024 (UTC)[reply]
      We currently have 3.888 Latin-script languages. I bet you 99% of them would write the name of Samoa in the same way as English does. This is an issue in the long run (although to me the technical side is much less important than the lexicographical one). Thadh (talk) 12:22, 6 April 2024 (UTC)[reply]
      @Thadh Which is a hypothetical future problem for when we have thousands of L2s on a single page.
      Plus, by far the biggest cost involved is in grabbing the content of the page (which is something we can’t do anything about, since it calls back into PHP), whereas the actual parsing is relatively cheap. I know this, because I spent ages doing time profiles to see what the problem was. In your example, if they’re all calling back into PHP to get content from the same page, then that time cost won’t apply, since PHP uses a cache. Where it is a significant issue is in scraped transliterations, since every link grabs the content of new page(s). Theknightwho (talk) 12:27, 6 April 2024 (UTC)[reply]
      One consequence that hasn't been considered: this makes the master entries, in effect, templates that can't be given template-editor protection. Any bad edits to these entries will be propagated to all of the other entries, which will make them targets once the vandals catch on. Because they're entries with cross-linguistic significance they need lots of translations, and many of the translations are added by contributors who would be kept out if the page is protected. Page protection would also affect unrelated homographs. It might be possible to create an abuse filter that protects just the sense, with the level dictated by something added to the main entry (maybe a parameter in the senseid?)- but that definitely complicates things. At the very least, {{tcl}} raises the stakes for any decisions made regarding the senses being transcluded. Chuck Entz (talk) 21:34, 6 April 2024 (UTC)[reply]
      I find the point being made here rather confusing. This dictionary is written in English and maintains a neutral point of view, so it does not refer to any country as a "home country" for instance. It's true that Votic and Dogrib speakers will think differently of Canada, but if the denotations of the words both refer to the UN member state, then the definitions, as written in English, should surely be identical. Further historical or cultural nuances are presented as distinct senses or usage notes. This, that and the other (talk) 05:58, 7 April 2024 (UTC)[reply]
      The reason I did this was to avoid tons of duplication of definitions. It may seem "obvious" to define Canada as a country in North America, but there are lots of cases where it's far from obvious especially if you want the categorization to work out correctly, and it was very painful to try and keep manually synchronized the definitions of a hundred different terms for e.g. Vatican City. For any country in the Middle East, for example, there are issues such as what are the limits of the Middle East and Western Asia, etc. and how do we indicate that countries are part of both? Even for a relatively innocuous area like Europe, the boundaries of "Western Europe", "Eastern Europe", etc. aren't necessarily obvious and it can be problematic if you get it wrong. For many geographic terms (e.g. Palestine, Jerusalem, Crimea, Artsakh/Nagorno-Karabakh, Macedonia, ...) just coming up with an NPOV definition is hard, and furthermore the NPOV definitions may change over time, leading again to a massive synchronization effort if {{tcl}} isn't used. The technical objections made above seem largely theoretical and speculative to me since I haven't actually seen major issues arising, and I don't at all buy User:Thadh's claim that we need to phrase the definition of a given geographic term differently depending on the language in question. In fact I would say doing so can be quite problematic from an NPOV perspective. Benwing2 (talk) 06:37, 7 April 2024 (UTC)[reply]
      NPOV has absolutely nothing to do with lexicography. A certain word has a certain meaning in a certain culture, distinct from other cultures, be it a noun, adjective, or a name. That meaning is what we record here, and it is impossible to do lexicography well by only documenting the English definition of the referent, rather than the definition through the speaker's pov. This includes things like omitting specific information which is not primary to the speakers: For instance, Mäkkylä would best be defined as "A village in the Leningrad Oblast" in Russian, and as "A village in Russia" in English. That has nothing to do with NPOV, it has to do with speaker experience. Thadh (talk) 09:08, 7 April 2024 (UTC)[reply]
      I am perplexed that you want the definitions of Ingrian words to be contextualised for Ingrian speakers, even though these definitions are written in English. It's likely that in the Ingrian edition of Wiktionary, definitions would be written in the way you wish. However, English is a global language. I do not agree with the idea that different languages' entries on English Wiktionary should assume different levels of geographic foreknowledge on the part of the reader. Perhaps not what we usually mean by NPOV, but it's a similar principle. This, that and the other (talk) 11:39, 8 April 2024 (UTC)[reply]
      @This, that and the other: If you want our dictionary to only be used by English speakers, then I am afraid you'll see the majority of the editors leave pretty quickly. The target audience of the Ingrian entries are Russian, Finnish and Estonian speakers, which is unsurprising, as I doubt you'll ever encounter even one English speaker who has even heard of the word "Ingrian" in his life. So forgive me for not wanting to adapt my definitions to people that will have absolutely never use the entries, and to have information there that absolutely no reader will ever want. Thadh (talk) 11:50, 8 April 2024 (UTC)[reply]
      @Thadh When you say If you want our dictionary to only be used by English speakers, then I am afraid you'll see the majority of the editors leave pretty quickly., do you not see how this implies the precise opposite of what you're saying? If it's not solely used by speakers of one language, then it's not safe to assume the reader's knowledge or interest on that basis. Theknightwho (talk) 17:03, 8 April 2024 (UTC)[reply]
      Which is why I don't do that, but rather state that we should adapt the definition on the basis of what the speakers would denote. Thadh (talk) 17:54, 8 April 2024 (UTC)[reply]
      @Thadh's argument doesn't make any sense to me. Why would English speakers not care that Den Hoorn is in a certain part of the Netherlands? Do you speak for all of us? As an English speaker, I strongly oppose removing this kind of geographical information on any entry. @Surjection's argument is also irrelevant, given that anyone trying to parse Wiktionary will use the HTML output, which isn't affected, rather than the wikitext. Therefore I support applying {{tcl}} whenever possible. Ioaxxere (talk) 20:59, 7 April 2024 (UTC)[reply]
      @Ioaxxere: If you want more detailed information on Den Hoorn, there is a very good website for you called Wikipedia. Luckily for you, pretty much all our English entries have a handy link (and often more than one) to related articles. We also don't include information on past inhabitants of the village and how many schools there are. There is a reason for that: We are not an encyclopedia. Thadh (talk) 21:05, 7 April 2024 (UTC)[reply]
      @Thadh Forcing users to go elsewhere because you’ve assumed people don’t care about information just adds inconvenience. What you’re essentially doing is applying the Sapir-Whorf hypothesis to place names, instead of the far more obvious explanation that it’s down to geographic proximity, which changes based on the speaker’s personal circumstances.
      I agree that a speaker’s conceptions will change based on the language they’re speaking, but I do not agree that that applies to place names in general (edit for clarity: I’m not talking about poetic terms like Albion, which obviously are affected). Theknightwho (talk) 21:37, 7 April 2024 (UTC)[reply]
      I'm sympathetic to the idea that e.g. "(an island and city-state in Southeast Asia, located off the southernmost tip of the Malay Peninsula; a former British crown colony)" does not necessarily need to be present on every single language's definition of Singapore (e.g. சிங்கப்பூர்) ... but only inasmuch as I think just defining it as "Singapore", pointing to the English entry, and letting the English entry do the heavy lifting (including noting any associations the place has in any particular cultures) could be enough. (And since {{tcl}} syncs/transcludes this content, rather than duplicating it in a way that would fall out of sync, I think it's fine.)
      I don't think we can assume that the only people looking up a word in a given language are speakers from the main culture associated with the language, who live wherever the language is most commonly spoken; (native) Chinese speakers living outside China might only have as little need to know what specific sub-area of China Haikou is in as the average English-speaking American (but conversely, members of either group might want to know what specific region it was in); at the same time, they might have correspondingly more interest in exactly what part of Malaysia or the US, where they live, a nearby city (whether named in Chinese, Malaysian or English) is in. And IMO, information like (say) Mount Paektu being important to Koreans and Manchus should be noted in the English entry, not just the Korean entry. Only if a single place truly has salient/definitional cultural significance in a large number of languages would I consider not having at least a copy of all the info in the English entry, if it would balloon the English definition up too much.
      Regarding the idea that these are "unprotectable templates": if every language's word for "Singapore" just linked to the English entry with no extra details, and someone vandalized it, then people coming from all different entries would still see the vandalism if they clicked through to the English entry... and if they didn't click through, then the vandalism would go unnoticed by them, whereas if the English definition is transcluded across many pages and gets vandalized, more people are in a position to see the issue and bring it to our attention... so IMO it seems like a wash on that front. - -sche (discuss) 22:22, 7 April 2024 (UTC)[reply]
      @-sche: "I don't think we can assume that the only people looking up a word in a given language are speakers from the main culture associated with the language, who live wherever the language is most commonly spoken" - I assume nothing of the sort, but I am absolutely certain that the term denoted is the one that is used by "speakers from the main culture associated with the language, who live wherever the language is most commonly spoken". There is simply a difference between describing a referent and describing a term - when I as a Russian speaker say "Дортмунд (Dortmund)" I denote something different from what a German speaker would denote. The referent is the same, that is true, but the communicated information is not. Thadh (talk) 16:24, 8 April 2024 (UTC)[reply]
      Unless you happen to be speaking to @Fay Freak or one of many other Russian-speakers who live in Germany, that is. Theknightwho (talk) 17:08, 8 April 2024 (UTC)[reply]
      @Theknightwho: Good luck finding three quotes in Russian proving that the speaker encoded the knowledge that Dortmund is located in North Rhine-Westphalia into their used word, and the best of luck proving that for smaller languages. Thadh (talk) 17:57, 8 April 2024 (UTC)[reply]
      @Thadh I'm pretty sure that Russian travel guides exist about Germany. Theknightwho (talk) 18:05, 8 April 2024 (UTC)[reply]
      Okay, great. {{tcl}} can stay for Russian. Please remove it from Ingrian, Votic, Veps, Karelian, [insert any other Uralic language other than Finnish, Estonian and Hungarian], and while you're at it [any language that does not have travel guides written in it, which by my estimation is 98%].
      I don't see why this would ever be a site-wide decision anyway. Regardless of what English editors may think, why should anyone choose whether or not to use this template for languages other than that language's editors? Thadh (talk) 18:10, 8 April 2024 (UTC)[reply]
      @Thadh I have no idea why (or on what basis) you think the totality of speakers of these languages are ignorant of information about major cities in Germany such as what region they're in. It's completely baffling. Theknightwho (talk) 10:58, 9 April 2024 (UTC)[reply]
      @Theknightwho: "Ignorant" and "Not encoding in their speech" is not the same thing. Thadh (talk) 12:17, 9 April 2024 (UTC)[reply]
      @Thadh When using value-neutral terms for places, I encode as much information as the listener is able to infer from their knowledge of that place. Nothing more or less. Theknightwho (talk) 12:21, 9 April 2024 (UTC)[reply]
      Also, maybe you haven't noticed, but there are currently two elderly speakers of Votic, and just some thirty of Ingrian. Would not be surprised if indeed the totality of these do not know where Dortmund is located. Thadh (talk) 12:21, 9 April 2024 (UTC)[reply]
      Thadh being illogical again (→ belief perseverance). The correct perspective was outlined by the accusation of perplexity by This, that and the other. If you are a Russian, Finnish and Estonian speaker and use en.wiktionary.org with success then you are an English speaker to some degree, and assume an English speaker’s perspective. Theory of mind. He portrays the issue without a sense of proportion. The Wikipedia stuff doesn’t work either in the way Thadh suggests. I already had the problem of Malaysian place-names cited in 1978 by the Encyclopedia of Islam being unidentifiable to me, the suggestions on كَلَة (kala) being like a third of the mentioned suggestions. It is also psychiatrically interesting that Thadh illustratively expands upon the application of the Sapir-Whorf hypothesis after being called out for it, which is at this point willingly fallacious, stubborn. I mean I don’t say it is a disorder, giving lack of significance pervasiveness, allism must have even maladaptive interaction typically, we are all learning, but I must warn against such subjectivity, this is an unconstructive and dangerous personality trait. Fay Freak (talk) 18:15, 8 April 2024 (UTC)[reply]
      I don't even know where to start, but I speak English and I still don't know what state any given American city is located in, even if I know English, let alone any city in any other of the hundreds of English-speaking countries and territories. Speaking a language to the degree of understanding glosses does not entail knowing anything about the culture at all. Furthermore, in this day of internet, you don't even have to know English to use our dictionary. Thadh (talk) 18:23, 8 April 2024 (UTC)[reply]
      That’s why we take extra care about the place-name glosses being comprehensible to the naivest denominator. They are made exact to various granularity within the same gloss and at the same time robot-readable. I still freestyle within these limits and give entries a human touch: Ahlat (a town in Bitlis Province, Turkey; at Lake Van, 40 km northeast from Tatvan along the coastline). You don’t regularly succeed to be more intelligent than that either way. Fay Freak (talk) 18:35, 8 April 2024 (UTC)[reply]
      @Fay Freak: I don't know if you've noticed, but this entire discussion is about removing this type of glossing. Thadh (talk) 18:40, 8 April 2024 (UTC)[reply]
      @Thadh: The types of glossing aren’t mutually exclusive. English entries like Aksaray have to be improved, they are inexact because we didn’t know formatting, even using images as a replacement for coordinates. And {{transclude}} can have parameters for extra text like {{place}} has. Or if the link in {{place}} has an |id= then we don’t need to use {{tcl}} at all. It’s hardly the hundreds of pages WingerBot caught semi-automatically; I too was doomscrolling my watchlist in December and did not notice the theoretical offence. Fay Freak (talk) 19:52, 8 April 2024 (UTC)[reply]
      I have a very specific objection to a use of {{tcl}} which I had forgotten. When I looked at Ancient Greek Ἰνδῐ́ᾱ (Indíā),I found that the definition was, "(chiefly historical, proscribed in modern use) India (a region of South Asia, traditionally delimited by the Himalayas and the Indus river; the Indian subcontinent)". What? A Classical Greek word proscribed in modern use? I then looked at the wikicode - {{tcl|grc|India|id=region}}. The problem is that the gloss is picking up {{lb|en|chiefly|historical|proscribed|_|in modern use}} from the English entry. We need some way to stop such labels being picked up.
      A problem with correcting that entry is that I don't know what the Greek conception of 'India' was. But.. - while English glosses are supposed to be definitions, glosses for other languages are supposed to be translations. Perhaps I should just change the gloss of the Greek word to 'India'. --RichardW57 (talk) 19:56, 8 April 2024 (UTC)[reply]
      I think this is actually a good example of what was being talked about above. Vininn126 (talk) 19:59, 8 April 2024 (UTC)[reply]
      Or Ἀλαζώνιος (Alazṓnios), anachronistically using a country name in the definition which was invented two thousand years after the attestation. On the other hand, compare Κῦρος (Kûros) using a historically appropriate gloss. Vahag (talk) 20:12, 8 April 2024 (UTC)[reply]
      That still works, you can’t circumscribe everything comprehensibly in historical terms, which become more ambiguous the more you go back, less the earliest Armenian historic times, which you do not well map year by year though unlike Europe’s maps of the 1900s, but another two millennia before, you see in what I wrote in the etymology section تبریز (tabrīz). And the historical sensibility or diachronic coherence gets lost by people’s manual actions, sigh, compare my original definition of Cyprus, thus circumspectly formulated because I added the Phoenician translation for the island of Cyprus (no country nor political unity existed). There is an inherent bias towards the present natural to language itself, its preservation and transmission up to the imagined readers of our working language, without anyone being Whorfian here. All techniques have intelligbility advantages. Fay Freak (talk) 20:48, 8 April 2024 (UTC)[reply]
      There is a principal difference between countries (top-level states) and geographical or cultural regions corresponding to them sometimes (through the concept of a nation, such as tied to a demonym). I again applaud Wiktionary and whoever wrote the English dictionary entry Germany which has been afforded meticulousness in this respect. If we think about it like a programmer then they are different objects, which we could fetch. Wikipedians can only complain about it not being made out in the references though, which need to be interpreted as referring to one or the other or both. The concept would simplify historic accuracy, though even ethnicities only form in distinct times (e.g. Bosnians split off Serbians in the 13th century), but then again only most intelligent people like Vahagn, Richard, and Ben would even get the idea of what we try to achieve there, which is not usually expressed well in language – something for a later generation.
      Psychology fact: Tasks become easier by sequentializiation and big questions like “What is Russia?” can be projected: At another occasion I point out that even in the legal realm there is the diplomatic/international law answer, the civil law one, which is the de-facto state one (a term invented by Wikipedia), the internal administrative one, then here we have the cultural one (surely went somewhere through Belarus and Ukraine until everyone became sick of being the Russian world as they bombed culture away), which is equal to the regions typically settled by Russians at the Eastern frontiers as opposed to Turks (ethnopluralism, which we define incorrectly btw, is rare, most people distinguish other people(s) by acculturation). Fay Freak (talk) 21:13, 8 April 2024 (UTC)[reply]
      @RichardW57 There is in fact support for this in {{tcl}}; |nolb=+ or |nolb=1 makes it not pick up any labels, or you can give a semicolon-separated list of labels not to pick up. This needs to be documented. Benwing2 (talk) 21:17, 8 April 2024 (UTC)[reply]
      @Benwing2: Ἰνδῐ́ᾱ (Indíā) corrected accordingly. But when may one use undocumented features without being accused by perps of hacking? --RichardW57 (talk) 06:24, 9 April 2024 (UTC)[reply]
      I wonder how often labels should vs. should not be transcluded; I wonder if |nolb=+ should be the default. If it's not too much work, maybe someone could make a list of what labels are used on the various definitions around Wiktionary that {{tcl}} transcludes, and how often, so we could get a sense of whether most labels can or can't be expected to apply across languages (e.g., if there are any foreign places for which the US and UK or NZ, etc, use different names, any "US" label would never carry over to Czech — maybe the template even already realizes this, to not carry over English-specific labels).
      I'm not saying this is a good idea, if it would increase how much memory or other resources Module:languages uses for relatively small gain, but there is also the possibility of adding some kind of "long dead?" / "ceased to be spoken by native speakers in ancient times?" field (precise scope and name to be workshopped) to Module:languages or to some other module, which might need only be present only when the value is "true", from which not only could {{tcl}} know to suppress labels like "historical" or "obsolete" when a language is long-dead (since such labels are likely to be less accurate, and might be better added manually if accurate), but also "long-dead language term borrowed from modern language term" in {{der}}/{{bor}} like "Coptic terms derived from Greek" (permalink) could categorize into a "this is probably wrong, check me" category (in that case, Coptic borrowed from Ancient Greek, not modern Greek). (Again, not sure this is worth the cost, but mentioning it.) - -sche (discuss) 13:33, 9 April 2024 (UTC)[reply]
      @Benwing: This discussion is delving into the theoretical of whether speakers of different languages denote different things, but meanwhile the technical issue with the quotations is not resolved yet. Thadh (talk) 12:50, 9 April 2024 (UTC)[reply]
      After a disussion with @Theknightwho, it seems the main problems we share with the template is the fact that cultural baggage denoted by the term is now both transcluded (it shouldn't be, as English is a separate language and has separate connotations), and afaict excluded on the individual entries. By "cultural baggage" I mean for instance things like the difference between Burma and Myanmar.
      We still disagree on the amount of geographical information that should be added to the entries (I personally am of the opinion that this should be minimal), but in any case things like "Largest country in the world" does not seem to be something that should be transcluded. Similarly, labels should not be transcluded by default, I don't see how that is desirable. @Benwing. Thadh (talk) 13:25, 9 April 2024 (UTC)[reply]
      Coincidentally, my earlier comment here was prompted by finding a number of instances of the invalid category [language code]:Burma] that were only there because someone put |cat=Burma in a {{sensid}} template instead of |cat=Myanmar. While it's nice that all these entries are in synch, the fact that a Tagalog entry has all the mistakes of its English counterpart isn't. IMO we shouldn't use this on entries that are contentious or vandalism-prone in any way. Chuck Entz (talk) 14:04, 9 April 2024 (UTC)[reply]
      @Thadh I think you are right about not transcluding labels by default. This is behavior inherited the original implementation by User:Fytcha, but it violates the principle of least surprise because often the labels won't be relevant or accurate and it's not obvious to the transcluder what the labels are. I'll change this so they are only transcluded if you use |lb=1 or |lb=+. BTW you should ping me as Benwing2; I don't log into my admin account much so I won't generally see pings to that account. Benwing2 (talk) 05:17, 10 April 2024 (UTC)[reply]
      Oh whoops, sorry, I normally do but seemingly forgot this time. Thadh (talk) 07:55, 10 April 2024 (UTC)[reply]

      Copying rhyme syllable counts from existing categories[edit]

      The {{rhymes}} template takes |s=, which specifies the number of syllables in the relevant pronunciation of the term (not in the rhyme itself). This allows the template to categorize the term in e.g. Category:Rhymes:English/iː/1 syllable. Many English entries lack these syllable counts but are in categories like Category:English 1-syllable words. I've written a Python script to find English terms that are in exactly one of these syllable categories and that already have a rhyme with no syllable count specified, and add the syllable count specified by the category. I'm seeking consensus to run it (under my bot account) over all "English N-syllable words" categories. As usual I will start off slow to allow me to catch bugs early. — excarnateSojourner (ta·co) 01:46, 7 April 2024 (UTC)[reply]

      Run it. CitationsFreak (talk) 07:08, 7 April 2024 (UTC)[reply]
      @CitationsFreak This seems fairly innocuous to me. I think User:Surjection has already done similar runs for certain languages. Ideally this wouldn't be necessary and we'd have a pronunciation module that would automatically generate the pronunciation from a respelling along with the rhymes, but we're a long way off from that for English. The only issues I can really think of are cases where there are multiple possible pronunciations with different numbers of syllables, where the different pronunciations are tied to a specific dialect (e.g. secretary, normally 4 syllables in the US but 3 syllables in the UK), and for which the rhymes are different per dialect and thus the syllable counts need to be synchronized to the rhymes. Whether this actually happens I don't know, but you might want to generate a list of all the pages that have both multiple syllable counts and multiple rhymes, and manually review them to see if there are any of this nature (or just tell your bot not to touch them, and do them by hand). Benwing2 (talk) 07:31, 7 April 2024 (UTC)[reply]
      @ExcarnateSojourner Sorry, too late at night, I pinged the wrong person. Benwing2 (talk) 07:31, 7 April 2024 (UTC)[reply]
      It's fine, Benwing. (Also, thanks for asking permission, ExcarnateSojourner!) CitationsFreak (talk) 08:10, 7 April 2024 (UTC)[reply]

      Labels 'UK' vs 'Britain'[edit]

      Pinging @-sche because you often have thoughts about things like this. Currently we have two labels in Module:labels/data/regional, UK (alias United Kingdom) and Britain (aliases British, Great Britain, Brit), which display differently but categorize identically into British Foo where Foo is currently one of the five languages Bengali, English, Urdu, Vietnamese and Chinese. Yes, I know the UK and Great Britain aren't the same, but is there really enough of a linguistic distinction to merit two labels? I am 90% sure these labels are used promiscuously, with editors more or less randomly choosing one or the other. Given this, should we merge them? Benwing2 (talk) 07:00, 7 April 2024 (UTC)[reply]

      Aren’t they merged in their categorization already? Because that’s only what I could propose. Otherwise I let editors write what they want to write. To avoid frustration that is.
      I have added Irish English often enough, and with some likelihood “UK” means a term relevant due the Kingdom’s politics or federal (yikes) legal system, whereas ”Britain” means that I think a term is said in the ends in Scotland too since I know it from Northern England whereas Northern Ireland I would reserve to check. Your mileage differs no doubt.
      Doesn’t really matter that editors are inconsistent in theology, for, as outlined, the editing process is affective rather than based on thorough linguistic field study, lucky googling + “I feel like it” (impeccable for experienced editors, who know what the dictionary profits from, don’t get it twisted). Fay Freak (talk) 08:40, 7 April 2024 (UTC)[reply]
      To me "UK" seems less ambiguous, and so preferable (except for the fact that it might sound less pleasant to the ear?). I don't live there, so maybe I'm missing something.--Urszag (talk) 08:45, 7 April 2024 (UTC)[reply]
      Technically, "Britain" excludes Northern Ireland while "UK" does not, but (a) I don't think that's a very meaningful distinction, as Scotland and Northern Ireland form a far more coherent linguistical unit than Great Britain to the exclusion of Northern Ireland, and (b) "British", the adjective, doesn't exclude Northern Ireland, so if we're categorising terms as "British X" then it makes more sense to use the "UK" label. Theknightwho (talk) 13:28, 7 April 2024 (UTC)[reply]
      IIRC, a main reason these exist separately is that sometimes people wanted a noun (that didn't require typing "the" in front of it every time) and sometimes people wanted a word that fit in an adjectival/attributive slot. This is partly so that in {{label}}s people can write "dated in Britain" vs "UK dialects" (instead of wrong-sounding "Britain dialects") — I dimly recall "British" may have displayed as such at some point too, instead of being aliased to "Britain" — and partly because {{label}} isn't the only place these are used, there's also e.g. UK form of foobar vs wrong-sounding British form of foobar. (This issue of needing {{standard spelling of}} et al. to display something different is also why we hackily have "British spelling" and "British form" as different labels, btw; see this June 2020 TR discussion and the other discussions linked there.)
      Offhand, it seems like uses of "Britain" could be folded into "UK" or, in a very few cases in labels, "the|_|UK", but I'd want us to first make sure these aren't also used in other places we haven't thought of where that wouldn't work. I am inclined to agree with the goal of merging them, because they technically refer to different terms as TKW says, but I doubt even 10% of uses intend to be conveying different things, so having the difference is basically creating inaccuracy (not to mention that they're not distinguished in categorization).
      I am reluctant to even mention this, because I fear some people will use it as a reason to keep separate labels, but: another difference they could theoretically convey if there was any chance — which there clearly isn't, looking at how they're used at the moment — of people ever maintaining this difference, is that "Britain" existed at times when the "UK" did not, so theoretically some entry for a word that went obsolete centuries ago might technically be more accurately labelled "Britian" than "UK", but such a case could (and probably better should) be tagged with the relevant more specific labels like "England, Scotland, Wales" instead. - -sche (discuss) 15:38, 7 April 2024 (UTC)[reply]
      @-sche So ... I actually introduced the capability in labels of having a language-specific postprocessing function, which is currently used in Chinese so that e.g. {{lb|zh|Jilu Mandarin|Jiaoliao Mandarin|and|Jianghuai Mandarin}} displays as (Jilu, Jiaoliao and Jianghuai Mandarin) and {{lb|zh|Zhangzhou}} displays as (Zhangzhou Hokkien) but {{lb|zh|Zhangzhou|_|Hokkien}} also displays as (Zhangzhou Hokkien) rather than as (Zhangzhou Hokkien Hokkien). Such an approach, or maybe some modification of it maybe with label-specific settings, could potentially be used to correct the display issues you've mentioned above without actually needing separate labels. I'd just need a full description of what should be displayed in which circumstances. Benwing2 (talk) 18:22, 7 April 2024 (UTC)[reply]
      For "Britain" vs "UK", I (at least) wouldn't bother trying to have the template/module guess which one to display, because the odds of us being able to set it up to always display "Britain" in only those miscellaneous situations where that is desirable (without it accidentally displaying in situations where it is undesirable), and vice versa / mutatis mutandis for "UK", seem slim to me, and the benefits even if we do it right seem small; it seems easier to convert everything to "UK" and maybe manually add "_|the|_|" in the hopefully few places where "the UK" would be more euphonious than "UK".
      For "British form" vs "British spelling", though... if the "spelling of" and "form of" templates like "alternative form of", "standard spelling of" etc could know to delete the word "spelling" from the display form of "British spelling" — so that {{altspell|en|foobar|from=British spelling}} displayed "British spelling of foobar" instead of "British spelling spelling of foobar" — then I think the labels "British form", "Canadian form", etc could be reduced to aliases of the "British spelling" labels, although we should check first whether those labels ("British form", "Canadian form" etc) have come to be used anywhere else in the years since I set them up... I do spy one use in from A to Zed which needs to be changed (probably to just say {{lb|en|UK}}) before we alias "British form" to "British spelling". - -sche (discuss) 19:15, 7 April 2024 (UTC)[reply]
      @-sche It occurs to me there's a very simple solution to this issue, which is to provide a way of indicating that the label should display as written, e.g. {{lb|dated|in|!Britain}} which means that Britain should display as written rather than canonicalized to "UK" or whatever. It could easily be argued that this should actually be the default, but I don't know the ramifications of that. Benwing2 (talk) 04:08, 8 April 2024 (UTC)[reply]
      Hmm. That might be useful in some situations which are currently handled by having different labels (with the same categorization or whatever), but the downside, especially if we allow that for all labels and their aliases, is that then (as people start to use it even in cases where it's the only label, not part of a "dated in..." or the like), lots of entries will start displaying different things, suggesting there is a difference. If some entries say "Canada" and others say "Canadian", I suspect anyone who actually notices the difference may wonder what it's trying to convey (is Canada the topic label, and Canadian the dialect label?), and if the answer is "we're not trying to convey a difference", then why do we have a difference? I should clarify that my initial comment in this discussion was not in support of these being separate, just answering what the reason people kept them separate was; I actually feel the same way about "Britain" vs "UK" as about "Canada" vs "Canadian": that anyone who actually notices the difference is liable to wonder if we're actually trying to convey that the words are used in different sub-areas of the British Isles. I don't think adding "|_|the|_" to labels that need to display "dated in..." [Britain → the UK] is onerous. But especially given recent cases of pushback to template changes, maybe we should ping some more British editors to make sure they're onboard. - -sche (discuss) 22:32, 11 April 2024 (UTC)[reply]
      @-sche I have thought instead of making this opt-in only for certain labels, e.g. the labels data can mark that Britain should stay as such even when it's an alias of UK. There are cases where non-equivalent labels were being aliased (e.g. South Midlands as an alias Midlands), which is problematic when the display gets changed. I have solved this so far by separating the labels entirely but this is a bit annoying to implement. Benwing2 (talk) 00:13, 12 April 2024 (UTC)[reply]
      @-sche I implemented ! preceding a label to indicate that the label should be displayed as-is instead of converted to its canonical form. This is useful e.g. for yallah, which is labeled as Arab|_|!Australian so it displays as Arab Australian; otherwise it would show up as Arab Australia, which sounds wrong. Benwing2 (talk) 23:13, 16 April 2024 (UTC)[reply]
      @-sche Specifically referring to your point about Britain existing before the UK, I think that any terms which fall within that period (1707-1800) are better labelled as "18th century". If the country is absolutely necessary for whatever reason, it would be best to use the term "Great Britain", which is the period-equivalent to "UK" (which is what it became at the start of 1801). Theknightwho (talk) 23:32, 7 April 2024 (UTC)[reply]

      Update the text on main page[edit]

      The license on the main page says CC-BY-SA 3.0, it was actually updated to 4.0. 2001:4455:25F:9A00:FD29:81AD:A478:9FA2 06:22, 8 April 2024 (UTC)[reply]

       DoneSURJECTION / T / C / L / 07:07, 8 April 2024 (UTC)[reply]

      Old Lombard[edit]

      Ok, so first of all, I have listed "Old Lombard" as a dialect of Lombard. However, Old Lombard was spoken in the 13th-14th centuries, a fact I got from Lombard Wikipedia, meaning it was spoken in the same time as Old Spanish and Old French, etc. So I guess we could just add a code like roa-lmoa, (the final a for antich). That Northern Irish Historian (talk) 14:40, 10 April 2024 (UTC)[reply]

      Definition of a neologism for loanwords[edit]

      After the Spanish occupation, Tagalists in the early 20th century (Tagalog enthusiasts and promoters, who eventually became or influenced the Filipino language committee members) were promoting the use of Tagalog for academic abstract terms such as for science and arts because the terms used back then were heavily depending on Spanish. They started coining new terms and intentionally borrowed (not by natural contact) other Philippine languages' terms (and also Malay) such as Cebuano batas for Spanish ley (law), Cebuano katarongan for Spanish justicia (justice), Malay bangsa for Spanish nación, and Malay guru for Spanish maestro (teacher). All of these made it to common use up to the current period as Tagalog batas, katarungan, and bansa, guro. Since they were initially borrowed but made it to the "mainstream" use, some people would never think of these as neologisms anymore.

      Now, Tagalog has native words araw (sun; day) and buwan (moon; month) and the words can mean both the celestial object meaning, and the period of time meaning they were assigned to and can be perfectly understandable with context. In addition, Spanish had separate words for "sun" and "day" which are Spanish sol (sun) and Spanish día (day) respectively. Likewise, Spanish also has separate words for "moon" and "month" which are Spanish luna (moon) and Spanish mes (month).

      With the sun/day, moon/month distinction of Spanish and with the goals of being "inclusive" to create a true "Philippine" language, there was a proposal back then as well to borrow Cebuano adlaw (sun; day) and Cebuano bulan (moon; month) but the Cebuano terms would only refer to the celestial objects sun and moon, and araw and buwan would be used for the time periods, day, and month. The separation of words to refer to the time period and the celestial object did not made it to mainstream use unlike ther terms in the first paragraph and Tagalog still used the native terms up to today.

      Currently, in Wiktionary, Tagalog adlaw (sun) and Tagalog bulan (moon) are listed as neologisms since it was not used practically, nor added in the common dictionary but still listed in some books introducing neologisms such as Maugnaying talasalitaang pang-agham Ingles-Pilipino Literally, Relational Scientific Vocabulary English-Filipino and still being talked about in some papers or whatever that it can actually satisfy the Criteria for Inclusion.

      A user thought that the neologism label for these words should be removed due to the definitions provided at Wiktionary:Neologisms which are the following:

      A more precise sense of neologism which has gained some support on Wiktionary is a word that

      a) is new and perceived as new, although there is no precise age cutoff for newness;
      b) has not yet been recognized as part of the standard language (often being written with scare quotes);
      c) is not slang, colloquial, very informal, or technical, and;
      d) is not merely derived from an already-existing term with no unexpected change in meaning, such as in the case of clippings, loanwords, and abbreviations.

      A neologism that becomes part of the standard language should have the "neologism" label removed. A neologism that fails to become part of the standard language after an extended period of time but is not a protologism should be labeled nonstandard.

      The user has said that they can just be interpreted as regular loanwords and not as neologisms.

      However, I think adlaw and bulan by this definition are

      1. new in the sense that an existing term araw/buwan already existed but the sun/day moon/month concept was being introduced, coinage in intention so could be arguably a protologism
      2. did not become part of the standard Tagalog language, and you may only understand bulan and adlaw more likely if you come from the regional speakers such as Cebuano and the native terms are still used without separation
      3. not slang, nor colloquially derived as it was introduced by intellectuals

      However it failed number 4 because it is a loanword, an existing loanword with the same definition, but I would argue that the word was not borrowed naturally (ex. Cebuano speakers influx, interaction nor Cebuano getting political power to have their language used by Tagalogs).

      So what do you think, is this a neologism still? Should the rules at Wiktionary:Neologisms be modified if it is still a neologism? Thanks. Ysrael214 (talk) 18:49, 10 April 2024 (UTC)[reply]

      I still think we should count these as neologisms - they are loanwords, but they're not normal loanwords. If they're not neologisms, they are surely something else, since these have not come about naturally, but through a "top-down" approach in language development. — SURJECTION / T / C / L / 19:03, 10 April 2024 (UTC)[reply]
      That's pretty much the definition of a learned borrowing. Trooper57 (talk) 19:05, 10 April 2024 (UTC)[reply]
      Yet some learned borrowings are actually commonly used in their target languages; this word isn't. If it weren't borrowed from another language, everyone would agree to call it a puristic neologism. — SURJECTION / T / C / L / 19:24, 10 April 2024 (UTC)[reply]

      Add a lang code to Template:a (Template:accent)?[edit]

      In my quest to reduce the number of modules enumerating language-specific varieties I discovered yet another one, which is Module:accent qualifier/data. This is a real mess as labels from multiple languages are all jumbled together. Since there is no language code currently associated with {{a}}, clashes are a real problem and are solved in all sorts of ad-hoc ways, e.g. inexplicably, Lahore displays as Lahori Urdu but Lahori displays as Lahori Punjabi. I think the only way to make this clean is to add a language code to {{a}}. This would make it possible to separate the per-language uses and ultimately eliminate Module:accent qualifier/data entirely in favor of the label data. It would also make it possible to have some labels categorize, if we wanted that. Thoughts/support/opposition/etc.? Benwing2 (talk) 03:01, 11 April 2024 (UTC)[reply]

      Support. Theknightwho (talk) 03:03, 11 April 2024 (UTC)[reply]
      Yes please. This template and {{lookfrom}} (see RFDO) are just about the last frontier when it comes to templates that should have language codes but don't. This, that and the other (talk) 02:06, 12 April 2024 (UTC)[reply]

      rename all states, provinces, etc. to include the associated country in them[edit]

      User:نعم البدل pointed me to an issue we currently have, which is that Category:Punjab refers specifically to the Indian state of Punjab when there are actually two Punjabs, one a state in India and the other a province in Pakistan. Currently we have a messy situation where some first-level administrative divisions contain the country in them (e.g. Category:Arizona, USA; Category:Ostrobothnia, Finland; Category:Herefordshire, England) but others don't (Category:Punjab and other states in India; Category:Trentino-Alto Adige and other regions in Italy; Category:Lampung and other provinces in Indonesia; etc.). Sometimes instead the name of the administrative division is in the category, e.g. Category:Gunma Prefecture and other prefectures of Japan, except for Category:Hokkaido and Category:Tokyo; Category:Chittagong Division and other divisions of Bangladesh; etc. I would like to propose we rename all such first-level administrative divisions to include the name of the country in them, although it might make sense to keep the existing UK situation where the category contains the constituent country (Category:Herefordshire, England instead of Category:Herefordshire, UK or Category:Herefordshire, United Kingdom). Benwing2 (talk) 03:29, 11 April 2024 (UTC)[reply]

      @Benwing2: But England is a country (within the UK), just as Denmark is a country (within the Danish Realm).
      Oppose I think this is stoking up problems for disputed territories like Crimea or Kherson. --RichardW57m (talk) 11:33, 11 April 2024 (UTC)[reply]
      @RichardW57m The issue with Crimea is definitely an edge case, and there isn't even a CAT:Crimea or CAT:Kherson. We can leave disputed cases like this without any country in them; this is not a problem. The vast majority of states and provinces are not disputed, however, and as the issue with Punjab shows, there's a real problem with ambiguity when the country is not mentioned. Do you still oppose if we leave any problematic cases without an attached country? Benwing2 (talk) 22:20, 11 April 2024 (UTC)[reply]
      Can't we do it the other way around? Make a list of duplicates and make it obligatory there? I'm personally fine with both approaches, but I do think in some cases explicitly stating the country may be overkill and/or potentially problematic. Thadh (talk) 22:26, 11 April 2024 (UTC)[reply]
      @Thadh That is possible; personally I like stating the country because e.g. I've never heard of Gunma Prefecture or Lampung, and I think it helps users if we give more context. Can you give examples where it's problematic to state the country (other than the already-identified cases like Crimea, and a few others that come to mind, such as Abkhazia, South Ossetia and Gaza)? Note also that these problematic cases have to be handled with special-purpose code in any case because their category text identifies the country they're part of. Benwing2 (talk) 22:37, 11 April 2024 (UTC)[reply]
      I was thinking of regions with strong nationalistic movements, but no majority support for independence yet, or alternatively no international support for it. Things like calling Catalonia a part of Spain might strike a nerve with some people.
      Think also of regions in Myanmar that are currently not controlled by the government. These are not disputed between multiple countries, they're disputed between one recognised country and a rebel group, which makes it pretty complex, and also very susceptible to rapid changes. Thadh (talk) 00:12, 12 April 2024 (UTC)[reply]
      My general impression is that Wikimedia Commons and English Wikipedia encounter similar naming issues when there are two places with the same name, and those websites deal with these naming issues in more or less haphazard fashion, as Wiktionary does. A correct solution would include an extensive review the policies on those websites so Wiktionary could make an intelligently concieved standard policy. What a solution would be- I cannot say. --Geographyinitiative (talk) 22:45, 11 April 2024 (UTC)[reply]

      Russian book quotations[edit]

      Which of the following quotation formatting variants do you like better:

      1. 1877, Лев Толстой, Анна Каренина, Москва: Наука (1970), pages 57-61; English translation from Constance Garnett, transl., Anna Karenina, Philadelphia: G. W. Jacobs, 1919, page 86:
        Когда они вышли, карета Вронских уже отъехала. Выходившие люди все еще переговаривались о том, что случилось.
        Kogda oni vyšli, kareta Vronskix uže otʺjexala. Vyxodivšije ljudi vse ješče peregovarivalisʹ o tom, što slučilosʹ.
        When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.
      2. 1877, Лев Толстой, Анна Каренина, Москва: Наука (1970), pages 57-61; English translation from Constance Garnett, transl., Anna Karenina, Philadelphia: G. W. Jacobs, 1919, page 86:
        Когда́ они́ вы́шли, каре́та Вро́нских уже́ отъе́хала. Выходи́вшие лю́ди всё ещё перегова́ривались о том, что случи́лось.
        Kogdá oní výšli, karéta Vrónskix užé otʺjéxala. Vyxodívšije ljúdi vsjo ješčó peregovárivalisʹ o tom, što slučílosʹ.
        When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.
      3. 1877, Лев Толстой, Анна Каренина, Москва: Наука (1970), pages 57-61; English translation from Constance Garnett, transl., Anna Karenina, Philadelphia: G. W. Jacobs, 1919, page 86:
        Когда́ они́ вы́шли, каре́та Вро́нских уже́ отъе́хала. Выходи́вшие лю́ди всё ещё перегова́ривались о том, что случи́лось.
        Kogdá oní výšli, karéta Vrónskix užé otʺjéxala. Vyxodívšije ljúdi vsjo ješčó peregovárivalisʹ o tom, što slučílosʹ.
        When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.
      4. 1877, Лев Толстой, Анна Каренина, Москва: Наука (1970), pages 57-61; English translation from Constance Garnett, transl., Anna Karenina, Philadelphia: G. W. Jacobs, 1919, page 86:
        Когда они вышли, карета Вронских уже отъехала. Выходившие люди все еще переговаривались о том, что случилось.
        Kogdá oní výšli, karéta Vrónskix užé otʺjéxala. Vyxodívšije ljúdi vsjo ješčó peregovárivalisʹ o tom, što slučílosʹ.
        When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.
      5. 1877, Лев Толстой, Анна Каренина, Москва: Наука (1970), pages 57-61; English translation from Constance Garnett, transl., Anna Karenina, Philadelphia: G. W. Jacobs, 1919, page 86:
        Когда они вышли, карета Вронских уже отъехала. Выходившие люди все еще переговаривались о том, что случилось.
        [Когда́ они́ вы́шли, каре́та Вро́нских уже́ отъе́хала. Выходи́вшие лю́ди всё ещё перегова́ривались о том, что случи́лось.]
        Kogdá oní výšli, karéta Vrónskix užé otʺjéxala. Vyxodívšije ljúdi vsjo ješčó peregovárivalisʹ o tom, što slučílosʹ.
        When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.
      6. 1877, Лев Толстой, Анна Каренина, volume 1, Москва: Типо-литографія Т-ва И. Н. Кушнеровъ и К° (1903), page 87; English translation from Constance Garnett, transl., Anna Karenina, Philadelphia: G. W. Jacobs, 1919, page 86:
        Когда они вышли, карета Вронскихъ уже отъѣхала. Входившіе люди все еще переговаривались о томъ, что случилось.
        Kogdá oní výšli, karéta Vrónskix užé otʺjéxala. Vxodívšije ljúdi vsjo ješčó peregovárivalisʹ o tom, što slučílosʹ.
        When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.
      7. 1877, Лев Толстой, Анна Каренина, volume 1, Москва: Типо-литографія Т-ва И. Н. Кушнеровъ и К° (1903), page 87; English translation from Constance Garnett, transl., Anna Karenina, Philadelphia: G. W. Jacobs, 1919, page 86:
        Когда они вышли, карета Вронскихъ уже отъѣхала. Входившіе люди все еще переговаривались о томъ, что случилось.
        [Когда́ они́ вы́шли, каре́та Вронских уже́ отъе́хала. Входи́вшие лю́ди всё ещё перегова́ривались о том, что случи́лось.]
        Kogdá oní výšli, karéta Vronskix užé otʺjéxala. Vxodívšije ljúdi vsjo ješčó peregovárivalisʹ o tom, što slučílosʹ.
        When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.

      The differences are basically:

      1. |text= the original text from a modern paper book edition |t= English translation
      2. |text= the text from a modern paper book edition, modified to add stress accents and "ё" letters |t= English translation
      3. |text= the text from a modern paper book edition, modified to add stress accents, "ё" letters and wikilinks for all words |t= English translation
      4. |text= the original text from a modern paper book edition |tr= romanized transcription with the added stress accents and "jo" where appropriate |t= English translation
      5. |text= the original text from a modern paper book edition |norm= normalization of the Cyrillic text with the added accents and "ё" letters |t= English translation
      6. |text= the original text from a pre-reform paper book edition |tr= romanized transcription with the added accents and "jo" where appropriate |t= English translation
      7. |text= the original text from a pre-reform paper book edition |norm= normalization of the Cyrillic text to modern orthography with the added accents and "ё" letters|t= English translation

      Right now the variant 3 is used in Wiktionary. With some assistance from a presumably @Benwing2's bot correcting the quotations (e.g. this diff or this diff). But the conversion to modern orthography and the addition of stress accents and ё letters can be alternatively done automatically by a Lua module, allowing to implement the variants 4, 5, 6 or 7. With an extra benefit of having an instant feedback to the human, who is editing a Wiktionary article. See the technical discussion here and a working demo of such automatic conversion at Module:User:Ssvb/ru-autoaccent/testcases. The automatic conversion can be also amended via |subst= overrides for the parts of text that the automatic converter can't handle on its own due to the ambiguity of уже́ (užé) vs. у́же (úže) or все (vse) vs. всё (vsjo). It's also possible to override the whole sentence via |norm= parameter.

      PS. I also noticed that the 1903's edition of "Анна Каренина" used the word "входившіе" ("coming inside") and the 1970's edition changed it to "выходившие" ("coming outside"). Just shows that we can't always fully trust the modern editions of books, so preserving the original pre-reform orthography from the old book editions may be useful in quotations.

      --Ssvb (talk) 06:44, 11 April 2024 (UTC)[reply]

      @Ssvb Hi, I've meant to respond earlier. In terms of the above variants, I would definitely be opposed to variants 4 and 6 where the stress is included only in the transliteration. If we take the approach of including the original unaccented, un-ё'd, unlinked text in the |text= param, we should use the |norm= param to include accents, ё's and links. Also I've been thinking there are ways in |subst= of avoiding having to repeat the unaltered text in most circumstances; e.g. if there's only one уже in the text, writing уже́ by itself instead of уже/уже́ should be enough. I've already taken this approach elsewhere in the Czech, Portuguese and Catalan pronunciation modules. I should also add, there are several edge cases your auto-accenting code really needs to handle properly; I had hoped by linking to my offline script you would glean those edge cases from the script, but you mostly dismissed them as bells and whistles (some are, some aren't). For example, in cases like до́ смерти (dó smerti), there's a multisyllabic word that must remain unstressed because of the preceding stressed preposition, whereas a naive approach would stress it. Benwing2 (talk) 07:01, 11 April 2024 (UTC)[reply]
      @Benwing2: I haven't dismissed your offline script. I only mentioned that some of its functionality is clearly out of scope of my module. For example, the creation of wikilinks for the lemma forms of words. I don't think that doing lemmatization is practically feasible inside of a Lua module due to a much higher resources usage required for that. Additionally, I initially intended to implement auto-accenting for Belarusian quotations. And the code for processing the Russian "ё" from your offline script wasn't applicable to my use case (the dots above "ё" are mandatory in Belarusian texts and can't be omitted). I decided to put aside my Belarusian module plans and started implementing the auto-accenting code for the Russian language precisely because of your feedback. I believe that it would be easier to reach consensus and avoid friction when we are on the same page.
      I like your idea about making the usage of |subst= simpler. Also thanks for your до́ смерти example. I have added it to the list of testcases and will update the code to handle it properly. BTW, my auto-accent code doesn't edit wiki pages, so it has no potential to do long lasting difficult to reverse damage. Of course, problems in it preferably should be fixed now, but there's no harm in initially deploying it even with a few minor bugs. The growth of the Wiktionary backup dump is the only thing that worries me, because the module data size is ~6MB right now. And each tiny adjustment of the dictionary regenerates it all for now, but it's possible to come up with an incremental updates scheme.
      Do the curators of the Russian section of English Wiktionary have an opinion about making |text= faithful to the orthography of the original source and moving the accent markup to |norm=? --Ssvb (talk) 11:09, 11 April 2024 (UTC)[reply]
      @Benwing2, @Atitarev: Could you please comment on this? I believe that, feature wise, I have finished the conversion functionality and I'm not aware of any remaining bugs in the algorithm or in the approach in general. But it performs a dictionary assisted conversion, so any errors or omissions in the existing Wiktionary entries snapshotted at https://dumps.wikimedia.org/enwiktionary/20240401/ will show up in the generated output. For example, the two testcases at the top of Module:User:Ssvb/ru-autoaccent/testcases rely on the existence of "антидилювиа́льный" and "за́ руку" entries. If somebody creates these entries right now, then the dictionary can be regenerated after the 20th of April upon the arrival of the next Wikimedia's backup dump. I can still tidy up the code to make it better commented, cleaner and faster, but now it's necessary to figure out what are the requirements and roadmap for integrating it into mainspace. --Ssvb (talk) 07:15, 13 April 2024 (UTC)[reply]
      BTW, I wonder if the converter should keep the original spelling "антидилювіальныя" with "і" in its output or somehow visually highlight the word in other ways? My understanding is that it's the responsibility of a Wiktionary editor to provide the necessary |subst= crutches when adding a quotation. Automatic conversion can successfully and accurately do the bulk of the work, but still a human has to review the result and be ready to step in to add the necessary corrections. --Ssvb (talk) 07:41, 13 April 2024 (UTC)[reply]
      @Ssvb Apologies for the delay in responding and apologies also for my slightly snippy comment about your earlier response. I need to look over your code in more detail, which I can probably do tomorrow (it's bed time for me now), but it feels to me like we need to resolve the issue of how to format quotations, transliterations and the like. I don't like the idea of having *only* the transliteration contain the accents; this is not how it's normally done here. Either the original or normalized version should contain the accents, too. This might need to differ between usexes (where it's probably OK to auto-accent the original) and quotes (where it's still to be resolved how to proceed). Also, just a note, any auto accenting you implement should err *STRONGLY* on the side of not putting in an accent if there's any doubt; better to have no accent than a wrong one. This goes especially for something that happens on the fly, because any review that the author does could become invalid due to a subsequent updating of the underlying data. Benwing2 (talk) 08:00, 13 April 2024 (UTC)[reply]
      @Benwing2: I think that the Wiktionary quotation entries can have pretty detailed information in them and that the original unmodified spelling with all its archaic words and typos is a valuable piece of information, so it should be one of the template data fields. But the way how the information is presented to the end user in the browser is another matter. Having the original text, its automatic or manual normalization, the romanized transliteration and the English translation already makes it four lines of text in the browser instead of the current three lines. And things may look even more awkward if it's a multi-line poetry quotation. However, at the end of the day, the format of the presentation can be probably configurable on the frontend side and based on the end user's preferences. I mean, the end user may prefer to only see the modern normalized Cyrillic text and hide the original unaccented pre-1918 orthography to reduce the on-screen clutter, but this doesn't mean that we have to erase the original non-tampered quotations from the Template:quote-book instances themselves in the Russian entries.
      As for erring *STRONGLY* on the side of not putting in an accent if there's any doubt, I think that with this approach none of the words can be safely accented. Because we can't rule out the possibility that the same word with a different stress position may be eventually added to the dictionary in the future. It's only possible to safeguard against this by not adding any accents at all. And the same goes for the prepositions "из", "за", "до", "у", "от", "со", "без", "по", "на" and many others. If we end up deciding not to accent any words adjacent to these prepositions at all, because of the risk of them being potentially a part of предложно-именные сочетания, then we would be doing a disservice to the users. I think that we should instead prioritize adding the missing dictionary entries to provide a good coverage for all of these cases. But maybe @Atitarev has an opinion on this?
      As for the generation on the fly and future updates of the underlying data. I think that the module can probably automatically categorize quotations if it has troubles accenting something. I mean, let's suppose that Wiktionary initially had no entry for "за́мок" and the auto-accent module happily annotated this word as "замо́к" in one of the quotations. Now suppose that "за́мок" got eventually added to Wiktionary and the auto-accent module detected this inconsistency in a quotation. What's the best course of action? I think that just dropping the accent mark for "замок" and adding the page to some sort of a "need review" category would be a reasonable thing to do. --Ssvb (talk) 09:32, 13 April 2024 (UTC)[reply]
      @Benwing2, @Atitarev: BTW, is there a big comprehensive and freely available Russian dictionary in an electronic format with full stress marks information to validate the Wiktionary words data against it? I mean, something similar to the Belarusian https://github.com/Belarus/GrammarDB --Ssvb (talk) 09:59, 13 April 2024 (UTC)[reply]
      BTW, instead of my kinda lame and seemingly artificial "за́мок" / "замо́к" example, here's something more practical: the current module testcases include the word Ока́ (Oká), which is accented because it isn't ambiguous. But what if somebody adds О́ко Сауро́на (Óko Sauróna) and its genitive form О́ка Сауро́на (Óka Sauróna) to Wiktionary in the future? Using something like "То есть по крайней мере, помимо облика Ока, Саурон имел и физический облик человека" [14] as a quotation? This would invalidate the automatically generated on-the-fly accenting in the old quotations of the word "Ока". And we need to be prepared for the situations like this. --Ssvb (talk) 11:54, 13 April 2024 (UTC)[reply]
      @Ssvb: Of the modern text versions (1 to 5), I like 1 (text, transliteration in the strict sense, and translation) best, then 5 (accented Cyrillic as normalisation, which is then transliterated), but can stomach 4, where the transliteration includes the stress. 6 and 7 are a different text - the difference from the first five matters if one is demonstrating spelling, but as a quotation for the emboldened word they are equivalent. I think Wiktionary policy would actually call for 6 or 7, as being the earliest issue available to the editor. I don't think it makes any difference for establishing the latest data of use of the emboldened word. --RichardW57m (talk) 11:54, 11 April 2024 (UTC)[reply]
      Aside: I think the examples have abused |newversion=, and the presentation of the dates seems dodgy. I wonder if I may do that when millennia have elapsed between composition and the printed version quoted, with the 2nd version's fields referring to a translation not eligible for supporting the words used in it. I don't like the style of the wikitext - it's hard to extract pieces of data from it. --RichardW57m (talk) 12:08, 11 April 2024 (UTC)[reply]
      @RichardW57m: The |newversion abuse was suggested here by @Sgconlaw. The usefulness of it is at least twofold:
      • We get an English translation, written by a native English speaker, who was also a professional translator. And by contrast, if I translate the Russian text into English myself, then I risk accidentally constructing something ungrammatical. Currently Template:quote-book doesn't support something like a |t-check parameter.
      • It attests that Constance Garnett considered that particular English word to be an appropriate translation for that particular Russian word back in 1919.
      As for the dodgy dates, WT:QUOTE says "The year should be that of the earliest edition known to use the word. Where feasible, the page number should be taken from the first edition, but if a later edition is used (e.g. paperback version, or digitised by Google Books), then the publication date should be added in parentheses after the publisher’s name."
      My understanding is that "Anna Karenina" novel was initially published from 1875 to 1877 in the literary journal "The Russian Messenger". And then there were multiple book editions printed after that. Russian Wikisource uses the book from 1903 for its pre-reform orthography text and the book from 1970 for the modern orthography text. Now which years should be referenced in the quote-book template? I wish there were a simple non-ambiguous and easy to follow guideline related to adding Russian quotations in WT:ARU. People would have a lot less headache. --Ssvb (talk) 15:33, 11 April 2024 (UTC)[reply]
      @Ssvb: I don't dispute that the abuse is very useful. I've asked the same question myself, and had not got a satisfying answer. I've had the same problem using translations that are sometimes CC-BY-SA, so it's a legal obligation as well as courtesy to acknowledge the translator, which latter applies if the translation has been released into the public domain by the translator. (Sometimes I've felt obliged to use a more literal translation.) I'd been resorting to putting the acknowledgement in the |footnote= field. I've felt particularly cribbed when the spelling of an ancient text seems very much 20th century or later and I'm having to quote it from a scan published in yet another document. So I looked at the template invocations to see if I could learn a new trick. --RichardW57m (talk) 16:04, 11 April 2024 (UTC)[reply]
      @Ssvb: Hi. My preference is #3 with accents in the souce language. Some people dislike linking each word. I think it's helpful for learners or someone who wants to analyse the text. The number of links could be reduced to some e.g. difficult, rare words and/or remove links for proper nouns, company/product names or words, which are NOT supposed to be linked (i.e. have entries).
      IMO, providing accents on unaccented words is incorrect, e.g. #4. It was the original old practice, which has been eradicated over time. Let's not reintroduce it. :)
      We already have an established practice to accentuate each Russian word and supply letter "ё" whereever a word appears. I belong to the group who wants to keep it that way, including quotes (and extend the good practice to other languages where appropriate). I go out of my way to provide accents fo Cyrillic-based Slavic languages, vocalisations for Arabic, and more recently Persian and Urdu, nuqta spellings for Hindi terms.
      Errors like you mentioned re "входившіе" are uncommon. Editors may prefer to quote one or the other or both.
      These changes sometimes deal with not just the spelling but the grammar. онѣ́ (oně́) -> оне́ (oné) -> они́ (oní). The entry for оне́ (oné) shows where the original Pushkin pronunciation was preserved for the sake of rhyming. Anatoli T. (обсудить/вклад) 03:04, 15 April 2024 (UTC)[reply]

      I'm not a fan of any of these. It's better to cite the first edition if possible, and in this case the first edition is available online. The quotation should be faithful to the source—Tolstoy wrote in the 1800s, not in 2024, and the quotation should reflect that. See Middle English pyteuous for an extreme example of this. That said, the normalization should be included if the original quotation is hard to understand (which might be the case here). Also, Anna Karenina was written 1873–1877 and published 1875–1877 as well as 1878 (per Wikipedia), so I think that you having |year=1877 is inaccurate. I propose:

      • 1873–1877, Л[евъ] Н[иколаевичъ] Толстой [Leo Tolstoy], Анна Каренина [Anna Karenina], first volume, Москва: Типографія Т. Рисъ, [], published 1878, page 103; English translation from Constance Garnett, transl., Anna Karenina: A Novel, Philadelphia, P.A.: George W. Jacobs & Company, 1919, page 86:
        Когда они вышли, карета Вронскихъ уже отъѣхала. Входившіе люди все еще переговаривались о томъ, что случилось.
        [Когда́ они́ вы́шли, каре́та Вронских уже́ отъе́хала. Входи́вшие лю́ди всё ещё перегова́ривались о том, что случи́лось.]
        Kogdá oní výšli, karéta Vronskix užé otʺjéxala. Vxodívšije ljúdi vsjo ješčó peregovárivalisʹ o tom, što slučílosʹ.
        When they went out the Vronskys' carriage had already driven away. People coming in were still talking of what happened.

      Ioaxxere (talk) 14:51, 11 April 2024 (UTC)[reply]

      @Ioaxxere: I wrote my previous comment right before I noticed that there was your reply already added. Thanks for all this additional information. I'm primarily interested in integrating the automatic text normalization auto-accenting Lua module, so focusing on the boilerplate with dates unfortunately may sidetrack the discussion a bit. That said, you raised a good question about whether the original quotation in the pre-reform orthography is easy or hard to understand for the intended Wiktionary users. I hope that somebody can answer it. --Ssvb (talk) 15:56, 11 April 2024 (UTC)[reply]

      New label: ephemeral[edit]

      There are many cases where a term briefly becomes extremely popular, and then fades into obscurity. Labelling these (dated) or (obsolete) doesn't feel right. Therefore, I propose the label (ephemeral) for this purpose along with an associated category. Some ephemeral terms in English:

      Note that "ephemeral" only refers to time, not usage, so "ephemeral" terms can be slang, informal, or formal. Ioaxxere (talk) 01:06, 12 April 2024 (UTC)[reply]

      In support of this concept. (Side note, what's the earliest term that could have this label be applied to?) CitationsFreak (talk) 02:02, 12 April 2024 (UTC)[reply]
      Maybe terms like nervo-bilious and sulphite? Benwing2 (talk) 02:45, 12 April 2024 (UTC)[reply]
      At first glance, the concept seems questionable. How would one characterize the difference between 'dated' and 'ephemeral' in such a way that users would care? How brief a period of 'popularity' (itself problematic) would we require for a definition to be deemed 'ephemeral'? Less than a decade? Would we need to define a class of curves of usage frequency over time? Moreover, as a practical matter, even with satisfactory definitions and criteria, systematic application based on objective facts seems very unlikely. DCDuring (talk) 14:18, 12 April 2024 (UTC)[reply]
      How about "only used for X amount of years"? CitationsFreak (talk) 14:21, 12 April 2024 (UTC)[reply]
      If facts to support 'X' were required, that label would rarely be used. DCDuring (talk) 15:04, 13 April 2024 (UTC)[reply]
      Good point. I was thinking about this more, and I believe that the word should be heavily tied to a rapidly-forgetten fad. CitationsFreak (talk) 01:56, 15 April 2024 (UTC)[reply]
      What happens to such words after twenty more years of successful curation here on Wiktionary? After a certain period of time, the use of any word no longer regularly used should just be considered "dated", or even "obsolete". Therefore, if you are already making the judgment that a word is "ephemeral", it has presumably already lost its regular usage and should probably also just be considered "dated" from the point at which the judgment has been made, for the foreseeable future of Wiktionary, or until such time that the word were to come back into vogue.
      If a word is inextricably linked to a particular event whence its ephemerality derives, then that event and its ephemeral real nature should probably just be mentioned in its definition, rather than creating a new label that really just means to say "no longer used". It would also be difficult to determine what the requisite maximum "lifespan" of a word should be to be given this label.
      It's worth noting that a word can be both "dated" and "ephemeral", however. I'm reminded of both inkhorn terms from 16th and 17th century English: Latinate neologisms brought into English and used for a short period of time—as well as several reactionary neologisms of more Anglo-Saxon stock, created or resuscitated from the depths of time in response, such as inwit and gleeman.
      Shakespeare as well coined many, many words in his works, and though many of them are now part and parcel of English, many others have faded into obscurity, such as appertainments, attasked, conspectuity, defunctive, dispunge, enacture, ensear, exsufflicate, immoment, imperceiverant, intrenchant, irregulous, oppugnancy, relume, reprobance, and rubious. Surely some of those words (and others I can't call to mind right now) found "ephemeral" usage for a short time after their coining?
      Hermes Thrice Great (talk) 16:42, 12 April 2024 (UTC)[reply]
      @Hermes Thrice Great: I don't like to use "dated" or "obsolete" to describe ephemeral terms, since those labels generally describe words which were in common use for a long time, maybe even centuries, before gradually falling out of use. Ioaxxere (talk) 17:20, 12 April 2024 (UTC)[reply]
      Oh? I wasn't aware. I've put "dated" on old computer terminology that only existed for a decade or two. Equinox 17:24, 12 April 2024 (UTC)[reply]
      @Ioaxxere: For me, "dated" mostly refers to terms associated with a previous generation- using them "dates" you. Such terms were usually only popular during that generation. We've also used it in the sense you're referring to, but that's not the primary meaning. Chuck Entz (talk) 17:48, 12 April 2024 (UTC)[reply]
      @Equinox, Chuck Entz: I think you two are agreeing with me. I consider "a decade or two", or a generation (~20 years), a relatively long time in comparison with the "ephemeral" terms listed above. Ioaxxere (talk) 18:09, 12 April 2024 (UTC)[reply]
      I'm trying to think if I could use this. There was a city in Hubei named Jingsha for a little over two years between 1994 and 1996, that might be ephemeral. Also, I'm thinking that some Cultural Revolution-connected geographical terms may be ephemeral in this sense, but reach three durably archived cites. --Geographyinitiative (talk) 15:32, 13 April 2024 (UTC)[reply]

      AAVE vs. African-American English[edit]

      We have 637 terms in Category:African-American Vernacular English and 13 in Category:African-American English. However, I can attest (being married to a Black woman who doesn't speak AAVE) that the vast majority of these terms are not restricted to AAVE. About the only ones I can think of that come to mind as really AAVE-only are terms like aks/axe/ax "ask" and fitna (= fixing to "going to") that are proscribed in Standard English, along with pronunciation spellings like smoove, mouf, foun' and skraight that represent AAVE-specific pronunciation features. (I should also note that we have uncharacterized entries for weird pronunciation spellings like debbil that sound to me like something out of Uncle Remus; these need to be marked as archaic or obsolete or something.) This suggests we either need to move the large majority of these terms under Category:African-American English or just merge the two (since I'm doubtful the average Wiktionarian will be able to keep them correctly categorized). Benwing2 (talk) 04:53, 12 April 2024 (UTC)[reply]

      Well yeah, to the category African-American English while still making the distinction in the entry if an editor intents too, so I had merged nouchi with Ivorian French from the beginning, because the antinomy only arises if you look upon all terms collectively from bird's-eye view, via the category system.
      Many terms of these terms raise our attention as slang—as in the Ivorian example, where you would think the language is the official language of a nation and hence there would be an even greater share of non-vernacular terms—, so I doubt non-proscribed ones are the vast majority, from what we even have documented. For either comprehensibility or markedness, most terms may be proscribed in use with the general American population, you probably proscribe less if it is your wife—it depends which social circle you take as a point of reference? If it’s a white conservative one then it is every single one in “Standard English”? The other terms are often just intransparent as to their origin from African-American speech (dig) or particular meaning (Talk:tired).
      But that may be not what you are trying to say, you think about Standard English as from an Afro-American perspective (which you can very well assume), if that makes sense, but as you realize the average Wiktionarians cannot consistently make sense out of it though he have enough charity to get your idea, never setting foot in America but consuming her media, like you have not been to Abidjan to accurately judge registers (this is just a statistical probability). Fay Freak (talk) 08:45, 12 April 2024 (UTC)[reply]
      @Benwing2: What does "AAVE-only" mean? "Fitna" and "skraight" are the only ones that would be out of place in non-standard English dialogue out of the mouth of an Englishman of native origin. And as /skr/ is mostly of Danish origin in English, I wouldn't be surprised to find that the scream-stream merger came from the English West Country. --RichardW57m (talk) 09:55, 12 April 2024 (UTC)[reply]
      I think typing shortcuts could help with future labelling and categorization. At present:
      {{lb|en|AAVE}} generates (African-American Vernacular).
      {{lb|en|AAE}} now generates (AAE) [could/should(?) generate African-American English].
      The situation is, has been, and probably will be dynamic. Labels are likely to change and their application even more so. Empirical support for the application of the labels is likely to be scarce and soon (one or two decades) become dated or even deemed insulting. I don't think we can count on many contributors to make systematic revisions of fine categories under these circumstances. So labels that are not controversial and of broad application are probably best. "African-American English" would seem to fit the bill. Also, we do have register labels to add finer descriptions and for searches using Cirrus Search. DCDuring (talk) 13:04, 12 April 2024 (UTC)[reply]
      Absolutely agree this is an issue. It's come up before, e.g. 2020 discussion here, linking to earlier discussions. Not only on Wiktionary but in the world people sometimes refer to the vocabulary of Black Americans in general, or "Black rappers' slang", "Black Twitter" etc as "AAVE", maybe due to not knowing what else to call it, and even in linguistics literature many uses of "African American English" use it as a synonym of "AAVE", so even that label doesn't unambiguously distinguish anything. Many uses of "MLE" on Wiktionary seem to similarly just mean "words Black rappers in Britain use" rather than specifically MLE. Sometimes white people have labelled entries "MLE, AAVE", just meaning Black rappers on both sides of the Atlantic use it, not that it manages to belong and be limited to the specific lect of AAVE and also the specific lect of MLE. I do feel some reluctance to do away with (merge into broader labels) specific AAVE or MLE labels, because there are a few things in those categories which belong to those specific lects, but it'll continue to be a constant, effort-intensive (and if the current state of the categories is anything to go on, losing) task to keep checking the categories' contents for wrong entries, because it's clear people both on Wiktionary and in the world at large don't distinguish "AAVE" from "words Black Americans use".
      Yes, Remus-esque stuff like debbil or ebery needs its own category... as I said in 2020, I wouldn't even put them in whatever "African-American English" category we put the rest in (even with an "obsolete" label), because it's not clear they were mainly or ever used by Black people, as opposed to just by white people caricaturing Black people. Maybe create a label for use in pronunciation spelling of|en|foo|from=bar}}, to display something like "19th century white caricatures of Black speech" or something... - -sche (discuss) 14:49, 12 April 2024 (UTC)[reply]
      "African-American Vernacular English" and "African-American English" are synonyms, so the categories should be merged. Ioaxxere (talk) 16:22, 12 April 2024 (UTC)[reply]
      They're sometimes synonyms. But they're also sometimes not synonyms, sometimes African-American English includes AAVE but also other African-American sociolects of English. Most of what we currently have labelled "AAVE" is actually not AAVE, just AAE, as Benwing points out. But given the occasional synonymy and the widespread inability to consistently distinguish them, maybe we are better off moving all of this to a new label like "Black American English". - -sche (discuss) 17:51, 12 April 2024 (UTC)[reply]
      @-sche I would support merging the two into a label like "Black American English". Benwing2 (talk) 20:51, 12 April 2024 (UTC)[reply]
      Because any difference to Canadian Black English is also a black box? Has a parallel advantage if we introduce “Black British English”, which may or may not include Ireland, but we already have Category:Multicultural Toronto English, which would be another Black American English, and that like MLE only in the last three decades, so behind it is a substratum of Black English like in the US, not a recent “multiethnolect”, a substratum Britain did not have. Strange that the term Canadian Black English is a hapax. There is some truth to even linguistic literature being at a loss. Fay Freak (talk) 21:31, 12 April 2024 (UTC)[reply]
      @Fay Freak: Multicultural Toronto English is emphatically not a type of American/Canadian Black English. They have completely different origins. Ioaxxere (talk) 21:38, 12 April 2024 (UTC)[reply]
      @Ioaxxere: Isn’t that what I was saying? (Though I have doubts they will stay distinct later into the century.) It would be strange if we rename MLE to British Black English but keep “MTE” modelled after the MLE term, and also if we take advantage of the term “Black American English” but not “Black British English”, we are in a pickle.
      I add that there is also “Black Canadian English”, as one can’t use arbitrary orders of English adjectives, also rare, but here referring to MTE. Fay Freak (talk) 21:43, 12 April 2024 (UTC)[reply]
      My understanding is that the demographic surge of emancipated slaves and their descendants leaving the US South combined with barriers set up to keep them out of the mainstream created a distinctive type of Southern-based speech that almost swamped out the other varieties. Those other varieties aren't really part of AAVE, nor is that of those who made it into the mainstream (not to mention more recent immigrants from the Caribbean and from the rest of the world). The problem is that this complexity is almost invisible to those of us from other parts of US society, so it's too easy to conflate everything else with AAVE. Chuck Entz (talk) 23:08, 12 April 2024 (UTC)[reply]
      We don't have a good factual basis for maintaining relatively fine distinctions, so a broad label that is subject to criticism, but defensible, is probably better than narrow ones, which are also subject to criticism. Most criticism doesn't seem to be based on systematic facts, just anecdotes and idiolects. DCDuring (talk) 15:01, 13 April 2024 (UTC)[reply]

      Ukrainian IPA transcriptions—in particular concerning the vowel И[edit]

      I’ve noticed that many (the majority) of Ukrainian entries here on English Wikipedia transcribe the Cyrillic letter И as [e], when in fact it should—in most every case (but not all)—be transcribed as [ɪ] (see, for example w:Help:IPA/Ukrainian). This is especially obvious when comparing the IPA transcriptions for the same word on Ukrainian Wikipedia and English Wikipedia—here are a few examples:

      It seems that most of the cases where the erroneous IPA transcriptions show up were added by the bot User:WingerBot, and the IPA transcriptions tend to be more correct with regards to this letter when a real user (who presumably knows Ukrainian) had added the transcription prior to the bot having done its work on the Ukrainian corpus here on English Wikipedia. For example, in the entry ринок, the transcription, provided by a real user, has been given correctly as [ˈrɪnɔk].

      I'm not sure if the bot could be reconfigured to go through these entries and fix them, but this would be preferable to having to go through all these entries manually, especially because not all Ukrainian entries are affected, and also there are indeed cases where [e] ought to be the IPA transcription for И.

      Hermes Thrice Great (talk) 16:01, 12 April 2024 (UTC)[reply]

      @Hermes Thrice Great I don't think this has anything to do real users vs bots, because all of these entries seem to use {{uk-IPA}}, including the one you've given as an example of not having the problem. Evidently a distinction is made in the underlying module somewhere (which looks to be down to whether the syllable is stressed or not). It would be a really bad idea to go through all of these manually, because that would decouple the entries from the template, meaning that it's much harder to ensure entries are kept in sync. Theknightwho (talk) 16:18, 12 April 2024 (UTC)[reply]
      @Theknightwho Sorry, you’re right, I should have looked into the template code. Yikes, this is a real mess then.
      Hermes Thrice Great (talk) 16:50, 12 April 2024 (UTC)[reply]
      @Hermes Thrice Great I expect it’s easy to fix (if it needs to be fixed - I don’t know what scheme @Benwing2 used). Also pinging @Atitarev. Theknightwho (talk) 16:59, 12 April 2024 (UTC)[reply]
      @Theknightwho @Hermes Thrice Great This implementation predates me and yes it is simply merging и and е when they are unstressed. According to Ukrainian phonology:
      /ɛ/ and /ɪ/ approach [e], which may be a shared allophone for the two phonemes.
      This is equivocal as to whether these two sounds are actually the same; I imagine it depends on how formal the speaker is being. Maybe Anatoli can comment. Benwing2 (talk) 18:55, 12 April 2024 (UTC)[reply]
      Yes, that’s why I said in some cases it makes sense to transliterate it that way. And yes, it may approach a shared allophone in some cases, but especially at the end of a word in the infinitive form, and especially where there are the two separate vowels Е and then final И in the word, like for example перекладати, the difference should be transcribed. In the example I just linked, you can hear very clearly in the audio the difference in the two vowels, yet the IPA transcription given transcribes them the same, as [perekɫɐˈdate] instead of [perekɫɐˈdatɪ].
      Hermes Thrice Great (talk) 01:39, 13 April 2024 (UTC)[reply]
      @Benwing2, @Theknightwho, @Hermes Thrice Great: It’s not a mess. The module was based on one classical version of Ukrainian, which may not be common, known or accurate from the modern perspective. It was consistent. I don’t have any objection to change to [ɪ] for the unstressed «и». I’m sure there are cases where the pronunciation is not considered modern or common but there are too many opinions on this. I prefer the scheme to be sourced. Anatoli T. (обсудить/вклад) 01:46, 13 April 2024 (UTC)[reply]
      @Voltaigne might want to say something on the topic. Anatoli T. (обсудить/вклад) 02:03, 13 April 2024 (UTC)[reply]
      To my orthoepically untrained ear, "и" sounds closer to [ɪ] in stressed and unstressed syllables, but can tend towards [e] when it occurs unstressed at the end of a word. In any event I share Atitarev's preference for a sourced transcription scheme. Voltaigne (talk) 11:19, 13 April 2024 (UTC)[reply]
      @Voltaigne, @Benwing2, @Theknightwho, @Hermes Thrice Great: We can decide with a vote to change to [ɪ], since this is the most controversial and often mentioned discrepancy with the modern transcription. I am sure this can sourced too. Anatoli T. (обсудить/вклад) 03:07, 15 April 2024 (UTC)[reply]

      Mainspace Proto-West-Germanic?[edit]

      ᚲᚨᛒᚨ and Proto-Germanic *kambaz are now in CAT:E because the former has been created in mainspace as a Proto-West-Germanic entry and the latter links to it. It's true that ᚲᚨᛒᚨ is known from writing on a comb from the 3rd century that was found in the near Erfurt or Frienstedt in Germany. It apparently isn't Old Norse, which is allowed in mainspace, or East Germanic. That leaves either Proto-Germanic or Proto-West-Germanic. I don't know enough about either to say what should be done, but we can't leave things the way they are. Chuck Entz (talk) 23:50, 12 April 2024 (UTC)[reply]

      I see two alternative approaches:
      1: downgrade mainspace refusion errors in proto-languages rarely ever attested to editor-directed warnings on editing and on read with {{attn}} which now we see one every page with a gadget, and categorization of such pages into “mainspace entries of a language almost always only reconstructed”, since we don’t even have categorization of the error-affected section at all. I.e. raise maximum attention but still operate.
      2: Or add some exception list for particular pages that only people privileged due to knowing what they do (autopatrollers) can modify, or even know about, which would raise enough awareness. I.e. have a preventive approach and annoy during the editing process already.
      Personally I prefer the second as the more aggressive approach.
      The appropriate language name though is probably Proto-West Germanic, in view of the dating and ending, having been created before we introduced Proto-West Germanic. Fay Freak (talk) 11:54, 13 April 2024 (UTC)[reply]
      A list of approved mainspace pages could be a good idea for a case like this where the language will have very few, iff it isn't too memory-"expensive". My reaction would've been to just change PWG to function like Proto-Norse, if some terms in PWG are attested (if we take the attested terms to be PWG), but that has the downside that then people aren't warned if they forget to put the asterisk in front of any of the more numerous unattested words. - -sche (discuss) 14:09, 13 April 2024 (UTC)[reply]
      @-sche Small lists of exceptions like this aren’t the kind of thing that impact memory use in a problematic way, so I wouldn’t worry about that. Theknightwho (talk) 14:19, 13 April 2024 (UTC)[reply]
      Though a better solution might be some kind of manual override (an “anti-asterisk”), for use with reconstructed languages, which could be placed in headwords/links to suppress the error. It would have to be something relatively obscure, so it couldn’t be entered by mistake, and unlike an asterisk it wouldn’t actually display. Theknightwho (talk) 14:22, 13 April 2024 (UTC)[reply]
      This has lower maintenance and hence annoyance than an exception list; we would have a tracking category filled by the parametral override, for such rare cases then, in place of an module-data exception list—though categories have memory impact, it would barely ever not be on low-stress page-titles. Fay Freak (talk) 17:52, 13 April 2024 (UTC)[reply]
      Maybe foobar or foobar? Inspired by "sic!", but thinking !foobar itself might not be the best choice because someone might actually type that. Just spitballing. (Obviously, past some threshold, a reconstructed proto-language has enough attested terms to merit just treating it as attested, but I don't know if we want "one attested term" to be the threshold.) - -sche (discuss) 20:49, 13 April 2024 (UTC)[reply]
      "Forget to put asteriks in reconstructions"? If a word is a reconstruction, you'll never forget to put an asteriks if you know what your doing. Did such mistakes happened before? And who is gonna do such mistake? Is it not a policy we have here on Wiktionary to put asterikses for reconstructed words? Why worry yourself then? Tollef Salemann (talk) 19:48, 13 April 2024 (UTC)[reply]
      Would that that were so! Sadly, not everyone knows, or at least manages, to do everything right all of the time... even in your own comment you forgot to put the apostrophe that goes in know what you're doing — or perhaps that was your point/joke, heh. The number of unattested or attested terms that have been RFVed, RFDed or RFMed to move them either to or out of the reconstruction namespace is non-trivial, and is just a fraction of the cases where someone incorrectly omitted or used an asterisk. - -sche (discuss) 20:49, 13 April 2024 (UTC)[reply]

      Potentially irrelevant question: how certain are we that the inscription on the Erfurt-Frienstedt comb even means "comb"? It seems odd for a comb to be labelled "comb". Could it just be saying that the comb was owned by someone named "Kaba"? Ioaxxere (talk) 01:23, 14 April 2024 (UTC)[reply]

      Nothing is certain, but some variation on this is exactly what would be expected as a word for comb in that time and place based on comparing attested terms in related languages. It's not a coincidence that the ancestor of this term is reconstructed as *kambaz. Also, writing in early Germanic cultures had dimensions not found in modern culture- it had some kind of intrinsic magic. Many of the objects from that period had explicit references to the evil things that were supposed to happen to anyone who stole them. The circumstantial evidence points to this being some kind of offering that was intentionally placed where it was, so the writing may well have had some deeper purpose. Chuck Entz (talk) 02:47, 14 April 2024 (UTC)[reply]
      Actually, in Old Norse runic inscriptions, most of short inscriptions on objects ending with -a are infact names of the owners (where "a"-ending is a shortening of "á mik"). But in PWG it is not gonna work same way with this ending of course. Anyway, the inscriptions on objects calling objects names can be not only made with magical purpose as Chuck says, but also just for fun (like mr. Tenevil did object inscriptions in Chukotka). Tollef Salemann (talk) 07:45, 14 April 2024 (UTC)[reply]
      Also, i don’t remember how many exactly, but there are some other objects with runic inscriptions on them calling the name of the object and not the owner. On the other hand, if the owner’s name is used, it is often used in context of an ownership phrase, also not the name alone. Not sure how it works for Germany tho. Tollef Salemann (talk) 07:55, 14 April 2024 (UTC)[reply]

      clarify boilerplate on categories for terms derived from vs topical to fiction[edit]

      Wiktionary:Tea room/2023/December#Category:English_terms_derived_from_Star_Wars shows we would benefit from clarifying the boilerplate on "Foo terms derived from [fiction]" vs "foo:[fiction]" categories, to better explain the difference. I think most people agree on the core difference (?) :

      • "CAT:English terms derived from Star Trek" is for terms like cloaking device, which was coined by Star Trek. (In this case, cloaking device happens to also not belong in CAT:en:Star Trek, because it's a general-English word and no longer has any closer connection to Star Trek than captain does.)
      • "CAT:en:Star Wars" is for terms like Glup Shitto, which is directly about Star Wars. (Glup Shitto does not belong in "...derived from Star Wars", because it's not used in Star Wars or derived from anything that is.)
      • Some terms, e.g. Dalek, both derive from and are directly about (a race from) a fictional franchise, and belong in both a topical category and a "derived from" category.

      But sometimes a franchise coins a term X, or the franchise itself is named X, and someone else modifies X to XY: is XY "derived from" the franchise?

      1. Does Vaderesque belong in "terms derived from Star Wars"? (Or Voldemortian in "terms derived from Harry Potter"?) The franchises only coined "Vader" and "Voldemort", someone else added the suffixes.
      2. Does Warsie belong in "terms derived from Star Wars"? (Or Trekkie in "...derived from Star Trek"?)
      3. Does Hand Solo belong in "terms derived from Star Wars"? It derives from modifying the name of a character.

      Asking here rather than in the December TR in hopes of reaching more people. How people feel about these will help with rewriting the boilerplate: if the answer to any is "yes", we should change the "derived from" categories' text from saying terms "originated in" the franchise, to saying ~"originated in or are derived from", whereas if the answer to all three is "no" and we exclusively want terms coined by the franchise, not just derived from it, then we should rename the categories from "derived from" to "coined by"... - -sche (discuss) 05:15, 13 April 2024 (UTC)[reply]

      PS how 'expensive' would it be for the modules that generate the categories' boilerplates to check, on either type of category, whether a corresponding category of the other type exists (or failing that, to let users input the other category into e.g. a "crossreference=" parameter), and link each category to the other so that users are aware of both and hopefully glean the distinction from comparing them? - -sche (discuss) 05:15, 13 April 2024 (UTC)[reply]
      @-sche This is not expensive, and in any case the category pages aren't generally memory hogs. Benwing2 (talk) 21:52, 13 April 2024 (UTC)[reply]
      I note the following:
      • The "core difference" mentioned above seems somewhat different to what was mentioned by some editors during the Tea Room discussion, since it was posited there that "Category:English terms derived from Star Wars" includes all terms etymologically derived from terms invented in the franchise, while "Category:en:Star Wars" includes terms semantically related to the franchise.
      • I don't think there is anything inherent in the way either of the above categories is named which points one way or another. Derived is quite a general word. Thus, I think it is open to us to define the categories in whatever way achieves consensus.
      • I think terms like Hand Solo ought to be in one of these categories, because—like it or not—I'll bet editors will be adding them to such categories anyway.
      • Ideally, the categories should be defined in such a way that a term belongs in one or the other category, not both. (I assume "Category:English terms derived from Star Wars" will remain a subcategory of "Category:en:Star Wars". Thus, editors should put a term in the subcategory if it applies, and only put it in the parent category if the subcategory is inapplicable.)
      I'm minded to suggest the following definitions for the categories:
      • "Category:English terms derived from Star Wars" is for terms which were coined in the franchise as well as terms etymologically derived from such terms, but excluding the name of the franchise itself unless it is used in-universe within the franchise.
        • Comments: I wholeheartedly agree that the category should be limited to coinages to exclude general terms like cloaking device and moon which may happen to be used in the franchise. Extending the category to terms etymologically derived from franchise terms means that terms like Hand Solo and Vaderesque will be included.
      • "Category:en:Star Wars" is for terms which are neither coined in the franchise nor etymologically derived from such terms, but relate in some way to the franchise. Terms etymologically derived from the name of the franchise should be placed here, if the name is not actually used in-universe within the franchise.
        • Comments: Thus, a term like Glup Shitto would be placed in this category. This category would also include terms like Star Wars Day, Trekkie, and Warsie since (I assume) the terms Star Wars and Star Trek aren't actually used in-universe within the franchise.
      Sgconlaw (talk) 21:51, 13 April 2024 (UTC)[reply]

      I think Vaderesque, Warsie, and Hand Solo do not belong in Category:English terms derived from Star Wars. Ioaxxere (talk) 01:23, 14 April 2024 (UTC)[reply]

      @Ioaxxere: why, and which category should they then be placed in? — Sgconlaw (talk) 03:19, 14 April 2024 (UTC)[reply]

      French Translingual[edit]

      There used to be five entries in the nonexistent category Category:French Translingual: ĵ, ŝ, , ꞓ̂ and . These are due to User:Kwamikagami; they used to e.g. define ĵ as "j with a circumflex" but Kwami changed them to try and capture some of their common uses, in this case by changing the definition to

      # {{lb|mul|phonetics|French}} {{ng|The affricate sound of English ''[[j]]''.}}
      

      This was leading to the weird categorization. I made a change a couple of weeks ago to add language restrictions to labels like French so they only are supposed to categorize for a subset of known good languages. This was broken but I just fixed it, so now these entries no longer categorize. This leaves the question, though, of what the definitions *SHOULD* be. I think Kwami's point was that the definition "j with a circumflex" doesn't carry much information, but OTOH I'm sure ĵ is used for purposes other than the now-stated one. Thoughts as to how to make the definitions better? Benwing2 (talk) 21:47, 13 April 2024 (UTC)[reply]

      Does "French" mean this is only used (in this way) in the French language? Then the definition should be in a ==French== section, no? Or is there a "Frenchist" school of phonological notation the way there is an "Americanist" one (that uses "y" for /j/, etc) one, and people writing in multiple languages (but using this school's notation) would use this letter this way? Then I would think that either needs a different label, or needs to be in the definition, a la # {{ng|The affricate sound of English ''[[j]]'', in Frenchist phonological notation.}} replacing "Frenchist" with whatever the appropriate term is. (Even if using labels in this way no longer categorizes, we probably still want to track it, because it seems like ... well, a sign of an entry which should be cleaned up ...) - -sche (discuss) 23:09, 13 April 2024 (UTC)[reply]
      @-sche That's a good idea, I will create some cleanup categories or tracking pages. I think the intent was to express the idea of a "Frenchist" school of notation but I don't know if any such thing actually exists. Benwing2 (talk) 23:31, 13 April 2024 (UTC)[reply]
      Maybe any label which—with the language being mul—would normally generate "Category:[language name] Translingual", could generate a cleanup/attention category instead? (Maybe any regional label + Translingual should also be monitored, e.g. {{lb|mul|Appalachia}} / "Category:Appalachian Translingual"?) If it would not be too difficult or expensive, it would also be useful to track/categorize cases of "Category:[language name] [different language name]", e.g. "Category:German Irish", another persistent type of erroneous use of labels (when people use "Germany" etc as a topic label); the few categories where that is actually correct, e.g. "Category:Vietnamese Chinese", would need to be exempted. - -sche (discuss) 23:52, 13 April 2024 (UTC)[reply]
      @-sche I added tracking for any label that fails a lang restriction, which should include both types you mention above but not e.g. Category:Vietnamese Chinese, which doesn't fail any lang restriction. Benwing2 (talk) 00:08, 14 April 2024 (UTC)[reply]
      @-sche BTW I am skeptical there is any such thing as a Frenchist phonetic school; the IPA was created in fact by French linguists and all French dictionaries I've ever seen use the IPA. Benwing2 (talk) 00:12, 14 April 2024 (UTC)[reply]

      bullet points, usage notes and etymologies[edit]

      Diff was reverted, but wasn't it correct to add the bullet point? Don't we normally bullet usage notes? (And on the topic of bullet points, don't we normally not bullet etymologies? Because I sometimes see people bullet them.) Do we have enough consensus about whether these sections should vs shouldn't be bullet-pointed to add something to WT:ELE about it? - -sche (discuss) 00:34, 14 April 2024 (UTC)[reply]

      @-sche Yes, I have gone through several times and added missing bullet points to Usage notes. I thought this was the standard, and also I agree there shouldn't be bullet points in etymologies. Benwing2 (talk) 00:39, 14 April 2024 (UTC)[reply]
      I agree with User:Benwing2. Ioaxxere (talk) 01:23, 14 April 2024 (UTC)[reply]
      I’m not sure a bullet point is needed if there is only one usage note, but don’t mind if one is added. On the other hand, I do think that it is occasionally desirable to use bullet points in etymologies for readability, for example, if a term is partly derived from two languages. — Sgconlaw (talk) 03:23, 14 April 2024 (UTC)[reply]
      @Sgconlaw: That's true, bullet points are useful if an etymology often takes the form of a list, such as at kibosh. But I think we all agree that it shouldn't be the default, so accomplished for example shouldn't have a bullet. Ioaxxere (talk) 04:54, 14 April 2024 (UTC)[reply]
      @Ioaxxere Yes, exactly. Benwing2 (talk) 05:03, 14 April 2024 (UTC)[reply]
      @Ioaxxere: yes, single-sentence or paragraph etymologies shouldn’t be bulleted. — Sgconlaw (talk) 05:20, 14 April 2024 (UTC)[reply]
      I agree with this, kibosh is fine. If we add something to WT:ELE my suggestion, subject to wordsmithing/improvement please, would be along the lines of: either "The first sentence of an etymology should not be bulleted. Other parts of the etymology may be bulleted if they are a list." if we think lists should always be introduced by something unbulleted ("Uncertain. Theories include:", etc), or at least "The paragraphs/sentences of the etymology section should not be bulleted unless they are a list." IMO a list must also contain more than 1 item, a "1-item list" is not a list and should just be unbulleted; for bullet points to be used, there should be multiple items IMO. - -sche (discuss) 15:20, 14 April 2024 (UTC)[reply]

      FWIW, as of a database dump from last August (what I had handy), 3,834 pages contained Etymology===\n\* like abdominothoracic (very many of them added by just one user), and 24,770 pages contained Usage notes====\n[A-Z] like Abrahamic. I did this just to quickly get a rough figure, it does not account for numbered etymology sections or people using Usage notes at L3 or L5, and some pages have probably changed since August, to stop having the problem or to newly have the problem. Iff we agree on fixing these, maybe a bot (operating from a more up-to-date list) could remove the bullet from any etymologies where there is only one "item"/sentence/paragraph (and any etymologies like this, where the first item is bulleted and the second item is a bulleted affix template?), and we could see whether what's left is a small enough number to go through by hand? (In case some instances of multiple bullet points are OK.) - -sche (discuss) 15:21, 14 April 2024 (UTC)[reply]

      I would support this. Benwing2 (talk) 21:16, 14 April 2024 (UTC)[reply]
      Noting the vast number of unbulleted usage notes, I do not support bulleting them universally. It would be interesting to compare the number of occurrences of Usage notes==+\n[^*] with Usage notes==+\n\*. I suspect the former will strongly prevail but I'm open to being proven wrong. This, that and the other (talk) 22:53, 14 April 2024 (UTC)[reply]
      As of the database dump above, 26,925 mainspace pages (and 27,092 pages overall) contained Usage notes====\n\*, e.g. var gücüyle. 24,460 mainspace pages (and 24,770 pages overall) contained Usage notes====\n[A-Z], e.g. în ruptul capului. (88 mainspace pages contained Usage notes====\n(1|2|3|4|5|6|7|8|9|0|\[), e.g. tetranary and himmel, and I hope the number of pages where the usage notes start with bare non-English text not inside a template, or start with a lowercase letter, is negligible since both of those seem like issues to be cleaned up. I haven't provided numbers of pages where usage notes start with { because those are often templatized usage notes, and some templates provide bullet points and others don't, which AWB can't determine from scanning a database dump.) I'm surprised it's even that close, I thought bullet points were the standard... well, I won't add anything to ELE about bulleting or not bulleting usage notes without a wider discussion (with more input than this one has gotten), then.
      BTW, regardless of whether they use bullet points or not, 721 mainspace pages were using L3 usage notes (no mainspace pages were using L2 usage notes); many should be L4. - -sche (discuss) 23:51, 14 April 2024 (UTC)[reply]

      ...is clearly (and openly) a bot operated by User:This, that and the other. The bot seems to be running some maintenance tasks of a few Wiktionary: namespace pages that contain lists for various tasks. While the user is trusted and the bot doesn't seem to be doing anything that dangerous, it should still in principle have the bot flag, but it does not, nor does it seem a vote was ever started to even obtain one. — SURJECTION / T / C / L / 18:00, 14 April 2024 (UTC)[reply]

      Well, by the reasoning in WT:BOT, bots need vote approval and control to avert that they run amok and leave messes not cleaned up after. Not the issue for the list-type pages the bot has only edited that are not expected to be created manually in the first place. Fay Freak (talk) 19:54, 14 April 2024 (UTC)[reply]
      I still think it would be better to start a bot vote; I'm sure it won't have any problem passing. Benwing2 (talk) 21:15, 14 April 2024 (UTC)[reply]
      @Surjection yes, in the end I suppose it should have the flag, and that requires a vote: Wiktionary:Votes/bt-2024-04/User:TTObot for bot status. Don't have a lot of time right now to put the finishing touches on it though. This, that and the other (talk) 22:50, 14 April 2024 (UTC)[reply]
      @This, that and the other Thanks for creating the vote. Note that bot votes are generally only a week in length, so I think it's fine for you to shorten the vote to one week. Benwing2 (talk) 00:44, 16 April 2024 (UTC)[reply]
      @Benwing2 the vote page template defaulted to 14 days, so I just rolled with it. I think we changed from 7 to 14 days when we doubled the length of votes a few years ago. This, that and the other (talk) 12:13, 16 April 2024 (UTC)[reply]
      @This, that and the other I see, no problem. Benwing2 (talk) 20:40, 16 April 2024 (UTC)[reply]

      Etymology tree vote[edit]

      The etymology tree testing thread, which has run for two weeks now, has achieved good results with several minor bugs or problems being fixed[note 1] and every commenter being happy with the output. Therefore I feel that I'm ready to put the template up for a vote to establish consensus on whether it can be used on mainspace. I think that language community should establish its own policy on where etymology trees may be used, which may well be "nowhere".[note 2] However, before I start any vote I would like to address some objections that were made in last month's thread.

      @Benwing2 said that we should avoid duplicating information between the template and the etymology sections, and this is something I agree with. It's just that our current etymology sections are fairly inefficient at representing information and can be automated to a significant extent. One of the things I've been working on is automatic text generation, which you can see here.

      @Victar commented that we would need "a very complex module, with ways to mark derivational types, certainty, alternatives, mergers, etc." All of these have been integrated into Module:etymon.

      So I invite the community to discuss whether the module is ready for a vote, and if so, how it should be worded.


      1. ^ Specifically: a) unbolded transliterations, b) fixed overflowing text, c) decreased font size, d) fixed visual bug on Firefox
      2. ^ Note that {{etymon}} can run in a "silent mode" which produces no visible output, but passes along information to other entries. This could be added anywhere.

      Ioaxxere (talk) 20:17, 14 April 2024 (UTC)[reply]

      Edge cases maybe difficult, some concepts in Etymology are not as clear to convey like Wandelwörter, reference @benwing2. But it seems you should vote on it. Also will be difficult to display doublets or cognates ADDSamuels (talk) 01:49, 15 April 2024 (UTC)[reply]
      P.S: Should be voted as an opt-in additional. ADDSamuels (talk) 13:48, 15 April 2024 (UTC)[reply]
      Will we have a policy of eliminating lies, such as the claim that most Indic lects are descended from Sanskrit as normally understood and as usually described by Wikipedia? (A solution is to direct the user to WT:About Sanskrit instead of to w:Sanskrit. Dubious edits on Wikipedia to promote the Wiktionary definition of 'Sanskrit' seem to have disappeared.) --RichardW57m (talk) 10:14, 15 April 2024 (UTC)[reply]
      our current etymology sections are fairly inefficient at representing information and can be automated to a significant extent.
      I think that's a problem that needs to be solved before deploying this. For this to be useful, it needs to be widespread, and for it to be widespread it needs an automated way to be integrated into most of the existing etymologies plus manual effort to integrate it where it can't be added automatically. How can we replace the existing information in the etymology sections with this template? Do you have a parser to convert existing etymology data to this template? How many etymologies can be automatically replaced and how many will need manual work? JeffDoozan (talk) 13:49, 15 April 2024 (UTC)[reply]
      I think the biggest hurdle would be the fact that each etymology would need an ID. Those with {{etymid}} already present could just give that to the template, but those without could not. You could even probably generate ID's for most entries that only have 1 etymology (i.e. affixed adverbs and the like) and even only a few definitions, but you'd need a way to generate the actual ID name. I forsee a major hurdle with entries with many etymologies/definitions. Vininn126 (talk) 14:09, 15 April 2024 (UTC)[reply]
      @JeffDoozan: User:Vininn126 is right in that assigning IDs is a significant challenge. Basically, if an entry says "borrowed from X", does it mean X (etymology 1) or X (etymology 2)? Sometimes the ID is specified, but not often enough to be very useful. However, I have been working on getting structured etymological data from Wiktionary. Here's a sample:
      camarade#French>0 camaraderie#French>0
      thio-#English>0 thiocresol#English>0
      Reconstruction:Proto-Germanic/watrōną>0 water#English>2
      flop#English>1 flopover#English>0
      Conkright#English>0 Conkrights#English>0
      Reconstruction:Proto-Indo-European/mey->3 Reconstruction:Proto-Italic/moinos>0

      (>0 means that the entry only has a single etymology section)

      With that said, I don't think mass replacement needs to be a priority.

      Ioaxxere (talk) 14:41, 15 April 2024 (UTC)[reply]

      For what it's worth, I Support the idea of this template. I think it will attract a lot of readers. Having asked some various people online (hearsay evidence, I know, it would be nice to have an official account where we can poll people on other platforms...), they seem to generally love this. Probably not possible at the moment to deploy it large-scale, but is there anything stopping one from adding it to pages manually? Vininn126 (talk) 08:53, 16 April 2024 (UTC)[reply]

      Ordering entries differing in lexicographic spelling[edit]

      Do we need a policy on ordering the entries of the forms of the same lemma when they appear on the same page? For example, Latin nigra and nigrā are two different entries on the same page, and are different case and gender forms of the same lemma, Latin niger, which is on a different page. Indeed, should such forms have different entries? I see that Latin mala (bad) and malā (bad) share the same entry, the writing with the macron only being distinguished as a label for the pronunciations! I may not even be consistent myself - I may have done the entry for Lithuanian Alfrede wrong, as the two locative singular forms correspond to different citation (nominative singular) forms. --RichardW57m (talk) 14:53, 15 April 2024 (UTC)[reply]

      Recent changes to the citation templates[edit]

      User:JeffDoozan has been making a lot of changes to the citation templates as of late with their bot, AutoDooz. Some have introduced errors, but the change that I have to object to above all is changing parameter |1= to |lang=, and despite me bringing this up on their talk page, they went ahead with their script today, regardless. The rational given for this change is that it unifies the citation templates with the quotation templates, but this is argument fallacy because these templates have very different purposes. For one, we only specify the language of work if it isn't English, which mind you, isn't even done at all in other citation formats, ex. Harvard, Oxford, etc. Secondly, many journals are in multiple languages, or others, like lists, are in no language at all. Making language a mandatory field for citation templates is all around a bad idea and was done with no community consensus. @Benwing2 -- Sokkjō 18:04, 15 April 2024 (UTC)[reply]

      Including a language is not mandatory, as discussed here. Works in multiple language can use a comma separated list of language ids in either |1= and |worklang=. JeffDoozan (talk) 18:12, 15 April 2024 (UTC)[reply]
      But is it mandatory because if you use the template with just the numbered fields, you have to at the very least leave the field blank. -- Sokkjō 18:18, 15 April 2024 (UTC)[reply]
      @Sokkjo That means it's not mandatory, then. I'm sure you'll be able to cope with leaving a parameter blank - it's only one extra button press. Theknightwho (talk) 23:42, 15 April 2024 (UTC)[reply]
      Definitely this should have been discussed more before implementation. However, it's not clear to me it's wrong. It is a bit strange to have an optional langcode parameter in |1= but I do understand on the one hand the desire to harmonize the interfaces of quote-* and cite-* and on the other hand the fact that we don't categorize citations, so it's not strictly necessary to have the language specified for all citations. I actually think having numbered params (other than something |1= for a language code) for these templates, whether quote-* or cite-*, is a bad idea, because they're hard to make sense of when reading the wikitext and highly error-prone when creating the wikitext (esp. since there are a lot of them and every citation template is different from every other one). I proposed eliminating numbered params for the quote templates but some people objected, so I just eliminated them on the less-used ones and kept them e.g. for {{quote-book}} and {{quote-journal}}. Yes it requires a bit more typing to use named params but inserting a quotation or citation takes a lot of typing anyway, so the net effect isn't that much. Benwing2 (talk) 00:42, 16 April 2024 (UTC)[reply]
      @Benwing2: What myself and other users do is use the numbered parameters for quick inline citations, i.e. {{cite-book|year|author|title}}. Having language as a first parameter would mean we would have to do {{cite-book||year|author|title}}, leaving a blank field, which is very prone to error, either from people forgetting to do so, or from people misinterpreting it as typo. Certainly when creating a reference template, I use full parameter names. -- Sokkjō 01:48, 16 April 2024 (UTC)[reply]
      What is true about it is that the text in langcode is in less than optimal places, being placed after journal names, while we always encode the language of the journal piece. Fay Freak (talk) 18:24, 15 April 2024 (UTC)[reply]

      Multiple Quotations[edit]

      Reading WT:ELE in response to the call to bullet usage notes, I noticed the following text:

      Quotations are generally placed under the definition which they illustrate. If there is more than one being provided, or where this is not possible (e.g., a very early usage that does not clearly relate to a specific sense of the word), a separate section should be used.

      Are we meant to take this seriously? If so, is there any guidance on how (or whether) to relate quotations to senses? Or any good examples of so doing? --RichardW57m (talk) 14:59, 16 April 2024 (UTC)[reply]

      My preference (and I don't think I'm the only one) is to eradicate ====Quotations==== sections by assigning the quotations to senses or, if it's truly unclear whether they use any of the definitions we're providing, then moving them to the cites page — how weird it would be if quotations of senses that don't meet CFI were given extra prominence with their own section in the main entry! If ELE says that merely having more than one quotation means they should not go under the definition anymore, that definitely needs to be changed, yikes! We have so many entries where a definition is supported by more than one quotation under it... :o (prior vote btw) - -sche (discuss) 16:26, 16 April 2024 (UTC)[reply]
      100% agreed. ===Quotations=== sections are unhelpful, even more so than ===Synonyms=== and the like because at least the synonyms sections (sometimes) identify which sense the given terms are synonyms of. Benwing2 (talk) 18:47, 16 April 2024 (UTC)[reply]
      I agree. — Sgconlaw (talk) 18:57, 16 April 2024 (UTC)[reply]
      @-sche: Most lemmas don't have any quotations, though a high proportion (very roughly half) are backed up by dictionaries. (And most entries are non-lemmas without quotations.) I actually found it quote hard to find senses with two or more quotations. I suspect it's mostly those senses that have been challenged that have multiple quotations. For Pali I'd been working on the principle of keeping the best set of three with the sense, but I may not have had any extras that seemed worth putting on a citations page. I think the number three may simply have been the WDL requirement. --RichardW57m (talk) 16:56, 17 April 2024 (UTC)[reply]
      @RichardW57m I think you're right about this; senses that are obviously attestable often get only one quotation to illustrate them, but those that may be challenged are more likely to get three. Benwing2 (talk) 21:36, 17 April 2024 (UTC)[reply]

      Tibetan script for Japhug[edit]

      Currently, our Japhug entries are using IPA transcription as the main script. This follows almost all publications on (Kamnyu) Japhug, all of which are by Guillaume Jacques (and his co-authors). Guillaume Jacques' grammar on the language does mention: "The IPA orthography chosen to write Japhug in this grammar is probably not viable for use by native speakers, and an alternative writing system based on Tibetan script is preferable." The grammar does have a page and a little on how Tibetan script can be used for writing the language, though there aren't really many details given. After some discussion on Discord with @Thadh, we thought it best to bring it here to see if people have thoughts on how to proceed. To me, there are two possibilities. First is to keep the status quo and continue using IPA for lemmatization; Tibetan script could be added perhaps automatically to entries beside the headword. The second option would be to use Tibetan script for lemmatization, perhaps with IPA transcription as the romanization that appears beside headwords. — justin(r)leung (t...) | c=› } 15:34, 16 April 2024 (UTC)[reply]

      (I personally highly prefer the latter option). Thadh (talk) 15:36, 16 April 2024 (UTC)[reply]
      If the Tibetan script is not attested, then using it only reinforces Wiktionary as a den of fabrications. --RichardW57m (talk) 15:51, 16 April 2024 (UTC)[reply]
      @RichardW57m: It's not us who developed the script, it's described in the grammar. And the language itself is not written at all, it's only attested in grammars and scientific papers. If anything, people are going to thank us for using something that even resembles human language rather than a scientific transcription. Thadh (talk) 15:59, 16 April 2024 (UTC)[reply]
      People could also say that we're making things up with no proof, and we're chauvinists assuming that a language uses this script. We need to find people writing the language down, and see what script they use. CitationsFreak (talk) 16:26, 16 April 2024 (UTC)[reply]
      The language uses no script. In my opinion it's better to use some script (especially when it is proposed already!) than use no script. If at some point we hear from the speakers that they developed an alternative orthography, we can always switch to that. Thadh (talk) 16:31, 16 April 2024 (UTC)[reply]
      @Thadh: Is Japhug in the Tibetan script used to convey meaning? If not, the words in the Tibetan script don't meet CFI. And it doesn't seem to meet the principle of independence, so being an LDL should be irrelevant. --RichardW57m (talk) 16:27, 16 April 2024 (UTC)[reply]
      What are you even talking about? Thadh (talk) 16:29, 16 April 2024 (UTC)[reply]
      If the language is not normally written, and the only orthography used when it is written (or written about, by scholars) is the IPA-ish transcription, then it seems like creating unnecessary hurdles to require people to figure out how to translate the words as they are actually written into someone's speculative "maybe if people wrote this in Tibetan script, they'd write it like this" orthography. I say we just continue using the attested (IPA-ish) orthography, until such time as a Tibetan orthography actually becomes used. - -sche (discuss) 16:31, 16 April 2024 (UTC)[reply]
      Lemmatising at a practical orthography is standard practice for dictionaries. I think using IPA would be a grave mistake, and would also give out a "we don't really care" attitude. Thadh (talk) 16:35, 16 April 2024 (UTC)[reply]
      I agree with User:-sche here. Lemmatizing at what is at this point essentially a con-script seems worse in many ways than using the IPA that publications actually use. Benwing2 (talk) 18:44, 16 April 2024 (UTC)[reply]
      @-sche: "such time" - If ever. Population 3,000 and characterised as 'vulnerable'. --RichardW57m (talk) 16:04, 17 April 2024 (UTC)[reply]
      Thanks for all your input. I generally sense that the general sentiment is to stick with the status quo and use IPA transcription for lemmatization purposes. I am sympathetic towards this option as well. (Sorry, Thadh!) The question now is whether to show a possible Tibetan orthography in entries at all based on the schema described in Jacques Guillaume's grammar. — justin(r)leung (t...) | c=› } 00:28, 18 April 2024 (UTC)[reply]

      {{surf}}’s up[edit]

      (Previous discussion here.)

      What do people think of introducing a long-term replacement like {{sync}} ‘synchronically derivable from X + Y’? This is, in my view, the most precisely-defined proposal thus far, and that exact phrase is used in the linguistic literature. Nicodene (talk) 02:12, 17 April 2024 (UTC)[reply]

      Because we don't want to make it easy for new normal users? I don't think synchronic or its morphs are part of the idiolect of more than 5% (1%?) of normal users. What we mean by surface analysis is not much better, but at least the words are understandable. Superficially would be much clearer, though it is too (negatively) evaluative. DCDuring (talk) 12:44, 17 April 2024 (UTC)[reply]
      Solvable by simply linking to a glossary entry that explains the meaning. I doubt most visitors would know what a determiner is either, yet it would hardly be a reason to remove the correct linguistic term in favour of a fictitious alternative. Moreover our using a fictitious term gives readers the misleading impression that it actually exists. Nicodene (talk) 18:57, 17 April 2024 (UTC)[reply]
      Not a real solution, just an excuse. It's like fine-print or shrink-wrap (non)disclosure IRL. DCDuring (talk) 13:20, 18 April 2024 (UTC)[reply]
      I generally support this but I'd like to hear how you propose to convert things over. Can we just replace all uses of {{surf}}? If not we will potentially get a messy situation with both {{surf}} and {{sync}} coexisting indefinitely. Benwing2 (talk) 23:34, 17 April 2024 (UTC)[reply]
      Step one: {{surf}} is deprecated, step two: a bot converts {{surf|X|Y}} to {{sync|X|Y}} in cases where X and Y combined have the same spelling as the entry? (Allowing for the loss of a vowel at the end of X.) That should avoid most of the synchronically invalid combinations. Whatever is left will require human attention, unfortunately. I can take care of various European languages at least. Nicodene (talk) 05:24, 18 April 2024 (UTC)[reply]
      Note that {{sync}} already exists as a redirect to Template:syncopic form. (Personally, I think "sync" seems too much like the name for a template for syncing entries, and I'd rather find another short name for both "syncopic" and "synchronic" if possible.)
      IMO, if we want to change the name or wording of {{surf}}, I don't know if it makes a difference (in the long run) whether we slowly change uses of one over to the other, or just move (redirect) the existing template to the new name and update the wording (and potentially switch all uses over to the name main name all at once). As far as I have been able to tell, even going by our glossary definitions of these things (which say a surface analysis is a synchronic one), the places {{surf}} should be used and the places "{{sync}}" should be used seem to be identical. It's true that {{surf}} is currently used in some places it should not be, e.g. again says "By surface analysis, on- +‎ gain (“against”)" which is wrong (no speaker who is unaware that again comes from Middle English thinks that it looks on the surface like it was formed in modern English by combining "on-" and "gain"! this should be a "For more, see..." set of links or something)... but A) such improper uses need to be cleaned up whether we move/rename/switch the template or not (and basically the same process the Nicodene outlines for switching between templates would work even if just making a list of all uses of {{surf}} and progressively removing valid ones, or invalid ones after correcting them), and B) like people use {{surf}} in wrong places, people would undoubtedly also use {{sync}} in wrong places, particularly because DCDuring is right that "synchronic" would be an even more opaque name to many people (and the short name "sync" just makes it seem like a template to sync entries or sync links to entries or something), so if they see {{sync}} in entries, I expect the main takeaway for at least some people will just be "this is the way to link whatever words are closest to being the parts that make up the word in question", and that's how they'll use it, even in places we don't want it used, like again. That itself doesn't necessarily mean we shouldn't use "synchronic"; lots of other words we use are unfamiliar to laypeople, as Nicodene says; but it means I wouldn't expect renaming the template to be any help as far as making people only use it in the right places. I am actually sympathetic to DCDuring's point that the word clearest to laypeople would probably be "superficial", and indeed it's not hard to find linguistics literature discussing "superficial analysis" of how words are formed vs their actual etymologies, so maybe we should consider that. Heck, what if we made the wording something like "Superficially or synchronically derivable from x + y"? (We could probably even make it so that users could optionally suppress seeing "superficially or", in the same way people can opt out of or opt in to seeing {{,}}, so the 'clearer' word would be shown to laypeople, but logged-in linguists could choose to only see "synchronically".) - -sche (discuss) 16:41, 18 April 2024 (UTC)[reply]
      @Nicodene: I agree that we should try to reduce ambiguity but as others have said we should be avoiding jargon. How about this wording?
      The template names could be {{etyeq}} and {{anz}}.
      Ioaxxere (talk) 16:50, 18 April 2024 (UTC)[reply]

      Arabic-script affixes[edit]

      As we all know, Latin-script affixes are lemmatised with a hyphen (such as -ed); what about affixes in Arabic script? Currently, Arabic affixes are lemmatised in plain form without any hyphen, but get a connecting character in the headword (such as at ون) or do not get one (such as at وت). Ottoman Turkish affixes are lemmatised with the connecting character in the pagetitle (such as ـدن). Pashto affixes randomly do or do not get a hyphen (such as -تون versus تون). What is the correct way to do this? MuDavid 栘𩿠 (talk) 02:43, 17 April 2024 (UTC)[reply]

      @MuDavid I agree the current situation is a mess and should be harmonized. I would propose lemmatizing using the tatweel sign like in Ottoman Turkish, or if that is rejected, lemmatizing at the form without connector but placing the tatweel in the headword (like ون). Benwing2 (talk) 21:35, 17 April 2024 (UTC)[reply]

      Russian clitics in English Wiktionary[edit]

      @Atitarev @Benwing2: Currently the Russian clitic entries barely have any presence in English Wiktionary. But it's probably possible to borrow many of them from Russian Wiktionary.

      LETTERS = "Ѐ-џҊ-ԧꚀ-ꚗѣѢѳѲѵѴʼ"
      AC = "\xCC\x81"
      PREPOS = "[#{LETTERS}]+#{AC}[#{LETTERS}]*"
      WORD = "[#{LETTERS}]+"
      result = []
      fh = ARGV[0] ? File.open(ARGV[0]) : STDIN
      while line = fh.gets
        next unless line =~ /\|клитика/
        next if line =~ /\|клитика={{{клитика|}}}/ # skip empty entries
        newentries = line.scan /(#{PREPOS})\s+(#{WORD})/
        result += newentries.map {|a| a[0] + " " + a[1] }
        nextline = fh.gets
        # report suspiciously formatted entries
        STDERR.puts "#{line}#{nextline}" unless !newentries.empty? && nextline =~ /^(\}\})|(\|)/
      end
      # print results as wikilinks
      puts result.sort.uniq.map {|w| "[[" + w.gsub(/#{AC}/, "") + "|" + w + "]]" }.join(" ")
      

      Here are the results (please note that only two-word pairs are listed, but some of them could have been a part of a longer phrase):

      I also have corrected one inconsistently formatted entry there. Do the missing headword entries just need to be created in English Wiktionary? Or, similar to how it's done in Russian Wiktionary, some kind of a special template can be introduced for listing clitics in the parent word entries?

      BTW, as a person from Belarus, I personally find a lot of these clitics weird and unnatural to various extent. I understand the reason and necessity of the accent pattern in до́ смерти (dó smerti)" or на́ хуй (ná xuj), because these word pairs are not to be interpreted literally and have a different sense of their own. But many others just feel to me like somebody is trying to sound deliberately poetic or archaic when reciting a fairy tale or something. And, for example, I doubt that anyone from Belarus would ever say "до́ дому" in their Russian speech, maybe because of the influence from the Belarusian дадо́му (dadómu)? The whole concept of the prepositions stealing stress from the next word doesn't exist in the Belarusian language. I understand that it's the standard Russian pronunciation norm and has to be acknowledged as such, I'm just trying to say that my Russian language competence is definitely lacking in this area. I'm primarily interested in the Russian clitics for the purpose of correctly handling them in the auto-accenting Lua module. --Ssvb (talk) 04:18, 17 April 2024 (UTC)[reply]

      @Ssvb I am not a native Russian speaker; I'm sure Anatoli can answer better. I'll just note that Zaliznyak's grammatical dictionary notes the occurrences of such stress-stealing in the headwords of each word where it occurs. I would not necessarily recommend creating entries for all such combinations unless they are idiomatic. I think it's enough to note them in a usage note in the noun. I'm not sure if we need a special template for this, esp. since I imagine the conditions under which this stress-stealing occurs are rather varied and some of the expressions may be archaic, poetic, etc. as you note. Benwing2 (talk) 04:31, 17 April 2024 (UTC)[reply]
      As an example, under нога it has a diamond symbol (indicating special usages) followed by this:
      зá ногу, зá ноги; на́ ногу, на́ ноги; снде́ть ногá зá ногу (ногá нá ногу); заки́нуть нóгу зá ногу (нóгу нá ногу); переступа́ть (перемина́ться) с ноги́ нá ногу; бро́сить (смотрéть) по́д ноги
      This is a lot to unpack but I take it it is indicating the various expressions where stress-stealing occurs. Benwing2 (talk) 04:36, 17 April 2024 (UTC)[reply]
      @Benwing2: Thanks! So these are recorded in {{m|ru|stressed_preposition word}} format in English Wiktionary. This makes them relatively easy to parse from https://dumps.wikimedia.org/enwiktionary/20240401/enwiktionary-20240401-pages-articles-multistream.xml.bz2 directly. They are just not present in https://kaikki.org/dictionary/Russian/ JSON, but it's a minor issue. --Ssvb (talk) 04:58, 17 April 2024 (UTC)[reply]
      @Ssvb BTW if you want I can send you a copy of the 1980 edition. I think there's a newer one available online somewhere, but I'm not sure where; maybe User:Cinemantique would know. Benwing2 (talk) 05:17, 17 April 2024 (UTC)[reply]
      Benwing, do you reckon ходить по воду (xoditʹ po vodu) as idiomatic or should i never created it? Tollef Salemann (talk) 16:54, 17 April 2024 (UTC)[reply]
      @Tollef Salemann Not being a native Russian speaker I can't say for sure but I imagine if the entry should be there at all, it should be under по́ воду (pó vodu) as I bet you can use other verbs with it besides ходи́ть (xodítʹ). Benwing2 (talk) 21:29, 17 April 2024 (UTC)[reply]
      Not really. May be you can replace ходить with идти, but it's the same conjugation forms other than the infinitive. Ехать по воду (to drive for getting water) is also possible, but rare. Tollef Salemann (talk) 10:11, 18 April 2024 (UTC)[reply]
      As someone who created a Russian clitic entry, I wonder if it really worth to do it, because it is not really a thing and it is very depending on dialect. We can of course take clitic stuff from dictionary, but it is gonna be useless about 20-40 years (as of modern Russian dialects, it is useless anyway). I mean, it is important to register clitics, but it seems not so easy as the dictionaries say. Tollef Salemann (talk) 06:49, 17 April 2024 (UTC)[reply]
      @Tollef Salemann: These things are not reflected in spelling, so pronunciation may have diverged in different regions. Still only Moscow determines what is considered to be the official standard pronunciation. And "за́ руку" [18] seems to be legit. But "на́ берег"[19] - not so much and seems to primarily exist because of "Выходила на́ берег Катюша". --Ssvb (talk) 17:57, 17 April 2024 (UTC)[reply]
      Moscow pronunciation is not the same as "standard" Russian pronunciation. The dictionaries have often differencies, and they are under update. Different primary schools may use different dictionaries as well. So the clitics seem as a mess for me. Tollef Salemann (talk) 10:16, 18 April 2024 (UTC)[reply]
      @Ssvb: How would these be clitic entries? They look like clitic plus noun phrases, presumably suggested for inclusion because of idiomatic meanings or possibly (though I'm not sure of the validity of so doing) because they are not readily recognised as such. --RichardW57m (talk) 15:36, 17 April 2024 (UTC)[reply]
      @RichardW57m: They are relevant and deserve to be documented because they have different pronunciation at least by some speakers (the true authentic Russians). But unless they are really idiomatic, they don't need their own headword entries each (I agree with User:Benwing2). Please disregard the red links in my starter comment, they are a red herring. --Ssvb (talk) 18:10, 17 April 2024 (UTC)[reply]
      Having heard my Russian from Sovietized Qazaqstani Germans and Tatars, most, save distinct idiomatic ones, which are figurative but not literal uses of the vulgarities and apparently за́ бок (zá bok) which I only now hear, seem optional to me, the oftener I try to recall them, up to individual preference or even mood, some stilted, similar to Ssvb.
      и́зо дня (ízo dnja) seems like an archaism and бронь (bronʹ) I have never heard, and also not a coincidence that I never heard the set phrase за версту (za verstu) in either stress. Interestingly it shows that some such phrases, including при́ смерти (prí smerti), are idiomatic only in some regions and registers of the Russian language area (but до смерти (do smerti) has optionally either stress for me and is idiomatic anyhow).
      I also think that some of the phrases only have peculiar meaning and stress due to using a particular, stressed, sense of the preposition, namely за́ зиму (zá zimu) and за́ ночь (zá nočʹ) and на́ ночь (ná nočʹ) (all quite obligatory), which also constitutes the reason of the said figurative senses stress-stealing.
      There seems to be an overlooked, hardly satisfactorily described, part of Great Russian grammar that figurative prepositional phrases switch stress to attain emphasis for the figure. Fay Freak (talk) 22:21, 17 April 2024 (UTC)[reply]
      @Ssvb: Thanks for the ping. I think many of the clitics listed can be created but they have to be filtered case by case. Just a few things to consider
      1. бе́з соли (béz soli) sounds weird. I never heard it. без со́ли (bez sóli) is not an expression, IMO.
      2. Both до до́му (do dómu) and до́ дому (dó domu) are valid. The latter sounds a bit rustic or folkloric.
      3. Both за́ голову (zá golovu) and за го́лову (za gólovu) are valid. The same is true for many cases.
      4. {{&lit}} can be used to clarify that a term can be both idiomatic and unidiomatic. Many clitics will fall into that.
      Off-topic, your comment BTW, as a person from Belarus, I personally find a lot of these clitics weird and unnatural to various extent. surprises me. It's great if your Belarusian is better than Russian but unfortunately, it seems not many Belarusians mastered their own language. I heard 6% of Minsk citizens are fluent in Belarusian. I follow news from Belarus and many Belarusians gave interviews or answered reporters' questions in perfect Russian. Also, you can be even arrested for showing your preference to speak Belarusian over Russian. Anatoli T. (обсудить/вклад) 01:14, 18 April 2024 (UTC)[reply]
      Hey, I'm saying "béz soli", but "za nógi". They both ain't no really expressions, why to include such stuff? Only because the clitics? But there are thousands of them. Tollef Salemann (talk) 10:05, 18 April 2024 (UTC)[reply]
      @Tollef Salemann: I have never heard "бе́з соли" myself, but it is mentioned in Ushakov Dictionary. Why to include such stuff? It's useful to inform the users that the stress pattern may be unusual in some cases. And also stress can be marked automatically in quotations, the Lua module just needs to identify tricky cases and avoid marking stress in them. For now I can take the list from Russian Wiktionary, but it would be great to be able to rely only on the information from English Wiktionary alone. Russian Wiktionary lists less than 200 of them. English Wiktionary currently mentions them in notes of the declension tables, e.g. the "нога́" entry. --Ssvb (talk) 13:22, 18 April 2024 (UTC)[reply]

      template for INN spellings of drugs[edit]

      I made these three edits this morning, each indicating in a slightly different way that a given entry is the INN spelling of the name of a drug. I wonder if we should have a standalone template for this. If not, could the {{altsp}} template be made such that we could put "INN" in the from field and have it link to Wikipedia or to the glossary? Best regards, Soap 08:58, 17 April 2024 (UTC)[reply]

      And we could also use {{altform}}, which is perhaps more accurate. At first I avoided altform because i thought its labels could only be parenthemes, but it seems that I can type
      {{altform|en|methamphetamine|from=INN}}
      and get
      INN form of methamphetamine
      So if we could only have INN link to Wikipedia (or to a glossary entry, if we prefer), I think this would be the best solution of all. Ideally, it would also categorize just as labels like US do. Then anyone could see all the INN spellings (at least the ones we get to) all laid out in a list. Soap 09:15, 17 April 2024 (UTC)[reply]
      Re {{altsp}} vs {{altform}}: use {{altsp}} if the pronunciation is the same, and {{altform}} if it's different. (There is an effort in the works to make {{altsp}} spell this out soon.) - -sche (discuss) 16:33, 17 April 2024 (UTC)[reply]
      Let me see what I can do in terms of making {{alt sp}} be explicit about this. Since I assume metamfetamine would be pronounced differently than methamphetamine (/t/ instead of /θ/), it should use {{alt form}}. Benwing2 (talk) 23:30, 17 April 2024 (UTC)[reply]
      @Soap What you are proposing can easily be done by adding an INN label. Benwing2 (talk) 23:32, 17 April 2024 (UTC)[reply]
      That looks like a nightmare:
      1) Not all dialects of English distinguish /t/ and /θ/.
      2) What of the possibility of such a word being written with 't' but pronounced with /θ/?
      Or is {{altform}} appropriate if it represents a distinct set of pronunciations? --RichardW57m (talk) 11:20, 18 April 2024 (UTC)[reply]

      Proposed Reorganization of Regional Hokkien[edit]

      I have some concerns about the current classification system for Regional Hokkien. The system appears to be based more on administrative divisions than on linguistic, particularly phonological, relationships. For example, the current classification treats Tong'an dialect as a branch of Xiamen dialect. While administratively Tong'an District is part of Xiamen City, the narrowly-defined Xiamen dialect (specifically the dialect of Xiamen City center, which is spoken in the southwest of Xiamen Island) actually belongs to the Zhangdong branch (漳東腔), differing from the true Tong'an dialect, which is used in Tong'an District, Xiang'an District, and Kinmen County.

      If we were to simplify and categorize Hokkien into just three dialects—Quanzhou, Zhangzhou, and Xiamen—based solely on major geographical areas, it might be feasible. However, treating these as three distinct major divisions fails to accurately reflect the relationships among various other dialect points.

      The current classification system
      ├── Hui'an (惠安)
      ├── Longyan (龍巖)
      ├── Non-Mainland
      │   └── Taiwanese (臺灣)
      │       ├── Hsinchu (新竹)
      │       ├── Kaohsiung (高雄)
      │       ├── Lukang (鹿港)
      │       ├── Penghu (澎湖)
      │       ├── Sanxia (三峽)
      │       ├── Taichung (臺中)
      │       ├── Tainan (臺南)
      │       ├── Taipei (臺北)
      │       └── Yilan (宜蘭)
      ├── Quanzhou (泉州)
      │   ├── Anxi (安溪)
      │   ├── Jinjiang (晉江)
      │   ├── Lukang (鹿港)
      │   ├── Philippine (菲律賓)
      │   └── Yongchun (永春)
      ├── Xiamen (廈門)
      │   └── Tong'an (同安)
      └── Zhangzhou (漳州)
          ├── Changtai (長泰)
          ├── Medan (棉蘭)
          ├── Penang (庇能)
          ├── Taichung (臺中)
          ├── Yilan (宜蘭)
          ├── Zhangping (漳平)
          └── Zhao'an (詔安)
      

      To address these issues, I propose adopting a new system based on the Dialectal Atlas of Southern Min (閩南地區方言地圖集) by Professor Ang Uijin. This system classifies the core Southern Min dialects into eight branches: Tong'an, Quanhai, Quanshan, Quanzhong, Zhangdong, Zhanghai, Zhangnan, and Zhangshan. The first four can be collectively referred to as "Quan dialects" (泉系方言), and the latter four as "Zhang dialects" (漳系方言).

      Additionally, considering geographical, historical, and political factors, Taiwanese Hokkien could be established as a separate branch, with its internal dialects also categorized under these eight branches or their sub-branches. For example, Lukang Hokkien could be categorized under the Quanzhong branch.

      This new classification system would not only allow for a more precise relationship between the dialects but also maintain expandability for future adjustments. We do not need to immediately add all sub-branches, but I believe this approach could significantly improve the way Hokkien dialects are related and classified in Wiktionary.

      The proposed classification system (excluding Taiwanese)
      Hokkien
      ├── Tong'an (同安腔方言)
      │   └── Tong'an (同安話)
      │       ├── Tong'an (同安腔)
      │       └── Kinmen (金門腔)
      ├── QuanHai (泉海腔方言)
      │   ├── Northern Quanhai (泉海腔北片)
      │   │   ├── Chongwu (崇武話)
      │   │   ├── Honglai Majia (洪瀨馬甲話)
      │   │   ├── Luoyang (洛陽話)
      │   │   └── Shanyao (山腰話)
      │   ├── Old Quanhai (老泉海腔)
      │   │   ├── Huizhong (惠中話)
      │   │   └── Tuling (塗嶺話)
      │   ├── Shishi (石獅片)
      │   │   └── Shishi (石獅話)
      │   └── Southern Quanhai (泉海腔南片)
      │       ├── Jinjiang Fengze (晉江豐澤話)
      │       └── Xunpu (蟳埔話)
      ├── Quanshan (泉山腔方言)
      │   ├── Central Quanshan/Yongchun (泉山腔中片/永春話)
      │   │   ├──  (內永春腔)
      │   │   └──  (外永春腔)
      │   ├── Northern Quanshan (泉山腔北片)
      │   │   └── Dehua (德化話)
      │   └── Southern Quanshan (泉山腔南片)
      │       ├── Inner Anxi (內安溪腔)
      │       ├── Longjuan (龍涓腔)
      │       └── Outer Anxi (外安溪腔)
      ├── Quanzhong (泉中腔方言)
      │   ├── Central Quanzhong (泉中腔中片)
      │   │   ├── Nan'an (南安話)
      │   │   └── Shishan (詩山話)
      │   ├── Eastern Quanzhong (泉中腔東片)
      │   │   └── Licheng (鯉城話)
      │   ├── Southern Quanzhong (泉中腔南片)
      │   │   └── Shuitou Dongshi (水頭東石話)
      │   └── Western Quanzhong (泉中腔西片)
      │       └── Yingdu Cannei (英都參內話話)
      ├── Zhangdong (漳東腔方言)
      │   ├── Mixed Zhangdong (混合漳東腔)
      │   │   └── Xiamen (廈門話)
      │   ├── Quanzhouized Zhangdong (泉化漳東腔)
      │   │   ├── Chenjing (陳井話)
      │   │   ├── Guankou (灌口話)
      │   │   └── Xiamen Shanchang (廈門山場話)
      │   └── Standard Zhangdong (正漳東腔)
      │       └── Jiaomei (角美話)
      ├── Zhanghai (漳海腔方言)
      │   ├── Haicheng (海澄話)
      │   │   ├── Guanxun (官潯腔)
      │   │   ├── Longjiao (隆教腔)
      │   │   └── Standard Haicheng (正海澄腔)
      │   ├── Nanjing (南靖話)
      │   │   ├── Chuanchang (船場腔)
      │   │   └── Nanjing (南靖腔)
      │   ├── Northern Pinghe (平和話北片)
      │   │   └── Pinghe (平和腔)
      │   ├── Southern Pinghe (平和話南片)
      │   │   ├── Mapu (馬鋪腔)
      │   │   ├── Nanpu (南浦腔)
      │   │   ├── Shiliu (石榴腔)
      │   │   └── Wuzhai (五寨腔)
      │   ├── Western Pinghe (平和話西片)
      │   │   ├── Jiufeng (九峰腔)
      │   │   ├── Luxi (蘆溪腔)
      │   │   └── Xiazhai (霞寨腔)
      │   ├── Yunxiao (雲霄話)
      │   │   ├── Yunling (雲陵腔)
      │   │   └── Yunxiao (雲霄腔)
      │   └── Zhangpu (漳浦話)
      │       ├── Fotan (佛壇腔)
      │       ├── Qianting (前亭腔)
      │       └── Standard Zhangpu (正漳浦腔)
      ├── Zhangnan (漳南腔方言)
      │   ├── Donshan (東山話)
      │   │   ├── Dachan (大嵼腔)
      │   │   ├── Dongshan (東山腔)
      │   │   └── Tongling (銅陵腔)
      │   ├── Sidu (四都話)
      │   │   └── Sidu (四都腔)
      │   └── Zhao'an (詔安話)
      │       ├── Chencheng (陳城腔)
      │       ├── Qianlou (前樓腔)
      │       └── Zhao'an (詔安腔)
      └── Zhangshan (漳山腔方言)
          ├── Eastern Zhangdong (漳山腔東片)
          │   └── Changtai (長泰話)
          ├── Northern Zhangdong (漳山腔北片)
          │   ├── Hexi (和溪話)
          │   ├── Hua'an (華安話)
          │   ├── Mafang (馬坊話)
          │   └── Xiandu (仙都話)
          └── Southern Zhangdong (漳山腔南片)
              ├── Longhai (龍海話)
              └── Xiangcheng (薌城話)
      

      I would appreciate feedback on the proposed classification system from other editors. If you have a moment, please take a look and share your thoughts.

      @Theknightwho, Singaporelang, Mar vin kaiser, Mlgc1998, 幻光尘, MistiaLorrelay, RcAlex36, Kangtw, your insights would be especially valuable. Thank you! --TongcyDai (talk) 16:32, 17 April 2024 (UTC)[reply]

      @TongcyDai I am probably at least partly responsible for the current system; I've been trying to clean up the various Chinese lects and what we have results from what was there before (even messier) along with some changes I've attempted to make based on Wikipedia. I have no attachment whatsoever to the current system and would welcome some cleanup from someone who better understands the dialect situation. Also pinging @Wpi, ND381 who might have thoughts. Benwing2 (talk) 21:02, 17 April 2024 (UTC)[reply]
      @Benwing2 I sincerely appreciate the substantial efforts you have dedicated to organizing and refining the classifications of Chinese lects on Wiktionary. It's clear that such a task is complex and challenging, and the progress achieved, even though not yet perfect, has greatly improved upon what was previously in place. Your dedication to enhancing these entries is invaluable to the community.
      Thank you for your openness to new approaches and improvements. While referencing Wikipedia has been a good starting point for improving the Hokkien classifications, I have noticed some discrepancies and occasional inaccuracies across various aspects. This observation has inspired me to propose some adjustments based on specialized academic works, aiming to further refine our classification system on Wiktionary. --TongcyDai (talk) 06:44, 18 April 2024 (UTC)[reply]
      @TongcyDai Of course. My only comment would be that I am generally in agreement with Justin that some of the intermediate nodes might not be needed, as they are generally the most controversial. Benwing2 (talk) 07:24, 18 April 2024 (UTC)[reply]
      @TongcyDai Thanks for pointing this out. I am generally in favour of what you propose, although we might not need to go as detailed as the 片 level from Ang Uijin. I am also curious to know if there are better English translations of these names; are they taken from Ang or translated by yourself? — justin(r)leung (t...) | c=› } 02:25, 18 April 2024 (UTC)[reply]
      Thank you for your supportive feedback. I agree that while we may not need to delve into the level of detail as outlined by Ang Uijin, maintaining a broad framework would indeed be beneficial.
      Regarding the translation of the dialect names, I translated them myself with Hanyu Pinyin, following Wiktionary's conventions, as no English equivalents were provided in the original Dialectal Atlas of Southern Min.
      The atlas, including its maps and appendices, does not provide translations but does assign codes to different branches and dialect points, such as the Zhanghai dialect ("Jc"), Zhangpu subdialect ("Jc1"), and Qianting sub-subdialect ("Jc1.3"), where "J" presumably stands for Zhangzhou and "c" might denote coastal, although the specific romanization scheme used is unclear.
      Additionally, the use of identical names at different hierarchical levels, such as Tong'an dialect (同安腔方言), Tong'an subdialect (同安話), and Tong'an sub-subdialect (同安腔)—with only their Chinese suffixes varying—presents a challenge for listing in Wiktionary. This overlap necessitates careful consideration of how to name these similarly titled branches to ensure clarity and adherence to Wiktionary's standards. If we need to classify these categories in such detail, it is crucial that we develop a systematic approach to differentiate and label them appropriately. --TongcyDai (talk) 07:28, 18 April 2024 (UTC)[reply]

      disambiguation of links to Wikipedia[edit]

      If a Wiktionary page has multiple definitions where further reading on Wikipedia would be useful, maybe each Wikipedia link should be adjacent to that Wiktionary definition, instead of all the Wikipedia links getting shuffled into a separate section? For example (copy-paste from Wiktionary:Tea_room/2024/April#API):

      i made a "sandbox" at Talk:API#proposed deviation from usual format. i think it might be more useful to put Wikipedia links with each definition instead of lumping all the Wikipedia links into a single =Further reading= section? --173.67.42.107 06:12, 10 April 2024 (UTC)[reply]

      I like this idea, because to my mind it is more useful, direct, and intuitive to users. My gut will not be surprised if other Wiktionarians dislike it. The Beer parlour ("for policy discussion and cross-entry discussion") is the correct place to propose it, rather than the Tea room ("for questions concerning particular words"). Quercus solaris (talk) 16:35, 10 April 2024 (UTC)[reply]

      END copy-paste --173.67.42.107 23:47, 17 April 2024 (UTC)[reply]

      Very attentive. Would use. Fay Freak (talk) 00:24, 18 April 2024 (UTC)[reply]
      A follow-up thought: In my opinion, if Wiktionary adopts this idea, then it would be best to use significant case for the WP link, instead of sentence case, irrespective of WP's page titles being sentence case. I have a solid reason for favoring this approach, which I can share if anyone requests it. TLDR: it's better for the Wiktionary environment. Quercus solaris (talk) 01:10, 18 April 2024 (UTC)[reply]
      Yeah, you’re right. I’m not in favour of this because Wikipedia links are not that important, and splitting them up means possibly putting too much information adjacent to definitions which some readers may not like. — Sgconlaw (talk) 04:45, 18 April 2024 (UTC)[reply]
      Yes, too much clutter and noise, distracting from the definitions. How about a small ref-style Wikipedia icon which jumps to the wp link in the "Further reading" section? Jberkel 07:46, 18 April 2024 (UTC)[reply]