-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i18n: allow strings with duplicate message and descriptions #12723
Conversation
@@ -598,54 +598,43 @@ function writeStringsToCtcFiles(locale, strings) { | |||
} | |||
|
|||
/** | |||
* This function does three things: | |||
* This function does two things: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry @patrickhulce, this would have been nice to know before your changes. And I've bungled up your nice cascade of checks a bit :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, no worries. It's still more straightforward to read IMO so still a win :)
throw new Error(`Each strings' \`message\` or \`description\` must be different for the translation pipeline. The following keys did not have unique \`meaning\` values:\n\n${debugCollisionsMessage}`); | ||
// We have duplicate messages with different descriptions. Disambiguate using `meaning` for TC. | ||
for (const ctc of messageGroup) { | ||
ctc.meaning = ctc.description; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a few ways to do this. For instance, if this messageGroup
had three strings with one description
and two strings with a different description
, the three-string set could get no meaning
while the two-string set did get a meaning
.
However since this is currently a hypothetical situation and meaning
is hashed into the ID (not explicitly included in the messages), it doesn't matter a whole lot and assigning a meaning to any messageGroup
with multiple description
s is a lot simpler.
} | ||
|
||
// We survived fatal collisions, now check that the known collisions match our known list. | ||
const collidingMessages = allCollisions.map(collision => collision[1].message).sort(); | ||
// Check that the known collisions match our known list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy to bikeshed on "collisions" since it feels not as descriptive now
Second extract/dump was successful, so I think we're good here. I do have a lingering fear I'm missing something (some placeholder business or something? Feels like that should be handled with |
thanks! |
I started looking into fatal
collect-strings
collisions after https://github.com/GoogleChrome/lighthouse/pull/12697/files#r658831618 when it became clear that thatlighthouse-core/audits/accessibility/aria-progressbar-name.js | description
has been accidentally colliding withlighthouse-core/audits/accessibility/aria-treeitem-name.js | description
for quite some time and we (and TC) haven't noticed.It seems like something has changed with collisions. At least from what I can understand from inspecting some of the intermediate files in the pipeline, internally those two strings got deduped into a single message for translation (they share an ID which is hashed from the
message
andmeaning
), then properly duplicated again when dumped into our LHL format. Basically exactly what you'd hope for. From docs I've looked at now, it's actually encouraged that we allow colliding strings and only add ameaning
if we really want the strings to be translated separately, so it's possible this was a bug in the handler that ingested our CTC format that was incidentally fixed in the last few years.To test this, I took four currently colliding accessibility messages and made their descriptions the same as well so they'd be full collisions. They appear to have gone through a full TC roundtrip once, but I'm running again just to be sure :)
If this all looks good, we can probably follow up with setting a lot of the current collision list be full duplicates (no reason each stack pack needs "WordPress"/"Drupal"/"Joomla" in their descriptions when the string is just about using "HTML5 video"). Sometimes an explicit approach like @adamraine's in #12714 (comment) will make more sense when a lot of strings are explicitly being shared between files, but it's also nice to know that strings that collide by happenstance are no big deal.