Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locale negotiation is incorrect for i18n routing #18676

Open
longlho opened this issue Nov 2, 2020 · 44 comments
Open

locale negotiation is incorrect for i18n routing #18676

longlho opened this issue Nov 2, 2020 · 44 comments
Assignees
Labels
Internationalization (i18n) Related to Internationalization with Next.js.

Comments

@longlho
Copy link
Contributor

longlho commented Nov 2, 2020

Bug report

Describe the bug

If I send Accept-Language: "fr-XX,en" and I have available locales fr, en, it should give me fr instead of en (current behavior)

To Reproduce

Steps to reproduce the behavior, please provide code snippets or a repository:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

Should be routed to fr

Screenshots

If applicable, add screenshots to help explain your problem.

System information

  • OS: [e.g. macOS, Windows]
  • Browser (if applies) [e.g. chrome, safari]
  • Version of Next.js: [e.g. 6.0.2]
  • Version of Node.js: [e.g. 10.10.0]

Additional context

Add any other context about the problem here.

@longlho
Copy link
Contributor Author

longlho commented Nov 3, 2020

this might be something formatjs can provide actually

@longlho
Copy link
Contributor Author

longlho commented Nov 12, 2020

@Timer I believe u can use the ResolveLocale abstract operation from ECMA402 (spec). We do have an implementation in @formatjs/ecma402-abstract: https://codesandbox.io/s/affectionate-chatelet-nvgl7?file=/src/App.tsx

ResolveLocale(
	["fr", "en"],
	["fr-XX", "en"],
	{ localeMatcher: "best fit" },
	[],
	{},
	() => "en"
)

should yield

{
	"locale":"fr",
	"dataLocale":"fr"
}

@arguiot
Copy link

arguiot commented Nov 17, 2020

I have the same issue here... very very annoying.

@WsCandy
Copy link

WsCandy commented Nov 19, 2020

I've just encountered this in a real world scenario, I have Italian set up as an it subpath. Desktop browsers set to Italian seem to match it. Safari on iOS only sends it-it in their headers, this results on the page falling back to English.

Here's a link: https://i18n.wscandy.vercel.app/

@arguiot
Copy link

arguiot commented Nov 20, 2020

@ijjk You have assigned yourself to this issue, but to save you time, I tried to understand how NextJS took into account the Accept-Language header, and realized that this part was managed by the hapi/accept module:
https://github.com/vercel/next.js/blob/master/packages/next/next-server/server/next-server.ts#L328

So, I've written a small script that could replace the accept.language method:

const regex = /((([a-zA-Z]+(-[a-zA-Z0-9]+){0,2})|\*)(;q=[0-1](\.[0-9]+)?)?)*/g;

function parse(al){
    const strings = (al || "").match(regex);
    return strings.map(m => {
        if(!m){
            return;
        }

        const bits = m.split(';');
        const ietf = bits[0].split('-');
        const hasScript = ietf.length === 3;

        return {
            code: ietf[0],
            script: hasScript ? ietf[1] : null,
            region: hasScript ? ietf[2] : ietf[1],
            quality: bits[1] ? parseFloat(bits[1].split('=')[1]) : 1.0
        };
    }).filter(r => r).sort((a, b) => b.quality - a.quality);
}

function pick(supportedLanguages, acceptLanguage) {
    const parsedAcceptedLanguage = parse(acceptLanguage);

    const supported = supportedLanguages.map(support => {
        const bits = support.split('-');
        const hasScript = bits.length === 3;

        return {
            code: bits[0],
            script: hasScript ? bits[1] : null,
            region: hasScript ? bits[2] : bits[1]
        };
    });

    for (let i = 0; i < parsedAcceptedLanguage.length; i++) {
        const lang = parsedAcceptedLanguage[i];
        const langCode = lang.code.toLowerCase();
        const langRegion = lang.region ? lang.region.toLowerCase() : lang.region;
        const langScript = lang.script ? lang.script.toLowerCase() : lang.script;
        let possible = []
        for (let j = 0; j < supported.length; j++) {
            const supportedCode = supported[j].code.toLowerCase();
            const supportedScript = supported[j].script ? supported[j].script.toLowerCase() : supported[j].script;
            const supportedRegion = supported[j].region ? supported[j].region.toLowerCase() : supported[j].region;
            if (langCode === supportedCode) {
                possible.push({ lang: supportedLanguages[j], supportedScript, supportedRegion });
            }
        }
        if (possible.length > 0) {
            const regionsFilter = possible.filter(({ supportedRegion }) => supportedRegion == langRegion)
            if (regionsFilter.length > 0) {
                const scriptFilter = regionsFilter.filter(({ supportedScript }) => supportedScript == langScript)
                if (scriptFilter.length > 0) {
                    return scriptFilter[0].lang
                }
                return regionsFilter[0].lang
            } else {
                return possible[0].lang
            }
        }
    }

    return null;
}

Which, once implemented/imported in the server (line 328) would resolve into:

pick(i18n.locales, req.headers['accept-language'])

I've tested my script on multiple cases, and I found it always took the right option (but I encourage to try it yourself). The only thing is that it works better by adding general language code before region/script specific languages: for example, to make sure en is selected it has to be placed first:

const languages = ["en", "en-US", "en-GB"]
pick(languages, "en-AU,en-US;q=0.9,en;q=0.8")

@longlho
Copy link
Contributor Author

longlho commented Nov 20, 2020

@arguiot unfortunately your locale negotiation algorithm is not correct. There are currently 2 algorithms:

RFC4647 is pretty rudimentary while UTS35 is more sophisticated and handle legacy aliases (e.g zh-TW -> zh-Hant-TW). libicu uses UTS35 and that's also what formatjs simulate as well.

@arguiot
Copy link

arguiot commented Nov 20, 2020

@longlho Which part is wrong exactly? The links you gave aren't algorithm at all, they are specifications (which is quite different). Currently, the problem is not about parsing Accept-Language header or selecting the most relevant locale, but a bit of both.

I'm not saying my algorithm is perfect, because it's not the case and not the goal. The problem with specifications is that not everyone follows them. The language picker has to be very flexible to make sure that it works for everyone.

There's room for improvement (so if you want to improve my work or someone else's work, please do). But I think it's better to quickly fix the issue and improve the language detection overtime. I believe it's always better to have something pretty reliable right now that will get better than having something absolutely perfect but in 3 months.

@longlho
Copy link
Contributor Author

longlho commented Nov 20, 2020

I'm not sure you understand locale negotiation. You need language matching data from CLDR to resolve them correctly. Just parsing locales and doing matching based on lang/region/script is not enough. The 2 algorithms are described in the 2 specs if you read them. "Not everyone following them" because they don't really understand its complexity and choose to just hack together solutions that work in the 20 tests they have.

I don't recommend just hacking a solution. Either use a library that follows the algorithm or implement the algorithm itself.

Formatjs already follows the spec so either use that or libicu

@arguiot
Copy link

arguiot commented Nov 20, 2020

I probably don't understand locale negotiation as well as you. I would like to understand why parsing locales and doing matching based on lang/region/script is not enough in our case.

The problem with what you're proposing is that it's not related to Accept-Language headers at all. Which basically means that in the ECMA402 implementation q weightings aren't respected.

Moreover, we would have to parse the Accept-Language ourself. I already solved the problem in my implementation.

I agree on the fact that it's better to use FormatJS, but we first need to solve the q weightings problem first.

Finally, the ResolveLocale functions makes bad decisions:

ResolveLocale(
            ["en-CA", "fr-CA"],
            ["fr-FR"],
            { localeMatcher: "best fit" },
            [],
            {},
            () => "en"
)

In that case, it should return fr-CA because it's the closest locale, but it returns en.

@longlho
Copy link
Contributor Author

longlho commented Nov 20, 2020

Q weightings determines the order of the requested locale array.

And in your example, yes, it should return "en". You're assuming if you understand a language in a certain region then you understand that in all regions, which is not true for traditional vs simplified Chinese, or latin american spanish vs spain spanish.

Using lang/region/script is not enough. Because zh-TW matches with zh-Hant, not zh-Hans or zh and that data lives in CLDR.

@arguiot
Copy link

arguiot commented Nov 21, 2020

@longlho I know what Q weightings are, but does the order matter in the ResolveLocale function?

This is a very tricky point, and I think we should let the developer decide. Maybe by letting the developer use its own resolver in next.config.js? And make the ECMA402 resolver the default one.

The problem is that not every browser will send the header with the region and the code: Chrome, will do it but Safari will only send en-US for example.

For global websites, having a perfect language resolver matters a lot. But for country specific website, like in Canada for example, only the language code matters.

I would like to have your opinion on this proposal. For me I think it would solve the problem (and close our little debate 😄).

@longlho
Copy link
Contributor Author

longlho commented Nov 21, 2020

The order absolutely matters in ResolveLocale. And tbh you should NOT let developers control the resolver bc it requires fairly extensive knowledge on how locale negotiation & CLDR data works, in conjunction with things like upgrading libicu/CLDR/unicode versions. Unless you have a dedicated team for it, it's not something that an average dev wants to deal with.

The problem is that not every browser will send the header with the region and the code: Chrome, will do it but Safari will only send en-US for example.

That's not entirely correct. Locale preferences are determined by both browser user settings and OSes. It contains more than just lang/script/region. If u read the UTS35 LDML spec it contains preferences like calendar, numbering systems and such (e.g ar-nu-latn is Arabic using Latin numbering system instead of Arabic ones). Calendar/numbering systems can be pulled from the OS by the browser. Browsers can choose to send those or not, due to potential fingerprinting issue.

For global websites, having a perfect language resolver matters a lot. But for country specific website, like in Canada for example, only the language code matters.

Canada has 2 official languages: English & French. CA also uses Celsius instead of Fahrenheit. There are a lot of differences in en vs en-CA (number grouping, grouping symbol, currencies...). I guess I'm not sure what you meant here.

Having worked with large engineering org I have not seen the need to have "custom" i18n functionalities that are not encompassed by Unicode or ICU due to high learning curve and complexity, so providing something standard out of the box is the norm.

arguiot added a commit to arguiot/next.js that referenced this issue Nov 23, 2020
@arguiot
Copy link

arguiot commented Nov 23, 2020

I must admit that you convinced me. I created a Pull Request (#19460) based on your code. It would be great if you could review the code and make sure everything is working properly.

@flayks
Copy link

flayks commented Nov 29, 2020

Experiencing the same issue on a Next 10 sites using built-in localization. The client using iOS Safari (I can reproduce it on my iPad) is not redirected to /fr but keeps seeing the content of the site with the default language (en). Works on any other browser though.

@timneutkens timneutkens modified the milestones: iteration 13, iteration 14 Dec 1, 2020
@hohl
Copy link

hohl commented Aug 25, 2021

Given that it seems that this issue isn't gonna resolved anytime soon: did somebody at least find a workaround? Some way to hijack the locale negotiation routine maybe?

@traviscrist
Copy link

I wish, I just ended up adding a language button to the bottom of my site which lets you flip from english to spanish.

Its amazing there is no work being done to fix this, if this was more than a personal site i’d consider is a major issue.

@longlho
Copy link
Contributor Author

longlho commented Aug 26, 2021

@hohl technically it is possible if you don't use built in next.js locale routing and just read directly from headers to negotiate

@hgezim
Copy link

hgezim commented Aug 26, 2021

I already solved the problem in my implementation.

@arguiot How did you solve it?

@hgezim
Copy link

hgezim commented Aug 26, 2021

Issue also presents itself in Firefox. I surely thought this was me not using Next.js properly. It's super odd that this bug has been allowed to persist since i18n was implemented in Next.js.

@chr4ss12
Copy link

facing same issue here,

@longlho do you know what happened with your PR? #19552 that looks good to me, I'll try to patch-package it and that should be good workaround for now

@longlho
Copy link
Contributor Author

longlho commented Aug 28, 2021

the PR is pretty old but that's the general idea. If you're using vercel the issue is upstream routing at the CDN level so that patch won't help.

@neb-b
Copy link

neb-b commented Aug 31, 2021

I can't tell if this has been reported yet, but it seems this is happening specifically with dynamic routes. If you create a new next.js project with a single pages/index.js file it works correctly. But as soon as I add a pages/[id].js it starts reporting the locale as whatever defaultLocale I have in my next.config.js

Since this seems to be related to the routing, it would be nice to have an option to use localeDetection: true and something like localeRedirection: false

@olivierpascal
Copy link

Any updates?

@topched
Copy link

topched commented Nov 1, 2021

Safari does NOT work for us pushing the locale into router.push it always goes back to english. Chrome works just fine. using next v 11.1.2

@olivierpascal
Copy link

Here is a simple expression of the problem:

$ curl 'https://mywebsite' -H 'Accept-Language: fr-FR'
Redirecting to /en (308)

$ curl 'https://mywebsite' -H 'Accept-Language: fr'
Redirecting to /fr (307)

@timuric
Copy link

timuric commented Nov 2, 2021

One possible workaround is to include all locales, like so:

const es = [
   "es",
   "es-AR",
   "es-BO",
   "es-CL",
   "es-CO",
   "es-CR",
   "es-DO",
   "es-EC",
   "es-ES",
   "es-GT",
   "es-HN",
   "es-MX",
   "es-NI",
   "es-PA",
   "es-PE",
   "es-PR",
   "es-PY",
   "es-SV",
   "es-UY",
   "es-VE"
];

const fr = ["fr", "fr-BE", "fr-CA", "fr-CH", "fr-FR", "fr-LU", "fr-MC"];

const de = ["de", "de-AT", "de-CH", "de-DE", "de-LI", "de-LU"];

const nl = ["nl", "nl-BE", "nl-NL"];

It creates a massive overhead for static page generation though.

I have tested multiple production next websites and unfortunately all of them have this flaw.

It is quite sad that this is a year old issue, I guess it is under reported because it is hard to spot and real users most likely would not report such UX issue.

@pozylon
Copy link

pozylon commented Nov 2, 2021

@timuric Our bloody workaround was to use _app, add a small hook so there is client-side redirection from / to /de when accpt-language resolves to de* and route is /. But of course it has major downsides

@timuric
Copy link

timuric commented Nov 2, 2021

@pozylon

@timuric Our bloody workaround was to use _app, add a small hook so there is client-side redirection from / to /de when accpt-language resolves to de* and route is /. But of course it has major downsides

Haha, we also tried that, but unfortunately it is not bullet proof :(

So far adding all possible locales does the job, but as I mentioned there is a massive overhead that can be a deal breaker for some static paths with many permutations.

@pozylon
Copy link

pozylon commented Nov 2, 2021

@timuric We use SSG for a webshop and I'd wait 2h for a build if I did that haha ^^

Well it's all crap we need a next-native solution, I'd prefer beeing able to define a custom locale resolver as a function:

{
  i18n: {
     resolveLocale(ctx) { return ctx.req.headers?["X-Startreck"] ? "klingon" : null }
  }
}

@longlho
Copy link
Contributor Author

longlho commented Nov 4, 2021

you guys can use https://formatjs.io/docs/polyfills/intl-localematcher to manually match locales

@styfle styfle modified the milestones: 11.1.x, 12.0.4 Nov 5, 2021
@timneutkens timneutkens removed this from the 12.0.5 milestone Nov 17, 2021
@timneutkens timneutkens added Internationalization (i18n) Related to Internationalization with Next.js. and removed point: 5 labels Nov 17, 2021
@ricamgar

This comment has been minimized.

@olivier-voutat
Copy link

olivier-voutat commented Jun 17, 2022

I had this same problem. My workaround is to do my _middleware file as this:

import { NextRequest, NextResponse } from 'next/server'

import { ACCEPTED_LOCALES, DEFAULT_LANGUAGE } from '../core/constants/languages'

const PUBLIC_FILE = /\.(.*)$/

export function middleware(req: NextRequest) {
	const shouldHandleLocale =
		!PUBLIC_FILE.test(req.nextUrl.pathname) &&
		!req.nextUrl.pathname.includes('/api/') &&
		req.nextUrl.locale === 'default'

	if (shouldHandleLocale) {
		const language = req.cookies['NEXT_LOCALE'] ?? getBrowserLanguage(req) ?? DEFAULT_LANGUAGE
		const url = req.nextUrl.clone()
		url.pathname = `/${language}${req.nextUrl.pathname}`
		return NextResponse.redirect(url)
	}

	return undefined
}

// Chrome automaticaly adds fallback to generic languages
// so this is not called when it is the used browser
const getBrowserLanguage = (req: NextRequest) => {
	return req.headers
		.get('accept-language')
		?.split(',')
		.map((i) => i.split(';'))
		?.reduce(
			(ac: { code: string; priority: string }[], lang) => [
				...ac,
				{ code: lang[0], priority: lang[1] }
			],
			[]
		)
		?.sort((a, b) => (a.priority > b.priority ? -1 : 1))
		?.find((i) => ACCEPTED_LOCALES.includes(i.code.substring(0, 2)))
		?.code?.substring(0, 2)
}

My ACCEPTED_LOCALES variable = ['en', 'fr', 'pt']

@olivier-voutat
Copy link

I had this same problem. My workaround is to do my _middleware file as this:

import { NextRequest, NextResponse } from 'next/server'

import { ACCEPTED_LOCALES, DEFAULT_LANGUAGE } from '../core/constants/languages'

const PUBLIC_FILE = /\.(.*)$/

export function middleware(req: NextRequest) {
	const shouldHandleLocale =
		!PUBLIC_FILE.test(req.nextUrl.pathname) &&
		!req.nextUrl.pathname.includes('/api/') &&
		req.nextUrl.locale === 'default'

	if (shouldHandleLocale) {
		const language = req.cookies['NEXT_LOCALE'] ?? getBrowserLanguage(req) ?? DEFAULT_LANGUAGE
		const url = req.nextUrl.clone()
		url.pathname = `/${language}${req.nextUrl.pathname}`
		return NextResponse.redirect(url)
	}

	return undefined
}

// Chrome automaticaly adds fallback to generic languages
// so this is not called when it is the used browser
const getBrowserLanguage = (req: NextRequest) => {
	return req.headers
		.get('accept-language')
		?.split(',')
		.map((i) => i.split(';'))
		?.reduce(
			(ac: { code: string; priority: string }[], lang) => [
				...ac,
				{ code: lang[0], priority: lang[1] }
			],
			[]
		)
		?.sort((a, b) => (a.priority > b.priority ? -1 : 1))
		?.find((i) => ACCEPTED_LOCALES.includes(i.code.substring(0, 2)))
		?.code?.substring(0, 2)
}

My ACCEPTED_LOCALES variable = ['en', 'fr', 'pt']

My test results with this workaround (en default language, browsers languages in this order: fr-CH, pt-BR, en-US)

Firefox

http://localhost:3000/ => http://localhost:3000/fr/ - language found in getBrowserLanguage function
http://localhost:3000/test-localize/ => http://localhost:3000/fr/test-localize/ - language found in getBrowserLanguage function

Chrome

http://localhost:3000/ => http://localhost:3000/fr/ - language fr sent by chrome, no call to getBrowserLanguage
http://localhost:3000/test-localize/ => http://localhost:3000/fr/test-localize/ - language found in getBrowserLanguage function

@xdamman
Copy link

xdamman commented Dec 19, 2023

$ curl 'https://mywebsite' -H 'Accept-Language: fr-FR'
Redirecting to /en (308) 🫣

$ curl 'https://mywebsite' -H 'Accept-Language: fr'
Redirecting to /fr (307) 🙂

$ curl 'https://mywebsite/path' -H 'Accept-Language: fr'
No redirect (uses default locale)  🫣
Expected: redirecting to /fr/path

@samcx samcx removed the kind: bug label Apr 30, 2024
@Lvdwardt

This comment has been minimized.

@chr4ss12
Copy link

chr4ss12 commented Oct 4, 2024

going to be almost 5 years and nextjs still can't show the language correctly on IOS Safari devices without some hacks through patch-package? :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internationalization (i18n) Related to Internationalization with Next.js.
Projects
None yet
Development

Successfully merging a pull request may close this issue.