Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(crawlable-anchors): ignore event listeners, validate url #13105

Merged

Conversation

svkrclg
Copy link
Contributor

@svkrclg svkrclg commented Sep 22, 2021

Summary

Related Issues/PRs

fixes #11818

@svkrclg svkrclg requested a review from a team as a code owner September 22, 2021 17:47
@svkrclg svkrclg requested review from connorjclark and removed request for a team September 22, 2021 17:47
@google-cla google-cla bot added the cla: yes label Sep 22, 2021
@svkrclg svkrclg changed the title don't consider onclick behaviour to check if link is crawable seo: don't consider onclick behaviour to check if link is crawable Sep 22, 2021
@svkrclg svkrclg changed the title seo: don't consider onclick behaviour to check if link is crawable misc(seo): don't consider onclick behaviour to check if link is crawable Sep 22, 2021
@svkrclg
Copy link
Contributor Author

svkrclg commented Sep 22, 2021

Not sure whether ignoring onclick behaviour completely is a good idea. Let me know.
TODO: fix smoke test

Copy link
Collaborator

@connorjclark connorjclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether ignoring onclick behaviour completely is a good idea. Let me know.

Given we confirmed that search will only look at href, we're ok with ignoring click events entirely since this audit is only concerned about search engine crawlability.


Currently the audit does these things:

  1. Ignore links with role set
  2. Fail if href starts with file:
  3. Fail if href is an empty string
  4. Ignore links with a click event listener
  5. Fail if onclick has javascript:void(0)
  6. Fail if onclick has window.location/open

We want to drop items 4-6. Currently the PR drops 5-6 but not 4. Then you'll need to delete some tests which will newly fail.


Besides that, we can do a bit more to improve the audit (although if you don't want to tackle this part, we can do it in a new PR). As it is, this would pass:

<a href="http:://example.com">link</a>

Even though the URL is invalid (and thus definitely not crawlable). This is because we don't check href for anything but that it is non-empty and not file:. This should do the trick:

try {
	new URL(rawHref, URL.finalUrl); // URL can gotten from the first parameter of the `audit` function
} catch {
  return true;
}

@@ -57,12 +55,9 @@ class CrawlableAnchors extends Audit {
if (rawHref.startsWith('mailto:')) return;

const windowLocationRegExp = /window\.location=/;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can delete this too

@connorjclark connorjclark changed the title misc(seo): don't consider onclick behaviour to check if link is crawable core(crawlable-anchors): don't consider click event listeners Sep 22, 2021
@connorjclark connorjclark changed the title core(crawlable-anchors): don't consider click event listeners core(crawlable-anchors): ignore event listeners, validate url Oct 4, 2021
@connorjclark
Copy link
Collaborator

connorjclark commented Oct 4, 2021

Looking good!

For the smoke test, you can just flip these test cases from expecting "passing" to "failure":

  1. Change PASS->FAIL in html https://github.com/GoogleChrome/lighthouse/blob/master/lighthouse-cli/test/fixtures/seo/seo-tester.html#L75 (doesn't do anything, just documentation)
  2. Change expectation from pass to fail for this audit in seo-passing smoke
'crawlable-anchors': {
        score: 0,
        details: {
          items: {
            length: 2,
          },
        },
      }

yarn smoke seo-passing to confirm

Copy link
Collaborator

@connorjclark connorjclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the assist!

@connorjclark connorjclark merged commit 19c5b71 into GoogleChrome:master Oct 12, 2021
@svkrclg svkrclg deleted the mark_href_as_valid_crawable_link branch October 13, 2021 05:26
@svkrclg
Copy link
Contributor Author

svkrclg commented Oct 13, 2021

Thanks for helping out @connorjclark .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Links are not crawlable with acceptable URL
3 participants