-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved the doc search by including Levenshtein distance #15385
Conversation
…hich enables the docs search function to be more forgiving for spelling mistakes
@@ -58,7 +58,7 @@ | |||
} | |||
$('#' + from)[0].scrollIntoView(); | |||
$('.line-numbers span').removeClass('line-highlighted'); | |||
for (i = from; i <= to; i += 1) { | |||
for (i = from; i <= to; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels a bit funny to see this in a diff since Rust doesn't have this.
This idea seems awesome!!! |
cc @brson about licensing |
In C at least, |
@adrientetar In both C and Javascript, you are correct that If you are using |
@brson Any updates on the status of this pull request? |
If licensing is a problem, there is a Rust implementation in the standard library, which could be ported. |
Sorry for the delay. I wasn't aware of this PR. I assume this is CC-SA licensed by virtue that all content on stackoverflow is CC-SA (a fact I did not realize until now). I think we can accept the CC-SA code as-is. Thanks! |
This enables the docs search function to be more forgiving for spelling mistakes. The algorithm works as a dynamic programming algorithm to detect the minimum number of changes required to the search parameter string in order to match any string in the search index. If the number of changes is less then a threshold (currently defined as 3), then the search parameter will be included as it is a possible misspelling of the word. Any results returned by the algorithm are sorted by distance and are ranked lower than results that are partial or exact matches (aka the matches returned by the original search algorithm). Additionally, the increment in the for loops in this file were using one of three different ways to increment (`i += 1` `i++` and `++i`) so I just standardized it to `++i`. As an example, consider searching for the word `String` and accidentally typing in `Strnig`. The old system would return no results because it is a misspelling, but the Levenshtein distance between these two inputs is only two, which means that this will return `String` as a result. Additionally, it will return a few other results such as `strong`, and `StdRng` because these are also similar to `Strnig`. Because of the ranking system though, this change should be unobtrusive to anyone that spells the words correctly, as those are still ranked first before any Levenshtein results.
…rget, r=Veykril proc-macro-test: Pass target to cargo invocation When cross compiling macos → dragonfly the dist build fails in the proc-maro-test-impl crate with the following error: `ld: unknown option: -z\nclang: error: linker command failed with exit code 1 (use -v to see invocation)` This appears to be a wart stemming from using an Apple host for cross compiling. Passing the target along to cargo allows it to pick up a linker that it understands and DTRT.
This enables the docs search function to be more forgiving for spelling mistakes. The algorithm works as a dynamic programming algorithm to detect the minimum number of changes required to the search parameter string in order to match any string in the search index. If the number of changes is less then a threshold (currently defined as 3), then the search parameter will be included as it is a possible misspelling of the word. Any results returned by the algorithm are sorted by distance and are ranked lower than results that are partial or exact matches (aka the matches returned by the original search algorithm). Additionally, the increment in the for loops in this file were using one of three different ways to increment (
i += 1
i++
and++i
) so I just standardized it to++i
.As an example, consider searching for the word
String
and accidentally typing inStrnig
. The old system would return no results because it is a misspelling, but the Levenshtein distance between these two inputs is only two, which means that this will returnString
as a result. Additionally, it will return a few other results such asstrong
, andStdRng
because these are also similar toStrnig
. Because of the ranking system though, this change should be unobtrusive to anyone that spells the words correctly, as those are still ranked first before any Levenshtein results.