Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved the doc search by including Levenshtein distance #15385

Merged
merged 2 commits into from
Jul 31, 2014

Conversation

jroweboy
Copy link
Contributor

@jroweboy jroweboy commented Jul 3, 2014

This enables the docs search function to be more forgiving for spelling mistakes. The algorithm works as a dynamic programming algorithm to detect the minimum number of changes required to the search parameter string in order to match any string in the search index. If the number of changes is less then a threshold (currently defined as 3), then the search parameter will be included as it is a possible misspelling of the word. Any results returned by the algorithm are sorted by distance and are ranked lower than results that are partial or exact matches (aka the matches returned by the original search algorithm). Additionally, the increment in the for loops in this file were using one of three different ways to increment (i += 1 i++ and ++i) so I just standardized it to ++i.

As an example, consider searching for the word String and accidentally typing in Strnig. The old system would return no results because it is a misspelling, but the Levenshtein distance between these two inputs is only two, which means that this will return String as a result. Additionally, it will return a few other results such as strong, and StdRng because these are also similar to Strnig. Because of the ranking system though, this change should be unobtrusive to anyone that spells the words correctly, as those are still ranked first before any Levenshtein results.

…hich enables the docs search function to be more forgiving for spelling mistakes
@@ -58,7 +58,7 @@
}
$('#' + from)[0].scrollIntoView();
$('.line-numbers span').removeClass('line-highlighted');
for (i = from; i <= to; i += 1) {
for (i = from; i <= to; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels a bit funny to see this in a diff since Rust doesn't have this.

@steveklabnik
Copy link
Member

This idea seems awesome!!!

@alexcrichton
Copy link
Member

cc @brson about licensing

@adrientetar
Copy link
Contributor

Additionally, the increment in the for loops in this file were using one of three different ways to increment (i += 1 i++ and ++i) so I just standardized it to ++i.

In C at least, i++ and ++i have different behaviors. ++i is pre-incrementation which inside a loop means increment then run the looping code.
Should I infer from your comment that both of these have the same behavior in JS?

@jroweboy
Copy link
Contributor Author

jroweboy commented Jul 9, 2014

@adrientetar In both C and Javascript, you are correct that i++ and ++i have different behaviors, but in both languages, when in a for loop step, they both mean the same thing. i++ means return the value and then increment, which implies a copy, and ++i means increment and then return the value which can be done in place. Modern compilers are quite aware of the for loop construct where for (i=0; i<10; i++); does not use the value returned by i++ and will optimize it into an ++i. What this means is in a for loop, you can use either i++ or ++i and they will have the same effect in both C and Javascript, but ++i will never be slower than i++ since in the worst case, the compiler will not optimize i++.

If you are using i++ in code though such as int j = intarray[i++]; then you are absolutely correct that there is a difference between i++ and ++i. This difference does not apply to the for loop afterthought or step portion though because that code is always run at then end of the block.

@jroweboy
Copy link
Contributor Author

@brson Any updates on the status of this pull request?

@huonw
Copy link
Member

huonw commented Jul 24, 2014

If licensing is a problem, there is a Rust implementation in the standard library, which could be ported.

@brson
Copy link
Contributor

brson commented Jul 31, 2014

Sorry for the delay. I wasn't aware of this PR.

I assume this is CC-SA licensed by virtue that all content on stackoverflow is CC-SA (a fact I did not realize until now).

I think we can accept the CC-SA code as-is. Thanks!

bors added a commit that referenced this pull request Jul 31, 2014
This enables the docs search function to be more forgiving for spelling mistakes. The algorithm works as a dynamic programming algorithm to detect the minimum number of changes required to the search parameter string in order to match any string in the search index. If the number of changes is less then a threshold (currently defined as 3), then the search parameter will be included as it is a possible misspelling of the word. Any results returned by the algorithm are sorted by distance and are ranked lower than results that are partial or exact matches (aka the matches returned by the original search algorithm). Additionally, the increment in the for loops in this file were using one of three different ways to increment (`i += 1` `i++` and `++i`) so I just standardized it to `++i`.

As an example, consider searching for the word `String` and accidentally typing in `Strnig`. The old system would return no results because it is a misspelling, but the Levenshtein distance between these two inputs is only two, which means that this will return `String` as a result. Additionally, it will return a few other results such as `strong`, and `StdRng` because these are also similar to `Strnig`. Because of the ranking system though, this change should be unobtrusive to anyone that spells the words correctly, as those are still ranked first before any Levenshtein results.
@bors bors closed this Jul 31, 2014
@bors bors merged commit c614510 into rust-lang:master Jul 31, 2014
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 18, 2023
…rget, r=Veykril

proc-macro-test: Pass target to cargo invocation

When cross compiling macos → dragonfly the dist build fails in the proc-maro-test-impl crate with the following error:

`ld: unknown option: -z\nclang: error: linker command failed with exit code 1 (use -v to see invocation)`

This appears to be a wart stemming from using an Apple host for cross compiling.  Passing the target along to cargo allows it to pick up a linker that it understands and DTRT.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants