Improved the doc search by including Levenshtein distance #15385

jroweboy · 2014-07-03T21:33:54Z

This enables the docs search function to be more forgiving for spelling mistakes. The algorithm works as a dynamic programming algorithm to detect the minimum number of changes required to the search parameter string in order to match any string in the search index. If the number of changes is less then a threshold (currently defined as 3), then the search parameter will be included as it is a possible misspelling of the word. Any results returned by the algorithm are sorted by distance and are ranked lower than results that are partial or exact matches (aka the matches returned by the original search algorithm). Additionally, the increment in the for loops in this file were using one of three different ways to increment (i += 1 i++ and ++i) so I just standardized it to ++i.

As an example, consider searching for the word String and accidentally typing in Strnig. The old system would return no results because it is a misspelling, but the Levenshtein distance between these two inputs is only two, which means that this will return String as a result. Additionally, it will return a few other results such as strong, and StdRng because these are also similar to Strnig. Because of the ranking system though, this change should be unobtrusive to anyone that spells the words correctly, as those are still ranked first before any Levenshtein results.

…hich enables the docs search function to be more forgiving for spelling mistakes

steveklabnik · 2014-07-03T21:55:42Z

src/librustdoc/html/static/main.js

@@ -58,7 +58,7 @@
            }
            $('#' + from)[0].scrollIntoView();
            $('.line-numbers span').removeClass('line-highlighted');
-            for (i = from; i <= to; i += 1) {
+            for (i = from; i <= to; ++i) {


Feels a bit funny to see this in a diff since Rust doesn't have this.

steveklabnik · 2014-07-03T21:56:26Z

This idea seems awesome!!!

alexcrichton · 2014-07-05T20:23:57Z

cc @brson about licensing

adrientetar · 2014-07-09T15:58:26Z

Additionally, the increment in the for loops in this file were using one of three different ways to increment (i += 1 i++ and ++i) so I just standardized it to ++i.

In C at least, i++ and ++i have different behaviors. ++i is pre-incrementation which inside a loop means increment then run the looping code.
Should I infer from your comment that both of these have the same behavior in JS?

jroweboy · 2014-07-09T18:32:05Z

@adrientetar In both C and Javascript, you are correct that i++ and ++i have different behaviors, but in both languages, when in a for loop step, they both mean the same thing. i++ means return the value and then increment, which implies a copy, and ++i means increment and then return the value which can be done in place. Modern compilers are quite aware of the for loop construct where for (i=0; i<10; i++); does not use the value returned by i++ and will optimize it into an ++i. What this means is in a for loop, you can use either i++ or ++i and they will have the same effect in both C and Javascript, but ++i will never be slower than i++ since in the worst case, the compiler will not optimize i++.

If you are using i++ in code though such as int j = intarray[i++]; then you are absolutely correct that there is a difference between i++ and ++i. This difference does not apply to the for loop afterthought or step portion though because that code is always run at then end of the block.

jroweboy · 2014-07-21T21:06:49Z

@brson Any updates on the status of this pull request?

huonw · 2014-07-24T11:56:24Z

If licensing is a problem, there is a Rust implementation in the standard library, which could be ported.

brson · 2014-07-31T00:16:26Z

Sorry for the delay. I wasn't aware of this PR.

I assume this is CC-SA licensed by virtue that all content on stackoverflow is CC-SA (a fact I did not realize until now).

I think we can accept the CC-SA code as-is. Thanks!

This enables the docs search function to be more forgiving for spelling mistakes. The algorithm works as a dynamic programming algorithm to detect the minimum number of changes required to the search parameter string in order to match any string in the search index. If the number of changes is less then a threshold (currently defined as 3), then the search parameter will be included as it is a possible misspelling of the word. Any results returned by the algorithm are sorted by distance and are ranked lower than results that are partial or exact matches (aka the matches returned by the original search algorithm). Additionally, the increment in the for loops in this file were using one of three different ways to increment (`i += 1` `i++` and `++i`) so I just standardized it to `++i`. As an example, consider searching for the word `String` and accidentally typing in `Strnig`. The old system would return no results because it is a misspelling, but the Levenshtein distance between these two inputs is only two, which means that this will return `String` as a result. Additionally, it will return a few other results such as `strong`, and `StdRng` because these are also similar to `Strnig`. Because of the ranking system though, this change should be unobtrusive to anyone that spells the words correctly, as those are still ranked first before any Levenshtein results.

…rget, r=Veykril proc-macro-test: Pass target to cargo invocation When cross compiling macos → dragonfly the dist build fails in the proc-maro-test-impl crate with the following error: `ld: unknown option: -z\nclang: error: linker command failed with exit code 1 (use -v to see invocation)` This appears to be a wart stemming from using an Apple host for cross compiling. Passing the target along to cargo allows it to pick up a linker that it understands and DTRT.

Improved the search by including a levenshtein distance calculation w…

6b21087

…hich enables the docs search function to be more forgiving for spelling mistakes

steveklabnik reviewed Jul 3, 2014
View reviewed changes

Updated the licensing information for the Levenshtein distance function

c614510

bors closed this Jul 31, 2014

bors merged commit c614510 into rust-lang:master Jul 31, 2014

Gankra mentioned this pull request Aug 24, 2014

Sort doc search results by their total module path length #16735

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved the doc search by including Levenshtein distance #15385

Improved the doc search by including Levenshtein distance #15385

jroweboy commented Jul 3, 2014

steveklabnik Jul 3, 2014

steveklabnik commented Jul 3, 2014

alexcrichton commented Jul 5, 2014

adrientetar commented Jul 9, 2014

jroweboy commented Jul 9, 2014

jroweboy commented Jul 21, 2014

huonw commented Jul 24, 2014

brson commented Jul 31, 2014

Improved the doc search by including Levenshtein distance #15385

Improved the doc search by including Levenshtein distance #15385

Conversation

jroweboy commented Jul 3, 2014

steveklabnik Jul 3, 2014

Choose a reason for hiding this comment

steveklabnik commented Jul 3, 2014

alexcrichton commented Jul 5, 2014

adrientetar commented Jul 9, 2014

jroweboy commented Jul 9, 2014

jroweboy commented Jul 21, 2014

huonw commented Jul 24, 2014

brson commented Jul 31, 2014