Suggestion: Search plugin algorithm improvement

topgeek · May 17, 2018, 8:32pm

Most searching these days is incremental (e.g. one letter is typed and added to the search term at a time).

If the Search.query() is enhanced to maintain state from the prior queryStr and queryResult (but really just row/col indexes), then each subsequence search would not need to search all cells, just the ones from the prior search.

Example: searching for “fo” followed by typing another ‘o’ to get “foo” would, by definition, mean that any “foo” results are a subset of “fo”.

As it stands now, the complexity is always rows*columns and as more letters are typed, the slower it gets instead of faster.

Clearing a search should be faster and maybe even a separate call. With caching incremental searches, the only cells that need to be cleared are the ones last matched. Clear() would be called when a incremental search hits a combo breaker.

Bonus speedup: because of the typing behavior, most users might mistakenly type one letter (e.g when looking for foobar, they type “food” vs “foob”). They would then delete one character after the typo. So, “food” becomes “foo”. If the search results of “foo” are cached in a history, then it’s an instance result set lookup.

topgeek · May 21, 2018, 10:54pm

OK. I actually coded this up and the speed improvement wasn’t that noticeable. Ends up that building the results into arrays is so much more expensive than read-only access searching all cells that building a cache kills most of performance gained from using it.

I think the poor performance I noticed was trying to hide rows as the user types a search term using “hiddenRows” plugin.

aleksandra_budnik · May 22, 2018, 10:05am

Hi @topgeek

I believe that this issue has been already raised here https://github.com/handsontable/handsontable/issues/3428 but maybe you can add there a comment from your investigation. It looks like you have really digged deeper than using a single demo

topgeek · May 23, 2018, 5:47pm

Posted tests + code + timing test results to that discussion thread.

aleksandra_budnik · May 24, 2018, 7:22am

Thank you @topgeek
you’ve done a great work!
I will ask our CTO to check it on next release planning meeting.

topgeek · May 24, 2018, 5:43pm

My github branch has two different approaches to solving the problem.

I’d like to get someone who works on the code more to give input on which is the better approach before I do a Pull Request.

First approach builds an array(array) for every cell, then uses a single loop for cached-cells and all-cells.
Pros:

It is easy to read/understand.
The search cost savings over multiple incr searches is really good
Cons:
It has a higher “first search” cost.
A little bit more memory.

Second approach uses two different loops for all-cells vs cached-cells and an inner function to do the common work.
Pros:

Lower “first search” cost
Little bit less memory
Cons:
Harder to read/maintain
Slower multiple search performance

Both approaches are in my github branch. You can compare the two differences github rev compare.

I think I prefer the my first attempt 9ba94e6

topgeek · May 24, 2018, 7:44pm

Just for completeness I uploaded the performance test that uses Rev 9ba94e6

aleksandra_budnik · May 30, 2018, 6:59am

Sorry for keeping you waiting. I’ve asked my colleague to check your approach and share his thoughts.

aleksandra_budnik · May 30, 2018, 9:41am

I just got a reply from our developer. He told me that

he’d also pick the first approach as the code is easier to read and it’s more clear what’s going on there.
He also points that for the methods and private properties shouldn’t be typed with an underscore _.
The Object.keys..forEachand in the for loop can be replaced by our predefined helpers helpers.
you could use the map for this._priorQueryResultsCached = {}; or even a WeakMap