Suggestion: Search plugin algorithm improvement

Most searching these days is incremental (e.g. one letter is typed and added to the search term at a time).

If the Search.query() is enhanced to maintain state from the prior queryStr and queryResult (but really just row/col indexes), then each subsequence search would not need to search all cells, just the ones from the prior search.

Example: searching for “fo” followed by typing another ‘o’ to get “foo” would, by definition, mean that any “foo” results are a subset of “fo”.

As it stands now, the complexity is always rows*columns and as more letters are typed, the slower it gets instead of faster.

Clearing a search should be faster and maybe even a separate call. With caching incremental searches, the only cells that need to be cleared are the ones last matched. Clear() would be called when a incremental search hits a combo breaker.

Bonus speedup: because of the typing behavior, most users might mistakenly type one letter (e.g when looking for foobar, they type “food” vs “foob”). They would then delete one character after the typo. So, “food” becomes “foo”. If the search results of “foo” are cached in a history, then it’s an instance result set lookup.

OK. I actually coded this up and the speed improvement wasn’t that noticeable. Ends up that building the results into arrays is so much more expensive than read-only access searching all cells that building a cache kills most of performance gained from using it.

I think the poor performance I noticed was trying to hide rows as the user types a search term using “hiddenRows” plugin.

Hi @topgeek

I believe that this issue has been already raised here https://github.com/handsontable/handsontable/issues/3428 but maybe you can add there a comment from your investigation. It looks like you have really digged deeper than using a single demo

Posted tests + code + timing test results to that discussion thread.

Thank you @topgeek
you’ve done a great work!
I will ask our CTO to check it on next release planning meeting.

My github branch has two different approaches to solving the problem.

I’d like to get someone who works on the code more to give input on which is the better approach before I do a Pull Request.

First approach builds an array(array) for every cell, then uses a single loop for cached-cells and all-cells.
Pros:

  • It is easy to read/understand.
  • The search cost savings over multiple incr searches is really good
    Cons:
  • It has a higher “first search” cost.
  • A little bit more memory.

Second approach uses two different loops for all-cells vs cached-cells and an inner function to do the common work.
Pros:

  • Lower “first search” cost
  • Little bit less memory
    Cons:
  • Harder to read/maintain
  • Slower multiple search performance

Both approaches are in my github branch. You can compare the two differences github rev compare.

I think I prefer the my first attempt 9ba94e6

Just for completeness I uploaded the performance test that uses Rev 9ba94e6

Sorry for keeping you waiting. I’ve asked my colleague to check your approach and share his thoughts.

I just got a reply from our developer. He told me that

  • he’d also pick the first approach as the code is easier to read and it’s more clear what’s going on there.
  • He also points that for the methods and private properties shouldn’t be typed with an underscore _.
  • The Object.keys..forEachand in the for loop can be replaced by our predefined helpers helpers.
  • you could use the map for this._priorQueryResultsCached = {}; or even a WeakMap