Validate data from thousands of rows quickly (web worker ?)

Tags: #<Tag:0x00007f18b273fa38> #<Tag:0x00007f18b273f218>

Hello,
I am transforming an excel file into json (https://github.com/SheetJS/sheetjs) via a web work that I load into Handsontable.

This excel has tens of thousands of rows and around 50 columns.
The loading is done quite quickly (although I think we can do much better) however I put a validation set in the columns, examples:

window.hot = new Handsontable(container, {
data: json.slice(1) || [],
stretchH: ‘all’,
rowHeaders: true,
colHeaders: json[0],
filters: true,
dropdownMenu: true,
autoRowSize: false,
autoColumnSize: false,
multiColumnSorting: true,
columns: [{
// col1
validator: enumValidator
},
{
// col2
validator: customValidator
},
{
// col3
type: ‘numeric’,
},
{
// col4
type: ‘date’,
dateFormat: ‘YYYY-MM-DD’,
validator: beforeDateValidator
},
//
// etc …
//
]
});

The problem is that if I run the validations with hot.validateCells(); just after loading it either crashes or takes too long.

So I would like to execute hot.validateCells(); in a different way, either basically via a webworker, I tried several things but nothing works (I must be doing it wrong) or via another way that I don’t know…

Do you have any ideas or even a solution?

Thank you

Hi @Echoes

I am not sure if we can speed it up. If you have 50 columns x 15000 rows that gives the validator 750 k cells to check. It makes it less if not all of the cells have a validator. Here, for example, https://jsfiddle.net/handsoncode/sh9Lf3mj/ some of the columns have the text cell type (default) and they do not have a delicate validator. So in the console, we can see that there are 165 k calls for the cell validator.

In this case the best way to keep the performance on the high level is to provide pagination.

Thank you for your quick feedback, you think It is possible to recover, for example 10,000 lines of json with hot.getData() and send it to the worker (after loading the data), let the worker do all the validations and just send a json with the htinvalid classes in the right places?
For example this would give a Json of handsontable to send:
[a, b, 1, d]
And a return
[true, true, false (or the .htinvalid class), true]

Thank you

Validation on the server seems to save a lot of performance. But then it seems that you should avoid doing the server-side and client-side validation. Especially cause the htInvalid className is connected to client-side verification.

I was having something like this in my mind (not sure if it aligns with your requirements)