You can view the list of presets at Github.
You can also make a custom tokenizer with JavaScript.
It should take an argument of "text" and return the tokens
Note: if using a preset tokenizer, don't enter the .js extension
These are manual grammar rules that can complement the language models. Use JavaScript for each of them. They should all return a list of invalid or questionable locations. Examples can be found on the Github repository.
The rules should be implemented as a function that returns a list of functions which are the individual rules.
The individual functions should return a list of objects, and each object should be in the form {indexStart: [token start index], indexEnd: [token end index], level: [level], suggestions: [list of suggestions as strings]}, where level 0 means OK, level 1 means questionable, and level 2 means probably wrong
This will be used for training the language model.