Create a language

Step 1: Enter details

Step 2: Add tokenizer

You can view the list of presets at Github.

You can also make a custom tokenizer with JavaScript.

It should take an argument of "text" and return the tokens

Note: if using a preset tokenizer, don't enter the .js extension

Step 3: Add grammar rules

These are manual grammar rules that can complement the language models. Use JavaScript for each of them. They should all return a list of invalid or questionable locations. Examples can be found on the Github repository.

The rules should be implemented as a function that returns a list of functions which are the individual rules.

The individual functions should return a list of objects, and each object should be in the form {indexStart: [token start index], indexEnd: [token end index], level: [level], suggestions: [list of suggestions as strings]}, where level 0 means OK, level 1 means questionable, and level 2 means probably wrong

Step 4: Add Corpus

This will be used for training the language model.


Step 5: Train language model

Step 6: Save the language