corrlex
& corrlexalgos
modules).
More...Data Structures | |
struct | SolutionPart |
Lexical correction solution part. More... | |
struct | Solution |
Lexical correction solution. More... | |
struct | SolutionSet |
Lexical correction solutions set. More... | |
Defines | |
#define | SOLUTION_SET_ALLOC_INCREMENT 30 |
Number of added elements during a SolutionSet (re)allocation. | |
#define | DEFAULT_CORRECTION WEIGHTED_CORRECTION |
Default correction mode. | |
#define | COST_RANGE 0.5 |
Cost range of the solutions to include during an spellCorrectFlat call when max_solutions parameter is set to 0 . | |
#define | COST_FORMAT "%4.2f" |
Format specification to provide to printf family functions when printing a Weight type value. | |
Typedefs | |
typedef double | Weight |
Cost of a lexical transformation. | |
typedef GSList | PositionList |
Singly-linked list of string index. | |
Enumerations | |
enum | CorrectionMode { BASIC_CORRECTION = 0, WEIGHTED_CORRECTION, SPLITED_CORRECTION } |
Lexical correction mode. More... | |
Functions | |
void | solutionSetEnlarge (SolutionSet **solution_set, size_t *current_size, size_t *allocated_size) |
void | parsingChartGetMaxLexemes (ParsingChart *chart, const char *head, StringArray *words) |
void | getCorrection (const Lexicon *lexicon, const LexicalEntryIndex entry_index, const Weight cost, const gboolean get_word, GString *output) |
void | positionListAdd (PositionList **position_list, const short int position) |
ParsingChart * | spellCorrectChart (const char *input_string, const Lexicon *lexicon, const Weight max_cost, const Weight mark_cost, const Weight capital_cost, const Weight blank_cost) |
ParsingChart * | lexematize (const char *input_string, const Lexicon *lexicon, int(*is_space)(int), int(*is_never_delimiter)(int), int(*is_glueable)(int)) |
void | solutionSetFree (SolutionSet *solutions_set) |
void | spellCorrectFlat (const char *input_string, LexicalAccessTable *lexical_access_table, const int max_solutions, const CorrectionMode mode, const Weight max_cost, const Weight mark_cost, const Weight capital_cost, const Weight blank_cost, SolutionSet *solutions_set) |
void | solutionGetString (const LexicalAssocMem *lam, const SolutionSet *solution_set, size_t index, const char *delimiter, GString *output) |
corrlex
& corrlexalgos
modules).
SlpTK Library 0.6.0
<lexicalcorrection.h>
Antonin Merçay (revision on 14.12.2004)
#define DEFAULT_CORRECTION WEIGHTED_CORRECTION |
enum CorrectionMode |
Lexical correction mode.
Specifiy the lexical correction mode to apply during a spellCorrectFlat call
BASIC_CORRECTION | Lexical correction that use only insertion, deletion and substituion operations. All operations have an unitary cost |
WEIGHTED_CORRECTION | Lexical correction similar to BASIC_CORRECTION, but that also take into account accenting, capital/small letter conversion and blank characters insertion/deletion operations. Each one of this three operations can have its own (not necessary whole number) cost |
SPLITED_CORRECTION | Lexical correction similar to WEIGHTED_CORRECTION, but where the insertion/deletion of blank characters can occur between words, i.e. the correction result may consist of a sequence of several words |
void getCorrection | ( | const Lexicon * | lexicon, | |
const LexicalEntryIndex | entry_index, | |||
const Weight | cost, | |||
const gboolean | get_word, | |||
GString * | output | |||
) |
Dump a lexical correction in an output string buffer
[in] | lexicon | The reference vocabulary lexicon |
[in] | entry_index | The index of the corrected word in the lexicon |
[in] | cost | The cost of the lexical required correction (0 to avoid cost printing) |
[in] | get_word | Set if the correct word graphy must be extracted from the vocabulary lexicon |
output | The string buffer where to append the correction |
affiche_correction
ParsingChart * lexematize | ( | const char * | input_string, | |
const Lexicon * | lexicon, | |||
int(*)(int) | is_space, | |||
int(*)(int) | is_never_delimiter, | |||
int(*)(int) | is_glueable | |||
) |
Lexematization algorithm (in other words, lexical correction with null cost) that cuts up an input string in lexical tokens.
[in] | input_string | The input string to lexematize |
[in] | lexicon | The reference lexicon containing the reference lexemes |
[in] | is_space | The blank character classification routine |
[in] | is_never_delimiter | The classification routine for characters that are never delimiter |
[in] | is_glueable | The glueable character classification routine |
Correction_Zero
& Lexematise
void parsingChartGetMaxLexemes | ( | ParsingChart * | chart, | |
const char * | head, | |||
StringArray * | words | |||
) |
Extract (from the left to the right) the lexemes sequence that cover a sentence processed by lexematize. The lexemes are outputted in a StringArray where each unknow words are prefixed by provided head
parameter.
[in] | chart | The considered parsing chart |
[in] | head | The prefix to insert before unknown words |
[out] | words | The array where to output the solution |
Solution_Max_Treillis
(from Christophe de Benoit's project) void positionListAdd | ( | PositionList ** | position_list, | |
const short int | position | |||
) |
Add a value to a position list sorted in ascending order
position_list | The position list where to add | |
position | The value to add to the list |
ajoute_liste_pos_chaine
void solutionGetString | ( | const LexicalAssocMem * | lam, | |
const SolutionSet * | solution_set, | |||
size_t | index, | |||
const char * | delimiter, | |||
GString * | output | |||
) |
Convert a lexical correction solution into its equivalent string representation
[in] | lam | The LexicalAssocMem that contains the strings to convert to. |
[in] | solution_set | The solution set that contains the solutions to convert from. |
[in] | index | The index of the solution inside solution_set |
[in] | delimiter | The string to insert between each words of the solution. If NULL is specified, a single space is used. |
[out] | output | The string where to output the result |
Solution_Vers_String
void solutionSetEnlarge | ( | SolutionSet ** | solution_set, | |
size_t * | current_size, | |||
size_t * | allocated_size | |||
) |
Enlarge from one element the size of a solution set
solution_set | The solution set to enlarge | |
current_size | The number of elements currently used (incremented after function completion) | |
allocated_size | The number of elements currently allocated (may be increased after function completion) |
augmente_ens_sol
void solutionSetFree | ( | SolutionSet * | solutions_set | ) |
Free the memory allocated to a solution set
solutions_set | The solution set to free |
Libere_Ensemble_Solutions
ParsingChart ** spellCorrectChart | ( | const char * | input_string, | |
const Lexicon * | lexicon, | |||
const Weight | max_cost, | |||
const Weight | mark_cost, | |||
const Weight | capital_cost, | |||
const Weight | blank_cost | |||
) |
Correct a string using the words stored in a lexicon up to a given lexical transformation cost. The operation returns a lattice (stored in a parsing chart) that contains all the words sequences found.
[in] | input_string | The input string to lexematize |
[in] | lexicon | The reference lexicon containing the reference lexemes |
[in] | max_cost | The maximal allowed correction cost between the input and a solution |
[in] | mark_cost | The cost of an accenting/desaccenting transformation |
[in] | capital_cost | The cost of a capital/small letter transformation |
[in] | blank_cost | The cost of a blank character insertion/deletion transformation |
Correction_Treillis
void spellCorrectFlat | ( | const char * | input_string, | |
LexicalAccessTable * | lexical_access_table, | |||
const int | max_solutions, | |||
const CorrectionMode | mode, | |||
const Weight | max_cost, | |||
const Weight | mark_cost, | |||
const Weight | capital_cost, | |||
const Weight | blank_cost, | |||
SolutionSet * | solutions_set | |||
) |
Correct a string using the words stored in a lexical access table up to a given lexical transformation cost.The operation returns a solutions set where each (flat) solution is a sequence of recognized words.
[in] | input_string | The string to correct |
[in] | lexical_access_table | The lexical memory containing the recognized words |
[in] | max_solutions | The maximal number of solutions to output:
|
[in] | mode | The correction mode used |
[in] | max_cost | The maximal allowed correction cost between the input and a solution |
[in] | mark_cost | The cost of an accenting/desaccenting transformation |
[in] | capital_cost | The cost of a capital/small letter transformation |
[in] | blank_cost | The cost of a blank character insertion/deletion transformation |
[out] | solutions_set | The set of solutions found |
Correction_Lexico