LexicalCorrection - Lexical correction module
[Lexical tools level]

Implementation of various lexical correction algorithms and declaration of the related output data structures (former corrlex & corrlexalgos modules). More...

Data Structures

struct  SolutionPart
 Lexical correction solution part. More...
struct  Solution
 Lexical correction solution. More...
struct  SolutionSet
 Lexical correction solutions set. More...

Defines

#define SOLUTION_SET_ALLOC_INCREMENT   30
 Number of added elements during a SolutionSet (re)allocation.
#define DEFAULT_CORRECTION   WEIGHTED_CORRECTION
 Default correction mode.
#define COST_RANGE   0.5
 Cost range of the solutions to include during an spellCorrectFlat call when max_solutions parameter is set to 0.
#define COST_FORMAT   "%4.2f"
 Format specification to provide to printf family functions when printing a Weight type value.

Typedefs

typedef double Weight
 Cost of a lexical transformation.
typedef GSList PositionList
 Singly-linked list of string index.

Enumerations

enum  CorrectionMode { BASIC_CORRECTION = 0, WEIGHTED_CORRECTION, SPLITED_CORRECTION }
 Lexical correction mode. More...

Functions

void solutionSetEnlarge (SolutionSet **solution_set, size_t *current_size, size_t *allocated_size)
void parsingChartGetMaxLexemes (ParsingChart *chart, const char *head, StringArray *words)
void getCorrection (const Lexicon *lexicon, const LexicalEntryIndex entry_index, const Weight cost, const gboolean get_word, GString *output)
void positionListAdd (PositionList **position_list, const short int position)
ParsingChartspellCorrectChart (const char *input_string, const Lexicon *lexicon, const Weight max_cost, const Weight mark_cost, const Weight capital_cost, const Weight blank_cost)
ParsingChartlexematize (const char *input_string, const Lexicon *lexicon, int(*is_space)(int), int(*is_never_delimiter)(int), int(*is_glueable)(int))
void solutionSetFree (SolutionSet *solutions_set)
void spellCorrectFlat (const char *input_string, LexicalAccessTable *lexical_access_table, const int max_solutions, const CorrectionMode mode, const Weight max_cost, const Weight mark_cost, const Weight capital_cost, const Weight blank_cost, SolutionSet *solutions_set)
void solutionGetString (const LexicalAssocMem *lam, const SolutionSet *solution_set, size_t index, const char *delimiter, GString *output)

Detailed Description

Implementation of various lexical correction algorithms and declaration of the related output data structures (former corrlex & corrlexalgos modules).

SlpTK Library 0.6.0

Required header
<lexicalcorrection.h>
Author:
Jean-Cédric Chappelier (creation on 13.03.1997)

Antonin Merçay (revision on 14.12.2004)

Date:
2 March 2005
Version:
0.6.0

Define Documentation

#define DEFAULT_CORRECTION   WEIGHTED_CORRECTION

Default correction mode.

See also:
CorrectionMode spellCorrectFlat()


Enumeration Type Documentation

enum CorrectionMode

Lexical correction mode.

Specifiy the lexical correction mode to apply during a spellCorrectFlat call

See also:
DEFAULT_CORRECTION
Enumerator:
BASIC_CORRECTION  Lexical correction that use only insertion, deletion and substituion operations. All operations have an unitary cost
WEIGHTED_CORRECTION  Lexical correction similar to BASIC_CORRECTION, but that also take into account accenting, capital/small letter conversion and blank characters insertion/deletion operations. Each one of this three operations can have its own (not necessary whole number) cost
SPLITED_CORRECTION  Lexical correction similar to WEIGHTED_CORRECTION, but where the insertion/deletion of blank characters can occur between words, i.e. the correction result may consist of a sequence of several words


Function Documentation

void getCorrection ( const Lexicon lexicon,
const LexicalEntryIndex  entry_index,
const Weight  cost,
const gboolean  get_word,
GString *  output 
)

Dump a lexical correction in an output string buffer

Parameters:
[in] lexicon The reference vocabulary lexicon
[in] entry_index The index of the corrected word in the lexicon
[in] cost The cost of the lexical required correction (0 to avoid cost printing)
[in] get_word Set if the correct word graphy must be extracted from the vocabulary lexicon
output The string buffer where to append the correction
See also:
?()
Former(s) function(s):
affiche_correction

ParsingChart * lexematize ( const char *  input_string,
const Lexicon lexicon,
int(*)(int)  is_space,
int(*)(int)  is_never_delimiter,
int(*)(int)  is_glueable 
)

Lexematization algorithm (in other words, lexical correction with null cost) that cuts up an input string in lexical tokens.

Remarks:
This operation is only allowed on lexicon encapsulating a lexical memory of CHARACTER strings
Parameters:
[in] input_string The input string to lexematize
[in] lexicon The reference lexicon containing the reference lexemes
[in] is_space The blank character classification routine
[in] is_never_delimiter The classification routine for characters that are never delimiter
[in] is_glueable The glueable character classification routine
Returns:
A parsing chart containing all the possible cutting up of the input string
See also:
spellCorrectChart()
Former(s) function(s):
Correction_Zero & Lexematise

void parsingChartGetMaxLexemes ( ParsingChart chart,
const char *  head,
StringArray words 
)

Extract (from the left to the right) the lexemes sequence that cover a sentence processed by lexematize. The lexemes are outputted in a StringArray where each unknow words are prefixed by provided head parameter.

Parameters:
[in] chart The considered parsing chart
[in] head The prefix to insert before unknown words
[out] words The array where to output the solution
Former(s) function(s):
Solution_Max_Treillis (from Christophe de Benoit's project)

void positionListAdd ( PositionList **  position_list,
const short int  position 
)

Add a value to a position list sorted in ascending order

Parameters:
position_list The position list where to add
position The value to add to the list
Former(s) function(s):
ajoute_liste_pos_chaine

void solutionGetString ( const LexicalAssocMem lam,
const SolutionSet solution_set,
size_t  index,
const char *  delimiter,
GString *  output 
)

Convert a lexical correction solution into its equivalent string representation

Parameters:
[in] lam The LexicalAssocMem that contains the strings to convert to.
[in] solution_set The solution set that contains the solutions to convert from.
[in] index The index of the solution inside solution_set
[in] delimiter The string to insert between each words of the solution. If NULL is specified, a single space is used.
[out] output The string where to output the result
See also:
spellCorrectFlat() solutionSetFree()
Former(s) function(s):
Solution_Vers_String

void solutionSetEnlarge ( SolutionSet **  solution_set,
size_t *  current_size,
size_t *  allocated_size 
)

Enlarge from one element the size of a solution set

Remarks:
If the size of the solution set before the operation is equal to the allocated size, the solution set is reallocated
Parameters:
solution_set The solution set to enlarge
current_size The number of elements currently used (incremented after function completion)
allocated_size The number of elements currently allocated (may be increased after function completion)
See also:
solutionSetFree()
Former(s) function(s):
augmente_ens_sol

void solutionSetFree ( SolutionSet solutions_set  ) 

Free the memory allocated to a solution set

Parameters:
solutions_set The solution set to free
Former(s) function(s):
Libere_Ensemble_Solutions

ParsingChart ** spellCorrectChart ( const char *  input_string,
const Lexicon lexicon,
const Weight  max_cost,
const Weight  mark_cost,
const Weight  capital_cost,
const Weight  blank_cost 
)

Correct a string using the words stored in a lexicon up to a given lexical transformation cost. The operation returns a lattice (stored in a parsing chart) that contains all the words sequences found.

Remarks:
This operation is only allowed on lexicon encapsulating a lexical memory of CHARACTER strings
Parameters:
[in] input_string The input string to lexematize
[in] lexicon The reference lexicon containing the reference lexemes
[in] max_cost The maximal allowed correction cost between the input and a solution
[in] mark_cost The cost of an accenting/desaccenting transformation
[in] capital_cost The cost of a capital/small letter transformation
[in] blank_cost The cost of a blank character insertion/deletion transformation
Returns:
A parsing chart containing all the possible cutting up of the input string
See also:
lexematize() spellCorrectFlat()
Former(s) function(s):
Correction_Treillis

void spellCorrectFlat ( const char *  input_string,
LexicalAccessTable lexical_access_table,
const int  max_solutions,
const CorrectionMode  mode,
const Weight  max_cost,
const Weight  mark_cost,
const Weight  capital_cost,
const Weight  blank_cost,
SolutionSet solutions_set 
)

Correct a string using the words stored in a lexical access table up to a given lexical transformation cost.The operation returns a solutions set where each (flat) solution is a sequence of recognized words.

Remarks:
This operation is only allowed on lexical access tables of CHARACTER strings
Parameters:
[in] input_string The string to correct
[in] lexical_access_table The lexical memory containing the recognized words
[in] max_solutions The maximal number of solutions to output:
  • if max_solutions < 0 : all the solutions up to a max_cost cost are returned;
  • if max_solutions = 0 : all the solutions up to the minimal lexical cost found + COST_RANGE are returned, except for the possible solutions with a cost greater than max_cost;
  • if max_solutions > 0 : exactly max_solutions solutions are returned.
[in] mode The correction mode used
[in] max_cost The maximal allowed correction cost between the input and a solution
[in] mark_cost The cost of an accenting/desaccenting transformation
[in] capital_cost The cost of a capital/small letter transformation
[in] blank_cost The cost of a blank character insertion/deletion transformation
[out] solutions_set The set of solutions found
See also:
solutionGetString solutionSetFree()
Former(s) function(s):
Correction_Lexico


Generated on Thu Mar 22 17:46:31 2007 for SlpTk by  doxygen 1.4.7