LexicalCorrection - Lexical correction module
[Lexical tools level]

Implementation of various lexical correction algorithms and declaration of the related output data structures (former corrlex & corrlexalgos modules). More...


Data Structures
struct	SolutionPart
	Lexical correction solution part. More...
struct	Solution
	Lexical correction solution. More...
struct	SolutionSet
	Lexical correction solutions set. More...
Defines
#define	SOLUTION_SET_ALLOC_INCREMENT 30
	Number of added elements during a SolutionSet (re)allocation.
#define	DEFAULT_CORRECTION WEIGHTED_CORRECTION
	Default correction mode.
#define	COST_RANGE 0.5
	Cost range of the solutions to include during an spellCorrectFlat call when `max_solutions` parameter is set to `0`.
#define	COST_FORMAT "%4.2f"
	Format specification to provide to `printf` family functions when printing a Weight type value.
Typedefs
typedef double	Weight
	Cost of a lexical transformation.
typedef GSList	PositionList
	Singly-linked list of string index.
Enumerations
enum	CorrectionMode { BASIC_CORRECTION = 0, WEIGHTED_CORRECTION, SPLITED_CORRECTION }
	Lexical correction mode. More...
Functions
void	solutionSetEnlarge (SolutionSet *solution_set, size_t current_size, size_t *allocated_size)
void	parsingChartGetMaxLexemes (ParsingChart chart, const char head, StringArray *words)
void	getCorrection (const Lexicon lexicon, const LexicalEntryIndex entry_index, const Weight cost, const gboolean get_word, GString output)
void	positionListAdd (PositionList **position_list, const short int position)
ParsingChart *	spellCorrectChart (const char input_string, const Lexicon lexicon, const Weight max_cost, const Weight mark_cost, const Weight capital_cost, const Weight blank_cost)
ParsingChart *	lexematize (const char input_string, const Lexicon lexicon, int(is_space)(int), int(is_never_delimiter)(int), int(*is_glueable)(int))
void	solutionSetFree (SolutionSet *solutions_set)
void	spellCorrectFlat (const char input_string, LexicalAccessTable lexical_access_table, const int max_solutions, const CorrectionMode mode, const Weight max_cost, const Weight mark_cost, const Weight capital_cost, const Weight blank_cost, SolutionSet *solutions_set)
void	solutionGetString (const LexicalAssocMem lam, const SolutionSet solution_set, size_t index, const char delimiter, GString output)

Detailed Description

Implementation of various lexical correction algorithms and declaration of the related output data structures (former corrlex & corrlexalgos modules).

SlpTK Library 0.6.0

Required header: <lexicalcorrection.h>

Author:: Jean-Cédric Chappelier (creation on 13.03.1997)
Antonin Merçay (revision on 14.12.2004)

Date:: 2 March 2005

Version:: 0.6.0

Define Documentation

#define DEFAULT_CORRECTION WEIGHTED_CORRECTION

Default correction mode.

See also:: CorrectionMode spellCorrectFlat()

Enumeration Type Documentation

enum CorrectionMode

Lexical correction mode.

Specifiy the lexical correction mode to apply during a spellCorrectFlat call

See also:: DEFAULT_CORRECTION

Enumerator:

BASIC_CORRECTION	Lexical correction that use only insertion, deletion and substituion operations. All operations have an unitary cost
WEIGHTED_CORRECTION	Lexical correction similar to BASIC_CORRECTION, but that also take into account accenting, capital/small letter conversion and blank characters insertion/deletion operations. Each one of this three operations can have its own (not necessary whole number) cost
SPLITED_CORRECTION	Lexical correction similar to WEIGHTED_CORRECTION, but where the insertion/deletion of blank characters can occur between words, i.e. the correction result may consist of a sequence of several words

Function Documentation

void getCorrection	(	const Lexicon *	lexicon,
		const LexicalEntryIndex	entry_index,
		const Weight	cost,
		const gboolean	get_word,
		GString *	output
	)

Dump a lexical correction in an output string buffer

Parameters:

`[in]`	lexicon	The reference vocabulary lexicon
`[in]`	entry_index	The index of the corrected word in the lexicon
`[in]`	cost	The cost of the lexical required correction (`0` to avoid cost printing)
`[in]`	get_word	Set if the correct word graphy must be extracted from the vocabulary lexicon
	output	The string buffer where to append the correction

See also:: ?()

Former(s) function(s):: affiche_correction

ParsingChart * lexematize	(	const char *	input_string,
		const Lexicon *	lexicon,
		int(*)(int)	is_space,
		int(*)(int)	is_never_delimiter,
		int(*)(int)	is_glueable
	)

Lexematization algorithm (in other words, lexical correction with null cost) that cuts up an input string in lexical tokens.

Remarks:: This operation is only allowed on lexicon encapsulating a lexical memory of CHARACTER strings

Parameters:

`[in]`	input_string	The input string to lexematize
`[in]`	lexicon	The reference lexicon containing the reference lexemes
`[in]`	is_space	The blank character classification routine
`[in]`	is_never_delimiter	The classification routine for characters that are never delimiter
`[in]`	is_glueable	The glueable character classification routine

Returns:: A parsing chart containing all the possible cutting up of the input string

See also:: spellCorrectChart()

Former(s) function(s):: Correction_Zero & Lexematise

void parsingChartGetMaxLexemes	(	ParsingChart *	chart,
		const char *	head,
		StringArray *	words
	)

Extract (from the left to the right) the lexemes sequence that cover a sentence processed by lexematize. The lexemes are outputted in a StringArray where each unknow words are prefixed by provided head parameter.

Parameters:

`[in]`	chart	The considered parsing chart
`[in]`	head	The prefix to insert before unknown words
`[out]`	words	The array where to output the solution

Former(s) function(s):: Solution_Max_Treillis (from Christophe de Benoit's project)

void positionListAdd	(	PositionList **	position_list,
		const short int	position
	)

Add a value to a position list sorted in ascending order

Parameters:

	position_list	The position list where to add
	position	The value to add to the list

Former(s) function(s):: ajoute_liste_pos_chaine

void solutionGetString	(	const LexicalAssocMem *	lam,
		const SolutionSet *	solution_set,
		size_t	index,
		const char *	delimiter,
		GString *	output
	)

Convert a lexical correction solution into its equivalent string representation

Parameters:

`[in]`	lam	The LexicalAssocMem that contains the strings to convert to.
`[in]`	solution_set	The solution set that contains the solutions to convert from.
`[in]`	index	The index of the solution inside `solution_set`
`[in]`	delimiter	The string to insert between each words of the solution. If `NULL` is specified, a single space is used.
`[out]`	output	The string where to output the result

See also:: spellCorrectFlat() solutionSetFree()

Former(s) function(s):: Solution_Vers_String

void solutionSetEnlarge	(	SolutionSet **	solution_set,
		size_t *	current_size,
		size_t *	allocated_size
	)

Enlarge from one element the size of a solution set

Remarks:: If the size of the solution set before the operation is equal to the allocated size, the solution set is reallocated

Parameters:

	solution_set	The solution set to enlarge
	current_size	The number of elements currently used (incremented after function completion)
	allocated_size	The number of elements currently allocated (may be increased after function completion)

See also:: solutionSetFree()

Former(s) function(s):: augmente_ens_sol

void solutionSetFree ( SolutionSet * solutions_set )

Free the memory allocated to a solution set

Parameters:

solutions_set

The solution set to free

Former(s) function(s):: Libere_Ensemble_Solutions

ParsingChart ** spellCorrectChart	(	const char *	input_string,
		const Lexicon *	lexicon,
		const Weight	max_cost,
		const Weight	mark_cost,
		const Weight	capital_cost,
		const Weight	blank_cost
	)

Correct a string using the words stored in a lexicon up to a given lexical transformation cost. The operation returns a lattice (stored in a parsing chart) that contains all the words sequences found.

Remarks:: This operation is only allowed on lexicon encapsulating a lexical memory of CHARACTER strings

Parameters:

`[in]`	input_string	The input string to lexematize
`[in]`	lexicon	The reference lexicon containing the reference lexemes
`[in]`	max_cost	The maximal allowed correction cost between the input and a solution
`[in]`	mark_cost	The cost of an accenting/desaccenting transformation
`[in]`	capital_cost	The cost of a capital/small letter transformation
`[in]`	blank_cost	The cost of a blank character insertion/deletion transformation

Returns:: A parsing chart containing all the possible cutting up of the input string

See also:: lexematize() spellCorrectFlat()

Former(s) function(s):: Correction_Treillis

void spellCorrectFlat	(	const char *	input_string,
		LexicalAccessTable *	lexical_access_table,
		const int	max_solutions,
		const CorrectionMode	mode,
		const Weight	max_cost,
		const Weight	mark_cost,
		const Weight	capital_cost,
		const Weight	blank_cost,
		SolutionSet *	solutions_set
	)

Correct a string using the words stored in a lexical access table up to a given lexical transformation cost.The operation returns a solutions set where each (flat) solution is a sequence of recognized words.

Remarks:: This operation is only allowed on lexical access tables of CHARACTER strings

Parameters:

`[in]`	input_string	The string to correct
`[in]`	lexical_access_table	The lexical memory containing the recognized words
`[in]`	max_solutions	The maximal number of solutions to output: if `max_solutions` < 0 : all the solutions up to a `max_cost` cost are returned; if `max_solutions` = 0 : all the solutions up to the minimal lexical cost found + COST_RANGE are returned, except for the possible solutions with a cost greater than `max_cost`; if `max_solutions` > 0 : exactly `max_solutions` solutions are returned.
`[in]`	mode	The correction mode used
`[in]`	max_cost	The maximal allowed correction cost between the input and a solution
`[in]`	mark_cost	The cost of an accenting/desaccenting transformation
`[in]`	capital_cost	The cost of a capital/small letter transformation
`[in]`	blank_cost	The cost of a blank character insertion/deletion transformation
`[out]`	solutions_set	The set of solutions found

See also:: solutionGetString solutionSetFree()

Former(s) function(s):: Correction_Lexico

Generated on Thu Mar 22 17:46:31 2007 for SlpTk by

1.4.7

LexicalCorrection - Lexical correction module [Lexical tools level]

Data Structures

Defines

Typedefs

Enumerations

Functions

Detailed Description

Define Documentation

Enumeration Type Documentation

Function Documentation

LexicalCorrection - Lexical correction module
[Lexical tools level]