SlpTk: Lexicon - Lexicon structure module

typedef unsigned int Frequency

Frequency of occurence.

Total number of occurence of a given Lexicon entry in a given corpus

typedef LexicalEntryIndex Lemma

Lemma.

The lemma field is represented as the corresponding entry index in the Lexicon

typedef unsigned short Morpho

Part of speech.

Internal part of speech (PoS) field representation in the Lexicon

typedef double Probability

Probability of occurence.

Probability of occurence of a given lexicon entry

Remarks:: All probabilities related treatments are based on logarithmic representation

LexiconAccess lexiconAccess	(	const Lexicon *	lexicon,
		const LexicalEntryIndex	index
	)

Access a lexicon entry identified by its index

Parameters:

`[in]`	lexicon	The source lexicon
`[in]`	index	The entry index

Returns:: The lexicon access result

Remarks:: If the provided entry index isn't valid (greater or equal to the number of entries of the lexicon), the returned values are not defined

See also:: lexiconGetSize() lexiconAccessGetGraphy() lexiconAccessGetPartOfSpeech()

Former(s) function(s):: Accede_Lexique

LexicalEntry lexiconAccessGetGraphy ( const LexiconAccess * lexicon_access )

Return the graphy associated to a given entry via a lexicon access result

Parameters:

[in] lexicon_access The lexicon access result

Returns:: The corresponding graphy

See also:: lexiconAccess() lexiconAccessGetPartOfSpeech()

char * lexiconAccessGetPartOfSpeech ( const LexiconAccess * lexicon_access )

Return the string representation of the part of speech associated to a given entry via a lexicon access result

Parameters:

[in] lexicon_access The lexicon access result

Returns:: The corresponding part of speech

Remarks:: If the part of speech field is not relevant (not a CHARACTER type lexicon or not handled property), this function returns NULL

See also:: lexiconAccess() lexiconAccessGetGraphy()

void lexiconApplyLogOnProba ( Lexicon * lexicon )

Convert each value of the lexicon probability field to its logarithm

Parameters:

[in] lexicon The lexicon to treat

Former(s) function(s):: Log_Proba

gboolean lexiconContains_CHARACTER	(	const Lexicon *	lexicon,
		const char *	sequence,
		const Lemma *	lemma,
		const char *	part_of_speech,
		const Frequency *	frequency,
		const Probability *	probability
	)

Check if a specified entry is stored in a lexicon. Only provide (not NULL) fields are checked. If a field is provided to the function is not handled by the lexicon, it is ignored.

Parameters:

`[in]`	lexicon	The source lexicon
`[in]`	sequence	The entry graphy (required)
`[in]`	lemma	The entry lemma (optional)
`[in]`	part_of_speech	The entry part_of_speech (optional)
`[in]`	frequency	The entry frequency (optional)
`[in]`	probability	The entry probability (optional)

Returns:: The corresponding entry is found (TRUE) or not (FALSE)

Remarks:: This function is only relevant and authorized on lexicon whose data type is equal to its suffix

See also:: lexiconContains_UNSIGNED_LONG() lexiconContains_CHARACTER()

Former(s) function(s):: Dans_Lexique & Dans_Lexique_Ulong

gboolean lexiconContains_UNSIGNED_LONG	(	const Lexicon *	lexicon,
		const LongArray *	sequence,
		const Lemma *	lemma,
		const Morpho *	part_of_speech,
		const Frequency *	frequency,
		const Probability *	probability
	)

Check if a specified entry is stored in a lexicon. Only provide (not NULL) fields are checked. If a field is provided to the function is not handled by the lexicon, it is ignored.

Parameters:

`[in]`	lexicon	The source lexicon
`[in]`	sequence	The entry graphy (required)
`[in]`	lemma	The entry lemma (optional)
`[in]`	part_of_speech	The entry part_of_speech (optional)
`[in]`	frequency	The entry frequency (optional)
`[in]`	probability	The entry probability (optional)

Returns:: The corresponding entry is found (TRUE) or not (FALSE)

Remarks:: This function is only relevant and authorized on lexicon whose data type is equal to its suffix

See also:: lexiconContains_UNSIGNED_LONG() lexiconContains_CHARACTER()

Former(s) function(s):: Dans_Lexique & Dans_Lexique_Ulong

int lexiconCreate	(	Lexicon *	lexicon,
		const LexicalMemoryType	memory_type,
		const LexicalDataType	data_type
	)

Allow and initialize a new lexicon

Parameters:

`[in]`	lexicon	The lexicon to create
`[in]`	memory_type	The type of memory that stores the graphies
`[in]`	data_type	The type of stored graphies

Returns:: a not null error code if operation fails

Remarks:

If the memory type is CHARACTER, the part of speech field is handled as char strings.
If the memory type is UNSIGNED_LONG, the part of speech is handled as ordinal values.

See also:: lexiconFree()

Former(s) function(s):: Init_Lexique

void lexiconDump	(	Lexicon *	lexicon,
		int()(const char ,...)	print,
		const char *	delimiter,
		gboolean	all,
		gboolean	numeration
	)

Dump the content of a lexicon

Parameters:

`[in]`	lexicon	The lexicon to dump
`[in]`	print	The print function to use
`[in]`	delimiter	The delimiter string to dump between fields
`[in]`	all	Dump delimiters for not handled fields
`[in]`	numeration	Print the numeration for each entry

Former(s) function(s):: Liste_Lexique

int lexiconExport	(	const Lexicon *	lexicon,
		const char *	input_filename
	)

Save the content of a lexicon in a set of human-readable ASCII files. The operation generates:

a header file named input_filename that defines the lexicon properties like the memory type, the handled fields or the open parts of speech informations;
the lexical associative memory that stores the graphy field is simply saved via its LAMExport function;
a file that contains the list of the values for each optional fields (frequency, probability, part of speech, lemma).

Parameters:

`[in]`	lexicon	The lexicon to export
`[in]`	input_filename	The input filename

Returns:: An not null error code if operation fails

See also:: lexiconImport()

Former(s) function(s):: Exporte_Lexique

void lexiconFree ( Lexicon * lexicon )

Free the memory allocated to a lexicon

Parameters:

[in] lexicon The lexicon to free

See also:: lexiconCreate()

Former(s) function(s):: Libere_Lexique

size_t lexiconGetSize ( const Lexicon * lexicon )

Returns the number of entries stored in a lexicon

Parameters:

[in] lexicon The source lexicon

Returns:: The number of stored entries

int lexiconImport	(	Lexicon *	lexicon,
		const char *	basefilename
	)

Load a lexicon from the content of a set of human-readable ASCII files. See lexiconExport for more informations about the required files.

Parameters:

`[in]`	lexicon	The lexicon where to import
`[in]`	basefilename	The lexicon header filename

Returns:: An not null error code if operation fails

Remarks:: The data type (CHARACTER or UNSIGNED_LONG) of the provided lexicon and of the input files must be the same, otherwise an error is returned

See also:: lexiconExport() lexiconCreate()

Former(s) function(s):: Importe_Lexique

gboolean lexiconInsert_CHARACTER	(	Lexicon *	lexicon,
		const char *	sequence,
		const Lemma *	lemma,
		const char *	part_of_speech,
		const Frequency *	frequency,
		const Probability *	probability,
		LexicalEntryIndex *	size,
		const gboolean	by_force
	)

Insert an entry into a lexicon. The first insertion set which fields are handled by the lexicon. Afterwards, a warning message is displayed each time a handled field is not specified by the function. If a similar entry is already stored by the lexicon and the insertion is not forced (by_force parameter set to FALSE), the entry is not inserted, an error message is printed and the function returns TRUE.

Parameters:

`[in]`	lexicon	The lexicon where to insert
`[in]`	sequence	The entry graphy (required)
`[in]`	lemma	The entry lemma (optional)
`[in]`	part_of_speech	The entry part of speech (optional)
`[in]`	frequency	The entry frequency (optional)
`[in]`	probability	The entry probability (optional)
`[out]`	size	The number of entries in the lexicon after the insertion (optional)
`[in]`	by_force	Set if the insertion is forced in case of duplicates or not. If set to `TRUE`, the function acts as if duplicates were never detected.

Returns:: A flag telling if a duplicate has been detected and the insertion aborded

Remarks:: This function is only relevant and authorized on lexicon whose data type is equal to its suffix

See also:: lexiconInsert_UNSIGNED_LONG() lexiconInsert_CHARACTER()

Former(s) function(s):: Insere_Lexique, Insere_Lexique_De_Force, Insere_Lexique_Ulong & Insere_Lexique_De_Force_Ulong

gboolean lexiconInsert_UNSIGNED_LONG	(	Lexicon *	lexicon,
		const LongArray *	sequence,
		const Lemma *	lemma,
		const Morpho *	part_of_speech,
		const Frequency *	frequency,
		const Probability *	probability,
		LexicalEntryIndex *	size,
		const gboolean	by_force
	)

Insert an entry into a lexicon. The first insertion set which fields are handled by the lexicon. Afterwards, a warning message is displayed each time a handled field is not specified by the function. If a similar entry is already stored by the lexicon and the insertion is not forced (by_force parameter set to FALSE), the entry is not inserted, an error message is printed and the function returns TRUE.

Parameters:

`[in]`	lexicon	The lexicon where to insert
`[in]`	sequence	The entry graphy (required)
`[in]`	lemma	The entry lemma (optional)
`[in]`	part_of_speech	The entry part of speech (optional)
`[in]`	frequency	The entry frequency (optional)
`[in]`	probability	The entry probability (optional)
`[out]`	size	The number of entries in the lexicon after the insertion (optional)
`[in]`	by_force	Set if the insertion is forced in case of duplicates or not. If set to `TRUE`, the function acts as if duplicates were never detected.

Returns:: A flag telling if a duplicate has been detected and the insertion aborded

Remarks:: This function is only relevant and authorized on lexicon whose data type is equal to its suffix

See also:: lexiconInsert_UNSIGNED_LONG() lexiconInsert_CHARACTER()

Former(s) function(s):: Insere_Lexique, Insere_Lexique_De_Force, Insere_Lexique_Ulong & Insere_Lexique_De_Force_Ulong

gboolean lexiconIsNormalized ( const Lexicon * lexicon )

Check if the probability field of a lexicon is normalized

Parameters:

[in] lexicon The lexicon to check

Returns:: TRUE if the given lexicon is normalized, FALSE otherwise.

Remarks:: The function returns TRUE if the provided lexicon doesn't handle the probability field

See also:: lexiconNormalizeProba()

Former(s) function(s):: Normalise_Proba

int lexiconLoad	(	Lexicon *	lexicon,
		const char *	filename
	)

Load the content of a lexicon from a set of binary files. See lexiconSave for more informations on the required files.

Parameters:

`[in]`	lexicon	The lexicon where to load
`[in]`	filename	The lexicon header filename

Returns:: An not null error code if operation fails

Remarks:: The data and memory type of the destination lexicon must be the same as the one of the provided files

See also:: lexiconSave()

Former(s) function(s):: Read_Lexique

LexiconSearch lexiconLookFor_CHARACTER	(	const Lexicon *	lexicon,
		const char *	graphy
	)

Search the first lexicon entry with a given graphy. Since several entries may have the same graphy, all results can be iterativly obtained using lexiconSearchNext.

Parameters:

`[in]`	lexicon	The lexicon where to search
`[in]`	graphy	The searched graphy

Returns:: The search result

Remarks:: This function is only relevant and authorized on lexicon whose data type is equal to its suffix

See also:: lexiconSearchNext()

Former(s) function(s):: Recherche_Lexique & Recherche_Lexique_Ulong

LexiconSearch lexiconLookFor_UNSIGNED_LONG	(	const Lexicon *	lexicon,
		const LongArray *	graphy
	)

Search the first lexicon entry with a given graphy. Since several entries may have the same graphy, all results can be iterativly obtained using lexiconSearchNext.

Parameters:

`[in]`	lexicon	The lexicon where to search
`[in]`	graphy	The searched graphy

Returns:: The search result

Remarks:: This function is only relevant and authorized on lexicon whose data type is equal to its suffix

See also:: lexiconSearchNext()

Former(s) function(s):: Recherche_Lexique & Recherche_Lexique_Ulong

gboolean lexiconNormalizeProba ( Lexicon * lexicon )

Normalize the probability field of a lexicon

Parameters:

[in] lexicon The lexicon to normalize

Returns:: TRUE if the given lexicon was already normalized, FALSE otherwise.

Remarks:: The function returns TRUE if the provided lexicon doesn't handle the probability field

See also:: lexiconIsNormalized()

Former(s) function(s):: Normalise_Proba

int lexiconSave	(	const Lexicon *	lexicon,
		const char *	input_filename
	)

Save the content of a lexicon in a set of binary files. The operation generates:

a (human-readable) header file named input_filename that defines the lexicon properties like the memory type, the handled fields or the open parts of speech informations;
the lexical associative memory that stores the graphy field is simply saved via its LAMSave function (memory type dependant format);
a binary file that for each optional fields (frequency, probability, part of speech, lemma).

Parameters:

`[in]`	lexicon	The lexicon to save
`[in]`	input_filename	The input filename

Returns:: An not null error code if operation fails

See also:: lexiconLoad()

Former(s) function(s):: Write_Lexique

gboolean lexiconSearchNext	(	LexicalAssocMem *	associative_memory,
		LexiconSearch *	lexicon_search
	)

Carry on a graphy-based search into a lexicon. Each time this function is called, the fields of the provided search result are updated to reflect the properties of the next relevant entry, until its returns FALSE, telling that all corresponding results have been output.

Parameters:

`[in]`	associative_memory	The associative memory on to perform the search
`[in]`	lexicon_search	The lexicon search result to update

See also:: lexiconLookFor_CHARACTER() lexiconLookFor_UNSIGNED_LONG()

Former(s) function(s):: Suivant_Lexique

Lexicon - Lexicon structure module
[Lexical tools level]

Data Structures

Defines

Typedefs

Functions

Detailed Description

Typedef Documentation

Function Documentation


Data Structures
struct	OpenPosTableItem
	Open parts of speech table entry. More...
struct	Lexicon
	Lexicon. More...
struct	LexiconAccess
	Lexicon access result. More...
struct	LexiconSearch
	Lexicon Search result. More...
Defines
#define	POS_DELIMITER_CHAR "#"
	Delimiter character used between lemma graphy and lemma part of speech.
#define	LEXICON_WHITE_CHAR ' '
	White character used by the lexicon (for example between "carte" and "bleue" for the "carte bleue" entry).
#define	LEXICON_FILE_EXT ".slplex"
	Default lexicon header file extension.
Typedefs
typedef LexicalEntryIndex	Lemma
	Lemma.
typedef unsigned short	Morpho
	Part of speech.
typedef unsigned int	Frequency
	Frequency of occurence.
typedef double	Probability
	Probability of occurence.
Functions
int	lexiconCreate (Lexicon *lexicon, const LexicalMemoryType memory_type, const LexicalDataType data_type)
gboolean	lexiconContains_CHARACTER (const Lexicon lexicon, const char sequence, const Lemma lemma, const char part_of_speech, const Frequency frequency, const Probability probability)
gboolean	lexiconContains_UNSIGNED_LONG (const Lexicon lexicon, const LongArray sequence, const Lemma lemma, const Morpho part_of_speech, const Frequency frequency, const Probability probability)
gboolean	lexiconInsert_CHARACTER (Lexicon lexicon, const char sequence, const Lemma lemma, const char part_of_speech, const Frequency frequency, const Probability probability, LexicalEntryIndex *size, const gboolean by_force)
gboolean	lexiconInsert_UNSIGNED_LONG (Lexicon lexicon, const LongArray sequence, const Lemma lemma, const Morpho part_of_speech, const Frequency frequency, const Probability probability, LexicalEntryIndex *size, const gboolean by_force)
int	lexiconExport (const Lexicon lexicon, const char input_filename)
int	lexiconImport (Lexicon lexicon, const char basefilename)
int	lexiconSave (const Lexicon lexicon, const char input_filename)
gboolean	lexiconNormalizeProba (Lexicon *lexicon)
gboolean	lexiconIsNormalized (const Lexicon *lexicon)
void	lexiconApplyLogOnProba (Lexicon *lexicon)
int	lexiconLoad (Lexicon lexicon, const char filename)
LexiconSearch	lexiconLookFor_CHARACTER (const Lexicon lexicon, const char graphy)
LexiconSearch	lexiconLookFor_UNSIGNED_LONG (const Lexicon lexicon, const LongArray graphy)
gboolean	lexiconSearchNext (LexicalAssocMem associative_memory, LexiconSearch lexicon_search)
LexiconAccess	lexiconAccess (const Lexicon *lexicon, const LexicalEntryIndex index)
LexicalEntry	lexiconAccessGetGraphy (const LexiconAccess *lexicon_access)
char *	lexiconAccessGetPartOfSpeech (const LexiconAccess *lexicon_access)
size_t	lexiconGetSize (const Lexicon *lexicon)
void	lexiconFree (Lexicon *lexicon)
void	lexiconDump (Lexicon lexicon, int(print)(const char ,...), const char delimiter, gboolean all, gboolean numeration)

Lexicon - Lexicon structure module [Lexical tools level]

Data Structures

Defines

Typedefs

Functions

Detailed Description

Typedef Documentation

Function Documentation

Lexicon - Lexicon structure module
[Lexical tools level]