Lingua::JA::RegEx - Japanese-friendly regular expression


NAME

Lingua::JA::RegEx - Japanese-friendly regular expression


SYNOPSIS

 use Lingua::JA::RegEx;

 my $re = new Lingua::JA::RegEx 'sjis';

 $re->set("^$japanese_word");

 while($re->match(\$text, 'g')) {
     print $re->{MATCH};
 }

 $re->replace(\$text, '($&)', 'g');
 print $text;


DESCRIPTION

Japanese is expressed in double byte codes on computers, because it has more characters than 256. This makes writing regular expressions that contain Japanese characters rather complicated matter.

For example, a character ``\x93\x82'', which represents the Tang dynasty of China, could be matched a pair of characters ``\x89\x93\x82\xa2'', which means 'far way'. So, you must isolate this character by writing ``(:?\x93\x82)''.

To make matters worse, many characters contains metacharacters as the second byte (in case of Shift-JIS). A character ``\x8b\x5c'', which means 'deceive', must be escaped in regular expressions.

With Lingua::JA::Regex, you can write regular expressions in Japanese without bothering about isolation or escaping.


CONSTRUCTOR

new( [CODE] )
Creates a regular expression object for CODE, which must be 'sjis' or 'euc'. If CODE is omitterd, 'euc' is assumed.


METHODS

set(REGEX, [MODIFIERS])
REGEX is a regular expression written in Japanese. When X, Y and Z are double byte characters, you can write as 'X*Y' instead of '(?:X)*Y'. You do not have to escape metacharacters that can be found as the second byte of Japanese characters of Shift JIS code.

It returns 1 if the regular expression was successfully evaluated, otherwise it returns 0.

If given, MODIFIERS alter the way this regular expression is used by Perl. MODIFIERS is a combination of one or more letters from 'i', 'm', 's' and 'x'. See perlre man page for details.

If the pattern evaluation failed, you can get the error message with $re-error_message>{} method. You may get bizarre error message when your original regular expression has hex char expressions like as '\x80'. You can avoid this by writing as '\x{80}'.

match(SCALARREF, [OPTIONS])
Searches the pattern set by set() method against the string that SCALARREF refers. Returns 1 for success, 0 for failure.

If successful, match method sets MATCH, PREMATCH and POSTMATCH properties of the object, which are equivalent to $&, $` and $' varialbes. You can refer to these properties as $re->{MATCH}, $re->{PREMATCH} and $re->{POSTMATCH}.

The substrings matched by the subpatterns are held in BUFFERS as an array reference, like as ($&, $1, $2, $3, ...).

When given, OPTIONS must be 'g' or 'gc' whose meanings are identical with those of m//. See perlop man page for details.

Note that this method always returns a boolean value even if it is used in the list context. This behavior is different from that of m//.

replace(SCALARREF, REPLACE, [OPTIONS])
Replaces the pattern set by set() method within the string refered by SCALARREF with REPLACE after evaluation.

REPLACE can be scalar ref or code ref.

If REPLACE is a scalar ref, the matched string is replaced by the string refered by REPLACE.

If REPLACE is a code ref, replace() method calls this subroutine, which takes a hash ref that has the MATCH, PREMATCH, POSTMATCH and BUFFERS properties and returns a string;

OPTIONS must be 'g' or omitted.

is_set()
Returns 1 if the set() method has been called successfully, otherwise returns 0;

error_message();
Returns the error message of the last failed set() call.


AUTHOR

Tsutomu Kuroda <tkrd@mail.com>


COPYRIGHT

Copyright (c) 2002 Tsutomu Kuroda <tkrd@mail.com> All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

 Lingua::JA::RegEx - Japanese-friendly regular expression