Lingua::JA::RegEx - Japanese-friendly regular expression |
Lingua::JA::RegEx - Japanese-friendly regular expression
use Lingua::JA::RegEx;
my $re = new Lingua::JA::RegEx 'sjis';
$re->set("^$japanese_word");
while($re->match(\$text, 'g')) { print $re->{MATCH}; }
$re->replace(\$text, '($&)', 'g'); print $text;
Japanese is expressed in double byte codes on computers, because it has more characters than 256. This makes writing regular expressions that contain Japanese characters rather complicated matter.
For example, a character ``\x93\x82'', which represents the Tang dynasty of China, could be matched a pair of characters ``\x89\x93\x82\xa2'', which means 'far way'. So, you must isolate this character by writing ``(:?\x93\x82)''.
To make matters worse, many characters contains metacharacters as the second byte (in case of Shift-JIS). A character ``\x8b\x5c'', which means 'deceive', must be escaped in regular expressions.
With Lingua::JA::Regex, you can write regular expressions in Japanese without bothering about isolation or escaping.
It returns 1 if the regular expression was successfully evaluated, otherwise it returns 0.
If given, MODIFIERS alter the way this regular expression is used by Perl. MODIFIERS is a combination of one or more letters from 'i', 'm', 's' and 'x'. See perlre man page for details.
If the pattern evaluation failed, you can get the error message with
$re-error_message>{}
method. You may get bizarre error message
when your original regular expression has hex char expressions like
as '\x80'. You can avoid this by writing as '\x{80}'.
set()
method against the string that
SCALARREF refers. Returns 1 for success, 0 for failure.
If successful, match
method sets MATCH
, PREMATCH
and POSTMATCH
properties of the object, which are equivalent to $&
, $`
and $'
varialbes. You can refer to these properties as $re->{MATCH}
,
$re->{PREMATCH}
and $re->{POSTMATCH}
.
The substrings matched by the subpatterns are held in BUFFERS
as an
array reference, like as ($&, $1, $2, $3, ...).
When given, OPTIONS must be 'g'
or 'gc'
whose meanings are identical
with those of m//
. See perlop man page for details.
Note that this method always returns a boolean value even if it is used
in the list context. This behavior is different from that of m//
.
set()
method within the string refered by
SCALARREF with REPLACE after evaluation.
REPLACE can be scalar ref or code ref.
If REPLACE is a scalar ref, the matched string is replaced by the string refered by REPLACE.
If REPLACE is a code ref, replace()
method calls this subroutine,
which takes a hash ref that has the MATCH
, PREMATCH
, POSTMATCH
and BUFFERS
properties and returns a string;
OPTIONS must be 'g'
or omitted.
is_set()
set()
method has been called successfully, otherwise
returns 0;
set()
call.
Tsutomu Kuroda <tkrd@mail.com>
Copyright (c) 2002 Tsutomu Kuroda <tkrd@mail.com> All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Lingua::JA::RegEx - Japanese-friendly regular expression |