Anda di halaman 1dari 10

FUNCTION zjnc_parse_artdesc.

*"---------------------------------------------------------------------*"*"Local Interface: *" IMPORTING *" REFERENCE(INPTEXT) TYPE CHAR2048 *" EXPORTING *" REFERENCE(METAPHONE1) TYPE CHAR4 *" REFERENCE(METAPHONE2) TYPE CHAR4 *" REFERENCE(METAPHONE3) TYPE CHAR4 *" REFERENCE(METAPHONE4) TYPE CHAR4 *" REFERENCE(METAPHONE5) TYPE CHAR4 *" REFERENCE(METAPHONE6) TYPE CHAR4 *" REFERENCE(METAPHONE7) TYPE CHAR4 *" REFERENCE(METAPHONE8) TYPE CHAR4 *" REFERENCE(NUMBER1) TYPE NUM8 *" REFERENCE(NUMBER2) TYPE NUM8 *" REFERENCE(NUMBER3) TYPE NUM8 *" REFERENCE(NUMBER4) TYPE NUM8 *"---------------------------------------------------------------------DATA : hit(1) wword len off TYPE TYPE TYPE TYPE c, char5, i, i.

TYPES: BEGIN OF t_token, token(40) TYPE c, END OF t_token. DATA: it_token TYPE STANDARD TABLE OF t_token. DATA: wa_token TYPE t_token. FIELD-SYMBOLS: <fs_token> TYPE t_token. TYPES: BEGIN OF t_number, numlen TYPE i, number(8) TYPE n, END OF t_number. DATA: it_number TYPE STANDARD TABLE OF t_number. DATA: wa_number TYPE t_number. FIELD-SYMBOLS: <fs_number> TYPE t_number. TYPES: BEGIN OF t_word, wlen TYPE i, word(16) TYPE c, END OF t_word. DATA: it_word TYPE STANDARD TABLE OF t_word. DATA: wa_word TYPE t_word. FIELD-SYMBOLS: <fs_word> TYPE t_word. CLEAR: metaphone1, metaphone2, metaphone3, metaphone4, metaphone5, metaphone6, metaphone7, metaphone8, number1, number2, number3, number4.

SPLIT inptext AT space INTO TABLE it_token. LOOP AT it_token ASSIGNING <fs_token>. len = STRLEN( <fs_token>-token ). IF len = 0. CONTINUE. ENDIF. TRANSLATE <fs_token>-token TO UPPER CASE. " Remove other than A-Z 0-9 REPLACE ALL OCCURRENCES OF REGEX '[^A-Z0-9]' IN <fs_token>-token WITH ''. CLEAR len. FIND REGEX '[0-9]+' IN <fs_token>-token MATCH OFFSET off MATCH LENGTH len. IF len > 0 AND len < 9. MOVE <fs_token>-token+off(len) TO wa_number-number. MOVE len TO wa_number-numlen. APPEND wa_number TO it_number. ENDIF. " Remove other than A-Z REPLACE ALL OCCURRENCES OF REGEX '[^A-Z]' IN <fs_token>-token WITH ''. len = STRLEN( <fs_token>-token ). IF len > 2. CASE <fs_token>-token. WHEN 'ALL' OR 'AND' OR 'ARE' OR 'BIG' OR 'BOOK' OR 'FOR'. MOVE 'Y' TO hit. WHEN 'FROM' OR 'HOW' OR 'NEW' OR 'NOT' OR 'NOW' OR 'OUT'. MOVE 'Y' TO hit. WHEN 'SET' OR 'THAT' OR 'THE' OR 'USE' OR 'WHEN'. MOVE 'Y' TO hit. WHEN 'WHO' OR 'WHY' OR 'WILL' OR 'WITH' OR 'YOU'. MOVE 'Y' TO hit. WHEN OTHERS. MOVE 'N' TO hit. ENDCASE. IF hit = 'N'. MOVE <fs_token>-token TO wa_word-word. IF len > 16. MOVE 16 TO len. ENDIF. MOVE len TO wa_word-wlen. APPEND wa_word TO it_word. ENDIF. ENDIF. ENDLOOP. SORT it_number BY numlen DESCENDING. SORT it_word BY wlen DESCENDING. LOOP AT it_number ASSIGNING <fs_number>. CASE sy-tabix.

WHEN 1. MOVE <fs_number>-number WHEN 2. MOVE <fs_number>-number WHEN 3. MOVE <fs_number>-number WHEN 4. MOVE <fs_number>-number WHEN OTHERS. EXIT. ENDCASE. ENDLOOP. LOOP AT it_word ASSIGNING CASE sy-tabix. WHEN 1. PERFORM f_metaphone WHEN 2. PERFORM f_metaphone WHEN 3. PERFORM f_metaphone WHEN 4. PERFORM f_metaphone WHEN 5. PERFORM f_metaphone WHEN 6. PERFORM f_metaphone WHEN 7. PERFORM f_metaphone WHEN 8. PERFORM f_metaphone WHEN OTHERS. EXIT. ENDCASE. ENDLOOP. ENDFUNCTION.

TO number1. TO number2. TO number3. TO number4.

<fs_word>. USING <fs_word>-word <fs_word>-wlen metaphone1. USING <fs_word>-word <fs_word>-wlen metaphone2. USING <fs_word>-word <fs_word>-wlen metaphone3. USING <fs_word>-word <fs_word>-wlen metaphone4. USING <fs_word>-word <fs_word>-wlen metaphone5. USING <fs_word>-word <fs_word>-wlen metaphone6. USING <fs_word>-word <fs_word>-wlen metaphone7. USING <fs_word>-word <fs_word>-wlen metaphone8.

*&--------------------------------------------------------------------* *& Form f_metaphone *&--------------------------------------------------------------------* FORM f_metaphone USING inpword TYPE c inpwlen TYPE i CHANGING metaphone TYPE c. DATA: inpoff inofp1 inofp2 inofm1 wlocal(40) wmetphn(8) wrdsiz outoff hard(1) inpwlm1 inpchr(1) TYPE TYPE TYPE TYPE TYPE TYPE TYPE TYPE TYPE TYPE TYPE i, i, i, i, c, c, i, i, c, i, c.

CONSTANTS: maxcodelen TYPE i VALUE 4.

CLEAR: inpoff, outoff, wmetphn. outoff = 0. hard = 'N'. inpwlm1 = inpwlen - 1. " handle initial 2 characters exceptions CASE inpword+0(1). WHEN 'K' OR 'G' OR 'P'. " looking for KN, etc IF inpword+1(1) = 'N'. wlocal = inpword+1(inpwlm1). ELSE. wlocal = inpword. ENDIF. WHEN 'A'. " looking for AE IF inpword+1(1) = 'E'. wlocal = inpword+1(inpwlm1). ELSE. wlocal = inpword. ENDIF. WHEN 'W'. " looking for WR or WH IF inpword+1(1) = 'R'. " WR -> R wlocal = inpword+1(inpwlm1). ELSE. IF inpword+1(1) = 'H'. wlocal = inpword+1(inpwlm1). wlocal+0(1) = 'W'. " WH -> W ELSE. wlocal = inpword. ENDIF. ENDIF. WHEN 'X'. " initial X becomes S wlocal = inpword. wlocal+0(1) = 'S'. WHEN OTHERS. wlocal = inpword. ENDCASE. " now wlocal has working string with initials fixed wrdsiz = STRLEN( wlocal ). inpoff = 0. DO. IF outoff >= maxcodelen. " max metaphone size of 4 works well EXIT. ENDIF. IF inpoff >= wrdsiz. EXIT. ENDIF. inofm1 = inpoff - 1. inofp1 = inpoff + 1. inofp2 = inpoff + 2. inpchr = wlocal+inpoff(1).

" remove duplicate letters except C IF ( inpchr <> 'C') AND ( inpoff > 0 ) AND ( wlocal+inofm1(1) = inpchr ). inpoff = inpoff + 1. CONTINUE. ENDIF. CASE inpchr. WHEN 'A' OR 'E' OR 'I' OR 'O' OR 'U'. IF inpoff = 0. wmetphn+outoff(1) = inpchr. outoff = outoff + 1. ENDIF. " only use vowel if leading char WHEN 'B'. IF ( inpoff > 0 AND wlocal+inofm1(1) = 'M' AND ( inofp1 = wrdsiz OR ( inofp2 = wrdsiz AND wlocal+inofp1(1) = 'E' ) ) ). " MB or MBE at end of word inpoff = inpoff + 1. CONTINUE. ELSE. wmetphn+outoff(1) = inpchr. outoff = outoff + 1. ENDIF. WHEN 'C'. " lots of C special cases " discard if SCI, SCE or SCY IF ( inpoff > 0 ) AND ( wlocal+inofm1(1) = 'S' ) AND ( inofp1 < wrdsiz ) . CASE wlocal+inofp1(1). WHEN 'E' OR 'I' OR 'Y'. inpoff = inpoff + 1. CONTINUE. ENDCASE. ENDIF. IF wlocal+inpoff(3) wmetphn+outoff(1) outoff = outoff + inpoff = inpoff + CONTINUE. ENDIF. = 'CIA'. " CIA -> X = 'X'. 1. 2.

IF inofp1 < wrdsiz. CASE wlocal+inofp1(1). WHEN 'E' OR 'I' OR 'Y'. wmetphn+outoff(1) = 'S'. outoff = outoff + 1. " CI,CE,CY -> S inpoff = inpoff + 1. CONTINUE. ENDCASE. ENDIF. IF ( inpoff > 0 ) AND ( wlocal+inofm1(3) = 'SCH' ). " SCH->sk wmetphn+outoff(1) = 'K'.

outoff = outoff + 1. inpoff = inpoff + 1. CONTINUE. ENDIF. IF wlocal+inpoff(2) = 'CH'. " detect CH IF ( inpoff = 0 ) AND ( wrdsiz >= 3 ). " CH consonant -> K consonant CASE wlocal+2(1). WHEN 'A' OR 'E' OR 'I' OR 'O' OR 'U'. wmetphn+outoff(1) = 'X'. " CHvowel -> X outoff = outoff + 1. inpoff = inpoff + 1. CONTINUE. ENDCASE. ELSE. wmetphn+outoff(1) = 'K'. outoff = outoff + 1. inpoff = inpoff + 1. CONTINUE. ENDIF. ENDIF. wmetphn+outoff(1) = 'K'. outoff = outoff + 1. WHEN 'D'. IF ( inofp2 < wrdsiz ) AND ( wlocal+inofp1(1) = 'G' ). " DGE DGI DGY -> J CASE wlocal+inofp2(1). WHEN 'E' OR 'I' OR 'Y'. wmetphn+outoff(1) = 'J'. inpoff = inpoff + 2. ENDCASE. ELSE. wmetphn+outoff(1) = 'T'. ENDIF. outoff = outoff + 1. WHEN 'G'. " GH silent at end or before consonant IF ( inofp2 = wrdsiz ) AND ( wlocal+inofp1(1) = 'H' ). inpoff = inpoff + 1. CONTINUE. ENDIF. IF ( inofp2 < wrdsiz ) AND ( wlocal+inofp1(1) = 'H' ). CASE wlocal+inofp2(1). WHEN 'A' OR 'E' OR 'I' OR 'O' OR 'U'. inpoff = inpoff + 0. " do Nothing! WHEN OTHERS. inpoff = inpoff + 1. CONTINUE. ENDCASE. ENDIF. IF ( inpoff > 0 ) AND ( wlocal+inpoff(2) = 'GN' OR wlocal+inpoff(4) = 'GNED' ). inpoff = inpoff + 1. CONTINUE. ENDIF. " silent G

IF ( inpoff > 0 ) AND ( wlocal+inofm1(1) = 'G' ). hard = 'Y'. ELSE. hard = 'N'. ENDIF. IF ( inofp1 < wrdsiz ) AND ( hard = 'N' ). CASE wlocal+inofp1(1). WHEN 'E' OR 'I' OR 'Y'. wmetphn+outoff(1) = 'J'. WHEN OTHERS. wmetphn+outoff(1) = 'G'. ENDCASE. ELSE. wmetphn+outoff(1) = 'K'. ENDIF. outoff = outoff + 1. WHEN 'H'. IF inofp1 = wrdsiz. inpoff = inpoff + 1. CONTINUE. ENDIF. " terminal H IF ( inpoff > 0 ). CASE wlocal+inofm1(1). WHEN 'C' OR 'S' OR 'P' OR 'T' OR 'G'. inpoff = inpoff + 1. CONTINUE. ENDCASE. ENDIF. CASE wlocal+inofp1(1). WHEN 'A' OR 'E' OR 'I' OR 'O' OR 'U'. wmetphn+outoff(1) = 'H'. outoff = outoff + 1. " Hvowel ENDCASE. WHEN 'F' OR 'J' OR 'L' OR 'M' OR 'N' OR 'R'. wmetphn+outoff(1) = inpchr. outoff = outoff + 1. WHEN 'K'. IF inpoff > 0. " not initial IF wlocal+inofm1(1) <> 'C'. wmetphn+outoff(1) = inpchr. outoff = outoff + 1. ENDIF. ELSE. wmetphn+outoff(1) = inpchr. " initial K outoff = outoff + 1. ENDIF. WHEN 'P'. IF ( inofp1 < wrdsiz ) AND ( wlocal+inofp1(1) = 'H' ). " PH -> F wmetphn+outoff(1) = 'F'. ELSE. wmetphn+outoff(1) = inpchr.

ENDIF. outoff = outoff + 1. WHEN 'Q'. wmetphn+outoff(1) = 'K'. outoff = outoff + 1. WHEN IF OR OR 'S'. ( wlocal+inpoff(2) ( wlocal+inpoff(3) ( wlocal+inpoff(3) wmetphn+outoff(1) = ELSE. wmetphn+outoff(1) = ENDIF. outoff = outoff + 1. WHEN 'T'. IF ( wlocal+inpoff(3) = 'TIA' ) OR ( wlocal+inpoff(3) = 'TIO' ). wmetphn+outoff(1) = 'X'. outoff = outoff + 1. inpoff = inpoff + 1. CONTINUE. ENDIF. IF wlocal+inpoff(3) = 'TCH'. inpoff = inpoff + 1. CONTINUE. ENDIF. " substitute numeral 0 for TH (resembles theta after all) IF wlocal+inpoff(2) = 'TH'. wmetphn+outoff(1) = '0'. ELSE. wmetphn+outoff(1) = 'T'. ENDIF. outoff = outoff + 1. WHEN 'V'. wmetphn+outoff(1) = 'F'. outoff = outoff + 1. WHEN 'W' OR 'Y'. " silent if not followed by vowel IF inofp1 < wrdsiz. CASE wlocal+inofp1(1). WHEN 'A' OR 'E' OR 'I' OR 'O' OR 'U'. wmetphn+outoff(1) = inpchr. outoff = outoff + 1. ENDCASE. ENDIF. WHEN 'X'. wmetphn+outoff(1) outoff = outoff + wmetphn+outoff(1) outoff = outoff + = 'K'. 1. = 'S'. 1. = 'SH' ) = 'SIO' ) = 'SIA' ). 'X'. 'S'.

WHEN 'Z'. wmetphn+outoff(1) = 'S'. outoff = outoff + 1. ENDCASE. inpoff = inpoff + 1. ENDDO. metaphone = wmetphn+0(4). ENDFORM. " end f_metaphone * Ref: http://aspell.net/metaphone/ * Program Based on: http://www.wbrogden.com/ * http://www.wbrogden.com/phonetic/index.html * * Objective: get back a list of words that might have a similar pronunciation. * This list might be useful if you are looking for alternate spellings. * * The original metaphone algorithm was published by Lawrence Philips * in an article entitled "Hanging on the Metaphone" in the journal * Computer Language v7 n12, December 1990, pp39-43. * His algorithm - translated into Java, and with minor tweaks * - is what we are using here. * Naturally, a phonetic encoding system has to assume a particular language and culture. * Here we are using essentially American English. * * The Metaphone Rules * Metaphone reduces the alphabet to 16 consonant sounds: * * B X S K J T F H L M N P R 0 W Y * * That isn't an O but a zero - representing the 'th' sound. * * Transformations * Metaphone uses the following transformation rules: * Doubled letters except "c" -> drop 2nd letter. * Vowels are only kept when they are the first letter. * * B -> B unless at the end of a word after "m" as in "dumb" * C -> X (sh) if -cia- or -ch* S if -ci-, -ce- or -cy* K otherwise, including -sch* D -> J if in -dge-, -dgy- or -dgi* T otherwise * F -> F * G -> silent if in -gh- and not at end or before a vowel * in -gn- or -gned- (also see dge etc. above) * J if before i or e or y if not double gg * K otherwise * H -> silent if after vowel and no vowel follows * H otherwise * J -> J * K -> silent if after "c" * K otherwise * L -> L * M -> M

* * * * * * * * * * * * * * * * * * * * * * * * * *

N -> N P -> F P Q -> K R -> R S -> X S T -> X 0 T V -> F W -> W X -> KS Y -> Y Z -> S

if before "h" otherwise (sh) if before "h" or in -sio- or -siaotherwise (sh) if -tia- or -tio(th) if before "h" silent if in -tchotherwise silent if not followed by a vowel if followed by a vowel silent if not followed by a vowel if followed by a vowel

Initial Letter Exceptions Initial kn-, gn- pn, ac- or wrInitial xInitial wh-> drop first letter -> change to "s" -> change to "w"

The code is truncated at 4 characters in this example, but more could be used.

Anda mungkin juga menyukai