11, 2008
*
The Study of Large- scale Web Term- pairs Extraction based on Regular Expressions
( 300222)
,
Web , Web ,
,
, , 66. 7% , ,
100%
Web
T P391. 3
( a z)
0 ( ) ,
Web HT M L ,
, HT M L , ,
, , Web
HTM L , ,
, Web ,
Web , Web
,
/ 0 , 1
,
, 1. 1 HT ML
, < br>
,
< / br>
< p> < / p> < td> < / td>
:
,
1. 1. 1 : < p> e1
; , ,
c1 e2 c2 e3 c3,en cn< / p>
,
e1 , c1 ,
, ,
, < p> < br > < td>
:
, [ 1~ 7]
,
< p> library , linkage to load , , , locat io n
: ;
logger , loop machine language m ag net ic
, sto rage magnet ic tape matrix memory message
, Web , , m icrocomputer < / p>
1
, , ( : ht t p: / / w w w . ddsic. com/ blog/ category/ 8/ 39)
[ 8] [ 9] 1. 1. 2
, : < p> e1 < / p> < p> c1< / p> < p> e2< / p>
, < p> c2 < / p> ,< p> en< / p> < p> cn< / p>
: / 0 ( : 20071303)
: , , 1981 , , ,
62
Journal of Information No. 11, 2008 2008 11
< P > & nbsp; & nbsp; & nbsp; valid and subsisting bill< / p> < P > & nbsp;
& nbsp; & nbsp; < / p> < P> & nbsp; & nbsp; & nbsp; v alid
bilateral netting arrang em ent< / p> < P> & nbsp; & nbsp; & nbsp;
< / p>
1 pat t ern1
2
( : ht t p: / / w w w . 0350edu. com/ A rticle/ kind/ account/ 200604/ [ a- zA- Z \ ( \ . ]
850. ht ml) [ \ s- . % - \ . \ ( \ / \ [ a- zA 0
1. 1. 3 - Z0- 9] *
[ \ ) a- zA- Z0- 9 \ . \ ]
: < p> e1 c1< / p> < p> e2 c2 < / p> , < p>
; { 0, 1} \ s* ( ?: & nbs p) * [ \ ( \
en cn< / p> | ] * \ s* ; { 0, 1}
< br> Gross Reg istered T onnage ( GRT ) ( ) < br> Net Registered [ \ x80- \ xff] [^a- zA- Z] { 1, }
, 6 -
1. 2. 2 ,
, , , $ pattern3
, , $ pattern4
, , U i , U
:
$ pat tern1= \ [ \ s] * ( [ a- zA- Z \ ( \ . ] [ \ s- . % - \ . \ ( \ / \ [ a- zA- ,
Z0- 9] * [ \ ) a- zA- Z0- 9 \ . \ ] ] ) ; { 0, 1} \ s* ( ? : & nbsp) * [ \ ( \ | ] * , / aa0 , / a+ 0
\ s* ; {0, 1}[ \ x80- \ x ff] [ ^a- zA- Z] {1, } < B R \ s{0, 1} \ / {0, 1}> / Ui / a0, / a0
4 - ( 1) , , i
$ pattern1 , ,
[ \ s] * ( 0 )
( ) 1 pattern1 1. 2. 3
, : a.
, UT F HT ML , ; b.
- 8 ASCII ; c. , (
, 127 , ) ; d. ,
, [ ^a- zA- Z] , 7
63
2008 11 Journal of Informati on No. 11, 2008
3
,
- 822 99. 04% 100%
- 9874 99. 06% 98%
, , - ( l- p) 3910 99. 54% 100%
, , - ( a) 32 21. 3% 93. 75%
- 37 88. 1% 100%
, , / > 0
- 3460 97. 25% 100%
, ; - ( w ) 159 92. 44% 100%
, 360 - ( v) 283 99. 65% 100%
- ( t) 231 99. 57% 98. 7%
2 ,
, ,
,
, ,
100% , /
0
,
66. 7% ,
,
,
3 ,
,
,
: a.
,
,
,
,
1 3 ,
b.
7 / t matr ix t
0/ t matrix t0 ,
2
/ 0 , t
2. 1
: a. V, ; b.
A: 30 , 3
; c. P: , -
/ ; d. R: , ,
, / , ,
,
2. 2 Web 30 ,
, 2 3 , ,
2
,
V A P R ,
12. 5s 66. 7% 99. 9% 88. 44%
,
( 68 )
64
2008 11 Journal of Informati on No. 11, 2008
2
[ M ] . . : , -
, 2001: 77- 123
1 , , . [ J ] . 9 Luft man Jerry. Assessing Business - IT Alignment M at urity [ R ] .
, 2005, 2( 3) : 340- 346 Communicat ions of A IS, 2000( 12) : 1- 49assessment , 2002
2 , , . 10 . . ht tp : / / it . city. sc. cn/
[ J] . , 2005, 2( 4) : 410- 416 HTM LS/ 200572611308054- 2. ht ml, 2005
3 G ibson C F, N olan R L. M anaging t he Four St ages of ED P Growt h 11 . [ J ] . , 2005, 23
[ J] . Harvard Business Review , 1974, 52( 1) : 76- 88 ( 2) : 9- 13
4 N olan R L. M anaging t he Comput er Resource: ASt age Hypot hesis 12 , , . [ J ] .
[ J] . Communicat ions of A CM , 1973, 16( 7) : 399- 405 ( ) , 2007, 37( 4) : 976- 980
5 N olan R L, Croson D C, Seger K N. Th e Stages t heory: A Frame- 13 , . [ J] .
w ork for IT Adoption an d Organizat ional Learning[ M ] . Bost on: Har- , 2007, ( 12) : 136- 138
vard Business School Publishing, 1993 14 , . [ J] .
6 Nolan R L. M anaging t he Crisis in Data Processing[ J] . Harvard Bus-
i , 2007, ( 8) : 39- 44
ness Review , 1979, 57( 2) : 115- 126 15 , , .
7 K oen Brand, HarryBoonen. IT OG vernanee. A Poeket Guide based on [ J] . , 2007, ( 8) : 108- 110
CO BIT [ M ] . V an Haren Publishing, 2004: 56- 135 16 . 5
8 S oumet ra Dut ta, M azoni Jean - Fransow a. 6 , 200 ( : )
68