Anda di halaman 1dari 8

InfobrightDataLoadingGuide

Revision2.1November11,2010 InfobrightincludesadedicatedhighperformanceloaderthatdiffersfromthestandardMySQLLoader. TheInfobrightLoaderisdesignedforspeed,butsupportslessLOADsyntaxthantheMySQLLoader,and onlysupportsvariablelengthtextformattedloadfiles.IEEadditionallysupportstheMySQLLoader,and theINSERTstatement.

DefaultLoader
ICEonlysupportstheInfobrightLoader. IEEdefaultstotheMySQLLoader,whichhasmorerobusterrorhanding,butisnotasfastasthe InfobrightLoader.ForfastestLOADresults,usetheInfobrightLoaderbysettingthe@bh_dataformat environmentvariableasdescribedbelow: TousetheInfobrightLoaderwithvariablelengthtextinCSVformat,enter: mysql> set @bh_dataformat = 'txt_variable'; TousetheInfobrightLoaderwithbinarydata,enterthefollowingcommand: mysql> set @bh_dataformat = 'binary'; ToreturntothedefaultMySQLLoader,setthedataformattothestandardMySQLformat: mysql> set @bh_dataformat = 'mysql';

Page1of8

InfobrightLoaderSyntax
ImportyourdataintoanInfobrighttablebyusingthefollowingloadsyntax(allotherMySQLLoader syntaxisnotsupported):
LOAD DATA INFILE '/full_path/file_name' INTO TABLE tbl_name [FIELDS [TERMINATED BY 'char'] [ENCLOSED BY 'char'] [ESCAPED BY 'char'] ];

ThedataiscommittedwhentheloadcompletesifAUTOCOMMITissettoon.Thisisdefaultsetting,but youcanmakeitexplicitbysetting:
Set AUTOCOMMIT=1;

Ifyouwanttocheckthedataviaselectbeforecommitting,thensetAUTOCOMMITtooff:
Set AUTOCOMMIT=0;

Newdatacanbeseenbytheloadingsessioneventhoughitsnotcommitted.IfAUTOCOMMITisoff,you mustcompletetheloadusinganexplicitCOMMIT(oritwillrollbackwhentheconnectionexits):
COMMIT;

FIELDSClause
FIELDSsubclausesareoptional.Ifnotspecifiedthedefaultvaluesareused:
CLAUSE FIELDS TERMINATED BY FIELDS ENCLOSED BY FIELDS ESCAPED BY DEFAULT VALUE ';' (semicolon) '"' (double quote) '' (none)

Fielddelimitersmustbesinglecharactersandnottwo(ie,not//).TheLoaderdoesnothaveadefault valuefortheescapecharacter,soifyouareusingoneitneedstobestatedexplicitly. MySQLfullsyntaxisavailableonhttp://dev.mysql.com/doc/refman/5.1/en/loaddata.html FIELDSTERMINATEDBY Theinputfilemaybedelimitedusing,forexample,asemicolon,comma,pipe(|)ortab(\t)itmustbe asinglecharacter(ie,not//).Itisimportantthatthecharacterusedasadelimiterdoesnotappearin theactualdataunlessitisspecificallyescapedorenclosed(seedetailsinFIELDSESCAPEDBYfurtheron).

Page2of8

FIELDSENCLOSEDBY Theinputfilemayhavefieldsenclosedbyacharacteraslongasitisstatedexplicitly,otherwisethe defaultenclosureofisassumed.Ifthereisnoenclosure,theneitherENCLOSEDBYNULLneedstobe statedexplicitlyastheenclosuretype,oralternatively,theENCLOSEDBYclausecanbeomitted.Itis importantthatanenclosurecharacterdoesnotappearintheactualdataunlessitisspecificallyescaped (seedetailsinFIELDSESCAPEDBYfurtheron). FIELDSESCAPEDBYCase1:Delimiters Ifacharacterthatisusedasadelimiterappearsintheactualdataitmusteitherbeescapedorthe entirefieldmustbeenclosed. Forexampleifwewanttoimportatextfieldof[one,twoorthree]wherethedatafieldsarealso terminatedby,asin:
1,one,two or three,1234

thenwecaneitheruseESCAPEDBY\\whichrequiresaddingthe\escapecharactertothedata,orwe canuseENCLOSEDBYwhichrequiresthetextfieldtobeenclosedby. Theinputfileandcorrespondingloadstatement:


1,one\, two or three,1234 LOAD DATA INFILE '/usr/tmp/file1.txt' INTO TABLE test_table1 FIELDS TERMINATED BY ',' ENCLOSED BY 'NULL' ESCAPED BY '\\';

isequivalenttoaninputfileandcorrespondingloadstatementof:
1,"one, two or three",1234 LOAD DATA INFILE '/usr/tmp/file2.txt' INTO TABLE test_table1 FIELDS TERMINATED BY ',' ENCLOSED BY '"';

sothesetwofilesandloadstatementswillresultin:
mysql> select * from test_table1; +------+-------------------+-----------+ | id | textfield | numerical | +------+-------------------+-----------+ | 1 | one, two or three | 1234 | | 1 | one, two or three | 1234 | +------+-------------------+-----------+ 2 rows in set (0.00 sec)

Page3of8

FIELDSESCAPEDBYCase2:Enclosures Ifacharacterusedasanenclosureappearsintheactualdata,itmustbeescapedotherwiseitwillnot load. Forexampleifwewanttoimportthetextfield[oneandtwo],thentheinputfilerequiredis:


1,"one \"and\" two",1234

andthecorrespondingloadstatementis:
LOAD DATA INFILE '/usr/tmp/file3.txt' INTO TABLE test_table1 FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\';

sothisthirdfilewasreadinas:
mysql> select * from test_table1; +------+-------------------+-----------+ | id | textfield | numerical | +------+-------------------+-----------+ | 1 | one, two or three | 1234 | | 1 | one, two or three | 1234 | | 1 | one "and" two | 1234 | +------+-------------------+-----------+ 3 rows in set (0.00 sec)

Note:ThereiscurrentlyabugbeingtrackedinICEwhereunescapedembeddedenclosuresarehandled inIBdifferentlythaninMySQL.Ifanevennumberofunescapedembeddedenclosuresisincludedin thetext,thedataloadswithoutrequiringanescapecharacterandishandledasexpectedasperMySQL. However,ifthereareanoddnumberofunescapedembeddedenclosureswithinthetextthenthatrow ofdatadoesnotloadandtheloadcompletesatthepreviousrow;noerrormessageisreturned.With MySQLthiscasewouldloadthedata.ThisbehaviourisbeingupdatedtoreflecthowMySQLhandlesthis case. ESCAPECHARACTERS Ifatextfieldincludesescapecharacters,thesemustbeescapedexplicitly,otherwisethestringisreadin asstraighttext. Forthefollowinginputfile:
2,other \t\t\ttext,4567

Usingtwodifferentloadcommands(onewithandonewithoutanESCAPEDBY):
LOAD DATA INFILE '/usr/tmp/file4.txt' INTO TABLE test_table1 FIELDS TERMINATED BY ',' ENCLOSED BY 'NULL' ESCAPED BY '\\'; LOAD DATA INFILE '/usr/tmp/file4.txt' INTO TABLE test_table1 FIELDS TERMINATED BY ',' ENCLOSED BY 'NULL';

Page4of8

Thedatawillbereadindifferentlyandproducedifferentresultsrespectively:
mysql> select * from test_table1; +------+--------------------------------+-----------+ | id | textfield | numerical | +------+--------------------------------+-----------+ | 2 | other text | 4567 | | 2 | other \t\t\ttext | 4567 | +------+--------------------------------+-----------+ 2 rows in set (0.00 sec)

LINESTERMINATEDClause
ItisimportanttonotethatIBloaderignorestheLINESTERMINATEDBYclause.Insteaditdetectshow recordsareterminatedbasedonthedataintheinputfile. TherearetwosupportedEOLformats: 1.Windowsspecific'\r\n' 2.Unixspecific'\n'

Page5of8

AboutIEEDataImportMethods
ThefollowingmethodsaresupportedforloadingdataintoIEE,listedfromslowesttofastest: DatabaseINSERTEnsurethat"autocommit=0"thendoanexplicitCOMMITattheendof yourprocess.INSERTisnotdesignedtoloadverylargeamountsofdata,soforhundredsof thousandsofrowsorgreater;itisrecommendedyouuseoneoftheLoaderoptions. MySQLLoaderLoadsfromaflatfilethatincludestextandfielddelimiters.Supportsmore featuresforescapingembeddedcharacters,errorhandling,andtransformingdatausing functionsatthetimeofloadthentheInfobrightLoader.Thisloaderissetusinganenvironment variable(@bh_dataformat='mysql'). o Formoreinformationseehttp://dev.mysql.com/doc/refman/5.1/en/loaddata.html ETLToolsThisusesanETLtooltodirectlyconnecttoyourdatasourceandloaddatadirectly toInfobrightoveranamedpipeconnection(datafilesnotneeded).WhenusingIEEyoucanalso loaddatainbinaryformat,whichistypicallytwiceasfastasusingtextformat.Thisalsorequires additionalconnectorsavailableontheinfobright.orgcontributedcodepage: o http://www.infobright.org/Downloads/ContributedSoftware/ InfobrightLoader,textdataLoadsfromaflatfilethatincludestextandfielddelimiters.The Infobrightloaderhasreducedsyntaxsupporttooptimizeloadspeeds,andrequiresacleanload file.Thisloaderissetusinganenvironmentvariable(@bh_dataformat='txt_variable'). Note:Whenusingtextformat,youmustrespectformatconventionsparticularlyforDATE, DATETIMEandTIMESTAMPtypes.TheonlyacceptedformatforDATEisYYYY-MM-DD.Theonly acceptedformatforDATETIMEandTIMESTAMPisYYYY-MM-DD HH:mm:sswhereHH representshoursona24hourclock.Inparticular,theAM/PMmodifierisnotsupported. InfobrightLoader,binarydataLoadsfromaflatfileinbinarydataformat(seebelowfor details),whichistypicallytwiceasfastasusingtextformat.Thisloaderanddataformatisset usinganenvironmentvariable(@bh_dataformat='binary').

Page6of8

IEEBinaryFormat
WithInfobrightsbinaryformatload,individualrowsarenotseparatedbyanyspecialcharacters.There arealsonovaluesdelimiterorqualifier. ThefollowingschemashowsformatofonerowintheBINARY format. 2 bytes Value1
L bytes

Value2

...

ValueN

Numberof bytesinthe wholerow Null indicators

EveryrowstartswithL(2byteinteger)thatspecifiesnumberoffollowingbytesofdata. Nullindicatorsareanarrayofbitsonebitpereachcolumn.1onmthbitmeansthatmthvalueinthe rowisNULL.


Bit

Nullindicators example. 0 1 0 0 1 0 0

ThenumberofcolumnsinarecorddeterminesthenumbersofbytesinNULLindicators.Forexample, forarecordthatcontainsfromonetoeightcolumnsindicatorbitsarestoredononebyte.Ifarecord containsfromnineto16columns,twobytesareusedandsoon. NULLindicatorsarrayisfollowedbyNvalueswhereNisanumberofcolumnsinarow.

Thirdandsixth valuesintherow areNULLs

Page7of8

Formatsandlengthsinbytesforparticulardatatypesareshowedinthefollowingtable. Datatype Format Lengthinbytes TINYINT 1 SMALLINT 2 MEDIUMINT 3 INTEGER 4 BIGINT 8 FLOAT IEEE 4-byte Float 4 DOUBLE IEEE 8-byte Double 8 Lengthin N bytes [1,2] 1 DECIMAL(N, M) (Actual value) * 10^M [3,4] 2 [5,9] 4 [10,18] 8 TIME [sign][h]hh:mm:ss 8-10 YEAR 2 2bytesinteger 4bytesintegeryyyymmdd DATE 4 whereyyyy=year1900 TIMESTAMP/DATETIME yyyy-mm-dd hh:mm:ss 19 CHAR(N) N N characters 2byteintegerofvalueL VARCHAR(N) 2+L followedbyLcharacters BINARY(N) N Nbytes 2byteintegerofvalueL VARBINARY(N) 2+L followedbyLbytes

NotethatCHARisconstantsized,whereasVARCHARoccupyonlythesizeneededforactualvalue. Integerandfloatingpointdataarestoredasanaturalbinaryrepresentationofthesevalues(little endian).

Page8of8

Anda mungkin juga menyukai