Anda di halaman 1dari 12

XML & XML with Informatica

XML My best description of XML is this : XML is a cross-platform, software and hardware independent tool for transmitting information. XML is used to Exchange Data With XML, data can be exchanged between incompatible s stems. In the real world, computer systems and databases contain data in incompatible formats. One of the most time-consuming challenges for developers has been to exchange data between such systems over the Internet. onverting the data to XML can greatly reduce this complexity and create data that can be read by many different types of applications. XML, D!D, and XML "chema !xtensible Mar"up Language #XML$ is a mar"up language generally regarded as the universal format for structured documents and data on the %eb. Li"e &'ML, XML contains element tags and attributes that define data. (nli"e &'ML, XML element tags and attributes are not based on a predefined, static set of elements and attributes. !very XML file can have a different set of tags and attributes. )ocument 'ype )efinition #)')$ files and XML schema files define the elements and attribute that can be used and the structure within which they fit in an XML file. )') and XML schema files specify the structure and content of XML files in different ways. * )') file defines the names of elements, the number of times the occur, and how the fit together. 'he XML schema file provides the same information plus the data t pes of the elements. D!D 'he purpose of a )') is to define the legal building bloc"s of an XML document. It defines the document structure with a list of legal elements. * )') can be declared inline in your XML document, or as an external reference. 'he )') file contains only metadata. It contains the description of the structure and the definition of the elements and attributes that can be found in the associated XML file. It does not contain any data. * sample )') loo"s li"e this+ ,-!L!M!.' ,-!L!M!.' ,-!L!M!.' ,-!L!M!.' ,-!L!M!.' employees #companyname, employee $ / companyname # id, name$ / employee # emp0 $ / emp # id, info $ / info # name, age, sex, 1ob, sal $ /

,-!L!M!.' created-date # format, timestamp $ / ,-!L!M!.' ,-!L!M!.' ,-!L!M!.' ,-!L!M!.' id # 23 )*'* $ / name # 23 )*'* $ / format # 23 )*'* $ / timestamp # 23 )*'* $ /

eg+ ,employees/ , companyname / ,id/45,6id/ ,name/%ipro 'echnologies,6name/ ,6 companyname / , employee / ,emp/ ,id/75444,6id/ ,info/ ,name/)ileep,6name/ ,age/89,6age/ ,sex/Male,6sex/ ,1ob/3ro1ect !ngineer,61ob/ ,sal/84444,6sal/ ,6info/ ,6emp/ ,6employee/ ,6employees/ XML "chema 'he XML schema file, li"e the )') file, contains only metadata. In addition to the definition and structure of elements and attributes, an XML schema contains a description of the type of elements and attributes found in the associated XML file. * sample XML :chema file loo"s li"e this+ ,xs+element name;<! =</ ,xs+complex'ype/ ,xs+se>uence/ ,xs+element ref;<! =?ob1ect<6/ ,xs+element ref;< ! .?ob1ect < minOccurs;<4< maxOccurs;<n<6/ ,6xs+se>uence/ ,6xs+complex'ype/ ,6xs+element/ ,xs+element name;<! =?ob1ect</ ,xs+complex'ype/

,xs+se>uence/ ,xs+element ,xs+element ,6xs+se>uence/ ,6xs+complex'ype/ ,6xs+element/ ,xs+element name;<! .?ob1ect</ ,xs+complex'ype/ ,xs+se>uence/ ,xs+element ,xs+element ,6xs+se>uence/ ,6xs+complex'ype/ ,6xs+element/ eg+ ,! =/

name;<number< type;<xs+string<6/ name;<summary< type;<xs+string<6/

name;<number< type;<xs+string<6/ name;<summary< type;<xs+string<6/

,! =?ob1ect/ ,number/4477@,6number/ ,summary/'esting,6summary/ ,6! =?ob1ect/ ,! .?ob1ect/ ,number/44A7@,6number/ ,summary/'est,6summary/ ,6! .?ob1ect/ ,6! =/ #ardinalit in XML: Declaring onl one occurrence of the same element $onl once% ,-!L!M!.' companyname # id, name$ /#Bor )')$ ,xs+element name;<number< type;<xs+string<6/#Bor :chema file$ Declaring minimum one occurrence of the same element $one or more% ,-!L!M!.' employee # emp0 $ /#Bor )')$ ,xs+element name;<number< type;<xs+string< minOccurs;<5< maxOccurs;<unbounded<6/#Bor :chema file$ or ,xs+element name;<number< type;<xs+string< minOccurs;<5< maxOccurs;<n<6/#Bor :chema file$ Declaring &ero or more occurrences of the same element $&ero or more% ,-!L!M!.' employee # empC $ /

,xs+element name;<number< type;<xs+string< minOccurs;<4< maxOccurs;<unbounded<6/#Bor :chema file$ or ,xs+element name;<number< type;<xs+string< minOccurs;<4< maxOccurs;<n<6/#Bor :chema file$ Declaring &ero or one occurrences of the same element $&ero or one% ,-!L!M!.' employee # empD $ / ,xs+element name;<number< type;<xs+string< minOccurs;<4< maxOccurs;<5<6/#Bor :chema file$

XML Entit 'eferences *n entity reference is a group of characters used in text as a substitute for a single specific character that is also a mar"up delimiter in XML. (sing the entity reference prevents a literal character from being mista"en for a mar"up delimiter Bor example, if an attribute must contain a left angle brac"et #,$, you can substitute the entity reference <EltF<. !ntity references always begin with an ampersand #E$ and end with a semicolon #F$. Gou can also substitute a numeric or hexadecimal reference. 'he entities predefined in XML are identified in the following table.
Character & < > " ' Entity reference &amp; &lt; &gt; &quot; &apos; Numeric reference &#38; &#60; &#62; &#34; &#39; Hexadecimal reference &#x26; &#x3C; &#x3E; &#x22; &#x27;

#haracter data: haracter data can be either a 3 )*'* or a )*'* in XML. (#D)!) 3 )*'* means parsed character data. i.e. if we have a character data element declared as 3 )*'* then all characters or text or data inside the xml tags will be parsed by the XML parser. In this type of data, if we place a character li"e <,< or <E< inside an XML element, it will generate an error because the parser interprets it as the start of a new element. Gou cannot write something li"e this <if salary , 5444 then< It will fire an error. 'o avoid this, we have to replace the <,< character with an entity reference, li"e this, <if salary EltF 5444 then< #D)!)

)*'* means character data. i.e. if we have a character data element declared as )*'* then all characters or text or data inside the xml tags will not be parsed by the XML parser. If we text contains a lot of <,< or <E< characters - as program code often does - the XML element can be defined as a )*'* section. Only the characters <,< and <E< are strictly illegal in XML. *postrophes, >uotation mar"s and greater than signs are legal, but it is a good habit to replace them. Metadata from XML, D!D, and XML "chema *iles 3owerMart and 3ower enter can create metadata for a source or target definition from XML, )'), or XML schema files. XML files provide both data and metadata, while )') and XML schema files provide only metadata. 'he )esigner re>uires a lot of memory and resources to parse very large XML files and extract metadata for source or target definitions. 'o ensure that the )esigner creates an XML source or target definition >uic"ly and efficiently, Informatica recommends that you import source or target definitions only from XML files that are no larger than 544H or from )') or XML schema files. If you want to import from a very large XML file that has no )') or XML schema file, decrease the siIe of the XML file by deleting duplicate data elements. Gou do not need all of your data to import an XML source or target definition. Gou need only enough data to accurately show the hierarchy of your XML file and enable the )esigner to create a source or target definition. 'he XML schema file, li"e the )') file, contains only metadata. In addition to the definition and structure of elements and attributes, an XML schema contains a description of the type of elements and attributes found in the associated XML file. 'arget from XML+ Gou can create an XML target definition from an XML, )'), or XML schema file. Gou can also create an XML target definition from an XML source definition or from one or more relational source definitions. =ules for a Jalid Kroup *n XML group is valid when it follows these rules+ *ny element or attribute in an XML file can be included in a group. * group cannot contain two elements with a many-to-many relationship. olumn names in the groups are uni>ue within a source or target definition. Kroup names are uni>ue within a source or target definition.

'he )esigner validates any group you create or modify. %hen you try to create a group that does not follow these constraints, the )esigner returns an error message and does not create the group.

.ote+ If the target definition consists of only one group, then it does not re>uire a primary "ey or a foreign "ey. .ormaliIed Kroups * normaliIed group is a valid group that contains only one multiple-occurring element. In most cases, XML sources contain more than one multiple-occurring element and convert to more than one normaliIed group. 'he following rules apply to normaliIed groups+ * normaliIed group must be a valid group. * normaliIed group cannot contain more than one multiple-occurring element.

)enormaliIed Kroups * denormaliIed group has more than one multiple-occurring element. 'he multiple-occurring elements can have a one-to-many relationship, but not a many-to-many relationship. *ll the elements in a denormaliIed group belong to the same parent chain.

:ource definitions can have denormaliIed groups, but target definitions cannot have denormaliIed groups. )enormaliIed groups, li"e denormaliIed relational tables, generate duplicate data. It can also generate null data. Ma"e sure you filter out any unwanted duplicate or null data before passing data to the target. 'he following rules apply to denormaliIed groups+ * denormaliIed group must be a valid group. * denormaliIed group can contain more than one multiple-occurring element. Multiple-occurring elements in a denormaliIed group must have a one-to-many relationship. )enormaliIed groups can exist in a source definition, but not in a target definition. Kroup Heys and =elationships 'he relationship between elements in the XML hierarchy translates into a combination of primary and foreign "eys that define the relationship between XML groups. If you define a "ey in the XML hierarchy, the )esigner uses it as a primary "ey in a group. 'he )esigner handles group "eys and relationships differently for sources and targets. In a source definition, a group does not have to be related to any other group. * denormaliIed group can be independent of any other group. 'herefore, groups in a source definition do not re>uire primary or foreign "eys. &owever, if a group is related to another group based on the XML hierarchy, and you do not designate any column as a "ey for the group, the )esigner creates a column called the Kenerated 3rimary Hey to hold a "ey for the group.

In a target definition, each group must be related to one other group. 'herefore, each group needs at least one "ey to establish its relationship with another group. If you do not designate any column as a "ey for a group, the )esigner creates a column called Kroup Lin" Hey to hold a "ey for the group. %hen you run a session with a mapping that contains an XML source, the Informatica :erver generates the values for the generated primary "ey columns in the source definition. %hen you run a session with a mapping that contains an XML target, you need to pass the values to the group lin" columns in the target groups from the data in the pipeline. Kroup "eys and relationships follow these rules+ *ny element or attribute can be mar"ed as a "ey. * group can have only one primary "ey. * group can be related to only one other group, and therefore can have only one foreign "ey. * column cannot be mar"ed as both a primary "ey and a foreign "ey. * "ey column can be a column that points to an element in the hierarchy or a column created by the )esigner. * group can have a combination of the two types of "ey columns. * source group does not re>uire a "ey. * target group re>uires at least one "ey. 'he target root group re>uires a primary "ey. It does not re>uire a foreign "ey. * target leaf group re>uires a foreign "ey. It does not re>uire a primary "ey. * foreign "ey always refers to a primary "ey in another group. :elf-referencing "eys are not allowed. * foreign "ey column created by the )esigner always refers to a primary "ey column created by the )esigner. #ode (ages XML files contain an encoding declaration that indicates the code page used in the file. 'he most commonly used code pages in XML are ('B-A and ('B-5@. *ll XML parsers support these two code pages. Bor information on the XML character encoding specification, go to the %L website at http+66www.wLc.org. 3ower enter and 3owerMart support the same set of code pages for XML files that they support for relational databases and other flat files. Gou can use any code page supported by both Informatica and the XML specification. Bor a list of code pages that Informatica supports, see M ode 3agesN in the Installation and onfiguration Kuide. Informatica does not support any user-defined code page. Bor XML source definitions, 3ower enter and 3owerMart use the repository code page. %hen you import a source definition from an XML file, the )esigner displays the code page declared in the file for verification only. It does not use the code page declared in the XML file.

Bor XML target definitions, 3ower enter and 3owerMart use the code page declared in the XML file. If Informatica does not support the declared code page, the )esigner returns an error. Gou cannot import the target definition. XML writer: Jerify the XML environment is set up correctly, such as the environment variables are set properly, the .dll files are in the correct location on %indows or the shared libraries on (.IX, and the supporting .dat files are present. +ow XML sources , targets loo- in .nformatica/ XML :ource+ !ach group in an XML definition is analogous to a relational table, and the )esigner treats each group within the XML :ource Oualifier as a separate source of data. In a mapping, the ports of one group in an XML :ource Oualifier can be part of more than one data flow. &owever, the ports of more than one group in the same XML :ource Oualifier cannot lin" to one transformation or be part of the same data flow. 'his is the biggest drawbac" with XML sources. If you need to use data from two different XML source definitions, you can lin" a group from each source >ualifier and 1oin the data in a Poiner transformation. Gou can also use the same source definition more than once in a mapping. onnect each source definition to a different XML :ource Oualifier and 1oin the groups in a Poiner transformation. 'he following figure shows how we can 1oin two XML groups in the same mapping using a Poiner transformation.

If we need to load data from several groups to the same target based on the granularity itQs always better to divide those mapping to 8 or L mappings E load the data to the target. %hen we create a session to extract data from an XML source we need to configure source properties, such as source file location, in the session properties. )efine the XML source properties on the 3roperties settings on the :ources tab.

XML 'arget+ 'he following figure shows how an XML target loo"s in Informatica )esigner.

%hen you configure a session to load data to an XML target, you define properties on the 'argets tab and the 'ransformations tab of the session properties. Gou can configure the following properties for XML targets+

0utput file options. Gou can configure the directory and file name to which the Informatica :erver writes the target file. #ode page. Gou can define the code page declared in the XML target file. (se the :et Bile 3roperties button to define the code page. Duplicate 1roup 'ow +andling. Gou can configure how the Informatica :erver handles duplicate rows. D!D2"chema 'eference. Gou can specify a )') or an XML schema file name for the XML target. (oints to be ta-en care while using XML as source or target: 'he code page used in the XML6)')6XML :chema file should be a valid one and supported by Informatica. It should be ta"en care while creating the file to match with the same format. Bor eg+ Bor a ('B-A code file, the encoding should be ('B-A itself. It should not be *. I. If we have a )')6XML :chema file associated with the source6target, then the XML data file should exactly match with the )')6XML :chema file. If we have a large no. of data in the XML source or to load huge data to our XML target, then divide it into smaller moduleQs with respect to the business re>uirement. Informatca will not be able to read or write bigger XML files. If we got any changes to the source6target )')6XML schema file, always re-import the source6target again. *lways ma"e sure that the data type and siIe for the imported XML metadata is correct E matching with the re>uirement. Ry default it will ta"e only number E string for all data as data type E siIe as 54. %e need to ma"e sure that whenever we 1oin two groups in the Poiner transformation that we select only the smaller group6set as the Master group. If we have XML as target, we should always ma"e sure that the data sent to the target is matching with the cardinality defined in the target )')6XML :chema file6XML file. If we have XML as source, decide whether groups in the source to be normaliIed or de-normaliIed based on our re>uirement. Rut ma"e sure that the XML sources contain only one multiple-occurring element. XML target never can be de-normaliIed one.

Anda mungkin juga menyukai