Anda di halaman 1dari 166

Kapow Mashup Server 6.

Users Guide

RoboMaker

RoboMaker
Users Guide

Copyright 1999-2007 Kapow Technologies http://www.kapowtech.com All rights reserved.

CONTENTS

iii

Contents
INTRODUCTION ........................................................................................ 1 What is RoboMaker?............................................................................1 Organization ......................................................................................1 Before You Read On ............................................................................1 Other Resources .................................................................................2 ROBOMAKER BASICS ................................................................................. 3 The Robot..........................................................................................3 Objects .............................................................................................3 Robot Libraries and Robot Projects ........................................................4 The Robot State .................................................................................4 Steps ................................................................................................5 Connections and Execution Flow ...........................................................6 Error Handling....................................................................................8 GETTING STARTED .................................................................................. 10 A Tour of the RoboMaker User Interface ............................................... 11 The Robot View ............................................................................. 11 The State View .............................................................................. 12 The Step View ............................................................................... 15 The Objects View ........................................................................... 16 A Tour of the RoboDebugger User Interface .......................................... 17 Robot Navigation and Editing.............................................................. 19 Undoing and Redoing Changes ......................................................... 19 Cutting, Copying, and Pasting .......................................................... 19 Manipulating Steps and Connections ................................................. 19 Step Actions and Data Converters ....................................................... 20 Patterns .......................................................................................... 20 Expressions ..................................................................................... 24 Working with Robot Projects and Robot Libraries ................................... 27 Putting It All Together ....................................................................... 29 TUTORIAL 1: ESSENTIALS ......................................................................... 31 TUTORIAL 2: FORM SUBMISSION ................................................................ 46 HOW TO CONFIGURE A ROBOT ................................................................... 63 HOW TO CONFIGURE THE OBJECTS OF A ROBOT .............................................. 65 HOW TO USE THE TAG FINDERS.................................................................. 67 Understanding Tag Paths ................................................................... 68 How the Tag Finder Works ................................................................. 69

iv

ROBOMAKER USER'S GUIDE Configuring the Tag Finders of the Current Step .................................... 70

HOW TO SUBMIT A FORM .......................................................................... 71 Simple Form Submission.................................................................... 71 Form Basics ..................................................................................... 71 Which Step Action Should I Use?......................................................... 74 Using the Submit Form Action ............................................................ 75 Using the Loop Form Action................................................................ 78 Uploading Files ................................................................................. 82 Using the Pop-up Menu in the Page View .............................................. 83 HOW TO LOOP THROUGH PAGES ................................................................. 84 Pages where First Page Links to All Other Pages .................................... 84 Pages where Each Page Links to Next .................................................. 85 HOW TO EXTRACT CONTENT ...................................................................... 87 Extracting Text................................................................................. 88 Extracting Clips ................................................................................ 89 Extracting Binary Data....................................................................... 90 Using the Pop-up Menu in the Page View .............................................. 90 Performing Common Tasks................................................................. 91 Extracting Only Part of a Text .......................................................... 91 Converting Content ........................................................................ 91 Number and Date Extraction and Formatting...................................... 92 Extracting Only a Subset of the Tags in the Found Tag ........................ 92 HOW TO EXTRACT CONTENT FROM A TABLE ................................................... 93 Content Irregularities ........................................................................ 93 Structure Irregularities ...................................................................... 93 HOW TO CLIP ........................................................................................ 95 What is Clipping? .............................................................................. 95 How Clipping Works .......................................................................... 95 The Structure of a Clipping Robot........................................................ 97 A Simple Clipping Robot .................................................................. 97 A Robot with Multiple Clip Branches .................................................. 98 A Robot with Automatic Navigation Sequences ................................... 98 Creating a Clipping Robot ................................................................ 100 The Portlet View ............................................................................. 101 Working with Clip Branches .............................................................. 104 Adding a New Clip Branch ............................................................. 104 Editing a Clip Branch .................................................................... 107 Using another Clip Branch for a Page .............................................. 108 Using Clip Conditions .................................................................... 110 Modifying Clips ............................................................................... 113 Selecting the Tags to Clip.............................................................. 114

CONTENTS

Changing Layout and Styles .......................................................... 115 Modifying the Pages before Clipping................................................ 118 Working with Windows and Frames ................................................... 120 Selecting the Window to Show in the Portlet .................................... 121 Blocking Popup Windows ............................................................... 121 Handling Login and Single-Sign-On ................................................... 122 Performing Automatic Login........................................................... 122 Performing Automatic Logout......................................................... 125 Supporting other Types of Single-Sign-On ....................................... 126 Adding an Automatic Navigation Sequence ......................................... 127 Other Topics .................................................................................. 128 Restricting the Clipping ................................................................. 128 Clipping Protected Resources ......................................................... 129 Configuring the Clipped User Actions............................................... 130 Passing Additional Information to a Clipping Robot............................ 131 Deploying a Clipping Robot .............................................................. 133 Generating the Clipping Portlet ...................................................... 133 Handling Clipping Sessions on RoboServer....................................... 134 HOW TO HANDLE ERRORS ....................................................................... 135 Handling a Steps Own Errors ........................................................... 135 Handling a Steps Received Errors ..................................................... 137 Using the Until Successful Branch Branching Mode ............................ 138 More Examples of Using Until Successful Branch ............................... 139 Viewing the Error Handling in the Robot View ..................................... 140 HOW TO WRITE A ROBOT WITH INPUT OBJECTS ........................................... 141 HOW TO MAKE ROBOTS MORE ROBUST ...................................................... 143 HOW TO REUSE SESSIONS ...................................................................... 144 HOW TO DEBUG A ROBOT ....................................................................... 146 Basic Debugging............................................................................. 146 Debugging from the Current Location in RoboMaker............................. 148 Making RoboMaker Go to a Location from RoboDebugger...................... 148 Using Breakpoints........................................................................... 149 Single-Stepping.............................................................................. 149 Using Environments ........................................................................ 149 HOW TO USE THE BROWSER TRACER ......................................................... 152 Setting Up a Browser ...................................................................... 152 Tracing ......................................................................................... 152 The Difference View ........................................................................ 153 JavaScript Trace ............................................................................. 153 HTTP Trace .................................................................................... 153 Saving and Loading Trace Sessions ................................................... 154

vi

ROBOMAKER USER'S GUIDE

INDEX................................................................................................ 155

INTRODUCTION

Introduction
What is RoboMaker?
RoboMaker is the application for creating and debugging robots. In RoboMaker, you can create and debug robots of any kind, including data collection robots that extract objects from a web site, and clipping robots that clip a part of an HTML page to be shown in another context, e.g. a portal. RoboMaker is an integrated development environment (IDE) for robots. This means that RoboMaker is all you need for programming robots in an easy-tounderstand programming language with its own syntax (structure) and semantics (meaning). To support you in the construction of robots, RoboMaker provides you with powerful programming features including interactive visual programming, full debugging capabilities, an overview of the program state, and easy access to context-sensitive on-line help.

Organization
The User's Guide is structured as follows: First, you are introduced to the essential concepts of RoboMaker. Then you are taken on a tour of the user interface and provided with an overview of the core building blocks of any robot. With the basics firmly in place, we get to the tutorials that show you how to use RoboMaker to create robots that do something useful. The tutorials get gradually more advanced until, finally, you are ready to create robots that perform tasks that you decide. The tutorials are the meat and bone of this User's Guide and it is critical to its success that you master them before proceeding. The rest of the User's Guide is divided into various topics (How To...). You should skim through them to get an idea of what they cover. Then you can return later when you need more information or help on one of the topics.

Before You Read On


Before you proceed to the next chapter, make sure that you have installed RoboMaker correctly as described in the Installation Guide. It is recommended that you read the Quick Start Guide before reading this guide. Also, before proceeding, make sure that you fulfill the following reader requirements: A basic understanding of what programming is. A basic understanding of HTML. A basic understanding of JavaScript.

ROBOMAKER USER'S GUIDE

Other Resources
Additional, mostly referential documentation on RoboMaker is available in the RoboMaker entry in RoboHelp, which is accessible from the Help menu in RoboMaker. You should also check out the support site at this URL: http://support.kapowtech.com/

ROBOMAKER BASICS

RoboMaker Basics
RoboMaker is a programming environment for programming robots in a special-purpose programming language with its own syntax and semantics. Like other programming environments, RoboMaker uses several concepts that you, as a robot designer, must understand in order to fully comprehend the workings of RoboMaker. It is the purpose of this chapter to introduce the most important of these concepts. Don't worry if on your first reading of this chapter you don't understand all the RoboMaker concepts described below; they will be become clearer to you as you explore RoboMaker and start to write robots. However, it is recommended that you refer back to this chapter whenever you encounter a concept whose meaning you do not understand.

The Robot
The most important concept in RoboMaker is robot. A robot is a program designed to accomplish some task, usually involving a web site. Typically, one robot is written per task per web site. For example, you would create one robot for extracting news from cnn.com, and another robot for extracting product information from an online product catalogue. In RoboMaker, you create one robot at a time. Basically, a robot can be programmed to do (automatically) everything you can do in a browser, such as Internet Explorer.

Objects
A robot outputs objects. An object is a collection of attributes. An attribute has an attribute name and may contain a single attribute value. For example, a robot that extracts news from some web site will output news objects; each news object has attributes with attribute names such as headline, body text, date, author, etc., and each outputted news object will have different attribute values for each attribute (unless, of course, the same news object is outputted more than once!). An outputted object is called a returned object. Some robots accept input objects as input. The input objects are a collection of objects that the robot can use in performing its task. For example, a shopping robot that orders books at amazon.com might accept input objects containing a user information object and a book object.

ROBOMAKER USER'S GUIDE

Input Input Objects (optional)

Output

Robot

Output Objects

Figure 1: Robot Input-Output

All objects are designed in the ModelMaker application and can be imported into RoboMaker. Objects are part of a domain model. See the ModelMaker Users Guide for more information on designing objects and domain models using ModelMaker.

Robot Libraries and Robot Projects


Robots are organized in robot libraries. A robot library is a collection of robot files and other files needed by the robots, such as domain model files containing the objects used by the robots. A robot library serves as the deployment unit for robots. This means that a robot library is how you bundle robots and their required files when you want to distribute and deploy the robots in a runtime environment, such as RoboServer. When you are working in RoboMaker or one of the other development applications in Kapow Mashup Server, you are working on a robot project. The purpose of a robot project is to develop a robot library. A robot project contains the robot library that you are developing, as well as other files that are useful for your work on the robot library but should not be part of the robot library itself. Thus, a robot project is what you work on when you are developing robots, and a robot library is how you distribute and deploy your work.

The Robot State


When a robot is executed, it works on a robot state. The robot state consists mainly of four elements:

windows objects cookies authentications

The windows element is the currently open windows, each containing a web page or part of a web page. At least one window is always open, and one window is marked as the current window. The objects element contains the current values of the objects. The cookies and authentications elements are

ROBOMAKER BASICS

the HTTP cookies and authentications, respectively, received during communication with a web server.

Steps
A robot is made up of steps. A step is a building block in a robot program.

Figure 2: A Step

A step accepts a robot state as input and, depending on the step configuration, outputs zero or more robot states. A step consists of several elements, including a step name, a list of tag finders and a step action. The step name provides a symbolic name for the step, such as Extract Headline and Load Search Page. In Figure 2 above, the step name is MyStep. The tag finders find the tags in the page that the step action (see below) should work on. Some step actions require a single tag, whereas others can handle more than one tag. Some step actions accept no tags at all. The step action is the action that the step performs. For example, an Extract action might extract the text from a tag and store it in an object attribute. And a Click action might load the URL residing in an <a>-tag and replace the page of the current window in the robot state with the newly loaded web page. An action usually changes the robot state. For example, the Extract action changes the objects, and the Click action changes the pages/windows, the cookies and the authentications. The action is the heart and brain of the step, and it is the selection of the right step action that is the challenge of robot writing. Some actions are termed loop actions. A loop action outputs zero or more robot states to the step that follows it. For example, a For Each Tag action looping through the <tr>-tags in a <tbody>-tag (inside a <table>tag) will output one robot state for each <tr>-tag in the <tbody>-tag; if the <tbody>-tag has eight <tr>-tags, then the For Each Tag action will generate and output eight robot states. When a loop action is outputting its Nth robot state, its current iteration is said to be N. The step shown in Figure 2 contains a loop action and the current iteration is 3. Some actions use data converters for converting data, e.g. converting text to a number or uppercasing it. A step can be executed. A step that is executed accepts a robot state as input and, by applying the tag finders and step action in turn, produces zero or more robot states as output. A step is valid if it has been properly configured so that execution can be attempted. For example, if a step has no action, it is invalid since execution cannot be attempted.

ROBOMAKER USER'S GUIDE

All robots containing at least one step have a first step. The first step is the step that first gets executed when the robot is executed.

Connections and Execution Flow


Steps can be connected to other steps via connections that affect how execution flows between steps. Consider the simple robot below:

This robot consists of three steps named step A, step B, and step C. Assuming that no errors occur, and that each step generates exactly one output robot state, then this robot is executed as follows: An initial robot state will be generated and inputted to step A (being the first step). Step A will produce an output robot state. This output robot state will be the input robot state to step B. Similarly, step B will produce a robot state and this will be the input robot state to step C. Once step C has executed and outputted a robot state, execution completes. In short, the execution of steps can be described as follows: A, B, C. Sometimes, a step generates no output robot state when executed. This is quite normal for steps containing a conditional action that is, an action that analyzes the input robot state and only outputs a robot state if the input robot state satisfies certain conditions. In the simple robot above, if step B outputs no robot state, then the execution of steps will be as follows: A, B. Note that step C will not get executed. The general rule is: If a step outputs no robot state, then execution will not proceed beyond that step. Other steps, namely those containing a loop action, might output more than one robot state. Consider the robot below where step B contains a loop action:

Assuming there are no errors, that step B outputs three robot states, and that all other steps output exactly one robot state, then the steps will be executed in the following order: A, B[1], C, D, B[2], C, D, B[3], C, D, where B[N] refers to the Nth iteration of the loop action contained in step B. Note that the robot states outputted by step B will be new robot states that is, each iteration will output a new robot state. Hence, step C will receive a new input robot state each time it is executed.

ROBOMAKER BASICS

A step can connect to more than one step. This is called branching. Consider the robot below:

In this robot, step A has two branches, one consisting of step B and step C, and another consisting of step D and step E. How branches are executed depends on the branching mode that has been selected for the step that has the branches (in this case step A). With the default branching mode, which is called All Branches, all of the branches are executed, one after another. Therefore, assuming that no errors occur and that each step generates exactly one output robot state, then the robot above will be executed as follows: A, B, C, D, E. However, it is important to note that step B and step D will each receive a copy of the same robot state outputted by step A. Were it not for the fact the some steps might have external effects, branches could in principle be executed in parallel. Sometimes you want to select (i.e. execute) only one of several branches. One way to handle this is to let the first step in each branch contain a conditional action. In the robot above, step B and step D could each contain a conditional action, each configured so that they collectively ensure that either step C or step E gets executed, but not both. Branches can be, and often are, mutually intertwined. Consider the following robot:

This robot illustrates how connections are ordered. Unless otherwise noted, connections are executed top-down. In this robot, however, the branches of step D are executed in the order specified by the numbers, that is, step E is executed before step C. Assuming no errors occur and that each step generates exactly one output robot state, then the robot is executed as follows: A, B, C, D, E, C. The first time step C is executed it will receive the robot state outputted by step B; the second time step C is executed it will receive the robot state outputted by step D.

ROBOMAKER USER'S GUIDE

Error Handling
Steps generate an error if they fail to process the input state. For example, the tag finders might fail to find the tag due to a dramatic page layout change. A step that generates an error is said to fail. For the purposes of this section we will assume that a step that generates an error will report it immediately. Several other error handling possibilities exist; see the How to Handle Errors chapter for more information. Consider the simple robot below and assume that each step expects to output exactly one robot state:

If step A reports an error, then the execution of steps is as follows: A. If step B reports an error, then the execution of steps is as follows: A, B. If step C reports an error, then the execution of steps is as follows: A, B, C. The general rule is: If a step generates an error, then the steps beyond that step will not be executed. Error handling is affected by loop actions. Consider the robot below and assume that step B expects to output three robot states, and all other steps expect to output exactly one robot state:

If step C reports an error the second time it is executed, then the execution of steps is as follows: A, B[1], C, D, B[2], C, B[3], C, D. Note that the error causes the loop action to go to the next iteration. In another situation, if step B reports an error when generating its second output robot state, then the execution of steps is: A, B[1], C, D, B[2]. Note that the error causes step B to fail completely, and, hence, execution does not go to the next iteration. The general rule is: If a step containing a loop action fails, then it fails completely like any other step, i.e. execution does not proceed beyond the step. Error handling is also affected by branching. Consider the robot below and assume that each step expects to output exactly one robot state:

ROBOMAKER BASICS

If step B reports an error, then the execution of steps is as follows: A, B, D, E. Note that execution will proceed to each branch regardless of whether a step on a previously executed branch has reported an error. When an error is reported then an error report is generated. An error report contains a message briefly describing the error, and a location and location code for the step that reported the error. The location of the step that reported the error is the list of steps (including iteration numbers) one needs to execute in order to reach that step from the first step. Consider the robot below:

If step C reports an error on the second iteration of step B, then the location is written as: step A - step B[2] - step C. Note that the location contains the step names and iteration numbers, separated by hyphens. The location code is similar to the location, but the name of each step is replaced by a unique identifier for that step, thereby avoiding name clashes. For the location example above, the location code may be: <0>.<1>[2].<2>. You can use the location code in RoboMaker to go directly to the step that reported the error.

10

ROBOMAKER USER'S GUIDE

Getting Started
This chapter gets you started with RoboMaker. It introduces you to the RoboMaker user interface, including selected menus and functions, and to the RoboDebugger sub-application. Then, the core building blocks of any robot, namely the step actions and data converters, are described. Finally, there are sections on patterns and expressions. When reading this chapter, it is recommended that you startup RoboMaker and explore the RoboMaker user interface as you follow the tour. Note, however, that the tour explores the RoboMaker user interface as it appears at startup if you do not create or load a robot. This means that the user interface will be quite empty, and many functions, such as debugging, will not make much sense. Don't worry about this; there will be plenty of opportunities to see many of these functions in action when you start on the tutorials following this chapter.

GETTING STARTED

11

A Tour of the RoboMaker User Interface


When you start RoboMaker and cancel the welcome screen, the RoboMaker Main Window appears, as shown in Figure 3 below.

Robot View

State View

Step View

Objects View

Figure 3: The RoboMaker Main Window

At the top, you see the menu bar and the toolbar. Now, let us go through each of the views in the RoboMaker Main Window.

The Robot View


The Robot View is located just below the toolbar icons in the RoboMaker Main Window. In the Robot View, you view the robot program that is, the steps and connections that make up the robot. The robot can be viewed in compact mode and in expanded mode. You can switch between the two modes by clicking the double-arrows ( ) in the right part of the robot view or by dragging the divider (the line immediately below the robot view) up or down. In compact mode, you can only view one branch for each step at a time. You can, however, select the branch to view by clicking the small up and down arrows appearing next to steps with more than one branch. You can select steps and connections in the Robot View by holding down the Ctrl key and clicking the steps or connections. When steps or connections are

12

ROBOMAKER USER'S GUIDE

selected, you can apply actions to them. For example, you can insert a new connection by first selecting the step that the connection should start at, then icon in the toolbar. the step that it should end at, and finally clicking the You can also right-click on a step or connection to bring up a pop-up menu. To deselect the currently selected steps and connections without applying any action, click outside of the robot. Invalid steps are underlined in red, and if you move the mouse to an invalid step, an explanation of why the step is invalid is shown.

The State View


The State View is located below the Robot View to the left in the RoboMaker Main Window. In the State View, you view the current robot state that is, the robot state that is input to the current step (marked with light green in the robot view). You can view different elements of the robot state by clicking one of the tabs: Windows, Cookies, and Authentications. In the Windows tab, you see the Page Views of the windows in the current robot state. When loading from a URL, several windows may be opened, each containing a page. The current window is marked with a blue arrow. For each window, the Page View is split into several sub views, as shown in Figure 4 below: The Tag Path View, the Browser View, the Tree View, and the Source View.

GETTING STARTED

13

Tag Path View

Browser View

Tree View Source View

Figure 4: The Page View of a Window

In the Tag Path View, you see the path from the root tag of the page to the selected tag. In the Browser View, you see the page as it appears in a browser. This view has two modes, selected by clicking the icons in the lower left of the Page View. In Normal Browser View mode ( ), you see the page exactly as it appears in a browser. In Boxed Browser View mode ( ), you see the page as it appears in a browser, but with colored boxes around specific tags, for example <table>- and <form>-tags. In the Source View, you see the HTML source of the page. The Source View has three modes that can be selected by clicking the icons to the right of the Browser View mode icons. In Normal Source View mode ( ), you see the plain HTML. In Colored Source View mode ( ), you see the HTML with color-highlighting. In JavaScript Source View mode ( ), you see the HTML with JavaScript highlighted. In all three modes, you can choose whether to show line numbers in JavaScript using the Line Numbers checkbox in the lower right corner of the Page View.

14

ROBOMAKER USER'S GUIDE

You can select a tag in the Page View by left-clicking in any of the views. The currently selected tag is shown with a green, dashed box in the Browser View and the Source View, and with a green background in the Tag Path View and the Tree View. You can hold down Alt while clicking inside the currently selected tag to move the selection one level out, i.e. select the tag that encloses the selected tag. You can also hold down Alt and Shift while clicking inside the selected tag. This will move the selection one level in towards the tag that you clicked. You can also change the current selection using the buttons to the right of the and icons move the selection one level out or in. Tag Path View. The icon selects the icon selects the root tag of the page, while the The innermost tag inside the selected tag. The and icons select the tag above or below the selected tag. You can also search for a tag by clicking the icon. The Page View also shows the tags found by the Tag Finders of the current step. These tags are called found tags and will be shown with a red, dashed box in the Browser View and the Source View, and with a red background in the Tag Path View and the Tree View. If you edit the Tag Finders, you can click icon to show the new tags found. You can also configure the Tag the icon, to use Finders to use only the currently selected tag by clicking the the currently selected tag as well as any other tags found by clicking the icon, or to not icon, to not use the currently selected tag by clicking the icon. use any tags at all by clicking the Furthermore, the Page View shows the current tags. Current tags are marker tags that are used as reference when finding other tags. Current tags can be set by step actions for example, some loop actions use a current tag to mark the result of the current iteration of the loop. You can also set current tags manually. Current tags are shown with a blue, dashed box in the Browser View and the Source View, and with a blue background in the Tag Path View and the Tree View. In all views of the Page View, you can right-click a tag to open a pop-up menu that allows you to configure the current step. This is very useful and will probably become your preferred way of configuring the current step. From the menu, you can choose Use only this Tag or Use this Tag to configure the Tag Finders to find the tag that you clicked. You can also choose an action such as Enter Text from the menu. This will configure the current step to use the corresponding action, in this case Enter Text, on the tag that you clicked. You can copy text from the Tag Path View or the Source View by holding down Ctrl while selecting the text with the mouse. You can also copy the HTML text icon. of the selected tag by clicking the In the Cookies tab, you see the Cookies View. Here, you see the cookies in the current robot state. Cookies are added to this list as the robot loads web pages that use cookies.

GETTING STARTED

15

Similarly, in the Authentications tab, you see the Authentications View.

The Step View


The Step View is located to the right of the State View. The Step View is separated from the State View using a divider, which you can drag either left or right to make more room for one of the views. The Step View shows the configuration of the current step. You can view and edit the different elements of the step by clicking one of the tabs: Basic, Tag Finders, Action, and Error Handling. In the Basic tab, you find the name of the step, as well as the comment attached to it. Steps that have an attached comment are marked with an icon in the Robot View. If you rest the mouse pointer on a step, the comment will be displayed as rollover text. In the Tag Finders tab, you can view and configure the list of tag finders of the step. You normally configure the tag finders by right-clicking the tag in the Page View. See the How to Use the Tag Finders chapter for more information. In the Action tab, you can select and view the action for the step. If you want help on choosing the step action, click the Guide... button. This will open the Step Action Selection Guide, which allows you to browse the available step actions by categorysee Figure 5.

16

ROBOMAKER USER'S GUIDE

Figure 5: The Step Action Selection Guide

When you have selected an action in the Step View, it will be displayed immediately below the action selection box. For a description of the actions available, see the Step Actions section below. In the Error Handling tab in the Step Window, you can see how the current step handles errors both its own errors and received errors. You can also select the branching mode for the step. See the How to Handle Errors chapter for more information.

The Objects View


The Objects View is located below the Step View. The Objects View is separated from the Step View using a divider, which you can drag either up or down to make more room for one of the views. The view shows the objects of the current step. The view has two tabs: Input Objects, and Output Objects. In each tab, the left part of the view shows a

GETTING STARTED

17

list of the particular objects. When you select an object in this list, that particular object is shown in the right part of the view. In the Input Objects tab, you find the list of the input objects of the robot. You can add, remove, or rearrange input objects by pressing the Add/Remove button. The view shows either the input values or the values at the step. The input values are used when writing and testing the robot. The input values can be edited and you can apply them by pressing the Apply button. When a robot is run on RoboServer, it will get the input values from the client. The values at the step are the values of the input objects at the current step, and these cannot be edited. In the Output Objects tab, you find the list of the output objects of the robot. You can add, remove, or rearrange output objects by pressing the Add/Remove button. The view shows either the initial values or the values at the step. The initial values are the values that the object attributes will have at the start of the robot, i.e. at the first step. The initial values can be edited and you can apply them by pressing the Apply button. The values at the step are the values of the output objects at the current step, and these cannot be edited.

A Tour of the RoboDebugger User Interface


RoboMaker contains a sub-application, RoboDebugger, for debugging robots. You can open RoboDebugger by clicking the icon in the toolbar in RoboMaker Main Window. Alternatively, if you want to debug from the current icon. When you open RoboDebugger, step in RoboMaker, you can click the the RoboDebugger Main Window appears, as shown in Figure 6 below.

18

ROBOMAKER USER'S GUIDE

Figure 6: The RoboDebugger Main Window

Below the menu bar and toolbar is the Robot View, similar to the one in the RoboMaker Main Window. Note, however, that the Robot View in RoboDebugger has a current step only when you are actually debugging the robot. This current step is not the same as the current step in the Robot View in the RoboMaker Main Window. The current step in RoboDebugger is the step currently being debugged. Below the Robot View is a large panel divided into three sub-panels, the main panel and two panels named Summary and Stop When. In the main panel, you see the results of the debugging process divided into four tabs. In the Input/Output tab, you see the input objects, if any, and a list of all returned objects so far during the debugging process. In the Error Reports tab, you see a list of the error reports generated so far during the debugging process. In the Log tab, you can see whatever has been written to the log so far during the debugging process. (Some actions, particularly those that take a while to execute, such as the Loop Form action, write status information to this log.) Whenever the debugging process has been temporarily stopped, the State tab shows the robot state that is input to the current step. The State tab contains five sub-tabs. The Objects sub-tab shows the list of objects. The Windows, Cookies, and Authentications sub-tabs show the state, in much the same way as the State View in RoboMaker. The Error Report sub-tab contains the error report generated at

GETTING STARTED

19

the current step, if any. For all error reports, you can click the Goto button to go directly to the step that generated the error that is, the step that generated the error report will become the current step in RoboMaker. In the Summary panel, you see an overview of the number of returned objects and generated error reports so far during the debugging process. In the Stop When panel, you can specify the criteria for when the debugging process should temporarily stop (besides ending normally). For more on using RoboDebugger, see the How to Debug a Robot chapter.

Robot Navigation and Editing


This section gives general hints related to navigating and editing robots in RoboMaker.

Undoing and Redoing Changes


In RoboMaker, you can undo a change by pressing Ctrl-Z or clicking the icon. Similarly, an undone change can be redone by pressing Ctrl-Y or clicking the icon.

Cutting, Copying, and Pasting


In RoboMaker, most items (e.g. steps, data converters and tag finders) can be cut, copied, and pasted using the shortcuts Ctrl-X, Ctrl-C, and Ctrl-V, respectively. In most lists, e.g. the list of tag finders for a step, all items can be copied using the shortcut Ctrl-Shift-C.

Manipulating Steps and Connections


If you left-click a step, the robot is executed up to that step, if possible, such that the step becomes the current step. However, any step in a robot can be right-clicked. When right-clicking a step, you can configure it by selecting Configure Step..., which opens a Step Configuration dialog. This gives you the same options as the Step View. You can select more than one step or connection by holding down the Ctrl key and left-clicking the steps or connections. You can also hold down Ctrl and mark an area of steps or connections that should be selected. A step can be cut, copied, and pasted (before or after a selected step). You can also move a selection of one or more steps by dragging and dropping it. You can make connections between two steps by placing the mouse cursor to the right of the step (where the connection should start). Then, a white arrowhead appears. You can now left-click the arrowhead and drag it to the step where the connection should end.

20

ROBOMAKER USER'S GUIDE

Step Actions and Data Converters


This section contains general information about the step actions and data converters available in RoboMaker. In RoboMaker, a short description will be shown together with each action and data converter, and you can get more information about the action or data converter by clicking the More... button associated with the description. Moreover, in the RoboMaker Help menu, you can find a help entry on every step action and data converter. Several of the actions, e.g. Extract, include the possibility of running some text content through a list of data converters and then storing the result in some attribute. As mentioned earlier, a data converter processes some text, e.g. the Extract Number data converter accepts an input text containing a number in some format and outputs a text containing the same number in a standardized format. Because a data converter takes a text as input and outputs another text, data converters can be chained so that the output of one data converter becomes the input to the next data converter. The final output is the text outputted by the last data converter in the list of data converters. For example, if the list of data converters consists of a Convert to Upper Case data converter followed by a Remove Spaces data converter, and the input text to the list is "R oboMa ker", then the output text will be "ROBOMAKER".

Patterns
A pattern is a way of describing a text. For example, the text "32" can be described as a text containing two digits. However, other texts also contain two digits, e.g. "12" and "00". We say that these texts match the pattern. (RoboMaker patterns follow the Perl5 syntax.) A pattern is composed of normal characters and special symbols. Each special symbol carries its own special meaning. For example, the special symbol "." (dot) means any single character and matches all single characters, e.g. "a", "b", "1", "2", ... Figure 7 below provides an overview of the most commonly used special symbols. For a complete overview of all the special symbols available, see the RoboHelp entry on Patterns.

GETTING STARTED

21

Special symbol . \d \D \s \S \w \W

Meaning

Any single character, e.g. "a", "1", "/", "?", ".", etc. Any decimal digit, e.g. "0", "1", ..., "9". Any non-digit, i.e. same as ".", but excluding "0", "1", ..., "9". Any white space character, e.g. " " and line break. Any non-white space character, i.e. same as ".", but excluding white space (such as " " and line break). Any word (alphanumeric) character, e.g. "a", ..., "z", "A", ..., "Z", "0", ..., "9". Any non-word (alphanumeric) character, i.e. same as ".", but excluding "a", ..., "z", "A", ..., "Z", "0", ..., "9".

Figure 7: The Most Commonly Used Pattern Special Symbols Example: The pattern ".an" matches all texts of length three ending with "an", e.g. "can" and "man" but not "mcan". Example: The pattern "\d\d\s\d\d" matches all texts of length five starting with two digits followed by a white space and ending with two digits, e.g. "01 23" and "72 13" but not "01 2s".

If you want a special character, such as "." or "\", to act as a normal character, you can escape it by adding a "\" (backslash) in front of it. So, if you wish to match exactly the "." character, instead of any single character, you should write "\.".
Example: The pattern "m\.n\\o" only matches the text "m.n\o".

You can organize a pattern into subpatterns by the use of parentheses: "(" and ")".
Example: The pattern "abc" can be organized as "(a)(bc)".

All single characters are considered subpatterns.


Example: In the pattern "abc", each single character "a", "b", and "c" is considered a subpattern.

22

ROBOMAKER USER'S GUIDE

Subpatterns are useful when applying pattern operators. Figure 8 below provides an overview of the pattern operators available.
Operator ? * + {m} {m,n} {m,} a|b Meaning

Matches the preceding subpattern, or the empty text. Matches any number of repetitions of the preceding subpattern, or the empty text. Matches one or more repetitions of the preceding subpattern. Matches exactly m repetitions of the preceding subpattern. Matches between m and n repetitions (inclusive) of the preceding subpattern. Matches m or more repetitions of the preceding subpattern. Matches whatever the expression a would match, or whatever the expression b would match.
Figure 8: The Pattern Operators

Example: ".*" matches any text, e.g. "RoboMaker", "1213" and "" (the empty text). Example: "(abc)*" matches any number of repetitions of the text "abc", e.g. "", "abc", "abcabc", and "abcabcabc", but not "abca". Example: "(\d\d){1,2}" matches either two or four digits, e.g. "12" and "6789", but not "123". Example: "(Robo)?Maker" matches "RoboMaker" and "Maker". Example: "(Robo)|(Maker)" matches "Robo" and "Maker".

As with other special characters, you can escape the special characters that appear in pattern operators by adding a \ backslash in front of the character. Subpatterns are useful when you want to extract specific text pieces from a text. When you make a subpattern using parentheses, you can extract the part of the text that is matched by that subpattern. For example, consider the pattern "abc (.*) def (.*) ghi". This pattern has two subpatterns that are made by means of parentheses. If the pattern is matched against the text "abc 123 def 456 ghi", the first of those subpatterns will match the text "123", and the second subpattern will match the text "456". In an expression (see the section named Expressions), you can refer to these subpattern matches by writing "$1" and "$2". For example, the expression "X" + $1 + "Y"+ $2 + "Z" will

GETTING STARTED

23

produce the result "X123Y456Z". This is a very important extraction technique in RoboMaker. By default, the repetition pattern operators (*, +, {...}) will match as many repetitions of the preceding pattern as possible. You can put a "?" after the operator to turn it into an operator that matches as few repetitions as possible. For example, consider the pattern ".*(\d\d\d).*". If the pattern is matched against the text "abc 123 def 456 ghi", the subpattern "(\d\d\d)" will match the second number in the text ("456"), since the first *-operator will match as many repetitions as possible. If you put a "?" after the *-operator, so that the pattern becomes ".*?(\d\d\d).*", the subpattern "(\d\d\d)" will match the first number in the text ("123"), since the *?-operator will match as few repetitions as possible. It is recommended that you experiment with patterns on your own. The best way to do this is to launch RoboMaker and find a place where you can enter a pattern, such as in the Test Tag action. Then, click the Edit... button to the right of the pattern field, to open the Pattern Editor Window, shown in Figure 9 below.

Figure 9: The Pattern Editor Window

In the Pattern Editor Window, you can enter a pattern and test whether it matches the test input text in the Input panel. When you open the window, RoboMaker will usually have set the test input text to the text that the pattern will be matched against if the given step is executed on the current input robot state. However, you can also edit the test input text yourself, to try the

24

ROBOMAKER USER'S GUIDE

pattern on other inputs. To test the pattern, click the Test button. The result of the matching will then be shown in the Output panel. The Symbol button is very useful when you want to enter a special symbol in the pattern. When you click it, a pop-up menu will be shown, from which you can choose the symbol to insert in the pattern. This way, you dont have to memorize all the special symbols and their meanings. For more on patterns, consult the RoboHelp entry on patterns.

Expressions
An expression evaluates to a text.
Example: The expression "a" + "b" evaluates to the text "ab".

An expression is composed of one or more sub-expressions, each separated by a "+" (plus). A sub-expression evaluates to a text. An expression is evaluated by adding together (concatenating) the sub-expressions, one-byone from left to right. Figure 10 below provides an overview of the most commonly used subexpression types. For a complete overview of all sub-expression types available, see the RoboHelp entry on expressions.
SubExpression Type Text Constant Notation Meaning

"text" or >>text<< object.attribute

Evaluates to the specified text, e.g. "Stephen King", or >>Stephen King<<. Evaluates to the value of the specified attribute, e.g. Book.author might evaluate to "Stephen King". Evaluates to the URL of the current page. Evaluates to the text matched by subpattern n in an associated pattern (if any). For example, this is used in the Advanced Extract data converter, as shown below. $0 evaluates to the text matched by the entire pattern. Evaluates the specified function by passing it the specified arguments and converting its result to a text.

Attribute Value

Current URL Subpattern Match

URL $n

Function

func(args)

GETTING STARTED

25

Figure 10: The Most Commonly Used Sub-Expression Types Example: The expression "The author of the book " + Book.title + " is " + Book.author + "." evaluates to the text "The author of the book Pet Semetary is Stephen King.", if the attributes title and author in the Book object contain the texts "Pet Semetary" and "Stephen King", respectively.

Note that you can specify a text constant using either the quote notation or the >>text<< notation, for example "Stephen King" or >>Stephen King<<. If you use the quote notation, and you want a quote character to appear inside the text, you have to write it as two quote characters. For example, write "This is some ""quoted"" text" to get the text "This is some "quoted" text". If you use the >>text<< notation, anything can appear inside the text, except ">>" and "<<". Thus, you can write quotes directly, as in >>This is some "quoted" text<<. The >>text<< notation is useful for long texts that contain many quote characters, such as HTML. Figure 11 shows the most commonly used functions in expressions. For a complete overview of all functions available, see the RoboHelp entry on expressions.
Function eval(arg) round(arg) Meaning

Evaluates to the numeric expression specified by the argument. Evaluates to the nearest integer of the specified argument number.

Figure 11: The Most Commonly Used Sub-Expression Functions Example: The expression "3+4 equals " + eval(3+4) + "." evaluates to the text "3+4 equals 7.".

It is recommended that you experiment with expressions on your own. The best way to experiment with expressions is to launch RoboMaker, select the Extract action for the current step, and then add an Advanced Extract data icon to configure the data converter. This opens the converter. Click the Advanced Extract Configuration Window shown in Figure 12 below.

26

ROBOMAKER USER'S GUIDE

Figure 12: The Advanced Extract Configuration Window

In the example shown, note the use of the $n notation to extract parts of the input text. Try to type your own input text into the text area to the left of the Test button, your own pattern into the Pattern property, and your own expression into the Output Expression property. Then hit Test to view the text that the expression evaluates to in the text area to the right of the Test button. Also, try to click the Edit... button to the right of the expression field. This opens the Expression Editor Window shown in Figure 13 below.

GETTING STARTED

27

Figure 13: The Expression Editor Window

In the Expression Editor Window, you can enter an expression and test what it evaluates to. If the expression is associated with a pattern, as in the Advanced Extract data converter, the result of matching the pattern against the current input text will be shown in the Input panel. You can see whether the pattern matches, and if so, what subpattern matches your expression can refer to using the $n notation. Note that the testing functionality is not available everywhere in RoboMaker. Click the Expression button to open a useful pop-up menu, from which you can choose among the available sub-expression types and functions. This way, you dont have to memorize all of these. For more on expressions, consult the RoboHelp entry on expressions.

Working with Robot Projects and Robot Libraries


When you are working in RoboMaker or the other development applications in Kapow Mashup Server, such as ModelMaker, you are always working on a specific robot project. The purpose of a robot project is to develop a robot library containing a collection of robots and the files required by these robots. Typically, you create a robot project for each separate usage of robots, for

28

ROBOMAKER USER'S GUIDE

example one robot project for each application in your company that uses robots. A robot project is a folder located anywhere in the file system. The project folder can have any name you want, but must contain the following subfolder:
Library this folder is the robot library in the project

In the Library folder, you should place all robot files, domain model files (containing objects used by the robots), and other files used by the robots, such as files that are loaded from the robot library. You can organize the files in the Library folder in any way you want, using sub-folders as appropriate. Figure 14 shows an example of the contents of a project folder named NewsAndStocksProject for a project that develops a robot library for extracting news from news sites and stock quotes from stock sites.
NewsAndStockProject/ Library/ News/ CNN.robot Reuters.robot News.model Stocks/ Nasdaq.robot NYSE.robot Stocks.model Figure 14: An Example of a Project for News Extraction

As you can see, the project has a Library folder with robot and domain model files divided into News and Stocks sub-folders. In RoboMaker and the other development applications in Kapow Mashup Server, you are always working on a specific project, referred to as the current robot project. When you install Kapow Mashup Server, a default project will be created for you in the installation folder. To create a new project, you simply create a project folder located anywhere you want and containing a Library folder, as well as other sub-folders that you want. To switch to your new project, you must change the current robot project. This is done by opening the Settings application, specifying the path to your new project folder in the Current Project Folder property in the Project tab, and then clicking OK to close Settings. After this, you need to restart all applications that you have running to make them work on the new project. You can switch back and forth between projects any time you want, but remember to restart the applications each time. When you want to distribute and deploy your robot library in a runtime environment, such as RoboServer, you can pack the robot library into a single file called a robot library file. You can do this by choosing Create Robot Library File from the Tools menu in RoboMaker or ModelMaker. This will pack

GETTING STARTED

29

together all files contained in the robot library of the current robot project and save the result as a single file having a name that you choose. Note that you should save all your open files, such as robots and domain models, before doing this, to get the latest changes into the robot library file. You can then make the robot library file available to RoboServer and execute robots from the robot library. See the RoboServer Users Guide for more information on how to do this. As mentioned, a robot library may contain files used by the robots. You can load a page from a file in the robot library that the robot belongs to. This is done using the special non-standard protocol named library. For example, if the file MyPage.html is located in the folder MyFolder in the robot library folder, you can load from that file using this URL:
library:/MyFolder/MyPage.html

This will work no matter whether the robot library is represented as a folder or has been packed into a robot library file.

Putting It All Together


By now, you have been introduced to the major concepts in RoboMaker, you have taken a tour of the RoboMaker user interface, you know the concepts of step actions and data converters, and you have a feel on what patterns and expressions are. Let us now put all this RoboMaker-knowledge into use and make some robots. However, first we need an overview of how robots are normally built that is, their structure. Robots mimic human behavior that is, they do (more or less) what you do when you are looking for content on the Internet using a browser: You start by searching for the content. Once found, you read and process it. Similarly, most robots can be roughly divided into two parts: A navigation part and an extraction part.
Navigation is concerned with getting to where the content is. Navigation mainly includes loading pages and submitting forms. When navigating in RoboMaker, you typically use the Click action to navigate to and between web pages. Extraction is concerned with getting the right content. Extraction mainly includes selecting, copying, and normalizing content from a web page that you have navigated to. When extracting in RoboMaker, you typically use the Test Tag action to skip uninteresting (noisy) content, the Extract action to copy content into object attributes, and the data converters for normalizing the content so that it gets the format you want, e.g. the right date and number format. Once extracted, you return (output) the object with the Return Object action.

So, the typical robot starts with one or more steps, each containing a Click (or Load Page) action, in order navigate to the interesting content on some web

30

ROBOMAKER USER'S GUIDE

site. It proceeds with one or more steps, each containing an Extract action, and ends with a step having a Return Object action that returns the extracted object. Note that in many robots the navigation and extraction parts overlap because the content to extract is located on several pages. Again, this is similar to when you look for content yourself; often, you have to visit several pages to get the content you want. Most robots include other actions than the ones mentioned above, e.g. a For Each Tag action for loading several similar looking pages or extracting attributes from several similar looking table rows. Because robots have different tasks, they have different needs. For this reason, we have included a considerable number of step actions and data converters in RoboMaker. Start with familiarizing yourself with the basic and most commonly used step actions and data converters, and then begin to explore. Experience shows that one can create most robots using only a handful of step actions and data converters. So, find your own favorite step actions and data converters and stick to them until you feel a need to explore others. You are now ready for the first tutorial, where you will learn how to write your first robot.

TUTORIAL 1: ESSENTIALS

31

Tutorial 1: Essentials
In this tutorial, you will write your first robot. You will learn how to:

navigate using the Click action, loop through tags using the For Each Tag action, extract attribute values using the Extract action, test content using the Test Tag action, return objects using the Return Object action, debug your robot with RoboDebugger, and plenty of other things to get you started using RoboMaker!

The robot you will create in this tutorial will navigate to a page containing a table, extract the person data contained in that table, and output several PersonOutput objects. Before proceeding, we recommend that you open your favorite browser, and navigate to http://www.kapowtech.com/tutorial/case1/index.html to take a look at the pages involved in this tutorial. Let us begin by starting up RoboMaker and selecting Create a new robot..., which starts the New Robot Wizard as shown below.

32

ROBOMAKER USER'S GUIDE

This wizard will assist you in configuring the robot. Choose Data collection robot as the robot type and continue to the next step of the wizard. As the URL to start from, enter "www.kapowtech.com/tutorial/case1/index.html".

TUTORIAL 1: ESSENTIALS

33

In the next step of the wizard, add the object called PersonOutput. This object will be used to extract person data.

34

ROBOMAKER USER'S GUIDE

Click Finish to create the robot. The RoboMaker Main Window should now look like this:

As you can see, two steps have been inserted. The first step, called Load Page, loads the page using a Load Page action, and the second step, which is the current step, has not been configured yet. Now lets configure the second step of the robot, so make sure it is the current step. In the Browser View, we see the link Go to Table which leads to the page containing the table that we want to extract data from. To load this page, we choose Click as the action of the current step. In the Action tab in the Step View, select the Click action. (If you wish to read more about the Click action, click More.... This brings you to a help page in RoboHelp. All step actions and data converters in RoboMaker have such a help page.) To select the link to be clicked by the Click action, click the Go to Table link in the Browser View. This will select the <a>-tag that defines the link (in the icon to Tag Path View, you can see that the "a" is selected). Then, click the configure the tag finders to find only that tag. To load the page, click the icon. This causes two things to happen: First, it adds a new step after the current step. Then the new step becomes the current step. Changing the current step has some interesting effects: it always results in an update of the robot state shown in the State View, because the

TUTORIAL 1: ESSENTIALS

35

State View always shows the input state to the current step. The input state to the current step is always the output state of the previous step. To update the robot state in the State View, RoboMaker will execute as much of the robot as is needed to get the updated robot state. In our example, the output state of the previous step contains the loaded page. An alternative and easier way to load the page would have been to simply right-click the link Go to Table and select Click in the pop-up menu. This would configure the current step to load the page referred to by the selected link, using the Click action, insert a new step after the current step and go to the new step. The RoboMaker Main Window should now look like this:

You have now reached the page containing the content that we want to extract. Hence, the navigation part of the robot is over. However, before starting on the extraction part, let us try to change which step is the current step without adding a new one. You can make any step in the Robot View the current step by simply clicking it. Try clicking the first step in the Robot View. In the Step View, we can see that the Load Page action has been selected as action, and that the URL from which the action loads is the URL that we entered in the New Robot Wizard. Now try making the second step the current step. Note how the State View updates itself to appear exactly as it did when you finished configuring that step a moment ago. The changes of current step

36

ROBOMAKER USER'S GUIDE

went pretty fast, didnt they? The reason for this is that RoboMaker caches (stores) the output robot states from selected steps in order to minimize the waiting time when the current step changes. The idea of caching is not unique to RoboMaker; your own browser also caches loaded pages so that you can quickly step back and forth between them. Like in a normal browser, you sometimes want to refresh the cache. You can refresh the cache in RoboMaker by clicking the icon. Normally, however, it is not necessary to refresh the cache. Let us return to the extraction part. Before continuing, make sure that the last (third) step is the current step. Taking a look at the table on the web page, we discover that the table contains three columns (PersonId, Name, and Age) and four rows (not counting the headline row). Furthermore, the trained eye will discover an irregularity in the table: Bill has no age! (As you will discover when you begin to write your own robots, these kinds of irregularities are quite common on real-world web sites.) How do we deal with this irregularity? First, we need to decide whether we wish to extract a PersonOutput object when there is no age available for that person. This is an important question that you will probably encounter many times: How much information should, as a minimum, be available in a returned object? Fortunately, we can see the right answer to that question by looking at the Output Objects tab in the Objects View. As you can see, the object attributes personId and name have a small red dot next to them. This means that these two attributes are mandatory and must be given a value before the PersonOutput object can be returned by the Return Object action. The Age attribute has no red dot next to it. This means that the attribute is optional (i.e. not mandatory) and may be given a value before the PersonOutput object is returned. So, we should extract four PersonOutput objects from the table. How do we do this? There are several approaches, but let us settle on one that uses the For Each Tag action to loop through (i.e. do the same for) each row in the table. Select the For Each Tag action in the Action tab in the Step View. Input "tr" into the Tag property to tell the For Each Tag action that it should loop through the <tr>-tags (the table rows) contained in some tag. Next, type "1" in the First Tag Number property to skip the headline row. Finally, we need to identify which tag the For Each Tag action should look for the <tr>-tags in. Try clicking on the table in the Browser View. Then look at the Tag Path View icon to configure and select the innermost <tbody>-tag. Finally, click the the tag finders to find this <tbody>-tag. (Another, much simpler way to do all this would have been to right-click the table, and, in the pop-up menu, select Loops, then For Each Table Row, and finally Exclude First Row.)

TUTORIAL 1: ESSENTIALS

37

The RoboMaker Main Window should now look like this:

icon to add a new step and make it the current step. The input to Click the the current step is the output of the first iteration of the For Each Tag action (iteration 1). You can change the iteration number of the For Each Tag action icon (decrease iteration number by one) or the icon by clicking the (increase iteration number by one), or by directly typing the iteration number into the small text field in-between and hit return. You can also go to the first or icons, respectively. Try and change or last iteration by clicking the the iteration number to 3.

38

ROBOMAKER USER'S GUIDE

The RoboMaker Main Window should now look like this:

Note that the current tag is now the third row in the table. Let us extract the content of the current row. Right-click the PersonId "2" in the Browser View, select Extraction in the pop-up menu, then Extract Number, and finally PersonOutput.personId. Because we are extracting a number, the Extract Number Configuration window pops up. Select the Convert to Integer option, and click OK. The current step will now be configured to use an Extract action, with the Attribute property set to PersonOutput.personId, and an Extract Number data converter added to the list of data converters. Do (more or less) the same for the Name "Jim": Right-click "Jim", select Extraction, then Extract Text, and finally PersonOutput.name. Do (more or less) the same for the Age "72": Right-click "72", select Extraction, then Extract Number, and finally PersonOutput.age. Select the Convert to Integer option in the Extract Number Configuration window that pops up, and then click OK to close the window. You have now extracted a PersonOutput object! Let us return it by selecting the Return Object action for the step. Remember to select PersonOutput in the drop-down box for the Object property in the Return Object action.

TUTORIAL 1: ESSENTIALS

39

The RoboMaker Main Window should now look like this:

The robot now consists of seven steps: two steps concerned with navigation and five steps concerned with extraction. Let us have a closer look at how the objects change as the current step changes. As you can see in the Output Objects tab of the Objects View, you have extracted the personId, name, and age attributes of the PersonOutput object. Now, try to make the previous step (named "Extract Age") the current step by clicking on it. Note that this causes the value of the age attribute to become empty. The reason for this is that the Objects View shows the objects input to the current step when Show Values at Step is selected; and as the attribute value for age has not yet been extracted, it is empty. Try clicking the previous step (named "Extract Name") and note that the value of the name attribute becomes empty. Finally, if you make the step named "Extract Person Id" the current step, then the value of the personId attribute also becomes empty. Now, change the iteration number of the For Each Tag action to 1 by clicking icon twice (or by clicking the icon). Then change the current step the back to "Return Object" and note how the values of the attributes change to match those for the second row (containing info on "Bob") in the table even though you created the extraction steps for the third row (containing info on "Jim") in the table. This is because the branch beyond the For Each Tag step is

40

ROBOMAKER USER'S GUIDE

applied on all robot states outputted by the For Each Tag action. This is a general principle for all loop actions and it is highly useful when you need to do the same thing more than once. icon and notice how the PersonOutput object changes. Try clicking the Also, notice that there is also only one PersonOutput object at any time and not one per person data in the table; the same PersonOutput object is reused in different iterations. Keep clicking the icon until the following message appears:

This error occurs because there is no age in the table row for Bill. This causes the tag finders in the step named "Extract Age" to fail. When you click "OK", this step will be made the current step. How do we deal with this missing age attribute value? We will select this approach: Only extract an age attribute value if there is one. In other words, there are two cases: One in which there is an age value, and one in which there is not. We can represent these two cases by branching into two branches, each starting with a conditional step containing a Test Tag action. The first Test Tag action will only continue execution beyond the step if there is an age value, and the other Test Tag action only continues execution beyond the step if there is no age value. icon (and not the icon!) to insert a new step between the Click the "Extract Name" and "Extract Age" steps. Click "3" or "Bill" in the Browser icon to configure the View, select "tr[4]" in the Tag Path View and click the tag finders to use the <tr>-tag as input. Select the Test Tag action in the Action tab in the Step View. We want this Test Tag action to continue execution beyond this step only if the row contains an age value. We enter the pattern ".*\d+" (which matches all texts ending with one or more digits) into the Pattern property, select the Continue if Pattern Matches Found Tag action, and select Only Text in the Match Against property. The Only Text option is selected because we want the pattern to be matched against just the text contents of the found tag, without the tags.

TUTORIAL 1: ESSENTIALS

41

The RoboMaker Main Window should now look like this:

To verify that the Test Tag action works correctly, click the "Extract Age" step. This will cause the following message to appear:

The message says that the Test Tag action has stopped the execution. Click icon OK to dismiss the message. Change the iteration to 2 by clicking the twice. Then click the "Extract Age" step again. This time the Test Tag action will not stop the execution because the pattern matches the text that is, the row contains an age value. Now, let us create the branch for the case in which there is no age value. Make the "Extract Name" step the current step by clicking on it. Then, click icon to add a new branch to the "Extract Name" step. The new branch the contains a single step that becomes the current step. As before, click 1, Ted or 25 in the Browser View, select "tr[2]" in the Tag Path View and click

42

ROBOMAKER USER'S GUIDE

the icon to configure the tag finders. Select the Test Tag action in the Action tab in the Step View. This action should be configured so that it stops execution if the text contains an age value. Enter ".*\d+" in the Pattern property, set the Action property to Stop if Pattern Matches Found Tag, and select Only Text in the Match Against property. The RoboMaker Main Window should now look like this:

Now, let us create a connection to the "Return Object" step. To do this, place the mouse cursor just to the right of the current step until a white arrowhead appears, then drag this arrow to the "Return Object" step. An alternative way to connect the two steps is to hold down the Ctrl key, and then click first the current step and then the Return Object step, to select both steps. Then, right-click the Return Object step to bring up the popup-menu for the step, and choose Add Connection from the pop-up menu. (To remove a connection between two steps, either hold down Ctrl and click the connection, then click icon, or right-click the connection to bring up the pop-up menu and the select Delete from the menu.)

TUTORIAL 1: ESSENTIALS

43

The RoboMaker Main Window should now look like this:

Verify that the Test Tag action works correctly by changing the iteration using and icons and then click the "Return Object" step. You should only the be allowed to execute beyond the current step (containing a Test Tag action) when the iteration is 4. And this is exactly what we want. We have now achieved the desired behavior: The first (top-most) branch only allows iterations 1, 2, and 3 to continue (those with an age value). The second branch only allows iteration 4 to continue (which has no age value). Have you noticed that the connections between steps are sometimes black and sometimes dark gray? This brings up the concept of the execution path. The execution path includes all steps from the first step to an end step (an end step is a step with no step after it) such that it includes the current step. As you change the current step repeatedly between the two Test Tag steps, notice how the execution path changes. You can use the execution path to see which of several branches was taken to reach a step. For example, if you make the Return Object step the current step, then the execution path will tell you which of the two branches that was executed to reach that step. That's it! Congratulations, you have now created your first robot! Let us test the robot in RoboDebugger and verify that it extracts the PersonOutput objects we expect, namely four PersonOutput objects, one for icon to open each row in the table containing person data. Click the

44

ROBOMAKER USER'S GUIDE

RoboDebugger. Then click the icon in the RoboDebugger Main Window to start the debugging process. As the debugging process runs, objects are returned and displayed. When the debugging process completes, the RoboDebugger Main Window should look like this:

If your RoboDebugger returns the same objects as suggested by this screenshot, then your robot is working as expected. Return to the RoboMaker Main Window (by either closing the RoboDebugger Main Window or simply icon to save the switching to the RoboMaker Main Window) and press the robot for later use. It might seem that you had to do a lot of work to simply extract some person data from a table. Well, when you are trained in using RoboMaker, you can create a simple robot like the one in this tutorial in one or two minutes! Also, the robot is rather robust; for example, it will still work correctly if persons are added to, or removed from, the table, and if the age attribute value is missing for any person, not just "Bill". So what you have is a robot that can be reused as the table content grows or shrinks, and that can handle some table irregularities. And for many kinds of robot tasks, this flexibility is exactly what you want and need. For more on robot robustness, see the How to Make Robots More Robust chapter.

TUTORIAL 1: ESSENTIALS

45

Before you proceed to the next tutorial, we recommend that you read the RoboHelp online entries for the step actions you have used so far. Also, you might want to experiment with the robot you have created:

Try to modify the robot so that it only extracts PersonOutput objects for table rows that contain both a name and an age. (Hint: The solution involves removing the two Test Tag steps and the connection between the "Extract Name" and "Return Object" steps. To remove the connection, right-click it and select Delete from the pop-up menu. After this, a good idea would be to add a Test Tag step immediately after the For Each Tag step.) Try to modify the robot so that it only extracts a PersonOutput object from the first table row. (Hint: The solution involves removing the For Each Tag step.) Try changing the robot so that it loads the table page directly, without loading the http://www.kapowtech.com/tutorial/case1/index.html page first. (Hint: The solution involves changing the URL in the first "Load Page" step, and removing the "Click" step.) Try recreating the robot from scratch without referring to this tutorial. This time, try right-clicking in the Page View to insert the steps.

Remember to test your modifications in RoboDebugger.

46

ROBOMAKER USER'S GUIDE

Tutorial 2: Form Submission


In this tutorial, you will create a robot that

submits a form, uses input objects, and converts values using data converters.

The robot you create in this tutorial will navigate to a page containing a form, fill out the form with data from an input object, and submit it. The robot is representative for all tasks related to form submission, including logon, registration, and order submission. Before proceeding, we recommend that you open your favorite browser, and navigate to http://www.kapowtech.com/tutorial/case2/index.html to take a look at the pages involved in this tutorial.

TUTORIAL 2: FORM SUBMISSION

47

Let us begin by starting up RoboMaker. If you have just completed the icon to start the New Robot previous tutorial, then you can simply click the Wizard. Choose Integration robot as the robot type and continue to the next step of the wizard. As the URL to start from, enter "www.kapowtech.com/tutorial/case2/index.html". In the next step of the wizard, add the object called PersonInput. This object will contain the data that we need to fill in the form.

48

ROBOMAKER USER'S GUIDE

Now go to the next step of the wizard. In this step, you should not add any objects. In practice, this is highly unusual, but in this tutorial we will not extract any objects. In the last step of the wizard, give the robot the id "2" and press Finish. The RoboMaker Main Window should now look like this:

Let us provide some test values to the PersonInput object. These values will only be used when we are developing and debugging the robot in RoboMaker. When the robot is run on RoboServer, the value of the PersonInput object will be provided by the client. However, while developing and testing the robot in RoboMaker, it is useful (and often necessary) to have some test values in the input objects. In the Objects View, you can see the PersonInput object in the Input Objects tab. Make sure that Edit Input Values is selected. Type "John" into the firstName attribute, "Doe" into the lastName attribute, select true for the isMale attribute, and select false for the isMarried attribute.

TUTORIAL 2: FORM SUBMISSION

49

The Objects View should now look like this:

Press Apply to apply the input values. Always remember to do this after you have entered or edited values in the Objects View, otherwise your new values will be lost. Now, RoboMaker executes to the second step using the new input values. When writing a robot that uses input objects, you can easily test different combinations of input values this way. The robot should work correctly for all valid input objects (e.g. a PersonInput object where the value of the isMale attribute is false), so later in the tutorial well try out other test values of the PersonInput object. We will now load the page containing the form to submit. Right-click the "Go to Form" link, and select Click in the pop-up menu.

50

ROBOMAKER USER'S GUIDE

The RoboMaker Main Window should now look like this:

To submit a form, you first fill out the form and then click the submit button. In this tutorial, we need to put the values of the PersonInput object into the form. The text field for the first name should contain the value of the firstName attribute of the PersonInput object. Right-click on the "First Name" text field and select Enter Text from Attribute, and then select PersonInput.firstName.

TUTORIAL 2: FORM SUBMISSION

51

The RoboMaker Main Window should now look like this:

Similarly, right-click the "Last Name" text field and select Enter Text from Attribute and then PersonInput.lastName.

52

ROBOMAKER USER'S GUIDE

The RoboMaker Main Window should now look like this:

Next, we handle the Male and Female radio buttons which should be set according to the value of the isMale attribute of the PersonInput object. Therefore, we test this attribute value. In the Action tab of the Step View, click Select an Action and in the Conditions category, select Test icon. Attributes. Add a condition by clicking the

TUTORIAL 2: FORM SUBMISSION

53

This opens the Attribute Condition Configuration window as shown below:

Select PersonInput.isMale as Attribute and enter true in the Value text field.

Click OK to close the Attribute Condition Configuration window.

54

ROBOMAKER USER'S GUIDE

The RoboMaker Main Window should now look like this:

In the Test Attributes action, we set the Action property to Continue if All Conditions are Satisfied (which is the default value). Now, after the Test Attributes step, we need a step that selects the Male checkbox (execution will only continue past the Test Attributes step if the value of the isMale attribute is true). First, insert a new step by clicking the icon. Then, right-click the Male radio button (not the text Male) and select Forms and then Select Radio Button. This selects the radio button and inserts a new step.

TUTORIAL 2: FORM SUBMISSION

55

The RoboMaker Main Window should now look like this:

If the value of the isMale attribute is false, the Test Attributes step will stop the execution. There should be an alternative execution path for this case. Make the Enter Last Name step the current step by clicking it. Now, lets change the test values of the input object. In the Input Objects tab of the Objects View, make sure that Edit Input Values is selected, and change the value of the isMale attribute to false. It is not strictly necessary to change the other values, but let us change the firstName attribute to Joanna and the isMarried attribute to true. Press Apply to use the new input values. When RoboMaker has executed to the Enter Last Name step, add a new branch after this step by clicking the icon. The new branch contains a single step that becomes the current step. Select the Test Attributes action in the Action tab in the Step View. This action should be configured so that it stops execution if the value of the isMale attribute is true. As before, add a icon, select PersonInput.isMale as Attribute, condition by clicking the enter true in the Value text field and click OK to close the Attribute Condition Configuration window.

56

ROBOMAKER USER'S GUIDE

The RoboMaker Main Window should now look like this:

In the Test Attributes action, we set the Action property to Stop if All Conditions are Satisfied.

TUTORIAL 2: FORM SUBMISSION

57

The RoboMaker Main Window should now look like this:

icon and right-click the Female radio Insert a new step by clicking the button (not the text Female), select Forms and then Select Radio Button. This selects the radio button and inserts a new step. We want to connect the two branches to get the same behavior regarding the Married checkbox, so we do not need this new step. Therefore, delete the last (unnamed) step of the second branch and create a connection between the Select Radio Button step of the second branch and the last (unnamed) step of the first branch.

58

ROBOMAKER USER'S GUIDE

The RoboMaker Main Window should now look like this:

Now, we need to set the Married checkbox. Go to the last (unnamed) step, right-click the Married checkbox (not the text Married) and select Forms and then Set Checkbox.

TUTORIAL 2: FORM SUBMISSION

59

This opens the Set Checkbox window as shown below.

. Now we need to specify whether the checkbox should be checked or unchecked, and we want this to depend on the isMarried attribute. The Set Checkbox window allows us to do this using a value selector. The value selector is a component that is used in many different places in RoboMaker. It allows you to specify a value in several different ways, depending on your needs. The value selector has a drop-down box to the right where you can choose how to specify the value. The value can be specified as a fixed value, a value from an attribute, a value from an expression, or a value from a list of data converters. The date converters are useful when you want to convert the value before using it. In the Set Checkbox window, we must specify one of the values checked, true, 1, or on if we want the checkbox to be checked, and one of the values unchecked, false, 0, or off if we want it to be unchecked. In this case, we want to use the value of the isMarried attribute. This value needs no conversion, since it is true when we want the checkbox to be checked and false when we want it to be unchecked. If we needed to convert the value, we could specify the value using a list of data converters that could do the conversion.

60

ROBOMAKER USER'S GUIDE

So, in the right drop-down box, choose Attribute instead of Value. In the left drop-down box, choose the PersonInput.isMarried attribute. The Set Checkbox window should now look like this:

TUTORIAL 2: FORM SUBMISSION

61

Click OK to close the Set Checkbox window. The RoboMaker Main Window should now look like this:

You have now filled out the form. Right-click the Send info button and select Click.

62

ROBOMAKER USER'S GUIDE

If you have done everything correctly, the RoboMaker Main Window should now look like this:

The tutorial ends here. Normally, after filling out and submitting a form, you would continue the navigation (perhaps by filling out and submitting more forms) to the page containing the information you want and then extract it. You might want to save your robot for future use as a template for how to fill out and submit forms in RoboMaker. We recommend that you consult the RoboHelp online entries on the step actions you have used in this tutorial. Also, you might want to experiment with the robot you have created:

Try changing the input values in the PersonInput object and submit the form again to verify that the robot fills out and submits the form correctly. You may need to shift between the two branches depending on the value of the isMale attribute. Remember to click Apply after changing the input values in the Objects View. Try to modify the robot so that it no longer accepts an input object, but instead uses fixed values when submitting the form. (Hint: The solution involves deleting the PersonInput object from the Input Objects in the Objects View.) Try recreating the robot from scratch without referring to this tutorial.

HOW TO CONFIGURE A ROBOT

63

How to Configure a Robot


A robot has a number of properties that you can configure by clicking the icon in the RoboMaker toolbar, or selecting Configure Robot in the File menu of the RoboMaker Main Window. This brings up the Robot Configuration Window.

Figure 15: The Robot Configuration Window

The available properties in the Robot Configuration Window depend on the robot type. In this section, we explain the properties for non-clipping robots. For information about the properties of clipping robots, see the chapter How to Clip, and the online documentation. In the Basic tab, you can configure default options which apply to all step actions of the robot. A step action can override these global options as needed. You can also enter a comment for the robot. This is useful if you want to document how the robot works, what should be taken into account when editing the robot, etc. In the Advanced tab, you can specify an optional robot id for the robot. If you do so, the id must be unique among all robots in the robot library. If you use RoboManager, then you should use that application to keep track of robot ids. You can register your robot in RoboManager and get an id for it by clicking Register.... If you do not use RoboManager, you will have to keep track of the robot ids yourself. A robot must have a robot id if you use the Database

64

ROBOMAKER USER'S GUIDE

Storage Environment or the Database Message Environment. See the RoboRunner Users Guide for more information about environments. Also in the Advanced tab, you can specify an optional proxy server to use for all page and data loading done by this particular robot. You should use this property only rarely. Normally, it is better to specify one or more proxy servers for the entire installation. This is most easily done in the Settings application. See the Installation Guide for further details on this. The proxy server specified for a particular robot will override proxy servers specified any other way.

HOW TO CONFIGURE THE OBJECTS OF A ROBOT

65

How to Configure the Objects of a Robot


When you create a new robot, you usually start by configuring its objects. Of course, you can reconfigure the objects at any time during the robots lifetime, e.g. if, at some point, you want to change the initial value for an object attribute. You configure the objects of the robot in the Objects View, located below the Step View in the RoboMaker Main Window. The objects that you specify become part of the robot state inputted to the first step of the robot. The Objects View is divided into two tabs, the Input Objects tab and the Output Objects tab. The Input Objects tab is shown below with two input objects added.

Figure 16: The Input Objects Tab

The Input Objects tab shows the input objects that must be inputted to the robot when it is run on RoboServer. If these input objects are not inputted to the robot at runtime, then the robot run will fail. In the bottom of the view, you can select how input objects should be shown. If Edit Input Values is selected, the input values of the input objects are

66

ROBOMAKER USER'S GUIDE

shown, and these can be edited. The input values can be applied by pressing Apply. Note that these values are only used when you are working with the robot in RoboMaker. When the robot is run on RoboServer, then the input values will be overridden (i.e. replaced) by the values of the input objects. If Show Values at Step is selected, the values of the input objects at the current step are shown, and these cannot be edited. The Output Objects tab looks like this with a single output object added:

Figure 17: The Output Objects Tab

The Output Objects tab shows the objects that can be returned by the robot. This is an example of how the view looks. For each object, you can configure and apply the initial values for the object attributes when Edit Initial Values is selected. If Show Values at Step is selected, the values of the output objects at the current step are shown, and these cannot be edited.

HOW TO USE THE TAG FINDERS

67

How to Use the Tag Finders


A Tag Finder is used to find a tag on a page. The most common use of a Tag Finder is in a step, where the Tag Finder is used to find a tag on which the selected action should be applied. The list of Tag Finders of the current step is located in the Tag Finders tab in the Step View, and is shown below.

Figure 18: The Tag Finders Tab in the Step View

68

ROBOMAKER USER'S GUIDE

Understanding Tag Paths


To understand the Tag Finder, the concept of a tag path is important. A tag path is a compact text representation of where some tag is located on a page. Consider this tag path: html.body.div.a This tag path refers to an <a>-tag inside a <div>-tag inside a <body>-tag inside an <html>-tag. A tag path can match more than one tag on the same page. For example, the tag path above will match all of the <a>-tags on this page, except the third one:
<html> <body> <div> <a href="url...">Link <a href="url...">Link </div> <p> <a href="url...">Link </p> <div> <a href="url...">Link <a href="url...">Link <a href="url...">Link </div> </body> </html>

1</a> 2</a> 3</a> 4</a> 5</a> 6</a>

You can use indexes to refer to specific tags among tags of the same type at that level. Consider this tag path: html.body.div[1].a[0] This tag path refers to the first <a>-tag in the second <div>-tag in a <body>-tag inside an <html>-tag. So, on the page above, this tag path would only match the Link 4 <a>-tag. Note that indexes in tag paths start from 0. If no index is specified for a given tag on a tag path, the path matches any tag of that type at that level, as we saw in the first tag path above. If the index is negative, the matching tags are counted backwards, i.e. starting with the last matching tag which corresponds to index -1. Consider this tag path: html.body.div[-1].a[-2] This tag path refers to the second-to-last <a>-tag in the last <div>-tag in a <body>-tag inside an <html>-tag. So, on the page above, this tag path would only match the Link 5 <a>-tag. You can use an asterisk (*) to mean any number of tags of any type. For example, the tag path html.*.table.*.a

HOW TO USE THE TAG FINDERS

69

refers to an <a>-tag located anywhere inside a <table>-tag, which itself can be located anywhere inside an <html>-tag. There is an implicit asterisk in front of any tag path, so you can simply write "table" instead of "*.table" to refer to any table tag on the page. The only exception is tag paths starting with a punctuation mark (.), which means that there is no implicit asterisk in front of the tag path, so the tag path must match from the first (i.e. top-level) tag of the page. With asterisks, you can create tag paths that are more robust against changes in the page, since you can leave out insignificant tags that are liable to change over time, such as layout related tags. However, using asterisks also increases the risk of accidentally locating the wrong tag. You can provide a list of possible tags by separating them with '|', as in this tag path: html.*.p|div|td.a This tag path refers to an <a>-tag inside a <p>-, <div>-, or <td>-tag located anywhere inside an <html>-tag. In a tag path, text on a page is referred to just as any other tag, using the keyword "text". Although text is not technically a tag, it is treated and viewed as such in a tag path. For example, consider this HTML:
<html> <body> <a href="url...">Link 1</a> <a href="url...">Link 2</a> </body> </html>

The tag path "html.body.a[1].text" would refer to the text "Link 2".

How the Tag Finder Works


A Tag Finder can be configured using the following properties:
Find Where: In this property, you can specify where to find the tag relative to a current tag. The default value is Anywhere in Page, meaning that current tags are not used to find the tag. Tag Path: In this property, you can specify the tag path as described in the previous section. Attribute Name: In this property, you can specify that the tag must have a specific attribute, for example "align". Attribute Value: In this property, you can specify that the tag must have an attribute with a specific value. If the Attribute Name property is set, the attribute value is bound to that specific attribute name.

Equals Text specifies that the attribute value must match a specified text. Note that the text must match the entire attribute value.

70

ROBOMAKER USER'S GUIDE

Containing Text specifies that the attribute value must contain the specified text. Pattern specifies that the attribute value must match a pattern. Note that the pattern must match the entire attribute value.

Tag Pattern: In this property, you can specify a pattern that the HTML of the tag must match (including all tags inside it), for example ".*<b>.*Stock Quotes.*</b>.*". Note that the pattern must match the entire HTML of the tag. Tag Depth: This property determines which tag to use if matching tags are contained inside each other. The default value is Any Depth which accepts all matching tags. If you select Outermost Tag, only the outermost tags are accepted, and similarly, if you select Innermost Tag, only the innermost tags are accepted. Tag Number: This property determines which tag to use if more than one tag match the tag path and the other criteria. You specify the number of the tag to use, either counting forwards from the first tag or counting backwards from the last tag that matches.

For example, if you set the tag path to "table", the Tag Attribute property to "align=center", and the Tag Pattern property to ".*Business News.*", then the Tag Finder would locate the first <table>-tag that is center aligned and that contains the text "Business News".

Configuring the Tag Finders of the Current Step


In RoboMaker, you can configure the Tag Finders of the current step in several ways. The first way is to configure it manually. Once configured (whether icon in the Page View to see automatically or manually), you can click the the tag found by the Tag Finder. The second way to configure the Tag Finders is to select a tag in the Page icon. This will configure the Tag Finder to find the View and click the selected tag using a tag path in simple mode. The third way is to right-click on a tag in the Page View and then select an action from the pop-up menu that appears. If you select Use Tag from the menu, the Tag Finder will be configured to find the right-clicked tag using a tag path in simple mode. Similarly, if you choose another action from the menu, this will select a corresponding step action and configure the Tag Finder to find the right-clicked tag. The fourth way to configure the Tag Finders is to select a new step action. Some actions, when selected, configure the Tag Finders so that they find the tags typically used for that action. For example, the Submit Form action will use one Tag Finder and set its tag path to "form" to locate the first <form>tag in the page.

HOW TO SUBMIT A FORM

71

How to Submit a Form


Submitting a form is a common task in a robot. For example, you may need to submit a search form to get the search results that you want to extract, or you may need to submit an order form to make an order transaction. In some cases, you do not need to actually submit the form, but simply want to create a URL that represents the form submission, or modify the current values in the form. In this chapter, you will learn how to do these things. If you havent already read the chapter Tutorial 2: Form Submission, you should do so now, before proceeding with this chapter.

Simple Form Submission


The recommended and simplest way of submitting a form in RoboMaker is similar to the way you submit a form in an ordinary browser: First fill in the form and then click the form submission button. The chapter Tutorial 2: Form Submission is an example of how to do this. To fill in the form, you can use following actions:
Enter Text Select Option Select Multiple Options Set Checkbox Select Radio Button

and to submit the form, you can use the Click action. You can also loop through options or radio buttons by using the following actions:
For Each Option For Each Radio Button

Form Basics
This section describes some basic properties of forms. Consider the following example of a book search form, shown in Figure 19 as HTML, and in Figure 20 as it appears in a browser.

72

ROBOMAKER USER'S GUIDE

<html> <body> <form action="http://www.books.com/search.asp" method="get"> Author: <input type="text" name="book_author"> <p> Title: <input type="text" name="book_title"> <p> Language: <select name="book_language"> <option value="lang_0" selected>English</option> <option value="lang_1">French</option> <option value="lang_2">German</option> <option value="lang_3">Spanish</option> </select> <p> Format: <input type="checkbox" name="book_format" value="format_pb">Paperback <input type="checkbox" name="book_format" value="format_hc">Hardcover <input type="checkbox" name="book_format" value="format_ab">Audiobook <p> Reader Age: <input type="radio" name="reader_age" value="age_inf">Infant <input type="radio" name="reader_age" value="age_teen">Teenager <input type="radio" name="reader_age" value="age_adult" checked>Adult <p> <input type="submit" value="Search"> </form> </body> </html>

Figure 19: A Book Search Form (as HTML)

HOW TO SUBMIT A FORM

73

Figure 20: The Book Search Form in a Browser

A form contains a number of fields. For example, the first <input>-tag in the example form defines a field named book_author. Note that the name of a field is usually different from what the user sees in a browser. For example, the book_author field will appear to be named Author in the browser, not book_author. A field can be defined by more than one tag. For example, the book_format field is defined by three <input>-tags in the example form. Tags that use the same field name and are of the same field type (text field, radio button, checkbox, etc.) define the same field. A field can be assigned one or more values. For example, the book_format field can be assigned the value format_pb to select paperback format. Note that, like the field name, the value that is assigned to a field is usually different from what the user sees in a browser. For example, the user will see the text Paperback, not the value format_pb, when choosing the paperback format. Depending on the field type, some fields can be assigned more than one value at the same time. For example, since book_format is a checkbox field, we could assign both the value format_pb and the value format_hc to the book_format field to select both the paperback format and the hardcover format. Most fields have a default value. The default value is the value that is initially assigned to the field in the form. For example, the book_language field has the default value lang_0, because of the selected attribute. A form is submitted by sending the current values of the fields to the web site. Only fields that have one or more current values are sent. For example, if

74

ROBOMAKER USER'S GUIDE

none of the checkboxes of the book_format field in the example form are checked, no value is sent for that field. In a browser, the submission of a form usually happens when the user clicks a submit button. There are two kinds of submit buttons: normal submit buttons and image submit buttons. Normal submit buttons are defined using a <button>-tag or an <input>-tag, in both cases with the type attribute set to submit. If a normal submit button has a field name and value, that field will be sent with the specified value when the button is clicked. Image submit buttons are defined using an <input>-tag with the type attribute set to image. An image submit button defines two fields, named button name.x and button name.y, where button name is the name contained in the name attribute of the <input>-tag. If the <input>-tag has no name attribute, the fields will be named x and y. When an image submit button is clicked, these two fields are assigned the x- and ycoordinates of the position in the image where the mouse was clicked. Some web sites use this for creating image maps with different behavior depending on where the user clicks. Some forms use JavaScript. For example, the <form>-tag may have an onsubmit attribute that contains JavaScript to be executed before the form is submitted. Similarly, an <input>-tag may have an onclick attribute that contains JavaScript to be executed when the user clicks on the field. Most forms use JavaScript to simply validate that the user has filled out the form correctly. In this case, you can simply ignore the JavaScript when submitting the form. However, some advanced forms use JavaScript to change the form dynamically as the user enters values into it, or to change the form before it is submitted. RoboMaker can handle these situations, with help from the Execute JavaScript action.

Which Step Action Should I Use?


As mentioned, the simplest way to submit a form is to fill in the form using the appropriate actions for this as described earlier. The chapter Tutorial 2: Form Submission is an example of how to do this. The rest of this chapter describes some advanced alternatives, namely the Submit Form and the Loop Form actions. Use the Submit Form action if you just want to submit the form once, create a URL that represents a submission of the form, or simply change some of the existing values in the form (something that can be very useful when the form uses JavaScript). Use the Loop Form action when you want to submit the form more than once, i.e. loop through the form. You need to loop through the form if you cannot get the desired result in a single form submission. Consider the book search

HOW TO SUBMIT A FORM

75

example form. If you want to search for books in all available languages and for all reader ages, you cannot do this in a single form submission, because the site will not allow such a general search. Instead, you have to loop through the languages and the reader ages, and make a form submission for each combination of language and age.

Using the Submit Form Action


The Submit Form action submits a form once.

Figure 21: The Submit Form Action

The Submit Form action requires the <form>-tag as the found tag. The basic principle of the Submit Form action is that you specify the values of all fields that you want to set to something other than their default values. You also choose which submit button to use. As the default, the Submit Form action performs the entire operation of submitting the form and loading the resulting page. However, you can also configure the Submit Form action to not submit the form, and instead either generate an <a>-tag containing the URL that represents the form submission, or to change the current values in the form.

76

ROBOMAKER USER'S GUIDE

If you want to set the values of one or more fields in the form to a value other than their default values, you have to look at the HTML of the form to find the names of the fields and the values to assign to them. For each field that you want to set, add a field value assignment in the Field Values property of the Submit Form action. A field value assignment assigns a specified value to a selected field.

Figure 22: Field Value Assignment Configuration

The value to assign in a field value assignment can be specified in several different ways depending on your needs, using a value selector. Use the drop-down menu on the right side of the value selector to select one of the following ways to specify the value (not all of these ways are available everywhere where a value selector occurs):
Value: Here, you enter the value directly as text or select a fixed value. This is useful if you want to specify a fixed value, without any computations or conversions. Attribute: Here, you select the value of an attribute in an object. For example, this is useful in a robot taking input objects if you have the value for the field ready in an object attribute and do not need to convert it in any way. Expression: Here, you enter an expression. This is useful if you want to make simple computations to get the value.

HOW TO SUBMIT A FORM

77

Converters: Here, you select a list of data converters, whose output is used as the value. The first data converter is given an empty text as input. This way of specifying the value provides the greatest flexibility, since you can make almost any kind of conversion or computation to get the value. For example, in a robot taking input objects, this is useful if you have the value in an object attribute, but need to convert it to the values used in the form.

You can assign multiple values to the same field by adding more than one field value assignment for that field. After adding and configuring the field value assignments, you should select the submit button to be used. Normally, you can use the Submit Button property of Submit Form action to do this. You can choose Default Submit Button to use the default button, which is the first submit button in the form. You can also choose a specific button in the form. The available buttons are shown together with the field names that correspond to the buttons, and with the values that will be assigned to the fields when the buttons are used.

78

ROBOMAKER USER'S GUIDE

Using the Loop Form Action


The Loop Form action is a loop action that loops through a form, i.e. makes multiple submissions of the form.

Figure 23: The Loop Form Action

The Loop Form action requires the <form>-tag as the found tag. To configure the Loop Form action, you must provide information about all fields that should be assigned values other than their default values. For each field, the Loop Form action needs to know whether to loop through the field, how to loop through the field, and which values to assign to the field. The Loop Form action also needs to know which submit button to use. Like the Submit Form action, the default behavior of the Loop Form action is to perform the entire operation of submitting the form and loading the resulting page, for each iteration. But you can perform different actions, just as with the Submit Form action, by either generating <a>-tags containing the URLs that represent each form submission, or changing the current values in the form in each iteration. Now, consider the book search example form from above. Assume that we want to search for all hardcover books by the author John Doe, in all

HOW TO SUBMIT A FORM

79

languages, and for all reader ages. Since the form does not allow us to choose all languages or all reader ages at the same time, we need to loop through the form using the Loop Form action. What we want is to set the book_author field to John Doe, the book_format field to format_hc, and then make a form submission for all possible combinations of values that can be assigned to the book_language field and the reader_age field. To do this, we must create a number of field groups. A field group is a group of one or more fields that must be looped through together. To create a field group, you first select the type of field group that you want. The type determines the number of fields in the group and how the fields are looped through. The two most common field group types are the following:
One field with one value: This is a field group containing one field which is assigned one value. Use this field group type if you want to assign a specific value to a field, without any looping through the field. A field group of this type is similar to a field value assignment in the Submit Form action. One field with values to loop through: This is a field group containing one field with a list of values to loop through. Use this field group type when you want to loop through a list of values, assigning one value at a time to the field.

In most cases, these two field group types are sufficient. The two other available field group types, Multiple fields to loop through and Two fields that define a range, are only needed in rare cases. For more information on these field groups, look in the RoboHelp entry on the Loop Form action. In our example, we would create the following four field groups:

A field group containing the book_author field. This field should not be looped through, but simply assigned the value John Doe, so we use the One field with one value field group type. A field group containing the format_hc field. This field should not be looped through either, but assigned the value format_hc, so, again, we use the One field with one value field group type. A field group containing the book_language field. This field should be looped through, using the available values in the form, i.e. lang_0, lang_1, etc. Therefore, we use the One field with values to loop through field group type. A field group containing the reader_age field. This field should also be looped through, using the values from the form, i.e. age_inf, age_teen, and age_adult. So, again, we use the One field with values to loop through field group type.

80

ROBOMAKER USER'S GUIDE

Now, let us look at how to configure each field group. The One field with one value field groups are configured in exactly the same way as a field value assignment in the Submit Form action.

Figure 24: One Field with One Value

The One field with values to loop through field groups are configured by first choosing the field to loop through, and then specifying the values to loop through.

Figure 25: One Field with Values to Loop Through

HOW TO SUBMIT A FORM

81

The values are specified by choosing and configuring a value list. There are four types of value lists:
List of values: This is a fixed list of values that you specify yourself. Values from form: This list contains the available values for a selected field, as they appear in the form. Number range: This list contains a range of numbers. Values depending on other fields value: This list is similar to the List of values type, except that the values to use can depend on the current value of another field.

The Values from form type is the typical choice when you want to loop through the available values of a field. This value list has the advantage that it will adapt to changes in the available values in the form, such as if a language was added to the list of available languages in the book_language field. In our example, we would use this value list in the field groups for the book_language and reader_age fields. For more information about each type of value list, see the RoboHelp entry for that list. Remember that you only have to create field groups for the fields that you want to assign other values than their default values. Every field that is not included in a field group will be assigned its default value from the form, in every iteration. If you want to assign more than one value to a field, you can create multiple field groups of type One field with one value containing that field. However, it is currently not possible to assign more than one value to a field when looping through it using the One field with values to loop through field group type. When the Loop Form action loops through its field groups, it will make an iteration for each possible combination of field group iterations. So, in our example, it will make an iteration for each possible combination of language and reader age. After adding and configuring the field groups, you should select the submit button to use. This is done in the same way as in the Submit Form action. Like in the Submit Form action, you can select No Submit Button to control the assignments to the submit button fields yourself. For example, you may want to set loop through multiple submit buttons. Some web sites have an upper limit on the number of objects that they will show as the result of a form submission. For example, a book site may not show more than the first 200 matching books. If you want all matching books, you can use the Loop Form action in a special optimization mode. In this mode, the Loop Form action can optimize the looping through the form such that all objects are obtained, but without exceeding the maximum number of objects in each form submission. See the RoboHelp entry on the Loop Form action for more on this.

82

ROBOMAKER USER'S GUIDE

Uploading Files
Some forms contain file fields that allow you to upload files. A file field is defined by an <input>-tag of type file, such as the following: <INPUT type="file" name="attachedFile"> In the Select File action, there are two ways to upload a file using a file field like this: The first way is to upload a file from the file system. To do this, select File in Local File System from the drop-down box and enter the file name. When the form is submitted, the specified file will be loaded from the file system and uploaded as part of the form submission. Note that the file name must be an absolute file name, including the drive name, if any, and the directory path to the file. The second and most common way to upload a file is to specify the file contents to upload, instead of loading the file from the file system. To do this, select File Contained in Attribute from the drop-down box. Then, you may select the attribute that holds the file contents from the drop-down box named File Content. Typically, you will get the contents from either a binary attribute in which you have downloaded the file earlier using the Load Data action, or from an attribute containing text that you have extracted earlier. Optionally, you can specify the content type and the file name of the file. The content type should be the MIME type of the contents, optionally followed a charset. You may use one of the predefined content types, acquire it from an attribute or specify a custom content type. For example, the content type could look like this for an image: image/gif and like this for a plain text: text/plain; charset=iso-8859-1 Note that when downloading files using Load Data, you can store the content type and file name of the downloaded data as part of the download. You can then use this content type and file name when uploading the file with the Select File action.

HOW TO SUBMIT A FORM

83

Using the Pop-up Menu in the Page View


You can use the pop-up menu in the Page View as a shortcut to selecting and configuring the Submit Form and Loop Form actions. To select the Submit Form or Loop Form action in the current step, right-click inside a <form>-tag in the Page View and choose Use Submit Form or Use Loop Form in the Forms submenu of the pop-up menu. If the current step contains a Submit Form or Loop Form action, you can rightclick on a field in the Page View and choose Add Assignment to Field in the Forms submenu, to assign a value to that field. A dialog will appear where you can select the value to assign. If the current step contains a Loop Form action, you can right-click on a field and choose Add Looping over Field in the Forms submenu, to loop through the field. A dialog will appear where you can configure a One field with values to loop through field group for the field. If the current step contains a Submit Form or Loop Form action, you can also right-click on a submit button in the Page View and choose Select Submit Button in the Forms submenu to select that submit button.

84

ROBOMAKER USER'S GUIDE

How to Loop Through Pages


A robot often needs to loop through pages. For example, many web sites will present the results of a search request over several pages, each containing e.g. 20 results from the search. To get the search results, you need to loop through the pages and process one page at a time. This chapter explains how to do this.

Pages where First Page Links to All Other Pages


There are two common ways of linking pages together. The first one is shown in Figure 26.

1234

Figure 26: First Page Links to All Other Pages

Here, the first page contains direct links to all other pages. That is, you can get to any page directly from the first page by following the corresponding link. The first page sometimes also contains a link to itself. Such pages can be looped through quite easily using a For Each Tag step, as shown in this excerpt from a robot:

Here, we are looping through the result pages from a search request, symbolized by the step named (Submit Form). The first result page can be processed directly, so there is a connection from the form submission step directly to the step that processes a page, symbolized by the step named (Process Page). The remaining pages are looped through by the For Each Tag action in the second branch from the form submission step. First, the Test Tag step checks that there is in fact more than one page. If so, we simply loop through the tags containing the links to the pages, load each page using a Click action, and then continue to the processing of the page. If the first page

HOW TO LOOP THROUGH PAGES

85

has a link to itself, the For Each Tag action should be configured to skip this first link, so that the first page isnt processed twice.

Pages where Each Page Links to Next


The other common way of linking pages together is shown in Figure 27.

Next

Next

Next

Figure 27: Each Page Links to Next

Here, each page simply links to the next page, typically with a link or form button named something like Next. To loop through such pages, use the Repeat action. The Repeat action will loop through the pages that are supplied to it by another action named Next. The principle is as follows: The Repeat action must be given the first page as input. It will then loop through the pages, and in each iteration it will output a page. In each iteration, we can process the current page, and we must also give Repeat action the next page, using the Next action. If we dont give the Repeat action a new page, it will not provide another iteration, i.e. the loop will end. This excerpt from a robot shows an example:

Here, like before, we are looping through the result pages from a search request, symbolized by the step named (Submit Form). The form submission step will output the first result page, which we give to the Repeat action. In the first branch from the Repeat action, we process the current page. In the second branch, we load the next page by clicking its link. The Next action will send the page back to the Repeat action, which will output it in its next iteration. When the last page is reached, the Click action will generate an error. Therefore, the Click step is configured to ignore errors and skip the rest of the branch. In the Click step, this is done in the Error Handling tab by setting the Own Errors property to Ignore and Skip Branch. Please see Handling a Steps Own Errors for more information on this.

86

ROBOMAKER USER'S GUIDE

An alternative way of handling the last page is shown in the robot excerpt below:

To detect when the last page has been reached, we use a Test Tag action in the second branch. The Test Tag action checks that the page contains a nextpage link, for example by looking for an <a>-tag containing the text Next. If the page contains such a link, we load this page and give this to the Next action. When the last page is reached, the Test Tag action will stop execution down the second branch, and no new page will be given to the Repeat action, causing the loop to end. Note that finding the link to the next page can be tricky. A common mistake is to find the previous-page link on some pages instead of the next-page link, because the layout of the pages changes slightly between the first page, the subsequent pages, and the last page. Another common mistake is to not detect the last page reliably. You may have to configure the tag finders of the steps carefully to make things work (see the chapter How to Use the Tag Finders). When you are working with a robot in RoboMaker, RoboMaker may not always be able to step correctly back and forth between iterations of a Repeat action. If you are not sure whether RoboMaker has got it right, click Refresh to update.

HOW TO EXTRACT CONTENT

87

How to Extract Content


RoboMaker has five important step actions for extracting content from a tag:

The Extract action is used to extract text content from the tag, optionally including the HTML tags. The Extract URL action is used to extract a URL from a tag attribute containing a URL, and making that URL absolute. The Extract Clip action is used to extract a stand-alone HTML clip from the tag, with support for preparing the HTML to appear on its own apart from its original HTML page. The Extract Tag Attribute action is used to extract the value of a tag attribute. The Load Data action is used to extract binary data such as images and PDF files, but it handles any kind of binary data.

Often you need to reformat (or normalize) the extracted content, and the Extract, Extract Clip, and Extract Tag Attribute actions allow you to do this by configuring a list of data converters.

88

ROBOMAKER USER'S GUIDE

Extracting Text
The Extract action is used for extracting text.

Figure 28: The Extract Action

For short text, like a product name or a price, extract as Only Text. This will simply extract the text between the tags. If you want to extract a longer text with sections, headings etc. as plain text, but still want the text to appear close to how it appears in a browser, you should extract the text as Structured Text. If some sort of special markup is desired, e.g. brackets surrounding the headings, then Structured Text has rudimentary support for that. If the markup requirements cannot be fulfilled with Structured Text, then use Advanced Structured Text which allows you to set mappings from the HTML tags into your proprietary markup.

HOW TO EXTRACT CONTENT

89

Extracting Clips
The Extract Clip action is used for extracting a stand-alone HTML clip from a page.

Figure 29: The Extract Clip Action

The Extract Clip action is useful when you want to extract parts of a page, or an entire page, and want to preserve the HTML formatting of what you extract. For example, you can use Extract Clip to extract web content with the original formatting preserved. Note that if you want to clip functionality, not just content, from a web site, you should create a clipping robot instead of using Extract Clip. See the How to Clip chapter for more on this. The Extract Clip action allows you to extract more than one tag from the same page and combine them into a single clip. You can choose between several ways of combining the individual clips. The Extract Clip action can also modify and adjust the clipped HTML in various ways that make it suitable for appearing on its own, separate from its original HTML page. This includes handling of layout, URLs, and JavaScript.

90

ROBOMAKER USER'S GUIDE

Extracting Binary Data


Binary data is extracted using the Load Data action, which will load the data from a URL and store it in an attribute.

Figure 30: The Load Data Action

Only binary attribute types can be used to store the loaded data in. The binary attribute types are Binary, Image, PDF, and Session. They are all equivalent except that the Image, PDF, and Session types allow you to preview the data.

Using the Pop-up Menu in the Page View


You can use the pop-up menu in the Page View as a shortcut to selecting and configuring the extraction step actions. Simply right-click on the text or tag that you want to extract from, or the link that you want to load from, and select the appropriate option from the Extraction menu in the pop-up menu that appears.

HOW TO EXTRACT CONTENT

91

Performing Common Tasks


In this section, we will take a look at some common extraction tasks that you should be familiar with.

Extracting Only Part of a Text


If you want to extract only a part of the text in a tag, then you can use patterns on the text in the tag. For example, you might want to extract the name "Bob Smith" from the following text: "The article is written by Bob Smith." To do this, use the Extract data converter (do not confuse this with the Extract step action) and configure it as shown below.

Figure 31: Using the Extract Data Converter

The principle is to configure the Pattern property to match the entire text, with the text to extract being matched by a subpattern, enclosed by parentheses. In this case, the pattern used is ".*by\s(.*)\.", which means that the text between by and the period will be matched by the subpattern. For more information on patterns, see the section Patterns in the chapter Getting Started.

Converting Content
Conversion is used whenever you want to normalize content, such as when one text should be replaced by another text. For example, you might want to normalize country codes to their natural language description, e.g. "US" should be normalized to "United States". For plain text conversions, you

92

ROBOMAKER USER'S GUIDE

should use the Convert Using List data converter. For conversions based on patterns or expressions, you should use the If Then data converter.

Number and Date Extraction and Formatting


Whenever you want to extract a number or a date from some content, you should use the Extract Number and Extract Date data converters. For further number and date formatting, you should use the Format Number and Format Date data converters. Often, when you want to extract a number from some content, you add an Extract Number data converter; if you need any further formatting you add a Format Number data converter in order to reformat the text extracted by (and outputted by) the Extract Number data converter.

Extracting Only a Subset of the Tags in the Found Tag


Sometimes, you want to extract from a range of tags rather than a single tag. The Extract action lets you specify a range of tags by specifying the first tag and the last tag in the range. For example, consider the case of extracting the body text of an article, where the body text is made up of individual sections, each in their own tag, and where information about the article title and author is contained in some other tags. To extract only the body text without the article title and author, use the Extract action to extract the text, and configure the action so that only the range of tags spanning the body is extracted.

HOW TO EXTRACT CONTENT FROM A TABLE

93

How to Extract Content From a Table


Interesting content is often located in tables. Unfortunately, HTML tables are far too often irregular in both content and structure. Fortunately, RoboMaker has been designed to deal with such irregularities as described below. (Note that the techniques described in this chapter are not really restricted to dealing with table content and structure irregularities. They can be used when dealing with all kinds of tag irregularities.) If the table containing the interesting content is perfectly regular in both content and structure, then you can extract the content as described in the How to Extract Content chapter. The robot will typically look like this:

The first step contains a For Each Tag action that loops through the <tr>-tags in the <tbody>-tag of a <table>-tag. It is followed by several steps that each extract content from a cell (column-wise) in a table row.

Content Irregularities
Sometimes the content of cells in the same table column differs in format. For example, it might sometimes be empty, sometimes contain "Bob" (firstname) and sometimes "Bob Smith" (first name and last name). One way, and probably the simplest way, to deal with content irregularities is to use the If Then data converter in the step doing the extraction of some attribute value. You configure its If and Else If properties so that they match each format variation. The corresponding Then properties then extract the matching subpattern. However, for the "Bob Smith" case, which contains two attribute values (first name and last name), you need to create two steps: one that extracts the first name and one that extracts the last name. This is because the Extract action only allows you to extract one attribute value. Each of the two steps would then contain an Extract action with an If Then data converter so that the first step extracts the first name (if any), and the second step extracts the last name (if any).

Structure Irregularities
Sometimes the rows of a table vary in the number of cells they contain. A common way of dealing with such irregularities is to test the format of each table row. For example, you might want to consider only rows containing a certain number of cells, or only rows containing a specific text. To do this, you add branches after the For Each Tag step (that loops through the table rows), so that each branch starts with a conditional step (having a

94

ROBOMAKER USER'S GUIDE

Test Tag action) that accepts all rows matching (or not matching) some format (written as a pattern). The conditional step is then followed by one or more extraction steps that assume the format accepted by the conditional action. The robot would look something like this:

When this robot is run, each conditional step will be executed in turn for each table row. Normally, we only want execution to proceed past at most one of the conditional steps. You can ensure this by writing the patterns of each conditional action in such a way that no text will match more than one pattern. Note that the branches beyond the conditional steps need not be kept separate. If two or more branches share extraction steps, you might want to merge the branches after the steps that are different.

HOW TO CLIP

95

How to Clip
This chapter describes how to use Kapow Mashup Server for clipping from web sites. This allows you to reuse existing web sites for new purposes, such as in portals. This chapter explains the new clipping functionality that was introduced in Kapow Mashup Server 6.0. The clipping functionality from Kapow Mashup Server 5.5 is still available, but you will need to refer to the Kapow Mashup Server 5.5 documentation for information about this. Also, note that if you just want to extract stand-alone clips, without any functionality and without using a clipping robot, refer to the Extracting Clips section in the How to Extract Content chapter.

What is Clipping?
Clipping means reusing existing web sites in a new context, typically as a portlet in a portal. Clipping allows you to reuse existing web sites without affecting the web sites or changing a single line of code in them. You can clip entire web sites, or selected parts of the web sites, such as selected pages or parts of pages. As part of the clipping process, you can also modify the clipped pages to suit your needs. For example, you can change the layout and styling of the pages to match the look-and-feel of your portal. You can also remove or change parts of the pages, such as removing advanced functionality that you do not want to expose in your portal. As part of the clipping, you can also do automatic login and logout on the web site that you are clipping from. This can be done as part of a single-sign-on solution that covers your portal. This way, the portal user only needs to log into the portal itself, after which he will be logged in automatically to all applications that he accesses from the portal. You can also do other types of automatic navigation as part of the clipping, such as automatic navigation to the first page to be shown in the portlet, and automatic pre-filling of forms.

How Clipping Works


When you want to clip from a web site, you create a clipping robot in RoboMaker. From this clipping robot, Kapow Mashup Server can then automatically create the web component that you need to deploy your robot, such as a portlet or a web page. For simplicity, in this Users Guide, and in most of Kapow Mashup Server, we refer to this web component as a clipping portlet, although it may be another type of web component, such as a standalone web page that does not require a portal system.

96

ROBOMAKER USER'S GUIDE

Figure 32 shows the basic setup when a clipping robot is deployed at runtime.
Portal Server RoboServer Clipping Robot Web Site Clipping Session

Browser

Clipping Portlet

Figure 32: A Clipping Robot Deployed at Runtime

The generated clipping portlet is deployed on a portal server. When the portal user accesses the clipping portlet for the first time from his browser, the clipping portlet starts a new clipping session. It does this by executing the clipping robot on a RoboServer with a Begin Session command as input to the robot. On the RoboServer, the clipping robot will create a new clipping session that resides on that RoboServer. The robot will then perform the navigation necessary to reach the first page that should be clipped, clip from that page, and return the clip to the portlet. The portlet will then show the clip to the user. The robot state, e.g. windows, pages, JavaScript, cookies, etc., that results from the navigation on the web site will be kept on RoboServer, as part of the clipping session. The clip shown by the portlet is a specially modified version of the original page. Besides the modifications and layout changes that the robot may perform on the clip, the clipped page has also been specially instrumented for the clipping process. All original JavaScript has been removed from the page, since that JavaScript would not execute correctly in the new portal context that the clip appears in. Instead, the clipped page has been instrumented with special event handlers that capture selected user interactions with the page. When the user interacts with the clip, e.g. clicks a button or presses a key, one of two things happen. For low-level user actions, such as pressing a key or moving the mouse, the user interaction is typically handled locally in the users browser, since this does not normally require any page loading, JavaScript execution, or similar. For high-level user actions, such as clicking a button or submitting a form, the interaction is typically captured by a event handler and handled by the portlet. The portlet will handle such an interaction by executing the robot with a command that specifies the type of user interaction and relevant additional information such as which button was clicked or which form was submitted. The robot will then perform the same interaction on the web site that is being clipped, using the robot state stored in the clipping session. If the interaction triggers JavaScript, that JavaScript will be executed by the robot in its original context of the web site. This way, the actual interaction with the web site is

HOW TO CLIP

97

done by the robot, in the original context of the web site, using the full state necessary to do the interaction correctly. Thus, the actual interaction with the web site is unaffected by the clipping and the changes done to the clipped pages to make them suitable for view in the portlet. When the user logs out of the portal, or his portal session times out, the clipping portlet will end the clipping session by executing the robot with an End Session command. This will cause the robot to do whatever is necessary to end the interaction with the web site, such as logging out from the web site, and then end that particular clipping session on the RoboServer. The communication between the portlet and the clipping robot is done using Kapow Mashup Server objects, as for any other robot. The input to the clipping robot, which includes the command to perform, is represented by a ClipRequest input object. The output from the robot, which includes the resulting clip, is represented by a ClipResponse output object.

The Structure of a Clipping Robot


In this section, we will take a look at the structure of a clipping robot.

A Simple Clipping Robot


By default, a clipping robot has the following structure:

Figure 33: A Simple Clipping Robot

The robot starts with a step that uses the Begin Clip action. This step performs the command that the clipping portlet has requested:

For the Begin Session command, the Begin Clip step creates a new clipping session and loads the first page from the web site to clip from. For a user interaction, the Begin Clip step performs the requested user action on the current robot state in the clipping session, e.g. clicks a button or submits a form. For the End Session command, the Begin Clip step prepares for ending the clipping session.

After the Begin Clip step follows a default clip branch. The default clip branch handles the clipping from all pages that are not handled by any other clip branch in the robot. In this simple case, there is no other clip branches, so the default clip branch will handle the clipping from all pages. The default clip branch starts with a step named Default?, with a Test Default Clip action. The Test Default Clip action serves only to identify this branch as the default clip branch, and always lets execution continue along the branch.

98

ROBOMAKER USER'S GUIDE

After the Test Default Clip step is a step with a Clip action. This step performs the actual clipping and stores the resulting clip in the ClipResponse output object. The Clip step also stores the current robot state in the clipping session, for use in the next execution of the robot. The default clip branch ends with a step with an End Clip action. This step returns the ClipResponse object to the portlet and stops the robot execution.

A Robot with Multiple Clip Branches


If you do not want to clip all pages on the web site in the same way, you can add more clip branches to the robot, one clip branch for each set of clipping rules. Here is an example of a robot with multiple clip branches:

Figure 34: A Robot with Multiple Clip Branches

There are two clip branches besides the default clip branch. The non-default clip branches have the same structure as the default clip branch, except that the first step in the clip branch has a Test Clip action instead of a Test Default Clip action. The Test Clip action checks whether this clip branch should handle the clipping of the current page (or pages). The clip branches will be tried in turn until the Test Clip step of a clip branch accepts the current page, in which case that clip branch is used. If none of the non-default clip branches match, the default clip branch is used. The Test Clip action is configured by specifying one or more clip conditions that the current windows must match or not match. A clip condition can check the URLs of the current windows, the contents of the pages in the windows, etc. This way, you can specify exactly which pages this particular clip branch should handle. When you have multiple clip branches, the default clip branch is optional.

A Robot with Automatic Navigation Sequences


A robot can perform automatic navigation when a clipping session starts or ends. This is typically used for doing automatic login and logout on the web site that is being clipped. It can also be used for navigating from the front page to the first page to be clipped, or for pre-filling of forms.

HOW TO CLIP

99

Here is an example of a robot that does automatic login and logout:

Figure 35: A Robot with Automatic Login and Logout

After the Begin Clip step, the robot has three branches that are executed depending on the current command to the robot. Each branch starts with a step with a Test Clip Command step action that determines whether the branch should be executed for the current command:

If the command is Begin Session, the first branch is executed. If the command is a normal clip command, i.e. one that represents a user action, the second branch is executed. If the command is End Session, the third branch is executed.

The first branch performs the automatic login, in this case by entering the username and password in a login form and submitting the form. This branch then joins with the second branch. They join in the step named Logged In, whose only purpose is to serve as a joining point. After this join step, one or more clip branches follow, as in the simpler clipping robots described earlier. Thus, the clip branches are executed both for the Begin Session command and for the normal clip commands. The third branch from the Begin Clip step performs the logout when the session has ended. This is typically done by clicking on a logout button. Note that the steps named Begin Login, End Login, Begin Logout, and End Logout are just Do Nothing steps that serve only to mark the start and end of the login/logout sequences, and to make it easier to edit the sequences.

100

ROBOMAKER USER'S GUIDE

Creating a Clipping Robot


To create a clipping robot, click the icon in the RoboMaker Main Window. In the New Robot Wizard, choose Clipping robot and click Next:

HOW TO CLIP

101

On the second page, enter the URL that the robot should start from. This is the first page to clip from, or the page to start automatic navigation from. In this example, we enter www.google.com:

The next pages in the wizard help you to configure the robot for automatic login, if that is needed. In this example, we just click Finish after entering the URL. This will create a simple clipping robot with a default clip branch. By default, this robot will clip all pages that are reachable from the first page (directly or indirectly). The pages will be clipped in their entirety, without any modifications. You can subsequently configure the robot to behave differently, as explained in the rest of this chapter. When the robot has been created, RoboMaker will open up the Portlet View and show the first clip created by the robot.

The Portlet View


The Portlet View shows your clips as they will appear in the portlet. You can interact with the clips in the same way as a user interacts with the portlet.

102

ROBOMAKER USER'S GUIDE

Thus, the Portlet View allows you to navigate through the clips, refine your robot, and check whether everything looks and works as you want it to.

Figure 36: The Portlet View

You should use the Portlet View as your primary way of moving around in a clipping robot. In a clipping robot, you usually cannot click back and forth between branches as you would do in a normal robot. The reason is that the branches in a clipping robot depend on the current command to the robot, i.e. the current ClipRequest input object, as well as the robot state in the current clipping session. Therefore, if you click on a step in another branch than the current, RoboMaker will typically tell you that the step cannot be reached. Instead, you need to use the Portlet View to navigate to the clip for which that branch applies. You can then click back and forth between the steps within the branch, in the usual way. icon in You can go back to earlier clips in the Portlet View by clicking the icon. the toolbar of the Portlet View. You go forward again by clicking the This is similar to the back and forward buttons in a browser, but slightly

HOW TO CLIP

103

different, since you a moving back and forward between the clips made by the robot, not the loaded pages as in a browser. You can start a new clipping session in the Portlet View by clicking the icon in the toolbar of the Portlet View, or the icon in the Main Window of RoboMaker. This is useful when you want to go back to the first clip that will be shown in the portlet. It is also useful if you get stuck trying to click back icon to and forth between branches in the robot. In that case, click the start over, and navigate to the appropriate clip using the Portlet View. By default, the Portlet View opens and closes automatically. You can also open and close it yourself, by clicking the icon in the Main Window of RoboMaker. If you do not want the Portlet View to open and close automatically, uncheck the Open/Close Automatically checkbox in the toolbar of the Portlet View.

104

ROBOMAKER USER'S GUIDE

Working with Clip Branches


By default, there is only one clip branch in a clipping robot, the default clip branch. This branch handles all pages, i.e. all pages will be clipped in the same way. If you want to clip some pages differently, you can add more clip branches to the robot, one for each set of clipping rules that you want to define.

Adding a New Clip Branch


To add a new clip branch, navigate to a page that you want the new clip icon in the Portlet View to open the Edit branch to handle. Then click the Clip Wizard:

Select Add new clipping rules and click Next.

HOW TO CLIP

105

On the next page, enter the name of the new clip branch:

Use the name to distinguish the new clip branch from other clip branches, so that it is easy to remember which pages this clip branch handles. The Edit Clip Wizard will often suggest a name for you based on how you navigated to the page, and the page itself.

106

ROBOMAKER USER'S GUIDE

When you have entered a name, you can either click Finish to create the new clip branch now, or click Next to configure the clip condition for the new clip branch:

The clip condition will be used in the Test Clip step of the new clip branch to determine which pages the branch should handle. The Edit Clip Wizard will suggest a clip condition that will usually work well, but in some cases, you may need to adjust the suggestion or change the configuration entirely. You can also adjust the clip condition after creating the branch. When you have configured the clip condition, click Finish to create the new branch. When the new branch has been created, you will be placed at the Clip step of the new branch, so that you can configure how the clipping is done in the branch. See the section Modifying Clips later in this chapter for how to do this. icon in the Main When you have configured the new branch, click the Window of RoboMaker to see the resulting clip in the Portlet View. Instead of using the Edit Clip Wizard, you can also add a new clip branch by icon in the toolbar of the Portlet View. clicking the

HOW TO CLIP

107

Editing a Clip Branch


You can edit the clip branch that you are currently on by clicking the to open the Edit Clip Wizard: icon

Select Edit current clipping rules (or Edit default clipping rules if you are on the default clip branch), and then click Finish. You will then be placed at the Clip step of the current clip branch, allowing you to configure the clip branch. When you are done, click the the clip in the Portlet View. icon in the Main Window of RoboMaker to see

icon in the Portlet You can also edit the current clip branch by clicking the View, or the Main Window of RoboMaker. If you are on the default clip branch, you will be asked for confirmation that you want to edit the default clipping rules, since this will affect all pages clipped by the default clip branch. Note that you can also go to the Clip step yourself by clicking on the step in the Robot View.

108

ROBOMAKER USER'S GUIDE

Using another Clip Branch for a Page


If you want to use another clip branch than the current one for the page that icon to open the Edit Clip Wizard: you are on in the Portlet View, click the

Select Clip using other rules and click Next.

HOW TO CLIP

109

On the next page, select the clip branch that you want to use instead:

110

ROBOMAKER USER'S GUIDE

Click Finish if you want to use the clip condition suggested by the Edit Clip Wizard, or click Next to configure the clip condition:

The clip condition will be used in the Test Clip steps of the affected clip branches, to ensure that the selected clip branch is the only one that matches the pages, besides the default clip branch. Click Finish when you are done. The Edit Clip Wizard will then reconfigure the clip branches accordingly, and show you the resulting clip, where the selected clip branch is used instead. Instead of using the Edit Clip Wizard, you can also click the Portlet View to clip using another branch. icon in the

Using Clip Conditions


Clip conditions are used in the Test Clip step of a clip branch to determine whether that clip branch should handle the current page (or pages). The various clipping wizards will typically suggest suitable clip conditions that you can use directly or adjust. The wizards will also take care of adding and removing the clip conditions in the Test Clip steps. However, in some cases, you may need to configure the Test Clip steps directly, and this section will explain how to do this.

HOW TO CLIP

111

The Test Clip action has two lists of clip conditions:

At least one of the clip conditions in the first list must be satisfied, and none of the clip conditions in the second list may be satisfied. You can use the second list as a list of exceptions, when the clip conditions in the first list are broader than what you want to handle in the branch.

112

ROBOMAKER USER'S GUIDE

The configuration of a clip condition looks like this:

First, you select which window you want to check. Then, you select the condition that the window must satisfy. A number of conditions are available, such as conditions that check the URL of the window or the page contents of the window. If you select the Advanced condition, you can specify multiple conditions that must all be satisfied by the window.

HOW TO CLIP

113

In most cases, it is sufficient to check only one window. However, if you need to add conditions for other windows, you can do so in the Other Windows property:

Here, you can specify a list of additional conditions that must be satisfied. In each condition, you select a window and a condition, as in the clip condition itself. Using these additional conditions, you can create clip conditions that match only if a very specific set of windows is open, with specific URLs and page contents in each window. When you create a clip condition, it is often useful to enter a description for it. This makes it easier to distinguish the clip conditions, and remember what their purposes are.

Modifying Clips
By default, all pages are clipped in their entirety with the original sizes, layout, and styles preserved. In this section, we will explain how to modify the clips, such as clipping only parts of a page, changing layout and styles, and modifying the contents of the page. You modify clips by configuring the clip branch that handles the clips that you want to change. If you make the modifications in the default clip branch, the modifications will apply to all pages that are clipped by the default branch. If

114

ROBOMAKER USER'S GUIDE

you want to make modifications only on some pages, add one or more clip branches for these specific pages, and configure those clip branches to do the modifications. To modify the clip for the clip branch that you are currently on, use the Edit icon in the Portlet View, as explained earlier in the Clip Wizard, or click the section Editing a Clip Branch. When you are done with the modifications, icon in the Main Window of RoboMaker to see the resulting clip in click the the Portlet View. For changes in layout and styles, you can also specify default changes in the Robot Configuration Window. These changes will apply to all clip branches that have not been individually configured to use other changes than the default ones.

Selecting the Tags to Clip


You can configure a clip branch to clip only selected tags on a page, instead of the entire page. To do this, go to the Clip step in the clip branch. Configure the tag finders of the step to find the tags that you want to clip. You can clip more than one tag from the same page, in which case they will be combined into a single clip. Here is an example of clipping multiple tags from a page:

HOW TO CLIP

115

The resulting clip will look like this:

You can also clip a range of tags instead. To do this, choose Tag Range in the Clip From property of the Clip action. The Clip action will then clip all tags between the two found tags. Note that you can select the tags to clip only in the current window. If you have multiple windows open, the other windows will always be clipped in their entirety. See the section Working with Windows and Frames later in this chapter for more on working with multiple windows.

Changing Layout and Styles


You can change the layout and styles of the clips as part of the clipping. For example, you can change the clips to match the look-and-feel of your portal.

116

ROBOMAKER USER'S GUIDE

You can specify default layout changes that will be used by all clip branches by default, and you can specify individual layout changes in each clip branch. To icon to open the Robot specify the default layout changes, click the Configuration Window. Select the Layout Changes tab:

In the Original Layout property, you can specify what to do with the original layout and styles of the page. For example, you can specify that all layout and styles should be removed from the clips. This is useful if you want to completely restyle the clips without regard to the original layout. In the Sizing property, you can specify what to do with the original size specifications in the clip, e.g. widths of tables. For example, you can specify that all absolute sizes should be removed. This is useful if you want to adapt the overall size of the clip to fit into the portal.

HOW TO CLIP

117

In the Add new Style Sheet Link property, you can specify a style sheet link to be added to the clip. This is useful when you want to restyle the clips to use another style sheet, such as the standard style sheet in your portal. In the Layout Change Rules property, you can specify layout change rules that you want to apply to the clips. For example, this layout change rule changes all usages of the original style class headingb to the style class hd2, which could be a style class in the portal style sheet that you want to restyle the clip to use:

The default layout changes that you specify in the Robot Configuration Window will be used by all clip branches, except the clip branches that you explicitly configure to not use the default layout changes. To configure a clip branch to not use the default layout changes, go to the Clip step of that branch. In the Layout Changes tab, choose Specify, and then specify the layout change settings that you want to use for this particular clip branch.

118

ROBOMAKER USER'S GUIDE

Modifying the Pages before Clipping


Another way to modify the clips is to modify the original pages before the Clip step. This is useful for more advanced changes, such as removing functionality from the clips, or adapting the functionality of the pages. For example, to remove a button from a form, go to the Clip step in the clip branch that clips the page with the form. In the Page View, right-click on the button that you want to remove, and choose Hide Tag:

This will insert a Hide Tag step before the Clip step. The Hide Tag step will use icon to styling to hide the tag, so that it is invisible to the user. Click the see the results. Note that we hide the button instead of actually removing it from the page. This is to avoid breaking the functionality of the page. If you remove tags from the page, or otherwise change it, you may break things like JavaScript that rely on the page having a particular structure and contents. Therefore, it is usually safer to just hide tags, using styles, instead of removing them. However, if you subsequently remove all styles from the clips as part of your layout changes, the hiding will be lost. In that case, you will need to remove the tags instead.

HOW TO CLIP

119

Another example of modifying the pages is to adjust the functionality of a form by modifying or inserting hidden <input>-tags. For example, on Google, you can insert a hidden <input>-tag in the search form to reduce the number of search results shown on each page in the search results:

When modifying the original pages like this, you need to take into account that the clip branch may be applied multiple times to the same pages. This will happen if the user can make interactions with the page that cause him to stay on that page without loading a new page. In this case, the clipping robot will be executed multiple times to clip from the same page. This will also happen in some cases where the portlet needs to obtain the current clip again, such as if the user moves to another page in the portal and back again to the page containing the portlet.

120

ROBOMAKER USER'S GUIDE

Because of this, if you are not careful, the clip branch will perform the same modifications multiple times on the same page. To avoid this, you may need to check whether the change has already been made on the page. For example, you may need to check whether your hidden <input>-tag has already been inserted on the page, and skip the inserting in that case. Here is how this would look for the example above:

Note that this is not an issue if you are just hiding tags, using Hide Tag, since Hide Tag does nothing if the tag is already hidden.

Working with Windows and Frames


If you are clipping from a web site with multiple windows, all of the windows will be clipped by default. For example, if the site opens popup windows, these windows will be clipped and shown as popup windows in the portal, i.e. as separate windows opened from the portal page. If a page contains frames, those frames will be clipped as well and shown as part of the clipped page. In this section we will explain how to change this default handling of windows and frames.

HOW TO CLIP

121

Selecting the Window to Show in the Portlet


By default, the current window is shown in the portlet itself, and any other top-level windows are shown as popup windows opened from the portal page. You can change this in the Show in Portlet property of the Clip action:

You can select any window or frame as the window/frame to show in the portlet itself. If you are clipping from a page with frames, you can select a specific frame to show, instead of the entire page with all frames. This way, you can exclude the rest of the page from the clip.

Blocking Popup Windows


You can exclude some or all popup windows from the clip. This is useful if the popups contain unnecessary functionality or advertisements. You can configure the default handling of popup windows in the Popup Windows property in the Robot Configuration Window:

122

ROBOMAKER USER'S GUIDE

You can block (i.e. exclude) all popups, or block selected popups based on the window names, i.e. the names shown in the window tabs in the Page View. These default settings will be used in all clip branches where you have not configured the popup window handling individually. You can configure the popup window handling individually for a clip branch using the Popup Windows property in the Clip step of the branch.

Handling Login and Single-Sign-On


A clipping robot can perform login and logout automatically as part of the clipping, without the portal user having to worry about it or even know about it. The username and password for this can be obtained from various places, such as the credential vault of your portal. This way, the clipped web site can become part of the single-sign-on solution of your portal.

Performing Automatic Login


To configure your robot to do automatic login, choose Add Login Sequence in the Login menu of the Portlet View. This will open the Add Login Sequence Wizard:

HOW TO CLIP

123

On the first page of the wizard, select the type of login to perform. The most common type is form login, where the username and password are entered into a form on the login page, such as this one:

The other login type is HTTP login, where the username and password are entered into a special prompt window opened by the browser, such as this one:

124

ROBOMAKER USER'S GUIDE

When you have selected the login type in the wizard, click Next. On the next page, enter the username and password to use while editing the robot in RoboMaker:

The username and password that you enter are for development purposes only. For example, enter the username and password for a test account on the web site. When you deploy the robot, you can configure where the username and password should be obtained from at runtime. See the section Deploying a Clipping Robot later in this chapter for more on deploying a clipping robot. When you have entered the username and password, click Finish. If you selected the HTTP login type, no actual login sequence needs to be added to the robot. The Begin Clip step at the start of the robot will do the necessary login. If you selected the form login type, the wizard will add a login sequence to the robot. The sequence will be executed when a new clipping session starts. See the section The Structure of a Clipping Robot earlier in this chapter for more on the structure of a clipping robot. When the login sequence has been added, you will be placed at the location in the sequence where you should insert the login steps. The login steps should enter the username and password into the login form and submit the form. The username and password should be obtained from the ClipRequest.username and ClipRequest.password attributes. At runtime,

HOW TO CLIP

125

the clipping portlet will send the appropriate username and password to the robot in these attributes. An easy way to enter the username and password is to right-click on the appropriate fields in the form and choose Enter Username or Enter Password:

When you have inserted the steps, click the clip in the Portlet View.

icon to show the resulting, first

icon in the Portlet View, You can verify the login sequence by clicking the to start a new clipping session. If you want to edit the login sequence, choose Edit Login Sequence from the Login menu in the Portlet View. This will place you at the start of the login sequence. You can remove a login sequence by choosing Remove Login Sequence in the Login menu. You can edit the test username and password used when developing the robot in RoboMaker, by choosing Edit Test Username and Password from the Login menu. Note that you can also define the automatic login in the New Robot Wizard when you create your new clipping robot, by clicking Next after entering the start URL of the robot.

Performing Automatic Logout


On most web sites, it is not strictly necessary to do a logout, since the user session will time out automatically. However, if you want to create a wellbehaved clipping robot, you should configure the robot to do an automatic logout, when possible, to tie up as few resources as possible on the web site.

126

ROBOMAKER USER'S GUIDE

To configure your robot to log out automatically, choose Add Logout Sequence in the Login menu in the Portlet View. This will open the Add Logout Sequence Wizard. Simply click Finish in the wizard. The wizard will then add a logout sequence to the robot, and place you at the location in the sequence where the logout steps should be inserted. The logout sequence will be executed whenever a clipping session ends, i.e. when an End Session command is sent to the robot. The logout sequence will be executed with the current robot state in the clipping session, i.e. the logout sequence will continue from the point that the user has currently navigated to. In the logout sequence, insert the steps necessary to log out. For example, insert steps to navigate to a page where a logout button is present, and a step to click on the button. Remember that the logout sequence must work no matter which page the user is currently on. When you have inserted the logout steps, you can test the logout sequence by icon to start a new clipping session, and then choosing first clicking the End Session in the View menu of the Portlet View to end the session. You can edit the logout sequence by choosing Edit Logout Sequence in the Login menu of the Portlet View, and you can remove it by choosing Remove Logout Sequence.

Supporting other Types of Single-Sign-On


If the web site that you are clipping from is already included in a single-signon solution that also covers your portal, you can configure your robot to work in that environment. Such single-sign-on solutions are typically based on tokens (cookies or HTTP headers) which are provided by the single-sign-on server that all web requests pass through. You need to configure the clipping portlet to pass on the single-sign-on tokens to the robot. The robot will in turn pass them on to the web site. The configuration is done when generating the clipping portlet from the robot. See the online documentation for the portlet generation wizards for more on this. In production, the portlet passes the token to the robot as described above. However, when developing the robot in RoboMaker, you will need to obtain a suitable token through other means in order to access the source web site. For cookies, you can do this as follows: 1. Create a separate non-clipping robot that accesses one of the web sites covered by the single-sign-on solution and obtains a cookie from the single-sign-on server. 2. After accessing the web site, the robot should store the obtained session in an attribute of type Session, using the Save Session action. 3. Locate this session attribute in the Objects View and copy the value of the attribute to the clipboard by clicking the Copy button.

HOW TO CLIP

127

4. Open your clipping robot. 5. In the ClipRequest input object, click the Paste button in the cookies attribute, to paste the cookies of the copied session into the cookies attribute. 6. Click Apply to apply the changes. Now, the robot is configured to use the obtained cookie while running in RoboMaker. Note that you will have to repeat this process if the cookie that you obtained times out. If the single-sign-on solution uses HTTP headers, you probably need to ask your systems administrator to provide you with a valid header. You can then paste this header into the ClipRequest.headers attribute in the Objects View, to make the robot use the header while running in RoboMaker. Remember to click Apply in the Objects View after entering the header.

Adding an Automatic Navigation Sequence


If you want your robot to do automatic navigation, but do not need an automatic login sequence, you can add a plain automatic navigation sequence instead. This sequence will be executed when a new clipping session begins. It can be used for things such as navigating from the start page to the first page to clip from, or pre-filling a form on the first page to clip. If you also need to have an automatic login sequence, you should use the automatic login sequence to perform this additional navigation, pre-filling, etc., after the login. You cannot have both an automatic login sequence and a plain automatic navigation sequence at the same time. To add a plain automatic navigation sequence, choose Add Automatic Navigation Sequence in the Automation menu of the Portlet View. Click Finish in the wizard to add the sequence. You will then be placed at the point in the robot where you should insert the automatic navigation steps. icon to see the first clip. You When you have inserted the steps, click the icon to can try out the automatic navigation sequence again by clicking the start a new clipping session. You can edit the sequence by choosing Edit Automatic Navigation Sequence in the Automation menu, and you can remove the sequence by choosing Remove Automatic Navigation Sequence. Note that you can pass additional information to the robot from the clipping portlet for use in the automatic navigation sequence. For example, you can pass selected user preferences from the portal to the robot and use them for pre-filling a form, so that the user does not have to enter the same information every time he accesses the portlet. See the section Passing Additional Information to a Clipping Robot later in this chapter for more on this.

128

ROBOMAKER USER'S GUIDE

Other Topics
This section explains various other topics in relation to clipping.

Restricting the Clipping


By default, a clipping robot will clip all pages that are reachable from the first page (directly or indirectly). That is, the user can navigate wherever he wants, and stay in the clipped portlet. You can change this in the Basic tab of the Robot Configuration Window:

In the Clipping Restrictions property, you can restrict the clipping by specifying which links can be followed. In the Excluded Links property, you can specify what should happen if the user tries to follow a link that has been excluded. Here is an example of a configuration:

Here, the clipping has been restricted to domains ending with app1.mycompany.com. Links to other domains will be disabled, and the user will see the message This link has been disabled if he tries to follow them. Instead of disabling the links, you can also specify that the links should be opened in another window and not be clipped. As an example, if you create a Google search portlet, you would probably configure all links away from google.com to open in a new window and not be clipped. Note that opening links in other windows without clipping works only if the links can be accessed directly without a session, i.e. are not protected by a firewall and do not require cookies, authentications, etc.

HOW TO CLIP

129

Clipping Protected Resources


Most web pages contain resources, such as images, applets, etc. By default, such resources will be loaded directly from the portal users browser, without going through the portal or the clipping robot:
Portal Server RoboServer Clipping Robot Web Site Clipping Session

Browser

Clipping Portlet

Resources

Figure 37: Direct Resource Loading

This requires the resources to be directly accessible from the portal users browser. However, in some cases, the resources are protected from direct access. For example, the resources may be protected by a firewall between the user and the web site, or the resources may be dependent on the user session. You can solve this problem using resource clipping. Resource clipping means that the resource loading from the portal users browser is channeled through the clipping portlet and the clipping robot. This way, the resources will be loaded by the clipping robot itself, which does have access to the resources. Note that resource clipping is more performance expensive than loading the resources directly from the portal users browser, so you should only use resource clipping for protected resources. The default resource clipping settings can be found in the Resource Clipping tab of the Robot Configuration Window:

130

ROBOMAKER USER'S GUIDE

These default settings will be used by all clip branches that have not been configured to use individual resource clipping settings. To enable resource clipping for all resources, choose All. To enable resource clipping for selected resources only, choose Resources Matching these Rules, and adjust the default rules to cover the particular resources. To configure resource clipping individually for a specific clip branch, go to the Clip step of the branch. In the Resource Clipping tab, select Specify, and configure the resource clipping there.

Configuring the Clipped User Actions


As mentioned earlier, when the portal user interacts with a clipping portlet, that interaction is either handled locally in the users browser, or captured by the clipping portlet and forwarded to the clipping robot for execution. You can configure which user actions should be handled locally in the browser and which should be captured and forwarded to the robot. The default settings for this can be found in the User Actions tab of the Robot Configuration Window:

These default settings will be used in all clip branches that have not been configured with individual settings. To configure a clip branch with individual settings, go to the Clip step of that branch. In the User Actions tab, choose Specify, and configure the settings there. By default, all high-level user actions, such as clicking and submitting forms, will be captured and forwarded to the robot. Low-level user actions, such as

HOW TO CLIP

131

moving the mouse or entering characters on the keyboard, will be handled locally in the portal users browser. These default settings reflect that low-level user actions typically occur in rapid succession and require quick feedback to the user, so it is usually not desirable to trigger robot executions for such actions. On the other hand, handling these actions locally in the users browser means that no page loading, JavaScript, etc. can be triggered by these actions, since this requires a robot execution. So, for example, if the user enters something in a text field, the text will be entered, but no JavaScript will be triggered for the individual key presses, even if there are JavaScript event handlers registered for the individual key presses in the text field. If you are clipping from a site where it is important to trigger JavaScript for specific low-level user actions, try enabling robot execution for these actions. For example, if you are clipping from a site that has JavaScript-based menus, and these menus do not work correctly with the default settings, try enabling robot execution for some of the mouse actions, such as the Move Mouse To and Move Mouse From actions. An alternative approach for such cases is to rewrite the JavaScript in the pages to not be dependent on low-level user actions. This can be done on-thefly as part of the clipping, without affecting the original web site, using the principles described in the section Modifying Clips earlier in this chapter. The Portlet View has a special view mode that is useful when you want to see which user actions will trigger a robot execution. To switch to this view mode, icon in the Portlet View toolbar. This will show green boxes around click the the elements on the page for which the user actions will trigger robot execution, and red boxes around the ones that will be handled locally in the icon. users browser. To switch back to the normal view mode, click the

Passing Additional Information to a Clipping Robot


You can pass additional information to a clipping robot from the clipping portlet. For example, you can pass selected user preferences to the robot for use in automatic navigation sequences and pre-filling of forms.

132

ROBOMAKER USER'S GUIDE

The additional information is passed to the robot as name-value-pair properties in the ClipRequest.properties attribute:

When you generate the clipping portlet from the robot, you can configure where to obtain the properties from in the clipping portlet, such as the user preferences configured for the portlet. See the online documentation for the portlet generation wizards. In the clipping robot, you can retrieve the properties from the ClipRequest.properties attribute using the Get Property data converter. Here is an example of an Enter Text step that retrieves a property and enters the value into a text field:

HOW TO CLIP

133

When you are working with the robot in RoboMaker, you can enter test values for the properties in the ClipRequest.properties attribute in the Input Objects tab of the Objects View. Remember to click Apply after editing the properties.

Deploying a Clipping Robot


In this section we will explain how to deploy a clipping robot as a portlet.

Generating the Clipping Portlet


When your clipping robot is ready to be deployed as a portlet, you can generate the clipping portlet automatically using one of the code generation wizards in RoboMaker. In the Clip menu in the Main Window of RoboMaker, select the wizard that corresponds to the portal that you want to deploy to. For example, if you want to generate a standard Java Portlet that complies with the JSR-168 standard, select the Java Portlet Clip Wizard in the Standard Java menu:

Follow the instructions in the wizard. Refer to the online documentation for the wizards, or the Code Generation Guide, for help on this. Note that you can choose where the clipping portlet should obtain the robot library containing the clipping robot from. This is done in the Robot Library property of the Deployment page of the wizard. If you want the robot library to be included in the clipping portlet, so that the clipping portlet is selfcontained, select Embedded in Request. During development of the robot, it can be useful to select Default Robot Library instead, so that the default robot library is used instead. The default robot library is the library in the current project of the installation, i.e. the project that you are currently working on. With this selection, the clipping robot will be loaded from your current project before each robot execution. This means that any changes you make to the robot will take effect immediately in the portlet. Thus, you do not have to re-generate the portlet for changes in the robot to take effect. Note that this applies only if you are running against a RoboServer on your local machine.

134

ROBOMAKER USER'S GUIDE

Handling Clipping Sessions on RoboServer


When a clipping portlet is accessed by a user, a new clipping session will be created on one of the available RoboServers. That RoboServer will then continue to handle that users interaction with the clipping portlet, for the duration of the clipping session. Clipping sessions reside in memory on the RoboServers. Therefore, it may be necessary to adjust how many active clipping sessions are allowed on each RoboServer. You can do this using the maxClippingSessions parameter to RoboServer. See the RoboServer Users Guide for more on this. You can also specify various timeout values for the clipping sessions. This is done individually for each clipping robot, in the Robot Configuration Window:

In the Session Timeout property, you can specify the basic timeout of the clipping sessions created for this robot. When a clipping session has been inactive for the specified period of time (in minutes), the clipping session will be ended automatically by RoboServer. If the user interacts with the clipping portlet after this, a new clipping session will be created. In the Allow Session Termination after property, you can specify whether it is allowed for RoboServer to end a clipping session early when it needs to make space for a new clipping session. If you leave the property empty, RoboServer will never end a clipping session early, i.e. before the timeout specified in the Session Timeout property. If you specify a value in the property, this means that RoboServer is allowed to end a session early if the session has been inactive for at least the specified period, and RoboServer has reached its maximum allowed number of clipping sessions and needs to create a new clipping session.

HOW TO HANDLE ERRORS

135

How to Handle Errors


A step in a robot may generate an error when it is executed. For example, this will happen if the tag finders cannot find the tag to work on, or if the step action generates an error. The default behavior of a robot is to report the error immediately, and to abort the execution of the steps beyond the one that failed. However, by configuring the error handling properties of the steps in the robot, you can change this behavior. For example, you can make the robot skip a step that generates an error, or you can make it try alternative branches. In this chapter, you will learn how to do this. Before we start, note that the error handling behavior that we describe here applies to runtime execution of a robot (i.e. execution in RoboRunner, RoboServer or RoboDebugger), not to the execution in the Main Window of RoboMaker. In the Main Window of RoboMaker, an error is normally reported immediately, and the execution of the subsequent steps is aborted. One exception to this is when the Ignore and Go to Next Step option in the Own Errors property is used (see below), in which case RoboMaker does ignore the error and executes the next step, just as it would during runtime execution.

Handling a Steps Own Errors


When a step generates an error, this is referred to as the steps own error. Such an error is handled according to the Own Errors property of the step (found in the Error Handling tab in the Step View). This property has four options: Report Here Send Backwards Ignore and Go to Next Step Ignore and Skip Branch

136

ROBOMAKER USER'S GUIDE

The first option, Report Here, is the default one. It causes the error to be reported immediately, and the execution of the steps beyond the given step to be aborted. For example, consider this robot:
Generates Error

Assume that an error occurs in step B. Since it has the default Report Here option selected (as indicated by the absence of an icon in the step), an error report will be generated immediately, and steps C and D will not be executed. The error report will specify that it was generated at the location of step B, and it will contain a single error message describing the error and the location where it occurred (also at step B). The second option for the Own Errors property is Send Backwards. This option sends the error backwards to the preceding step in the robot, without executing the steps beyond the step that failed. What happens to the error in the preceding step depends on how that step has been configured to handle received errors. This will be explained in the next section. The third option for the Own Errors property is Ignore and Go to Next Step. This option causes the error to be ignored and the execution to proceed with the next steps after the one that failed. In other words, the step that failed is simply skipped. Take a look at the robot below:
Generates Error

Here, again, step B generates an error. However, the Ignore and Go to Next icon. This Step option has been selected for the step, as indicated by the causes the error to be ignored, and the execution to continue with steps C and D. These steps will both be given the same input robot state as was given to step B. The Ignore and Go to Next Step option is useful if you have a step that will succeed only in some cases, and which should simply be skipped in the cases where it fails. Note that the Ignore and Go to Next Step option is not allowed if the step has a loop action.

HOW TO HANDLE ERRORS

137

The fourth option for the Own Errors property is Ignore and Skip Branch. This option causes the error to be ignored and the execution of the steps beyond the given step to be aborted. In other words, the step that failed and all steps following it are simply skipped. Please see the chapter How to Loop Through Pages for an example of how to use this option.

Handling a Steps Received Errors


When a step receives one or more errors that have been sent backwards from a subsequent step in the robot, these errors are called received errors. Such errors are handled according to the Received Errors property of the step. There are two possible options: Report Here Send Backwards To see what these options do, take a look at this robot:
Reports Error Generates Error

Here, step D generates an error and has been configured to send its own errors backwards, as indicated by the icon. Therefore, the error is sent backwards to step C, without executing step E. In step C, the Received Errors property has been set to Send Backwards, as indicated by the icon. This means that received errors are simply sent further backwards to the preceding step, in this case step B. The Received Errors property of step B has been set to Report Here (as indicated by the absence of an icon). This means that received errors are reported at this point. The generated error report will specify that it was created at the location of step B, and will contain an error message specifying that the error was generated at the location of step D. After reporting the error, execution will proceed to the next branch or iteration that is to be executed, i.e. no execution of step B or the subsequent steps is done. In the example shown here, sending back the error serves no real purpose. It simply changes the location where the error is reported. However, sending back errors becomes useful if you combine it with the branching mode called Until Successful Branch.

138

ROBOMAKER USER'S GUIDE

Using the Until Successful Branch Branching Mode


Consider the robot below:
Generates Error

In this robot, we have used the Until Successful Branch branching mode in step B. This is indicated by the dashed connections from the step. In this branching mode, the branches will be executed one at a time until one of them is successful. Successful means that the branch does not send any errors backwards. In this example, step D generates an error, and the steps have been configured to send this error backwards to step B. This means that the branch is considered to have failed, and the second branch is executed, according to the branching mode. In the second branch, step G generates an error, which is also sent backwards to step B. Since this branch was also unsuccessful, the third branch is executed. This branch sends no errors backwards, and is therefore considered successful. Because of the branching mode, no more branches are then executed, i.e. the fourth branch is not executed. If the third branch had sent back an error, too, the fourth branch would have been executed. If the fourth branch had also sent back an error, all branches would have been considered to have failed. In this case, the errors that would have been collected at step B would then have been handled according to the Received Errors property of step B. In this example, step B has been configured to report its received errors. So, an error report containing the four received errors would have been generated, and no more execution would have been done. If step B had instead been configured to send its received errors backwards, all four errors would have been sent backwards to step A. Thus, more than one error can be sent backwards at a time. As you may have guessed, the Until Successful Branch branching mode is useful if you want a robot to try more than one approach to achieving something. Add a branch for each approach. The robot will then try the approaches one at a time until one of them succeeds. If all approaches fail, you can report the errors from all approaches. You can then examine the error

HOW TO HANDLE ERRORS

139

report to figure out why none of the approaches worked. In some cases, you want to ignore the errors when all approaches fail. This can be achieved by adding an extra branch as shown below.

The extra branch has a single step containing the action named Do Nothing. As the name suggests, this action does nothing, so, in the example above, the extra branch will always execute successfully without sending back any errors. Therefore, the errors that may occur in the preceding branches will be discarded. Instead of the Do Nothing action, you can use other actions that do not generate errors. For example, you can use the Write Log action if you want an entry in the log in case all the preceding branches generated errors. Using the Until Successful Branch branching mode can be rather complex. In many cases, it is easier to use the default All Branches branching mode, and then put a step with a conditional action in front of every branch, to determine when that branch should be executed. However, the Until Successful Branch mode is useful in the cases where it is difficult or impossible to use a conditional action to determine when a particular branch should be executed.

More Examples of Using Until Successful Branch


Now, let us look at a few more examples of using the Until Successful Branch branching mode. First, consider this robot:

In this case, we use the Until Successful Branch branching to try three different approaches to something. The interesting thing is that the three

140

ROBOMAKER USER'S GUIDE

branches join together at the end, to share the steps that they all have in common. You can join branches as much as you like, and you can even send back errors from the common steps if you want to. Another common example is shown below:

Here, we use a branch to jump past three steps if one of them fails. This is useful if you want to skip more than one step in the case of an error. If you want to skip just one step, use the Ignore and Go to Next Step option in Own Errors for that step.

Viewing the Error Handling in the Robot View


As the default, the error handling configuration of a robot will be shown in the Robot View of RoboMaker and RoboDebugger. To hide the error handling configuration, you can uncheck the Show Error Handling checkbox in the Robot View Options submenu, which can be found in the View menu in RoboMaker and RoboDebugger.

HOW TO WRITE A ROBOT WITH INPUT OBJECTS

141

How to Write a Robot with Input Objects


Robots taking input objects are probably the most advanced kinds of robots that you can write in RoboMaker because the requirements of these robots are often quite strict. Generally, they should be speedy and highly reliable. (These requirements might, of course, apply to all robots; however, typically robots taking input objects are executed in real-time whereas normal extraction robots are executed in batch; also, extracting news objects incorrectly is typically less critical than filling out a bank transfer incorrectly.) Let us look at the speed and reliability requirements in turn. The speed requirement can be fulfilled through optimization. In RoboMaker, the step actions involved with navigation, i.e. Click and similar actions, consume by far the larger part of the total execution time (80% or more). Hence, you should avoid any needless navigation whenever possible. One way to eliminate needless navigation is by linking directly to the relevant information. For example, instead of navigating to a login form from the front page you should load the login form directly. Sometimes, such direct navigation is not possible because of sessions and cookies, but often it is. The reliability requirement involves writing the robot in such a way that it either carries out its task smoothly, or reports that it cannot. This is not as easy as it may sound. In fact, there are situations in which a robot cannot possibly know whether it succeeded or not. For example, if the robot submits an order and the web site does not transmit a response, then what happened? (You have probably tried something similar yourself in a normal browser.) However, a robot should be written so that it minimizes such uncertainties whenever possible. In more concrete terms, each time the robot interacts with some uncertain element (such as a web site), it should analyze the interaction thoroughly in order to detect possible errors. If an error is detected, then it should be reported, usually in the form of a returned object that somehow tells the application that invoked the robots what went wrong. If no errors are detected, then the robot can proceed to its next action. See the chapter How to Make Robots More Robust for more information. When writing robots taking input objects, you use, more or less, the same step actions and data converters as for any other kind of robot. However, some step actions and data converters are especially useful with robots taking input objects. You should check the RoboHelp online entries on the following:

The Convert Attributes action for converting attribute values extracted from a web site, or converting input object attribute values before being inserted into a form. The Test Attributes action for testing an attribute value, e.g. an input object attribute value, according to one or more conditions, such as "price < 5000".

142

ROBOMAKER USER'S GUIDE

The Get Attribute data converter for fetching an attribute value for subsequent processing or insertion into a form input field. The Convert Using List data converter for converting content. This data converter is useful for normalizing input object attribute values for insertion into a web site form, or normalizing content extracted from a web site.

HOW TO MAKE ROBOTS MORE ROBUST

143

How to Make Robots More Robust


Web sites often change without notice. Such changes may result in the robot failing to do its task, unless you are careful. Robustness is the term used to describe how well robots cope with web site changes. The more changes the robot can deal with (and still work correctly), the more robust it is. Robustness, however, comes at a price. It is more challenging and timeconsuming to write robust robot than writing shaky robots. (The same goes for writing a program in any programming language.) It involves analyzing the web site in question, and to understand how it responds in various situations, such as when a registration form is filled out incorrectly. In a sense, writing robust robots involves a kind of reverse engineering of the web site logic, and usually the only way to do this is through exploration. There are two different approaches to robustness that each serves a different purpose:

Succeed as much as possible. Fail if not perfect.

Let us look at each approach in turn.


Succeeding as much as possible might, for a robot extracting news objects, mean that it should extract as many news items as possible, as well as possible. In RoboMaker, you will use conditional actions, branches, and data converters to deal with different layouts, missing information, and strangely formatted content. Failing when things are not perfect might, for an order submission robot, mean that it should fail immediately if it cannot figure out how to enter a field correctly, or the order result page does not match an exact layout. In this sense, failing does not mean to generate an error report. Instead, it means that the robot should return an object dedicated to describing errors and failure causes. Robots taking input objects will often choose to fail, rather than to succeed as much as possible. In RoboMaker, you will use dedicated error objects, error handling, and conditional actions to detect and handle unexpected situations.

For more information on RoboMaker techniques that can be used to make robots more robust, you should consult the following chapters: How to Extract Content, How to Extract Content From a Table, How to Handle Errors, and How to Use the Tag Finders.

144

ROBOMAKER USER'S GUIDE

How to Reuse Sessions


A session is the result of browsing on a website, and consists of the page, the page URL and the cookies and authentications obtained in the course. However, obtaining a session where the wanted information is easily reached can require a number of navigation steps such as logging in. If a robot is run frequently enough, and the response time needs to be very small, getting to a suitable session in the robot can require more time than is available. However, if the session could be obtained once, and then shared between robots and robot runs then great time savings could be achieved. This is exactly what session reuse provides by the means of a session pool. There are two step actions used for session reuse:

The Save Session action, which saves a session in the session pool or an attribute. The Restore Session action, which restores a session from the session pool or an attribute.

In order to restore a session from the session pool, it is necessary to identify it. A session is identified by a site name, a username and a password. Let's look at the robot for a website that requires that you log in. We want to share the session of a logged-in user. The robot would look something like this:

When the robot is run, it will first ask for a session from the session pool, and if one exists with the given identification parameters, that session will be used. If no session with the given parameters exists, the step will fail, and the second branch will be executed, which does the logging in by actually going through the necessary web pages, and finally stores the obtained session so that other robot runs can make use of it. After a session has been obtained, a page should be loaded, and on that page some conditional action should be applied to see that the session is truly still active. The conditional action should be set to generate an error when stopping, and these errors should be sent back, so the second branch is used if a session obtained from the session pool has become inactive. The session pool optimization need not be used only for logins, but could also be used for long navigations that are necessary to obtain certain cookies, or other time-consuming tasks. A robot should normally never rely on a session being available, but always provide a fallback for obtaining a session. In RoboMaker it is important to understand a little about the inner workings of the session pool if you want to utilize it. This is because the execution of a

HOW TO REUSE SESSIONS

145

robot in RoboMaker is not controlled by the natural flow of a robot run, but by the user interaction. First a session should be stored by executing the step containing the Save Session action. Selecting the step following the Save Session step does this. After this the Restore Session action will be able to pick up the stored session. Saved sessions will remain in the session pool even icon, or after loading different if you refresh the cache by clicking the robots. There is currently no way to remove a session from the session pool besides restarting RoboMaker.

146

ROBOMAKER USER'S GUIDE

How to Debug a Robot


In this chapter, we will take a closer look at how to debug a robot using RoboDebugger, the debugger that is built into RoboMaker. RoboDebugger allows you to execute a robot in runtime mode, i.e. in the same way as it will be executed by RoboRunner or RoboServer. This allows you to check that the robot does what you expect.

Basic Debugging
To open RoboDebugger, click the icon in RoboMaker. This opens the RoboDebugger Main Window, which is shown below. RoboDebugger always works on the current robot in RoboMaker. To start debugging the robot, click icon. the

Figure 38: The RoboDebugger Main Window

As the robot is being executed in RoboDebugger, you can watch the current location in the Robot View of RoboDebugger. You can also watch the results of the execution in the main panel. In the Input/Output tab, the Input panel shows the input objects, if any, and the Output panel shows all objects that have been returned so far during the execution. If the robot has no input objects, the Input panel is not shown. In the Error Reports tab, you can

HOW TO DEBUG A ROBOT

147

see all error reports that have been generated so far during the execution. In the Log tab, you can see what has been written to the log so far during execution. In the State tab, you can see the robot state, if any. Also, in the Summary panel to the right of the main panel, you can see a summary of the execution, containing the number of returned objects and the number of error reports generated. It is important to understand that RoboDebugger performs its own execution of the robot, independently of the execution done in RoboMaker. Therefore, RoboDebugger has its own current step and its own current robot state, independent of the current step and current robot state in RoboMaker. In RoboDebugger, the current step is the step that is about to be executed, or is being executed, in the debugging process, and the current robot state is the input to that step. icon. You can also You can stop the debugging at any time by clicking the make the debugging stop when certain events occur. This is done in the Stop When panel. Here, you can choose whether the debugging should stop when objects are returned, when errors are reported, and when breakpoints (see below) are reached. Of course, debugging will always stop when the execution of the robot has completed. When debugging has stopped, you can see the reason for the stop in the status bar at the bottom of the RoboDebugger window. If the debugging has stopped before the execution of the robot is complete, you can watch the current robot state in the State tab. The Objects, Windows, Cookies, and Authentications sub-tabs show the robot state in the same way as in the State View in RoboMaker. The Global Variables sub-tab shows the global variables, if any. The Error sub-tab shows the error report, if the execution stopped because an error report was generated. If debugging has stopped before the execution of the robot is complete, you icon. You can also restart the can resume the debugging by clicking the icon. This will abort the current debugging debugging by clicking the process and make RoboDebugger ready to start a new debug from the start of the robot. The debugging is also restarted automatically whenever the current robot is modified or replaced by another robot in RoboMaker. If the robot has input objects, the input values of these can be edited in the Input panel, and when you press Enter, the debugging will be restarted with the new input values. The input values cannot be edited while a debug is running, so if you want to change the input values, you must first restart the debugging. If you have a really big or long-running robot, you may want to uncheck the Show Location During Execution option in the Robot View Options submenu of the View menu. This will cause the current location not to be shown in the Robot View during the execution, which will speed up the execution.

148

ROBOMAKER USER'S GUIDE

Debugging from the Current Location in RoboMaker


You can start debugging from the current location in RoboMaker by clicking icon in RoboMaker. This will open RoboDebugger and make it execute the as directly as possible to the current location in RoboMaker. When the location is reached, the execution will stop. You can then start the debugging from this icon. location by clicking the This feature is useful if you just want to debug a part of the robot, like a specific branch or a specific iteration of a loop action. icon, If RoboDebugger is already debugging the robot when you press the it will have to restart the debugging before it executes to the location, and it will ask you for permission to do this.

Making RoboMaker Go to a Location from RoboDebugger


When debugging has stopped at some location in the robot, you can make RoboMaker go to that location by clicking the icon in RoboDebugger. This allows you to examine that location closer in RoboMaker, and perhaps modify the steps around that location, or modify some other part of the robot. You can also make RoboMaker go to the location where an object was returned. To do this, select the object in the Output panel in the Input/Output tab and click the Goto button in the lower right corner of the tab. This is useful if an object has not been extracted correctly, and you want to find out why. You can also make RoboMaker go to the location where an error report was generated, or where an error occurred. When you view an error report in the Error Reports tab or the Error Report sub-tab in the State tab, you can click the Goto button in the upper right corner of the error report to go to the location where the report was generated. You can also click on a Goto button to the right of a specific error to go to the location where that error occurred. This is, of course, very useful when you want to find the reason for the error and fix the problem. When you have made RoboMaker go to a location, and have done what you want in RoboMaker, you can resume the debugging. If you havent modified icon in RoboDebugger. If you have the robot, you can simply click the modified the robot, the debugging will have been automatically restarted, so you cannot resume it directly. Instead, you can start a new debugging session icon in RoboMaker. from the current location in RoboMaker by clicking the

HOW TO DEBUG A ROBOT

149

Using Breakpoints
You can make RoboDebugger stop at a specific step in the robot by setting a breakpoint on that step. The easiest way to do this is to right-click on the step in the Robot View and select Toggle Breakpoint in the pop-up menu. icon in the step. The breakpoint will be indicated by a small When RoboDebugger reaches a breakpoint during debugging, it will stop, unless you have chosen not to stop at breakpoints in the Stop When panel. You can resume the debugging by clicking the icon. You can remove the breakpoint from a step by right-clicking on it and selecting Toggle Breakpoints. If you select one or more steps, you can remove all breakpoints of these steps by selecting Remove Breakpoints. You icon. can also remove all breakpoints in the robot by clicking the

Single-Stepping
You can make RoboDebugger execute one step at a time. This is called single-stepping. It is useful if you want to examine the execution very closely. You can single-step when RoboDebugger is ready to start a new debug, or when it has stopped during a debug. To execute the next step, click the icon. RoboDebugger will then execute that step and stop. You can then click the icon again to execute the next step, and so on. At any step, you can icon. also resume normal execution by clicking the

Using Environments
RoboDebugger includes a feature for running a robot using environments. Generally, the environments determine how returned objects are stored, how messages (including error reports) are processed and stored, etc. This section will only describe how you can use the environments to run a robot that generates a file containing the returned objects. The returned objects will be stored in either CSV or XML format. If you wish to learn more about environments, you should consult the RoboRunner User's Guide, and the RoboHelp online entry on environments.

150

ROBOMAKER USER'S GUIDE

To output the returned objects to a file, click the icon in RoboDebugger to enable the use of environments. (You can click it again to disable the use of environments.) Click the icon to configure the environments. This opens the Configure Environments window as shown below:

Select the "File Storage Environment", and click the icon to configure it. This opens the File Storage Environment Configuration window as shown below:

For the File Name property, enter the name of the file that the returned objects should be stored in. For the File Format property, select the format in

HOW TO DEBUG A ROBOT

151

which to store the returned objects. When done, click "OK" to return to the RoboDebugger Main Window. To start the debugging process, click the icon. When the debugging process completes, the file you specified above has been created. Note that you cannot change the current environment settings during a debug. To change the settings, you need to restart the debug first by clicking icon. the

152

ROBOMAKER USER'S GUIDE

How to Use the Browser Tracer


Sometimes things do not behave as expected and it can be difficult to figure out just what is going on on a complex website. The Browser Tracer can assist you in this. It can trace JavaScript execution and HTTP traffic in RoboMaker and external browsers and compare these traces side-by-side so that differences can easily be identified. The Browser Tracer is available from the Tools menu in RoboMaker.

Setting Up a Browser
A browser, such as Internet Explorer, can be traced by setting it up to use a special proxy server which is built into RoboMaker and started when RoboMaker starts. This proxy server typically runs on port 9999, but if you start multiple instances of RoboMaker, additional instances will use different ports. You can see the exact port number in the Browser Tracer window. In Internet Explorer, setup the proxy server by opening Internet Options and choosing LAN Settings from the Content tab. Enable "Use a proxy server for your LAN" and type "localhost" in the Address field, and 9999 in the Port field. You should also clear the browser's cache because cached JavaScript files cannot be traced.

Tracing
To record a trace for either RoboMaker or a browser connected through the icon for the source you want to trace. Browser Tracers proxy, click the While recording, things may run much slower than normal since vast amounts of data is collected. Thus, you should make sure to disable recording by icon again once you have traced what you wanted. clicking the In a typical tracing scenario you would do the following: 1. Enable trace recording for RoboMaker. 2. Execute the step action in RoboMaker that you are interested in, say, a Load Page. 3. Disable trace recording for RoboMaker. 4. Enable trace recording for the proxy. 5. Perform the exact same actions in your browser, say, load a page. 6. Disable trace recording for the proxy. Now, you have produced two traces which you can compare side-by-side in the difference view.

HOW TO USE THE BROWSER TRACER

153

The Difference View


In the difference view two traces are shown side-by-side. Conflicts are highlighted with a blue color, trace entries that only occur in the left view are gray, and entries that only occur in the right view are green.

Figure 39: The Browser Tracer Window

JavaScript Trace
Below each JavaScript trace, the JavaScript source code for the currently selected trace entry is shown. When a trace entry is selected, the corresponding source code line is highlighted in the source view. The trace entry is the runtime result of the execution of the highlighted source code line. Each source code line may, of course, be executed multiple times, in which case multiple trace entries are produced - all corresponding to the same source code line. Stepping through trace entries can help you understand how a piece of JavaScript code works.

HTTP Trace
The HTTP trace shows HTTP traffic. Selecting a trace entry shows the details about that HTTP event in the detail view below the trace. The detail view

154

ROBOMAKER USER'S GUIDE

shows the request and response headers, as well as the request and response data sent. Normally, only POST requests will contain request data.

Saving and Loading Trace Sessions


Trace sessions can be saved and loaded at a later time. A trace session contains both the RoboMaker trace and the proxy trace, and both JavaScript and HTTP traces. Saving a trace session can be useful if it is large and you want to look at it in detail at a later time, or if you want to mail it to someone else. Note, that bug reports submitted from RoboMaker will automatically contain the current trace session, if any.

INDEX

155

Index
A
actions. See step actions attributes, 3 authentications, 5 Authentications View, 15 clip branches, 98, 104 clip condition, 98, 110 clipping session, 96, 103, 135 ClipRequest object, 97 ClipResponse object, 97 creating a clipping robot, 100 default clip branch, 98 deployment, 134 editing a clip branch, 107 End Session command, 97 form login, 124 hiding tags, 119 HTTP login, 124 layout changes, 116 login, 123 logout, 126 modifying clips, 113 modifying pages, 119 overview, 95 passing additional information, 133 popup windows, 121 portlet, 95, 134 Portlet View, 102 protected resources, 130 resource clipping, 130 restricting links, 129 RoboServer, 95, 135 selecting tags to clip, 114 single-sign-on, 123, 127 Test Clip action, 98, 110 Test Clip Command action, 99 Test Default Clip action, 98 adding a clip branch, 104 automatic login, 123 automatic navigation, 99 automatic navigation sequence, 128 Begin Clip action, 97 Begin Session command, 96 Clip action, 98

B
branching, 7 All Branches mode, 7, 141 branching mode, 7 Until Successful Branch mode, 140 breakpoints, 151 Browser Tracer, 154 browser setup, 154 difference view, 155 http, 156 javascript, 155 saving and loading, 156 tracing, 154 Browser View, 12

C
clipping, 95

156 user actions, 96, 131 username and password, 125 using another clip branch, 108 windows and frames, 121 clipping robots, 95 creating, 100 deploying, 134 moving around in, 102 structure of, 97 conditional actions, 6, 7 tutorial, 40 conditional actions, 29 connections, 6 adding new, 12, 42 removing, 42 converters. See data converters cookies, 5 Cookies View, 15 current iteration, 5 current robot project, 28 current step, 12 Step View, 15 current tags, 14 current window, 5, 12

ROBOMAKER USER'S GUIDE single-stepping, 151

E
environments, 151 errors, 8 error handling, 8, 137 error reports, 9 own errors, 137 received errors, 139 execution path, 43 expressions, 24 Expression Editor Window, 26 extraction, 29, 87 from tables, 93 of binary data, 90 of clips (stand-alone), 89 of range, 92 of text, 88, 91, 92 using patterns, 23

F
fields, 73, (See also forms) forms, 46, 71 basics, 71 choosing a step action, 74 default values of fields, 73 field groups, 79 field value assignments, 76 fields, 73 looping through, 78 simple submission, 71 submit buttons, 74 tutorial, 46 uploading files, 82 using pop-up menu, 83 value lists, 81

D
data converters, 5, 20 chaining, 20 debugging, 148 breakpoints, 151 environments, 151 from current location in RoboMaker, 150 making RoboMaker go to a location, 150 RoboDebugger Main Window, 17, 148

INDEX found tags, 14

157

P
page Page Views, 12 page loading looping through pages, 84 patterns, 20 escaping, 21 operators, 22 Pattern Editor Window, 23 special symbols, 20 subpatterns, 21 Portlet View, 102 projects, 4, 27

I
initial values, 17, 66 input values, 17, 66

J
JavaScript JavaScript Source View, 13

L
libraries, 4, 27 location, 9 location code, 9 looping loop actions, 5 through forms, 78 through pages, 84

R
returned objects, 3, 30 RoboDebugger Main Window, 17, 148 RoboMaker, 1, 3 RoboMaker Main Window, 11 robot id, 63 robot id, 63 robot libraries, 4, 27 robot library files, 29 robot projects, 4, 27 current robot project, 28 robot state, 4 authentications, 5 cookies, 5 current robot state, 12 objects, 3 refreshing, 36 Robot State View, 12 windows, 4 Robot State View, 12

N
navigation, 29

O
objects, 3 attributes, 3 configuration, 65 input objects, 3, 17, 143 input values, 17, 65 Objects View, 16 output objects, 17 initial values, 17, 66 returned objects, 3, 30 Objects View, 16

158 Robot View, 11 robots, 3 clipping, 95 configuration, 63 editing, 19 execution, 6 navigation, 19 Robot Configuration Window, 63 robustness, 145 structure, 29 name, 5

ROBOMAKER USER'S GUIDE

tag finders. See tag finders valid, 6 subpatterns, 21

T
tables extracting from, 93 tag finders, 5, 67 tag path, 68 Tag Path View, 12 tags current, 14 found, 14 Tree View, 12

S
Source View, 12 step actions, 5, 20 conditional. See conditional actions selecting, 15 Step Action Selection Guide, 15 steps, 5 actions, 5 actions, conditional. See conditional actions actions, selecting, 15 connections between. See connections current iteration, 5 current step. See current step invalid, 12

U
uploading files, 82

V
value lists, 81 value selector, 59, 76

W
windows, 4 current, 5, 12

Anda mungkin juga menyukai