RoboSuite Technical White Paper
A Technical Introduction to the World-Leading Web Integration Platform
Revision 2.0, March 2005 Kapow Technologies http://www.kapowtech.com
Copyright 2004 Kapow Technologies ApS. http://www.kapowtech.com All rights reserved.
ROBOSUITE TECHNICAL WHITE PAPER
iii
Contents
INTRODUCTION ..................................................................................................................... 1 The Challenges of Integration............................................................................................1 Types of Integration ........................................................................................................1 The RoboSuite Solution ....................................................................................................2 Benefits of the RoboSuite Approach....................................................................................3 SOLUTIONS .......................................................................................................................... 4 Web Clipping ..................................................................................................................4 Data Collection................................................................................................................6 Application Integration .....................................................................................................7 TECHNOLOGY ........................................................................................................................ 8 RoboSuite Overview.........................................................................................................8 RoboSuite Architecture ..................................................................................................8 Robot Basics ................................................................................................................9 Handling Changes in Web Interfaces................................................................................9 The Visual Development Environment............................................................................... 10 ModelMaker ............................................................................................................... 10 RoboMaker ................................................................................................................ 11 The Runtime Environment .............................................................................................. 13 RoboServer................................................................................................................ 13 RoboSuite Control Center............................................................................................. 14 RoboRunner ............................................................................................................... 14 RoboManager ............................................................................................................. 15 PLATFORM REQUIREMENTS .................................................................................................... 16 MORE INFORMATION ............................................................................................................ 17 ABOUT KAPOW TECHNOLOGIES ............................................................................................... 18
ROBOSUITE TECHNICAL WHITE PAPER
1
Introduction
The RoboSuite product from Kapow Technologies is a unique middleware platform for integration to web-enabled applications. This document will give a technical introduction to the RoboSuite platform, and explain the advantages that are gained over traditional integration methods.
The Challenges of Integration
Enterprises face increasingly serious IT integration problems, with numerous critical applications that are not designed to interact with each other. Enterprises also need connections to applications outside of the enterprise, such as applications of customers, suppliers, distributors, trading partners, or government agencies. The traditional application integration approach is to modify the applications to be integrated, adding new interfaces that expose the desired functionality and data. This approach results in expensive, long-running projects involving highly skilled people, high risk, high architectural impact, and considerable investments in new integration middleware such as messaging systems or object request brokers. For applications outside of the enterprise's realm, such modifications may not even be possible, or will require joint projects between enterprises. At the same time, enterprise IT organizations are pressed on budgets, with fixed or reduced resources. As a result, many important integration projects are never started. For the enterprise, the consequences are reduced productivity, increased operational costs, missed business opportunities, and reduced business agility. The industry efforts around web services promise to deliver a standard for programmatic interfaces to applications. This will potentially allow plug-and-play integration between applications, once web services have been created for them. However, this still leaves the problem of actually implementing these web services. Modifying an existing application to expose its functionality and data in a programmatic interface will still be a considerable task, whether the interface is designed as a web service, or using one of the traditional approaches, such as RPC, RMI, or messaging.
Types of Integration
Figure 1 is a generalized illustration of possible integration points into a given application. Conceptually, three types of connections can be made: connections at the presentation layer, at the functional layer, or at the data layer.
Application
Application
UI
Presentation
Web API
Database
Web Servic e
Function
Data
Integration Points
Figure 1: Application Integration Points The presentation layer is the user interface, either web-based or a platform-specific GUI or terminal interface. This user interface allows a human to interact with the application, and is not as such intended for use by other applications to connect. The functional layer is the most natural place for programmatic access to an application, and is done by integrating with an API or using web services. This will provide direct access to the
2
ROBOSUITE TECHNICAL WHITE PAPER
business logic of an application. In the cases where these integration points do not exist, they must be created. Integration at the data layer will typically be done with connections to one or more databases. This is an effective way to access data, but does not provide access to the business logic.
The RoboSuite Solution
Kapow Technologies provides a unique and innovative solution to the challenges of integrating and web service-enabling applications. With RoboSuite, integration can be done fast, iteratively, at low cost, non-intrusively, and with low skill requirements. The key to the RoboSuite solution is to exploit the fact that many applications already have an HTML-based web interface to their functionality and data. Although this web interface is intended for human users, Kapow's RoboSuite product can turn the web interface into a well-defined programmatic interface that exposes the full functionality and data of the application. This is illustrated in Figure 2.
Application
Web UI
Applicatio n
UI
Presentation
RoboSuite
Web API
Web Servic e
API
Data
Database
Web Servic e
Function
Data
Integration Points
Figure 2: RoboSuite Integration Points As the figure shows, RoboSuite accesses the presentation layer of an application through the web interface. Depending on the type of integration, RoboSuite can make one or more of the following integration points available: a modified web interface, an API, a web service, or just data. In terms of web services, RoboSuite can be used to easily web service-enable an application that has a web interface. The new integration points are created in an easy-to-use visual development environment, without changing a single line of code in the web interface or application. Using RoboSuite, other applications can access the full functionality and data of the application, as if the application itself had been changed to provide that interface. This can be applied in powerful solutions, such as: • • • • • • • Enterprise portals: Content and functionality of existing web-enabled applications can be used in an enterprise portal. Web Services: Any web-enabled application can be turned into a web service. Data Migration: Data from various sources in the enterprise can be migrated into a new context, such as a content management system. Supply-Chain Integration: Data and functionality from customers and suppliers can be integrated into supply-chain related applications. Data Aggregation: Services characterized by a large number of source web sites, such as price comparison sites and e-procurement sites, can be implemented efficiently. Market Intelligence: Information from competitors, media, patent applications, and so on, can be collected regularly for market intelligence purposes. Information Process Automation: In general, RoboSuite is very suitable for replacing human tasks involving an information flow between a range of web-enabled applications.
RoboSuite can be used as a stand-alone application suite, but is more typically deployed as middleware in a larger application framework. Any project that integrates to one or more webenabled applications can benefit from RoboSuite.
ROBOSUITE TECHNICAL WHITE PAPER
3
Benefits of the RoboSuite Approach
RoboSuite has a number of important benefits compared to traditional integration approaches: Low cost: Using RoboSuite, integration becomes orders-of-magnitude less expensive than traditional integration. Using RoboSuite, integration requires lower developer skills, can be done in significantly shorter time, and can be done iteratively. Non-intrusive integration: Using RoboSuite, integration is done without modifying the application to be integrated, thereby lowering the risk and impact of the integration project, eliminating the need for resource consuming architectural changes and refactoring, and avoiding cross-enterprise projects. Furthermore, RoboSuite is independent of and has no impact on the existing enterprise IT infrastructure. Because of its server-based architecture, RoboSuite can be used from any application almost regardless of platform and infrastructure, without requiring any architectural or infrastructure changes. Iterative/incremental integration: Using RoboSuite, integration can be done iteratively. The first integration can be up and running in a matter of days or weeks, and further integration can be done iteratively, as experiences and ROI from the first integrations are analyzed. This allows enterprises to try out new business opportunities with much lower commitment than using traditional methods. Time-to-market: Using RoboSuite, integration projects can be completed in weeks rather than months or years. Enterprises can gain competitive advantages by leveraging their enterprise applications much faster than competitors and increasing business agility. Low skill requirements: A traditional integration project requires highly skilled and scarcely available developers. Extensive knowledge of the applications to integrate and of enterprise application integration in general is required. With RoboSuite, the work that involves connecting to web-enabled applications will require developers with just basic programming experience and HTML knowledge. This will reduce, or even eliminate, the need for high-skill developers in the integration project.
4
ROBOSUITE TECHNICAL WHITE PAPER
Solutions
This section describes the three main solution areas that RoboSuite can be used for: • • • Web Clipping: Reusing the web interface of existing web-enabled applications in your portal. Data Collection: Collecting data from any web-enabled data source. Application Integration: Integrating functionality and data of any web-enabled application into your application.
Web Clipping
Web clipping is an integration method that reuses selected parts of an existing web interface in another web interface. This is done by clipping the HTML of the existing web interface and modifying the HTML for use in the new web interface. Web clipping is typically used for developing enterprise portals. A portal consists of portlets with real-time information and functionality from a variety of sources. Developing a portal typically involves significant integration tasks, because the portal requires integration to the functionality and data of other applications, some of which may not even be inside the realm of the enterprise. However, using RoboSuite, the web interfaces of existing web-enabled applications can easily be turned into portlets in the portal, by clipping the required segments of those web interfaces and reusing them in the portlets, possibly adapting them to match the visual style of the portal. No changes to the underlying applications or databases are required, no traditional integration work is required, and no new portlet-based user interfaces need to be implemented for the applications. Figure 3 shows a portal connected to two applications using RoboSuite. RoboSuite generates the portlet content dynamically by clipping the relevant content and functionality from the applications. This could be a stock chart from an external financial website, or selected functionality from a company’s web-enabled ERP system.
Web-enabled Applications Enterprise Portal
Portlets
RoboSuite
Figure 3: Web Clipping
ROBOSUITE TECHNICAL WHITE PAPER
5
A unique feature of RoboSuite is that it supports continuous clipping. This means that clipping can continue after the user interacts with a clip in a portlet. For example, assume that a search form is clipped and shown in a portlet. With continuous clipping, if the user fills in the form and submits it, the search result page can be clipped as well, and shown in the portlet. Without continuous clipping, the search result page would open up in a new browser window, causing the user to leave the portal. Thus, with continuous clipping, the user can stay inside the portal while interacting with clipped content. See Figure 4.
W eb-enabled Application Portal
Search OK
clipping
Search OK
user interaction
Results -------------------------------------------------------------------------------------------------
clipping
Results -------------------------------------------------------------------------------------------------
Figure 4: Continuous Clipping Using RoboSuite for clipping has a number of advantages for the portal developer: • Existing applications and external content can be made available in the portal without any programming, and without any modification to those applications, using RoboSuite’s visual development environment. The portal developer does not need to design a new user interface. If the portal has a different visual style than the existing applications, RoboSuite can modify the clipped segments to match that style. Clipping can be used as the first iteration of integration, to quickly get the new portal up and running. If a closer integration to some applications is subsequently required, the clippingbased portlets can be replaced by application integration-based portlets, also developed using RoboSuite.
•
•
6
ROBOSUITE TECHNICAL WHITE PAPER
Data Collection
Data collection is the process of extracting data from a number of different data sources. The data collection may involve data transformation and validation, because the source data models may not match the desired output data model. There are several scenarios where RoboSuite can be used for data collection purposes, including: • • • Data migration — implementing a content management system and feeding it with data from legacy web sites. Competitor tracking — gathering competitive product information, press releases, patent information, etc. from competitor web sites. Market places — gathering product information from a large number of websites, and implementing a price comparison function, or similar.
Data collection can be done on a one-off basis or on a continual basis. RoboSuite handles all the issues around data sources being temporarily inaccessible, data sources that change their web interface over time, data validation, data transformation, and so on. The result of the data collection can be made available in a number of formats, as shown in Figure 5. Typically, the data would be stored in a database, written in XML format to a file, or submitted to a content management system.
Data Sources
Collected Data
Database
RoboSuite
XML
CMS
Figure 5: Data Collection RoboSuite includes advanced support for data transformation and validation, as well as functionality that optimizes the data collection process when it is done regularly.
ROBOSUITE TECHNICAL WHITE PAPER
7
Application Integration
Application integration is about inter-connecting data and functionality between applications. Typically, an application connects with another application using an adapter, which is a component that translates the functions and data models between the two applications. Adapters may be written to connect two specific applications, or they may translate one application’s functions and data to an industry standard API, or to an API understood by an integration middleware component. RoboSuite acts as an adapter to any web-enabled application. Figure 6 shows an enterprise application, integration middleware, or similar, that connects to a number of applications using adapters. Applications A and B are connected with adapters that are specific to those applications and to the enterprise application. Application C has a web service interface, which connects directly to the enterprise application’s web service interface. Applications D, E and F have web interfaces, and therefore they can be connected with RoboSuite. RoboSuite acts as an adapter that connects to any web-enabled application via the web interface, and exposes the functionality and data of that application as an API or web service.
Adapter A
Adapter B
Applic ation A
Applic ation B
Web Service
Web UI
Web UI
Web UI
Enterprise Application Corporate Portal Integration Middleware CMS
Web Servic e Acc ess
Web Servic e
Applic ation C
Applic ation D
API
Applic ation E
RoboSuite
Applic ation F
Figure 6: Application Integration
Application integration is the most advanced form of integration using RoboSuite. It allows the developer full access to the functionality and data of the web-enabled applications. The definitions of how to interface the web-enabled applications are created in RoboSuite’s visual development environment, and programmatic access to applications is done through the RoboSuite APIs. RoboSuite APIs include a Java API, a JSP tag library, and a .NET API. Interfaces for specific application frameworks are also available, including BEA WebLogic Workshop and BEA WebLogic Portal. Using the BEA WebLogic Workshop interface, web services to web-enabled applications can be created easily. Additional interfaces will become available, or can be developed on request. Examples of application integration using RoboSuite include: • • • • Integration to enterprise applications such as CRM, ERP, legacy systems, etc. Integration to customer systems, suppliers or partners, such as order entry or shipment tracking systems. Account aggregation, such as building a uniform enterprise portal interface to multiple external web sites, e.g. e-procurement sites. Making existing web-enabled applications or portals available as portlets with re-engineered user interfaces, as opposed to just reusing the existing web interface by means of web clipping.
8
ROBOSUITE TECHNICAL WHITE PAPER
Technology
This section will provide an overview of the technical components of RoboSuite.
RoboSuite Overview
RoboSuite is a suite of applications and middleware components. It consists of a visual development environment and a runtime environment. Fundamental to RoboSuite is the concept of robots. A robot is a program that operates as an interface to a web-enabled application. A robot may be invoked as a result of a call to a RoboSuite API, and it will return one or more objects as output. It may also be scheduled to run regularly in order to perform data collection tasks. The RoboSuite platform is used to create, execute, and manage collections of robots, and to create a programmatic interface to the robots.
RoboSuite Architecture
An overview of the RoboSuite architecture is shown in Figure 7. It shows the RoboSuite runtime and development environments, the robots and object models, and the interfaces to external components.
Java .NET BEA WebLogic Workshop BEA WebLogic Portal
Code Ge nera tion Tools
XML
CSV
SQL
Java
JSP
.NET
files
database
RoboSuite APIs
Stora ge Env ironme nts
RoboSuite Control Cente r
RoboServ er
RoboRunner
RoboMaker
RoboManage r Robots a nd Obje ct Mode ls RoboSuite Runtime Env ironment
Mode lMake r RoboSuite Deve lopme nt Env ironme nt
We b-e nable d Applica tions
Figure 7: RoboSuite Architecture Robots and objects models are created in the visual development environment and executed in the runtime environment. A robot contains instructions on how to interface to one or more webenabled applications, and object models define the format for input and output from robots. Robots and object models are stored as XML definitions in files, along with other files of an integration project. The visual development environment consists of the RoboMaker and ModelMaker applications. RoboMaker is used for creating and editing robots, and ModelMaker is used for defining object models. Robots are executed in the runtime environment, using either RoboServer or RoboRunner. RoboServer is a server application that provides a number of interfaces for invoking robots through the RoboSuite APIs. RoboRunner is an application that runs robots in “batch mode” from the command line, typically outputting the results into a storage environment, such as a database or the file system.
ROBOSUITE TECHNICAL WHITE PAPER
9
The runtime environment also includes monitoring tools: RoboSuite Control Center is a client application used to monitor one or more RoboServers, and RoboManager is used to view the logs created by robots that are executed in either RoboServer or RoboRunner. The RoboSuite APIs are interfaces to RoboServer functions. A number of different APIs exist for different programming environments, including Java, JSP, and .NET. The code generation tools are code wizards that automatically create ready-to-use interfacing code for specific programming environments, including Java, .NET, BEA WebLogic Workshop, and BEA WebLogic Portal. Storage environments are used by RoboRunner and RoboServer to output data that have been collected by robots. This option is typically used in data collection solutions.
Robot Basics
A robot consists of a number of steps. Steps are connected with connections that control the program flow. A robot will typically start by loading a URL to a web-enabled application. As the steps are executed, the robot navigates the web interface, assigning values to object attributes, iterating through page structures, and so on. All useful robots will also have steps where objects are returned as output (and thus either passed back through the RoboSuite API, or written to a storage environment). Figure 8 shows an example of a simple robot. It is a data collection robot that extracts book information (title, price, author) from a web site. It will submit a form, and iterate through the resulting web pages.
Figure 8: Example Robot A step contains zero or more tag finders, a tag processor, and an error handling mechanism. The tag finders define the parts of the current page that should be processed by the tag processor. Tag finders either operate on the logical document structure, or use various types of pattern matching, or both. For example, the tag finder may be instructed to locate the second cell in the first row of the first table, or it may locate the cell with the style definition “price”, and so on. The tag processor operates on the tags identified by the tag finders, and performs the actual action of the step. A large number of tag processors are available, all designed to accommodate the variety of situations one encounters in a web interface. Using these building blocks, robots can be instructed to do any number of things as they navigate one or more web-enabled applications, including: • • • • • • • • • • Load pages, and follow links. Crawl and iterate over pages. Submit forms. Execute JavaScript. Iterate over page elements, such as links, paragraphs and table rows, etc. Do conditional execution based on content, object attributes, error situations, and so on. Extract and process data, including text, numbers, dates, images, and binary data. Clip and modify HTML. Return and process objects. ... and more
Handling Changes in Web Interfaces
Since a robot interacts directly with the web interface of the target application, the robot may be sensitive to changes in the structure or content of the web interface. RoboSuite includes a number of mechanisms to make robots robust against such changes or to detect changes and fail with an error. Furthermore, the RoboSuite runtime environment includes tools to monitor robots and notify administrators if problems occur.
10
ROBOSUITE TECHNICAL WHITE PAPER
The Visual Development Environment
The RoboSuite Visual Development Environment consists of two applications — ModelMaker and RoboMaker. They are used at development time to create and debug robots and object models.
ModelMaker
ModelMaker is used to create and edit object models. An object model is like a type definition in a programming language — it defines the structure of the objects that form the input and output of a robot.
Figure 9: ModelMaker An object model consists of one or more attribute definitions, each of which define an attribute name, type, and other information. A given robot will return (or store) objects defined by one or more object models. For example, a data collection robot for book information could return objects defined by the object model Book. Book would contain attributes such as title and author (short text types), price (number type), rating (integer), description (long text) and so on. This example is shown in the screenshot in Figure 9. In case the objects are stored in a database at runtime, the database will have a table definition matching the object model. ModelMaker can generate the SQL necessary to create the required tables in the database.
ROBOSUITE TECHNICAL WHITE PAPER
11
RoboMaker
RoboMaker is an easy-to-use visual environment for creating, editing, and debugging robots. RoboMaker includes wizards for creating robots that are used for web clipping, data collection and application integration.
Figure 10: RoboMaker RoboMaker interactively executes the robot as it is being written. This means that the user can always see the web page that the current step will operate on. Using point-and-click on the web page, the user can define the action to perform on the page, such as following a link, submitting a form, or extracting information into an object. In effect, creating a robot is similar to clicking through a site with a web browser.
12
ROBOSUITE TECHNICAL WHITE PAPER
RoboMaker also includes RoboDebugger for debugging robots, shown in Figure 11. In RoboDebugger, you can do everything you would expect from a debugger, including singlestepping, setting breakpoints, viewing the current execution state, and viewing the extracted objects and generated error messages. At any point in the execution, you can stop the robot, go to the RoboMaker window and modify the robot, and then resume debugging in RoboDebugger from that point or from the start.
Figure 11: RoboDebugger
ROBOSUITE TECHNICAL WHITE PAPER
13
The Runtime Environment
Web-enabled applications can be accessed through RoboSuite using either RoboServer or RoboRunner. These two applications are described in the following sections, along with the monitoring tools RoboSuite Control Center and RoboManager.
RoboServer
RoboServer is a server application for executing robots. There are a number of ways to interact with RoboServer, each method suitable for a given programming environment. The native API is RQL (Robot Query Language), but most applications will use one of the higher-level RoboSuite APIs. Currently, they include: • • • RoboSuite Java API: The API for Java. RoboSuite JSP Tag Library: A JSP tag library that provides an easy way to interface to RoboServer from web applications written using JSP. RoboSuite .NET API: The API for Microsoft .NET.
To minimize the programming effort involved in using these APIs, a number of code generation tools can be used to create ready-to-use code for specific APIs and programming environments, including these: • • • • Java: Automatic generation of Java classes that interface using the RoboSuite Java API. .NET: Automatic generation of C# classes that interface using the RoboSuite .NET API. BEA WebLogic Portal: Automatic generation of web clipping portlets for use in BEA WebLogic Portal. BEA WebLogic Workshop: Automatic generation of controls for BEA WebLogic Workshop, allowing robots to be accessed as controls.
The APIs may be invoked either synchronously or asynchronously. Synchronous calls return the result in the same function call (immediately or whenever it is available). Asynchronous calls return immediately, and the result is returned when it is available. RoboServer may be configured to write status information into the RoboManager database, which includes information about robots that are invoked, and error messages. This information is viewed using RoboManager. The status information can also be written to files, or sent as emails to the system administrator.
14
ROBOSUITE TECHNICAL WHITE PAPER
RoboSuite Control Center
RoboSuite Control Center is a visual client application that connects to RoboServer. It is used to control the functions of RoboServer, and to monitor the state of running robots.
Figure 12: RoboSuite Control Center
RoboRunner
RoboRunner is the RoboSuite application used for executing robots as a command-line invocation. In conjunction with a scheduler, such as crontab in Unix systems, RoboRunner can run robots in batch at specific time intervals. RoboRunner is used for data collection applications. RoboRunner is highly configurable, in terms of allocation of processing power and storage of extracted objects, timeout strategies, limits on extracted objects, etc. Resulting objects are written to a storage environment. Currently, the available storage environments include: • • SQL database: Stores objects in an SQL database. All major databases are supported. Files: Stores objects in files, for example as XML or CSV (comma separated values).
As with RoboServer, RoboRunner can store status information in the RoboManager database, or in files. It can also send emails to the system administrator in case of errors.
ROBOSUITE TECHNICAL WHITE PAPER
15
RoboManager
RoboManager is used to view status and error messages generated by robots executed using RoboServer and RoboRunner. Administrators can identify robots that are broken because of inaccessible sites, or because of significant web interface changes.
Figure 13: RoboManager
16
ROBOSUITE TECHNICAL WHITE PAPER
Platform Requirements
RoboSuite is available for the following platforms: • • • • • • • • • • • • Windows Linux Solaris HP-UX AIX Oracle IBM DB2 Microsoft SQL Server Sybase PointBase Server Solid MySQL
Additionally, the RoboSuite runtime environment alone is available for the following platforms:
RoboSuite supports the following databases:
ROBOSUITE TECHNICAL WHITE PAPER
17
More Information
Kapow Technologies web site: http://www.kapowtech.com Kapow Developer Connection web site: http://kdc.kapowtech.com RoboSuite BEA WebLogic Edition: http://bea.com/framework.jsp?CNT=index.htm&FP=/content/products/kapow Patent applications: PCT/DK00/00163, PCT/DK00/00429, PCT/DK00/00700
18
ROBOSUITE TECHNICAL WHITE PAPER
About Kapow Technologies
Profile: Founded in June 1998, Kapow Technologies is a world-leading supplier of software for integration to web-enabled applications. Kapow Technologies offers a wide variety of options to customers, ranging from web clipping and data collection to advanced application integration, all using the unique RoboSuite platform. Selected partners: BEA, IBM, Software AG, ATG, Autonomy Selected customers: Lycos, NATO, Danske Bank, NPD, APR Smartlogik, The Arlington Institute, TDC, BetBrain, The Danish National IT and Telecom Agency, Krak. Contact information: Kapow Technologies Dr. Neergaards Vej 5A DK-2970 Hørsholm Denmark Tel +45 70 33 10 00 Fax +45 70 33 10 01 http://www.kapowtech.com
Copyright 2003 Kapow Technologies ApS. http://www.kapowtech.com All rights reserved.