"XML Web Services Toxics Release Inventory"
XML Web Services: Toxics Release Inventory Brand Niemann ―XML Web Services Evangelist‖ Data Standards Branch January 12, 2002 Disclaimer: Any reference to or depiction of the commercial product of any vendor is for illustrative purposes only and does not constitute an endorsement by EPA or the trainer. 1 Overview • 1. Background • 2. National Database to FileMaker XML • 3. Web Pages and PDF to XML Documents • 4. Data Tables to XML Data Islands • 5. Some Future Steps • 6. Questions and Answers 2 1. Background • The Toxics Release Inventory (TRI), published by the U.S. EPA, is a valuable source of information regarding toxic chemicals that are being used, manufactured, treated, transported, or released into the environment. • Two statutes, Section 313 of the Emergency Planning and Community Right-To-Know Act (EPCRA) and section 6607 of the Pollution Prevention Act (PPA), mandate that a publicly accessible toxic chemical database be developed and maintained by US EPA. This database, known as the Toxics Release Inventory (TRI), contains information concerning waste management activities and the release of toxic chemicals by facilities that manufacture, process, or otherwise use said materials. Using this information, citizens, businesses, and governments can work together to protect the quality of their land, air and water. 3 2. National Database to FileMaker XML • 2.1 FileMaker 5.5 – http://www.filemaker.com • 2.2 Steps: – Download National.exe (16.7 MB) and extract. • http://epa.gov/tri/tri99/data/ – Import each of 4 files into FileMaker 5.5 (164 MB). – Make the 4 files sharable on the Web. – Use the FileMaker URL syntax for XML output. • 2.3 Interface Customization Possibilities. – http://www.filemaker.com/products/fmu_home.html 4 2.1 FileMaker 5.5 • Subsidiary of Apple Computer with powerful desktop desktop database functionality that supports multiple platforms including the Web. • The workgroup database of choice with organizations – more than 65% of the 1.2 million units shipped in 2000-2001 were volume license sales - second to Microsoft Access. • Third party developer resources: – Macromedia Dreamweaver – Adobe GoLive – Allaire ColdFusion 5 2.1 FileMaker 5.5 Database-to-XML 6 2.2 1999 Toxics Release Inventory (TRI) Data Files • File Type 1: Facility, Chemical, Releases and Other Waste Management Summary Information. This file contains facility information (Part I on Form R and Form A) as well as most chemical information (Part II on Form R and Form A). Data elements are reported individually. The information is also disaggregated based on Waste Management code (i.e., "M" code), and aggregated up to On-site Releases, Off-site Releases, Other On-site Waste Management, and Transfers Off-site for Further Waste Management categories. (84,079 records) • File Type 2: Detailed Waste Management and Source Reduction Activities. This files contains facility information (Part I on Form R and Form A) as well as the detailed information regarding source reduction and recycling activities (Part II, Section 8 on Form R) and on-site waste treatment methods (Part II, Section 7 on Form R). (84,079 records) • File Type 3A: Details of Transfers Off-site. This file contains facility information (Part I on Form R and Form A) as well as details of individual transfers off-site (Part II, Section 6.2 on Form R). (100,033 records) • File Type 3B: Details of Transfers to Publicly Owned Treatment Works (POTW). This file contains facility information (Part I on Form R and Form A) as well as a list of POTWs (Part II, Section 6.1.B on Form R). (84,079 records) 7 2.2 TRI National File Type 1 in FileMaker 5.5 8 2.2 TRI National File Type 1 in Web Browser 9 2.2 TRI National File Type 1 in Web Browser 10 2.2 TRI National File Type 1 in IE 6 (XML) 11 2.3 Interface Customization Possibilities • Change default.htm to own. • Use own stylesheet (XSL). Need Developer version. • Use HTML and Java to build Web application or portal. – Local Emergency Planning Committee database • http://www.epa.gov/ceppo/lepclist.htm – List of Lists database • http://184.108.40.206/lol/ – Population Estimation from Year 2000 Census Blocks • http://220.127.116.11:591/population 12 3. PDF and Web Pages to XML Documents • 3.1 Content Re-design and Re-publishing. • 3.2 Repurposing PDF to Excel. • 3.3 Repurposing PDF to XML. • 3.4 Repurposing PDF to Folio Views. • 3.5 NextPage Folio Views, LivePublish, and NXT 3. • 3.6 Comments. 13 3.1 Content Re-design and Re-publishing • Background: – backgrd_factors.pdf • Database: – National.exe • Press: – 40 pdf files at http://epa.gov/tri/tri99/press/press.htm – Tri99press.xsl (34 tables) • Previous: – Tri97.nfo and tri97.xls • Questions and Answers: – Qa.pdf (file error) • Report: – 1999pdr.pdf, completereport.pdf, sfs_introduction.pdf (Tri99.xsl - 23 tables). 14 3.1 Content Re-design and Re-publishing 15 3.2 Repurposing PDF to Excel • See Adobe Acrobat Help pages 103-109 & 82-84: – See next two slides for background. – Do: Edit, Preferences, Text/Formatted Text Preferences, Default Selection Type: Table, Okay. – Select: Table/Formatted Text Select Tool and draw a box around the table to be converted. – Do: Edit, Copy (or Ctrl+C) – In a blank Excel worksheet do: Edit, Paste (Ctrl-V) – Results: tri99.xls and tri99press.xls. 16 Acobat 5.0 Repurposing and Extracting • Acrobat 5.0 gives you powerful commands for repurposing or extracting text and graphics in PDF files.You can use the Save As command to save all text in a PDF file in Rich Text Format (RTF) for import into your favorite authoring application. If your PDF files use tagged Adobe PDF, you can extract the text without losing the formatting. For example, you can save pages of tables from a PDF file for import into an application such as Adobe FrameMaker or Microsoft Word and the table formatting will be preserved. Both PDFMaker and Acrobat Web Capture create tagged Adobe PDF automatically. (See “About the different types of Adobe PDF documents” on next slide) You can also use the Save As command to save each page in a PDF file to an image format. You can use the Export command to export all images in a PDF file; each image is saved in a separate file. In addition, Acrobat provides several tools—the text select tool, the column select tool, the table/formatted text select tool, and the graphics select tool—for copying and pasting small amounts of text and graphics from a PDF file to your clipboard.You can also paste text from a PDF document into a comment or bookmark name. While in a PDF document, you select the text or graphic and copy it onto the clipboard. Once the text or graphic is on the clipboard, you can launch the other application and paste the text or graphic into a file. 17 About the different types of Adobe PDF documents • There are three types of Adobe PDF documents: unstructured, structured, and tagged. These document types differ in what they contain and how their contents can be repurposed. In general, the more structural information the Adobe PDF document contains, the more options you have for repurposing its contents. – 1. Unstructured Adobe PDF: You can save unstructured Adobe PDF files to other formats such as RTF with good results. An unstructured Adobe PDF file saved to RTF recognizes paragraphs, but not basic text formatting, lists, or tables.You can’t reflow unstructured Adobe PDF files into different-sized devices, such as eBook reading devices. Unstructured Adobe PDF files aren’t reliably accessible using a screen reader for Windows. – 2. Structured Adobe PDF: You can save structured Adobe PDF files to other formats such as RTF with results that are better than unstructured Adobe PDF files but not as good as tagged Adobe PDF files. Structured Adobe PDF files saved to RTF recognize paragraphs and basic text formatting, but not lists or tables.You can’t reflow structured Adobe PDF files into different-sized devices. Structured Adobe PDF files can be accessed using a screen reader for Windows, but without the reliability of tagged Adobe PDF files. – 3. Tagged Adobe PDF: You can save tagged Adobe PDF files to other formats such as RTF with the best results, including the recognition of paragraphs, basic text formatting, lists, and tables.You can reflow tagged Adobe PDF files so that they’re readable in different-sized devices.Tagged Adobe PDF files have been optimized for accessibility, so they can be accessed reliably using a screen reader for Windows. 18 3.2 Repurposing PDF to Excel 19 3.2 Repurposing PDF to Excel 20 3.3 Repurposing PDF to XML • Adobe PDF Document as HTML: – http://access.adobe.com/simple_form.html • Save As XML Plug-In for Windows (B2): – http://www.adobe.com/support/downloads/detail.jsp?hexID=89a2 – Install and do: Help and About Adobe Acrobat Plugins and select SaveasXML. – Do: File, Save as, XML-1.00 without styling (*.xml) or XHTML-1.00 with CSS-1.00 (*.htm). (Note: Must be a tagged Acrobat PDF.) – See SaveAsXML Developer Information for Creating and Modifying Mapping Tables (DeveloperInfo.pdf). 21 3.3 Repurposing PDF to XML 22 3.3 Repurposing PDF to XML 23 3.3 Repurposing PDF to XML 24 3.3 Repurposing PDF to XML 25 3.4 Repurposing PDF to Folio Views • Imports major word processing and Web formats. • Use Adobe Acrobat 5.0.5. – Not the free Acrobat Reader. – Do: File, Open as Adobe PDF, then File, Save as, RTF. • Use Folio View 4.2 – Do: File, New and give it a name, Open; or File, Import, select RTF, Open. – Also do: File, Import URL for Web formats. – Apply structure, links, formatting, etc. using the GUI. 26 3.4 Repurposing PDF to Folio Views 27 3.4 Repurposing PDF to Folio Views 28 3.5 NextPage Folio Views, LivePublish, and NXT 3 • NextPage: http://www.nextpage.com – Folio Views – SGML-like markup (pre-XML) in a GUI. • CD-ROM distribution. • Web Server (Markup-to-HTML on the fly). – LivePublish – Basic XML support (uses DTD and see next slide). • Site Administrator. • Personal Edition (Desktop and CD-ROM). • Web Server (Markup-to-HTML on the fly). – NXT 3 – Advanced support for XML (LivePublish plus XSL, SOAP, etc. – see later slide). • Content Network Manager. • Content Network Server. 29 3.5 NextPage LivePublish • Uses of XML (see separate handout): – Serve up native XML. – Convert XML to HTML using a CSS or XSL at run time using the Display Filter API. – Convert XML to HTML at build time. – Uses an XML-based file to define site look and feel. – The build Makefiles are XML files that define the structure and contents of the information collections. – XML-based legacy conversion tools simplify the conversion of existing content into HTML. – Indexsheets (XIL) define and control the indexing of content like stylesheets (XSL) define and control the formatting (see separate handout). 30 3.5 NextPage Folio Views 31 3.5 NextPage LivePublish Site Administrator 32 3.5 NextPage LivePublish Personal Edition 33 3.5 NextPage LivePublish Personal Edition 34 3.5 NextPage LivePublish Web Server 35 3.5 NextPage NXT 3 Content Network • NextPage Web Services White Paper: – NXT 3 has been delivering XML Web Services since July 2000 based on an early SOAP recommendations before SOAP became a standard. – NextPage is developing full support for SOAP, WSDL, and UDDI standards and conforming Web service frameworks such as .Net and Sun One (Java). – Basic XML Web services provides low-level communication and NXT 3 provides high-level data coordination when intelligent evaluation of distributed content and collaborative capabilities in the context of business processes is needed (just released Matrix). 36 3.5 NextPage NXT 3 Content Network Manager 37 3.5 NextPage NXT 3 Content Network Web Server 38 3.6 Comments • Previous work: – 1995 Folio Views Infobase and Excel files. – TRI 1997 CD-ROM Users Guide Infobase. • Could add Year 2000 easily to Year 1999. • Organized files by folders for indexing with the NXT 3 File Service (recall section 3.1 screen capture and see next slide). • Can/should create tagged PDF files when you use Acrobat PDFMaker 5.0 to create PDF files from within Microsoft Office 2000 applications. 39 3.6 Discussion 40 4. Excel Data Tables to XML Data Islands • 4.1 Excel-to-XML and XML-to-Excel Round-tripping. • 4.2 XML Spy 4.2. • 4.3 Application of XML Step by Step, Second Edition, Data Binding. • 4.4 Comments. 41 4.1 Excel-to-HTML(XML) and HTML(XML)-to-Excel Round-tripping • In Excel do: File, Save as Web Page, select Republish: Sheet, Publish, Open in Browser, Publish. • In IE 5 or 6 do: View Source and explore the XML-like markup. • In Excel do: File, Open, Files of type: Web pages. 42 4.2 Data Tables to XML Data Islands • XML Spy 4.2 (see Tutorial): – Copying XML data to and from third party products: • XML Spy allows you to easily copy data to and from third party products. The copied data can be used within XML Spy as well as third-party products, enabling you to transfer XML data to spreadsheet-like applications (e.g. Microsoft Excel). • The " Copy as Structured Text" command copies elements to the clipboard as they appear on screen. This command is useful for copying table-like data from the Enhanced Grid View as well as the integrated Database/Table View. • The copied data can be used within XML Spy as well as third-party products, enabling you to transfer XML data to spreadsheet-like applications (e.g. Microsoft Excel). 43 4.3 Application of XML Step by Step, Second Edition, Data Binding • Re-format Excel worksheet with appropriate field names (Upper Camel Case). (See next slide) • Import to FileMaker 5.5 using field names. • Query FileMaker on the Web for XML output: – http://localhost/FMPro?–db=tri99table1.fp5&– format=–dso_xml&–findall • Add the XML output as a data island in the HTML file and display in IE5-6. – See tri1999table1.xml and tri1999table1.htm 44 4.3 Application of XML Step by Step, Second Edition, Data Binding 45 4.3 Application of XML Step by Step, Second Edition, Data Binding 46 4.3 Application of XML Step by Step, Second Edition, Data Binding <?xml version="1.0" encoding="UTF-8"?> <!-- File Name: tri99table1.xml --> <TRI99Table1> <Table1> <State>Nevada</State> <Rank>1</Rank> <FugitiveAir>1529022</FugitiveAir> <StackAir>1868475</StackAir> <SurfaceWater>136431</SurfaceWater> <UndgInjection>2797</UndgInjection> <LandReleases>1.1647E+09</LandReleases> <OnSiteReleases>1.1682E+09</OnSiteReleases> <OffSiteReleases>212998</OffSiteReleases> <TotalReleases>1.1684E+09</TotalReleases> </Table1> ….. </TRI99Table1> 47 4.3 Application of XML Step by Step, Second Edition, Data Binding 48 5. Some Future Steps • 5.1 Microsoft Excel 2002 lets you open or save workbooks in XML format. • 5.2 Access 2002 allows you to create a database table by importing an XML document or to export a database table or other object to an XML document. 49 5.1 Microsoft Excel 2002 • Source: Chapter 15. Publishing Information on the Web, Step by Step Microsoft Excel 2002: – Previous Excel 2000 workbooks and worksheets could be saved as Web files and queries could bring Web data into workbooks. – Excel 2002 extends those capabilities by providing live- links from Excel to Web files and by providing import and export of XML and ―Smart Tags‖ (e.g. have Excel look for known stock symbols and connect to a Web site that has information related to that symbol). 50 5.1 Microsoft Excel 2002 • Working with Structured Data: – XML can identify rows and cells within the spreadsheet and allow spreadsheet data to move freely to other applications. – Do: File, Save As, Save as type, select XML Spreadsheet (*.xml), and click Save. Click Yes when the message box appears. – Open the XML file in Spy to examine its structure and content (PivotXML.xml). – Open the XML file in Excel 2002 to see it re-display. 51 5.1 Microsoft Excel 2002 52 5.2 Access 2002 • Source: Chapter 3. Getting Information Into and Out of a Database, Step by Step Microsoft Access 2002: – Best practices: • Link to other databases rather than import so can view and edit in both systems. • Share databases by exporting to XML (universal format). • http://office.microsoft.com/assistance/2002/articles/acExOfScenariosUsingXML.aspx – Import: • Open Access 2002 database. • File, Get External Data, Import, Files of type, XML Documents, Import both XML and XSD, select file to be imported, Import, Import XML, Options, Structure and Data, Okay. • Open and view database tables to confirm data was imported. 53 5.2 Access 2002 • Exporting to other applications: – Works for Table, Query, Form, and Report. – Open Access 2002 database and select a table. – File, Export, select XML Documents, Save as type, Export, Export XML, select both Data (XML) and Schema (XSD) of the data, Okay. • See screen captures on next pages. • See Advanced, Schema tab and select appropriate option. – Look at XML and XSD files (see examples below) in XML Spy 4.2: • Orders.xml, Order Details.xml, and Order Details.xsd. 54 5.2 Access 2002 55 5.2 Access 2002 56 5.2 Access 2002 57 6. Questions and Answers Brand Niemann. Ph.D. USEPA Headquarters, EPA West, Room 6143D Office of Environmental Information, MC 2822T 1200 Pennsylvania Avenue, NW, Washington, DC 20460 202-566-1657 email@example.com EPA: http://18.104.22.168 Outside EPA: http://22.214.171.124 58