Database Concepts Objectives o Why should a database be used? o What do databases have to offer? o What should be taken into account when designing a database and setting it up? o What are objects of a database, and how can they be used? o Considerations in deciding whether to use a database Purpose Increase your awareness of the importance of managing information, and the vital role that databases play in accomplishing that. Learn to set up data properly using a systematic approach that starts with general planning and ends with a database ready for use. End results should include ability to: o describe information management, what it is composed of, and why it is important; o develop appropriate data structures and management procedures for a study / project of your choosing. History o 1980s: dramatic reduction in the differences in capabilities of mainframe, mini- computer, and microcomputer systems o Virtual explosion in computer power, accessibility Capability of storing terabytes of data Information Management o Common usage: worksheets (eg. Excel, Lotus) ascii files o Databases: can do the same thing as worksheets or ascii files, but allow you to have the data represented as part of a bigger picture “Relational” part of the name means that a given table in the database IS RELATED TO, or can be linked with other related datasets (tables) in the database. Relating Tables By using variables that are found in both tables of a relational database, you can merge information from these tables to generate 1 view of the results. TABLE 1 TABLE 2 Name Name Location Email address Phone Position Department Match up names, select people found in Peavy Hall, and give me their e-mail addresses. Peavy Hall Email Addresses Name Location Email address Database Design Step 1: Determine what datasets you are interested in maintaining and viewing regularly. How would this data appear if it all had to be represented in a table? o No introductory paragraphs o No end comments EXCEL Worksheet Containing Meteorology Data: Dataset MetLocId Date / AirT3m AirTdel3m1 AirTMax AirTMin AirT Code Time Comments HOEM_AIRT HOEM 12/18/1993 -2.63 0.15 -2.58 -2.69 19:30:00 HOEM_AIRT HOEM 12/18/1993 -2.65 0.08 -2.54 -2.73 19:40:00 HOEM_AIRT HOEM 12/18/1993 -2.42 0.02 -2.35 -2.51 19:50:00 HOEM_AIRT HOEM 12/18/1993 -2.56 0.11 -2.40 -2.74 20:00:00 HOEM_AIRT HOEM 12/18/1993 -2.81 0.09 -2.76 -2.85 20:10:00 HOEM_AIRT HOEM 12/18/1993 -2.67 0.04 -2.57 -2.77 20:20:00 HOEM_AIRT HOEM 12/18/1993 -2.45 0.04 -2.40 -2.55 20:30:00 HOEM_AIRT HOEM 12/18/1993 -2.56 0.02 -2.48 -2.63 20:40:00 HOEM_AIRT HOEM 12/18/1993 -2.56 0.01 -2.52 -2.61 20:50:00 HOEM_AIRT HOEM 12/18/1993 -2.51 -4.0e-03 -2.48 -2.54 21:00:00 The file is already organized with 1 variable per column and is straightforward for moving into a database table. Ascii file: PROGRAM NAME - Breckenridge LOCATION - MacDONALD ELEVATION IN FEET - 542 DATE(S) MEASUREMENTS WERE TAKEN – every 15 minutes from 1/3/91 7:30:00 to 1/3/91 13:45:00 VARIABLES: Discharge Rate,Discharge Quality,Water Temperature, Water Temperature Quality,Conductivity,Conductivity Quality 45.11,good,3.40,good,29.60,good 45.11,good,3.60,good,29.60,good 44.26,good,3.80,good,29.60,good 43.44,good,3.90,good,29.60,good 43.44,good,4.00,good,29.60,good 43.44,good,4.60,good,29.60,good 42.62,good,4.80,good,29.60,good 41.80,good,4.90,good,29.60,good 41.80,good,5.20,good,29.60,good 42.62,good,5.00,good,29.60,good 42.62,good,4.70,good,29.60,good 42.62,good,4.40,good,30.30,good 42.62,good,5.60,good,30.30,good 42.62,good,6.20,good,28.90,good 43.44,good,5.70,good,29.60,good 44.26,good,5.60,good,29.60,good 45.93,good,5.40,good,29.60,good 46.78,good,5.60,good,29.60,good 47.63,good,6.10,good,28.90,good 48.51,good,6.50,good,28.30,good 50.26,good,6.60,good,28.30,good 51.11,good,6.70,good,28.30,good 52.02,good,6.60,good,28.20,good 52.90,good,6.10,good,28.90,good 52.90,good,6.00,good,30.30,good 52.90,good,6.60,good,29.60,good You will need to go through this file and figure out how it would be represented if it were all tabular. The headings will need to be represented as columns or variables in the table. Possible result of cleaning up ascii file for stream hydrology: Breckenridge,MacDONALD,1/3/91 7:30:00,45.11,good,3.40,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,45.11,good,3.60,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,44.26,good,3.80,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,43.44,good,3.90,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,43.44,good,4.00,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,43.44,good,4.60,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,42.62,good,4.80,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,41.80,good,4.90,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,41.80,good,5.20,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,42.62,good,5.00,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,42.62,good,4.70,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,42.62,good,4.40,good,30.30,good Breckenridge,MacDONALD,1/3/91 7:30:00,42.62,good,5.60,good,30.30,good Breckenridge,MacDONALD,1/3/91 7:30:00,42.62,good,6.20,good,28.90,good Breckenridge,MacDONALD,1/3/91 7:30:00,43.44,good,5.70,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,44.26,good,5.60,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,45.93,good,5.40,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,46.78,good,5.60,good,29.60,good Breckenridge,MacDONALD,1/3/91 7:30:00,47.63,good,6.10,good,28.90,good Breckenridge,MacDONALD,1/3/91 7:30:00,48.51,good,6.50,good,28.30,good Breckenridge,MacDONALD,1/3/91 7:30:00,50.26,good,6.60,good,28.30,good Breckenridge,MacDONALD,1/3/91 7:30:00,51.11,good,6.70,good,28.30,good Breckenridge,MacDONALD,1/3/91 7:30:00,52.02,good,6.60,good,28.20,good Breckenridge,MacDONALD,1/3/91 7:30:00,52.90,good,6.10,good,28.90,good Breckenridge,MacDONALD,1/3/91 7:30:00,52.90,good,6.00,good,30.30,good Breckenridge,MacDONALD,1/3/91 7:30:00,52.90,good,6.60,good,29.60,good It may look like the first two columns of this table are repeated a lot, but it allows you to load all the information in a database table, and the more populated / complex your database becomes, the more you’ll appreciate having this information within the table. If you think about what would happen if you had datasets like this for 10 different programs covering 20 different locations spanning a 5 year period, you’ll begin to understand why it is helpful to add the program, location and date fields to the table. Database Design Step 2: Determine how datasets you’re interested in relate to one another. o Step back and figure out how the datasets fit into a larger picture. o What would be a logical / useful way of linking the tables together to allow fields from them to be viewed simultaneously? Meteorology & Stream Discharge Measurements: Meteorology station: measures precipitation, temperature, humidity; averaged every 10 minutes Stream gauge: measures discharge, temperature, conductivity; averaged every 15 minutes It may not look like meteorological data has much in common with streamflow. On a larger scale, time and location would be a logical way of linking them. Relating meteorology and hydrology tables: Meteorology Hydrology Station Station Date / Time Date / Time Precipitation Discharge Air Temperature Water Temperature Humidity Conductivity Select meteorology station A, stream gauge B, match by date & show average air temperature, total precipitation, and total discharge by day. Temperature, Precipitation & Stream Discharge Date Average Air Temperature Total Precipitation @ Meteorology Station Total Discharge @ Stream Gauge If the tables do not already contain the date fields, which are important in pulling them together, you need to be sure to add them. If this is not determined and taken care of at the start, it creates a lot of extra work to fix this in the future. Relationships between the meteorology and hydrology tables can be even more formalized by setting up a location table. It may look something like this: MEASURING SITES: Location LocID Latitude Longitude Elevation Description Basin Name (ft) Beacon BENM 49.681 N 38.422 E 176 Cloverdale met station Borney BOYS 56.994 N 50.981 E 264.43 Lorendcoop stream gauge Boonsworth BRHM 56.483 N 50.05647 E 279.07546 Lorendcoop met station Canada CAAM 45.753 N 48.42128 E 264.42892 Frosty met station Cornwealth COHS 49.664 N 38.52343 E 290.24256 Cloverdale stream gauge Beacon met station and Cornwealth stream gauge are found in the same basin according to this table. Since the meteorology table and streamflow tables both contain a column for station name, you can then refer to any records in the meteorology table and the stream table that have a station found in a given basin: MEASURING SITES location name locid latitude longitude elevation description Meteorology Hydrology basin Station Station Date / Time Date / Time Precipitation Discharge Air Temperature Water Temperature Humidity Conductivity This type of setup would allow you to: 1) Find all records in the measuring sites table where the basin is “Lorendcoop”. 2) List the station name for those records. 3) Using that narrowed down list of station names, show the total daily precipitation and stream discharges, and average air and water temperatures sorted by date for the month of October 2001. Resulting Table / View: Stream Meteorology Basin Date Total Avg Air Total Avg Location Location Ppt Temp Discharge Water Temp Borney stream Boonsworth Lorendcoop 10/1/01 0 54 2.09 38 gauge met station Borney stream Boonsworth Lorendcoop 10/2/01 0.02 49 2.73 39 gauge met station Borney stream Boonsworth Lorendcoop 10/3/01 0.01 51 2.51 37 gauge met station Database Design Step 3: Formally set up tables and their relationships in the database. Databases are populated through tables within the database. o If you think of a database like a house, that database will be made up of tables, like a house is made up of lots of rooms. o Doorways in a house tie the rooms together. Relationships, or links using fields in common tie tables of a database together. o To access information from more than one table of a relational database at once, the links that are set up in the design of the database will be used. MEASURING SITES location name (primary key) locid latitude longitude elevation description Meteorology Hydrology basin Station (foreign key) Station (foreign key) Date / Time Date / Time Precipitation Discharge Air Temperature Water Temperature Humidity Conductivity If we go back to the example we’ve been working with, we have 3 tables to set up in the database: measuring sites, meteorology, and hydrology. o Location name in the measuring sites table needs to be present in order for the hydrologists and meteorologists to link up with that information: location name will serve the role as PRIMARY KEY. PRIMARY KEY: unique identifier of a given field; no duplicate entries are allowed for the field serving this role. Any locations (stations) we want to use in our database must have an entry for that location before using it in linked tables. o Entry for station fields in the meteorology and hydrology tables must be represented in the location name field of the measuring sites table: station fields will serve the role as FOREIGN KEY. FOREIGN KEY: field found in a dependent table which must contain a matching entry in the primary key table / field. Specifying the primary keys and foreign keys of a relational database tells how you link the tables together. If you try to make an entry in a foreign key field that is not present in your primary key table, the database won’t allow it. It would return a primary key – foreign key constraint error and will disregard the entry. This provides database integrity. In the house analogy, the primary key – foreign key relationships provide doorways to go from room-to-room in the house. MEASURING SITES location name (primary key) locid latitude longitude elevation description Meteorology Hydrology basin Station (foreign key) Station (foreign key) Date / Time Date / Time Precipitation Discharge Air Temperature Water Temperature Humidity Conductivity In our example: o the station names for the meteorology and hydrology tables aren’t the same, but those stations would be represented in the measuring sites table. o Using records in the measuring sites table, we are able to find all hydrology and meteorology stations from a given basin, and o merge fields from the hydrology and meteorology tables using the basin and dates they have in common. Database Usage MS Access is one of the most common database software packages. This package contains a series of “objects” that it would be useful to become familiar with. MS Access objects: o tables o queries o forms o reports o pages o macros o modules MS Access Objects MS Access objects are different ways of either viewing or processing / updating the data stored in your database. MS Access Tables A table is a “collection of data about a specific topic”, such as hydrology or meteorology. Using a separate table for each topic means that you store that data only once, which makes your database more efficient, and reduces data-entry errors. Tables are the official place where database records are stored. All the other objects in MS Access are based on records in the tables. MS Access Queries Queries are used to view, change, and analyze data in different ways. They can also be used as the source of records for forms, reports, and data access pages. Four basic types of queries include; o select – calls up a given group of records matching the constraints specified; o update – changes / updates the values of field / record combinations meeting specified criteria; o append – adds new records to the bottom of a table; o delete – removes records matching specified constraints from a table. Basic statistical functions (eg. mean, max, min, standard deviation) can also be performed with queries. MS Access Forms Forms are a way of viewing the information in your tables in other types of layouts. This can serve a number of purposes: o create a data-entry form to enter data into a table; o create a custom dialog box to accept user input and then carry out an action based on that input; o create a switchboard form to open other forms or reports. Most of the information in a form comes from an underlying record source (eg. table or query). Other information in the form is stored in the form’s design. MS Access Reports Reports are a way to present your data in a printed format. You have control over the size and appearance of everything on a report, so you can display the information the way you want to see it. Most of the information in a report comes from an underlying record source (eg. table or query). Other information in the report is stored in the report’s design. MS Access Pages A data access page is a special type of Web page designed for viewing and working with data from an Internet or intranet – data that is stored in an MS Access database or an MS SQL server database. A data access page may also include data from other sources such as MS Excel. Data access pages are designed in MS Access. However, they’re stored in a separate file outside of MS Access. A shortcut to that file is automatically added in the database window. MS Access Macros A macro is a set of one or more actions that each perform a particular operation, such as opening a form or printing a report. Macros can help you to automate common tasks. For example, you can run a macro that prints a report when a user clicks a command button on a form. MS Access Modules A module is a collection of Visual Basic for Applications declarations and procedures that are stored together as a unit. If you know how to write what you want in Visual Basic, you can bypass the queries and macros, and go right to the modules. Considerations in Deciding to use a Database Under what circumstances is it worth my time to put my datasets into a database program? When am I better off with an ascii, or some other representation of my data? CONSIDER: o size of file o amount of data / number of tables o what will you be doing with the data? searches, grouping, calculations, summary stats can be very efficient in databases plotting, mapping, more complex statistics (eg. regression, ANOVA) should be done in other software. Considerations in Deciding to use a Database (cont’d) What pieces of software can easily deal with database files? MS Access and SQL Server tables can easily be saved / exported in the following formats: dbase III, IV or 5 (*.dbf) Microsoft Excel 3, 4, 5-7 or 97-2000 (*.xls) HTML documents (*.html, *.htm) Lotus 1-2-3 WJ2, WK1 or WK3 Paradox 3, 4, 5 or 7-8 (*.db) Text files (*.txt, *.csv, *.tab, *.asc) Microsoft Active Server Pages (*.asp) Microsoft IIS 1-2 (*.htx, *.idc) Rich text format (*.rtf) Microsoft Word Merge (*.txt) Microsoft Access databases (*.mdb) ODBC databases Considerations in Deciding to use a Database (cont’d) What pieces of software can easily deal with database files? The following software packages have options for calling up database tables: MS Excel: Data Get External Data New Database Query ArcView: o Using ODBC (open database connectivity), database tables can be linked to ArcView. o ArcView’s SQL connect feature allows you to query a database using SQL and store the returned records in an ArcView table. SAS: File Import Microsoft Access table specify database, table S-Plus: File Import From file Files of type MS Access file (*.mdb) CONCLUSIONS Why should a database be used? To make the whole greater than the individual parts. What do databases have to offer? They pull information from various sources together so you can see how 1 table / dataset fits in to the greater picture. What should be considered to set up a database? What does the data consist of? How will the data be used? How should the data be organized to fit into the overall picture?