Writing Illiad SQL Queries for MS SQL Server Williad 5.5.06 Why write SQL? - You need data in a machine readable format (like a comma delimited file) - You need to go beyond the generation of manual reports in something like Access: o To manipulate data o To create complex queries that automatically run at certain times or via events (through jobs and triggers) - You want to build a user friendly front end to your data and do so abstractly (ie: only give them what they need and protect your data). A few possibilities include an ASP or PHP web front end that spits out a web report, or a comma delimited file. - SQL code is a standard meaning your queries will work in any SQL compliant program (easily migrateable – if that‟s a word?) - It‟s fun and will impress your friends at parties! Tools for writing/using SQL queries Some tools for writing SQL include: - Query Analyzer (MS Standard) - Access (GUI – but adds all sorts of nonsense to code) - Toad (I use this; it‟s more robust than Query Analyzer. There is a free version at: www.toadsoft.com) The above programs let you write your ad-hoc queries, the SQL itself. If you are building an application to utilize the code (like a web front end or a script), you will either move the SQL into that language or store it on the server as a view or a stored procedure. More on that later. You use Enterprise Manager to administer the SQL Server (ie: add logins/users, administer rights, etc). MS SQL Server Security Basics To access data from a SQL Server database, a user must pass through two stages of authentication: 1) SQL Server level (login names) to login or connect to the actual server and 2) at the database level (user accounts) to access data from a database. - LOGIN: A valid login name used to connect to the SQL Server. If the SQL server is setup in mixed mode authentication, then this can be either a user manually created in the SQL server (lives in dbo.master DB) or a NT/W2K domain user. If it is setup in windows authentication mode, it only allows domain users to login. Note that in either setup, you can always use domain logins. - USER: A valid user account granted rights within a database to access/manipulate data. Specific to databases (ie: must be explicitly granted per DB). All permissions and ownership are controlled by the user account (ie: DML/DDL commands (SELECT/INSERT/CREATE TABLE/etc). Logins are associated with user accounts. A login can have different user accounts for different databases – but only one user per DB. The difference between Windows user and a SQL user is: - SQL: unique user/password combo stored/maintained in SQL server - Windows: account in the windows domain – if no SQL user is specified upon connection, the server will attempt windows authentication based on your current credentials (an “invisible” login) Choosing a SQL Server login or a Windows login is ultimately a matter of your needs, however the benefits of using Windows logins means: - Easier to know who did what (in some cases institutions use a single SQL user among staff, which makes this impossible) - Login information (user/password) is not sent over network (can‟t be sniffed) - Easier to maintain if you have many users (ie: create a windows group on server, add users you want to give read access to DB to it, add group as a login in enterprise manager, then give that group an appropriate user account with db_datareader role – now you can add/remove users to the widows group and never have to set rights within SQL server for those people) - At the user level, rights and ownership to a given DB and its objects can be manually configured, or are more easily controlled by fixed roles. There are server roles associated with logins and database roles associated with users. Common roles: o Server: sysadmin = can perform any activity in SQL Server o Database: public = equivalent to the “everybody” group in windows; initially has no rights to the db and every user is a automatically a member of it when added to a DB db_owner = has all permissions in the database db_datareader = can select all data from any table in DB db_datawriter = can modify any data from any table in DB In addition, individual rights can be explicitly granted (outside of roles for more granularity) via enterprise manager or SQL commands: GRANT/REVOKE/DENY. Or, you can create your own custom roles and assign them to users. Diving into SQL SQL stands for „Structured Query Language,‟ which is an ANSI compliant language for accessing and manipulating database systems. Well, it‟s MOSTLY standard across different proprietary DB systems like MS SQL Server, Oracle, MySQL, etc – but each company likes to write its own quirks and general wackiness into each of their database platforms. So a query written in SQL Server will not always work out-of-the-box on an Oracle Server unless you mess with the query a bit. SQL is a query language, but it includes a syntax to update, insert and delete records as well. The Data Manipulation Language (DML) represents these all these commands together: - SELECT – extract record(s) from table - UPDATE – update records(s) in table - DELETE – delete record(s) from table - INSERT – insert new records(s) in table The most basic SELECT statements look like these: SELECT field1, field2 FROM table_name SELECT * FROM UsersAll SELECT LastName, FirstName, EMa ilAddress FROM UsersAll In both queries, we will get a result set of all records in the table. The first returns all columns (*), the second the three columns we specified. If we want to conditionally select data according to some logical operation, we use the WHERE clause: SELECT LastName, FirstName, Department FROM UsersAll WHERE Status = 'Faculty/Staff' Notice the use of single quotes („) to denote a string, do not use them with number data types. Some other operators include: - <> : Not Equal - > : Greater than - >= : Greater than or equal to - < : Less than - <= : Less than or equal to - LIKE The LIKE operator allows us to use wildcards (the percent sign: %): SELECT LastName, FirstName, Number FROM UsersAll WHERE Number LIKE ‘20000%’ This query will return all records where the Number field starts with „20000‟ (ie: get all users with OHSU barcodes) What if we want to extract data from the database from several tables at once? We do so by JOINING the tables. There needs to be an established KEY relationship between them: a primary key (row with unique value) in one table correlates to one or more row(s) in another table. This relationship allows us to join two or more tables and ensures our result set will not return duplicate records between the tables (note: you will still get dupes within the table if they exist! More on that in a bit) Example that returns Loans by User (joining UsersAll and Transactions tables in WHERE clause): SELECT u.Status, u.Department, t.TransactionDate, t.LoanTitle, t.LoanPublisher, t.LoanDate, t.LoanEdition, t.ISSN, t.LendingLibrary FROM dbo.Transactions t, dbo.UsersAll u WHERE LOWER(t.TransactionStatus) = 'request finished' AND LOWER(t.DocumentType) = 'loan' AND LOWER(u.Status) IN ('faculty/staff', 'student', 'distanceeducation') AND ((t.TransactionDate < '1/1/06') AND (t.transactionDate >= '1/1/05')) AND (t.Username = u.UserName) - Notes: o Table is specified via explicit schema (where it is located in the DB: dbo.Transactions) and is followed by an „alias‟ or shortcut designation that makes it easier when denoting the table throughout the rest of the query o I am using a SQL Server string function called LOWER to convert values from the table to all lowercase, ensuring data consistency and that everything is caught o I am using ANDs to logically specify multiple conditions that must be true (you can also use ORs and NOTs; use parenthesis to control order of operations) o I am also using the IN keyword to easily specify that I want to match on a value within the parenthesis (easier than writing out „equals‟ conditions for each of those values) o JOIN occurs at the bottom of our WHERE clause: here I specify which fields I want to join the two tables on. You can also join in the FROM clause (I prefer this way as it is cleaner and more ANSI compliant): SELECT u.Status, u.Department, t.TransactionDate, t.LoanTitle, t.LoanPublisher, t.LoanDate, t.LoanEdition, t.ISSN, t.LendingLibrary FROM dbo.Transactions t INNER JOIN dbo.UsersAll u ON (t.Username = u.UserName) WHERE LOWER(t.TransactionStatus) = 'request finished' AND LOWER(t.DocumentType) = 'loan' AND LOWER(u.Status) IN ('faculty/staff', 'student', 'distanceeducation') AND ((t.TransactionDate < '1/1/06') AND (t.transactionDate >= '1/1/05')) - Notes: o INNER JOIN: returns all rows where both tables have matching rows, unmatched rows are excluded from our result set (ie: an average join) o LEFT and RIGHT JOIN: LEFT returns all rows where there are matches, but also returns rows from the LEFT or first table that have no match on the right, or second table. RIGHT is the opposite. But I am anal and want my data sorted! Well, use an ORDER BY clause then: SELECT u.Status, u.Department, t.TransactionDate, t.LoanTitle, t.LoanPublisher, t.LoanDate, t.LoanEdition, t.ISSN, t.LendingLibrary FROM dbo.Transactions t INNER JOIN dbo.UsersAll u ON (t.Username = u.UserName) WHERE LOWER(t.TransactionStatus) = 'request finished' AND LOWER(t.DocumentType) = 'loan' AND LOWER(u.Status) IN ('faculty/staff', 'student', 'distanceeducation') AND ((t.TransactionDate < '1/1/06') AND (t.transactionDate >= '1/1/05')) ORDER BY t.TransactionDate, u.UserName - Notes: o Here, we specify which column we want our result set sorted on. You can also specify (separated by columns) what to sort on after that initial column if there more than one value. Wait a minute buddy, I am getting duplicate records in my result set. How do I deal with those? The DISTINCT keyword after SELECT will tell SQL Server to only return unique rows (i.e.: per our result set it will not report rows that are the same) if it encounters them. An example that returns users with visa cards: SELECT DISTINCT a.Type AS 'Type', u.Username as 'UserName', u.LastName as 'LastName', u.FirstName as 'FirstName', u.Status as 'Status', u.EMailAddress as 'EmailAddress' FROM UsersAll u JOIN UserAccountsLink l ON (u.UserName = l.UserName) JOIN UserAccounts a ON (l.InternalNo = a.InternalNo) WHERE u.NVTGC = 'ILL' AND u.Cleared = 'Yes' And a.Active = 'Yes' AND a.Type like 'VISA%' ORDER BY u.LastName - Notes: o I am using “Aliases” for columns returned in our result set (specified after AS keyword) for a cleaner report o To get a result set that joins users with their accounts (one user can have any accounts) we need to use a “link table” called UserAccountsLink, which is essentially a representation of the correlation (that rhymes!) between UsersAll.Username and UserAccounts.InternalNo. So we join three tables… o My queries are ran against an Illiad setup with multiple sites, as such I am querying a UsersAll table and am using the NVTGC field in that table to specify which site I want to restrict my query to. Otherwise, the query would include User data from all sites. You may be working with table in a single site installation, so your mileage may vary. In the above example, we were still getting dupe records (ie: some users had multiple records in the UsersAll table with slightly different date in them). Realistically, this data should be cleaned up pronto, but we needed a way to get an accurate count of users and their contact details so I used a GROUP BY clause and the aggregate function MAX to discard these erroneous rows: SELECT DISTINCT MAX(a.Type) AS 'Type', MAX(u.Username) as 'UserName', MAX(u.LastName) as 'LastName', MAX(u.FirstName) as 'FirstName', MAX(u.Status) as 'Status', MAX(u.EMailAddress) as 'EmailAddress' FROM UsersAll u JOIN UserAccountsLink l ON (u.UserName = l.UserName) JOIN UserAccounts a ON (l.InternalNo = a.InternalNo) WHERE u.NVTGC = 'ILL' AND u.Cleared = 'Yes' And a.Active = 'Y es' AND a.Type like 'VISA%' GROUP BY u.UserName ORDER BY u.LastName - Notes: o GROUP BY takes a field and groups the output rows based on it. In this case, any row with the same UserName is merged into a single row. In order for the GROPU BY clause to work, you typically have to use aggregate functions in the SELECT statement. Here I am using the MAX function, which returns the highest value (in numeric terms), but in this case it returns the first value and thus will ignore any dupes below it. Other aggregate functions include COUNT, MIN, AVG, etc – COUNT in this case would tell us HOW many dupe records we had – which would yield helpful for cleaning up the data… INSERT, UPDATE, DELETE statements are used for manipulating data. Use INSERT to add new records to table(s), UPDATE to modify fields of existing record(s) and DELETE to remove records. INSERT example (I don‟t have a tested example, but here is the context: INSERT INTO table_name (column1, column2,...) VALUES (value1, value2,....) - Notes: o table_name is the table to insert values into, followed by: o columns we want data inserted into (typically in parenthesis), followed by: o VALUES clause and values to insert into those columns, followed by: o If you do not specify columns after the table name, values will be inserted starting with the first column in the table‟s design and working up (so, you need to specify enough values for the entire table, otherwise SQL Server will yell at you) UPDATE example (setting user accounts to inactive who have visa cards): UPDATE UserAccounts SET UserAccounts.Active = 'No' WHERE UserAccounts.InternalNo IN (SELECT a.InternalNo FROM UserAccounts a JOIN UserAccountsLink l ON (l.InternalNo = a.InternalNo) JOIN UsersAll u ON (u.UserName = l.Username) WHERE u.NVTGC = 'ILL' And a.Active = 'Yes' AND a.Type like 'VISA%') - Notes: o Here I am updating a single field “Active” in the UserAccounts table to „No‟ (specified via the SET clause) o Note how I am using TWO (count them, TWO!) SQL queries in one. In this case, the inner most query, the SELECT query is ran first and returns a result set (in this example, it is pulling InternalNo from UserAccounts who are in the „ILL‟ site, active and have visa cards) . Then, the Update query updates fields for records in that results set (using the IN keyword). DELETE example (delete accounts for users with visa cards): DELETE FROM UserAccounts a2 WHERE a2.InternalNo IN (SELECT a.InternalNo FROM UserAccounts a JOIN UserAccountsLink l ON (l.InternalNo = a.InternalNo) JOIN UsersAll u ON (u.UserName = l.Username) WHERE u.NVTGC = 'ILL' And a.Active = 'Yes' AND a.Type like 'VISA%') Disclaimer: Obviously it is a good idea to read up on and test out modification queries like these on test data before messing around with them! Different ways of storing your SQL on the server: All of the above details ways of writing what are known as ad-hoc queries, or queries where you typically are using a program like Query Analyzer, TOAD, or Access to run the query against the DB. When you reach a point where you want to start exploring the possibilities of utilizing SQL queries within applications and scripts, it is highly beneficial to write queries that are stored on the SQL server. Some examples include: - A view (easier to write, but is very limited: cannot manipulate data, has no control flow, no parameterization, and is limited to a single select statement; good choice for dealing with simple reports where you want some abstraction of your original tables) - A stored procedure/user defined function (UDF) (more difficult, start getting into T-SQL, but is exponentially more powerful: allows all those above, better choice when building applications, and thinking „modular‟; UDF returns a value, stored procedure does not) - Notes: o There is really no performance gain in using stored procedures over views; in both cases the execution plan is stored in the server‟s cache, making the SQL run much faster than if it is delivered to the system at query time (ad- hoc). o A well written stored procedure or function leverages your data processing entirely onto the SQL Server (it is built for that; hit it hard and be happy!) thus providing data abstraction (read: protection), requires less manual coding on the programming/scripting end, and always executes faster than ad-hoc queries. Resources online (I didn’t make this stuff up, I stole it): Straight forward/no nonsense SQL reference (geared towards web work): http://www.w3schools.com/sql/default.asp SQL Server security model and security best practices, tips: http://vyaskn.tripod.com/sql_server_security_best_practices.htm Should I use a view, a stored procedure, or a user-defined function? http://www.aspfaq.com/show.asp?id=2537 MSDN reference for SQL syntax, Server functions, T-SQL, etc…: http://msdn.microsoft.com/library/en-us/tsqlref/ts_tsqlcon_6lyk.asp Contact info: Nick Peterson Oregon Health & Science University Library Computer User Support Analyst Phone: 503.494.1361 Email: email@example.com Questions? Insults?
Pages to are hidden for
"Illiad SQL Notes"Please download to view full document