Illiad SQL Notes by ygq15756


									                            Writing Illiad SQL Queries for MS SQL Server
                                                                              Williad 5.5.06

Why write SQL?
   -    You need data in a machine readable format (like a comma delimited file)
   -    You need to go beyond the generation of manual reports in something like
              o To manipulate data
              o To create complex queries that automatically run at certain times or via
                 events (through jobs and triggers)
   -    You want to build a user friendly front end to your data and do so abstractly (ie:
        only give them what they need and protect your data). A few possibilities include
        an ASP or PHP web front end that spits out a web report, or a comma delimited
   -    SQL code is a standard meaning your queries will work in any SQL compliant
        program (easily migrateable – if that‟s a word?)
   -    It‟s fun and will impress your friends at parties!

Tools for writing/using SQL queries
Some   tools for writing SQL include:
   -     Query Analyzer (MS Standard)
   -     Access (GUI – but adds all sorts of nonsense to code)
   -     Toad (I use this; it‟s more robust than Query Analyzer. There is a free version at:

The above programs let you write your ad-hoc queries, the SQL itself. If you are building
an application to utilize the code (like a web front end or a script), you will either move
the SQL into that language or store it on the server as a view or a stored procedure. More
on that later.

You use Enterprise Manager to administer the SQL Server (ie: add logins/users,
administer rights, etc).

MS SQL Server Security Basics
To access data from a SQL Server database, a user must pass through two stages of
authentication: 1) SQL Server level (login names) to login or connect to the actual server
and 2) at the database level (user accounts) to access data from a database.
    - LOGIN: A valid login name used to connect to the SQL Server. If the SQL server
        is setup in mixed mode authentication, then this can be either a user manually
        created in the SQL server (lives in dbo.master DB) or a NT/W2K domain user. If
        it is setup in windows authentication mode, it only allows domain users to login.
        Note that in either setup, you can always use domain logins.
    - USER: A valid user account granted rights within a database to access/manipulate
        data. Specific to databases (ie: must be explicitly granted per DB). All
        permissions and ownership are controlled by the user account (ie: DML/DDL
       commands (SELECT/INSERT/CREATE TABLE/etc). Logins are associated
       with user accounts. A login can have different user accounts for different
       databases – but only one user per DB.

The difference between Windows user and a SQL user is:
   - SQL: unique user/password combo stored/maintained in SQL server
   - Windows: account in the windows domain – if no SQL user is specified upon
       connection, the server will attempt windows authentication based on your current
       credentials (an “invisible” login)

Choosing a SQL Server login or a Windows login is ultimately a matter of your needs,
however the benefits of using Windows logins means:
   - Easier to know who did what (in some cases institutions use a single SQL user
      among staff, which makes this impossible)
   - Login information (user/password) is not sent over network (can‟t be sniffed)
   - Easier to maintain if you have many users (ie: create a windows group on server,
      add users you want to give read access to DB to it, add group as a login in
      enterprise manager, then give that group an appropriate user account with
      db_datareader role – now you can add/remove users to the widows group and
      never have to set rights within SQL server for those people)
   - At the user level, rights and ownership to a given DB and its objects can be
      manually configured, or are more easily controlled by fixed roles. There are
      server roles associated with logins and database roles associated with users.
      Common roles:
          o Server:
                   sysadmin = can perform any activity in SQL Server
          o Database:
                   public = equivalent to the “everybody” group in windows; initially
                      has no rights to the db and every user is a automatically a member
                      of it when added to a DB
                   db_owner = has all permissions in the database
                   db_datareader = can select all data from any table in DB
                   db_datawriter = can modify any data from any table in DB

In addition, individual rights can be explicitly granted (outside of roles for more
granularity) via enterprise manager or SQL commands: GRANT/REVOKE/DENY. Or,
you can create your own custom roles and assign them to users.

Diving into SQL
SQL stands for „Structured Query Language,‟ which is an ANSI compliant language for
accessing and manipulating database systems. Well, it‟s MOSTLY standard across
different proprietary DB systems like MS SQL Server, Oracle, MySQL, etc – but each
company likes to write its own quirks and general wackiness into each of their database
platforms. So a query written in SQL Server will not always work out-of-the-box on an
Oracle Server unless you mess with the query a bit.
SQL is a query language, but it includes a syntax to update, insert and delete records as
well. The Data Manipulation Language (DML) represents these all these commands
    - SELECT – extract record(s) from table
    - UPDATE – update records(s) in table
    - DELETE – delete record(s) from table
    - INSERT – insert new records(s) in table

The most basic SELECT statements look like these:

SELECT field1, field2 FROM table_name


SELECT LastName, FirstName, EMa ilAddress FROM UsersAll

In both queries, we will get a result set of all records in the table. The first returns all
columns (*), the second the three columns we specified. If we want to conditionally
select data according to some logical operation, we use the WHERE clause:
SELECT LastName, FirstName, Department
FROM UsersAll
WHERE Status = 'Faculty/Staff'

Notice the use of single quotes („) to denote a string, do not use them with number data
types. Some other operators include:
    - <> : Not Equal
    - > : Greater than
    - >= : Greater than or equal to
    - < : Less than
    - <= : Less than or equal to
    - LIKE

The LIKE operator allows us to use wildcards (the percent sign: %):
SELECT LastName, FirstName, Number
FROM UsersAll
WHERE Number LIKE ‘20000%’

This query will return all records where the Number field starts with „20000‟ (ie: get all
users with OHSU barcodes)

What if we want to extract data from the database from several tables at once? We do so
by JOINING the tables. There needs to be an established KEY relationship between
them: a primary key (row with unique value) in one table correlates to one or more
row(s) in another table. This relationship allows us to join two or more tables and ensures
our result set will not return duplicate records between the tables (note: you will still get
dupes within the table if they exist! More on that in a bit)

Example that returns Loans by User (joining UsersAll and Transactions tables in
WHERE clause):
SELECT u.Status, u.Department, t.TransactionDate, t.LoanTitle,
t.LoanPublisher, t.LoanDate, t.LoanEdition, t.ISSN, t.LendingLibrary
FROM dbo.Transactions t, dbo.UsersAll u
WHERE LOWER(t.TransactionStatus) = 'request finished'
       AND LOWER(t.DocumentType) = 'loan'
       AND LOWER(u.Status) IN ('faculty/staff', 'student', 'distanceeducation')
       AND ((t.TransactionDate < '1/1/06') AND (t.transactionDate >= '1/1/05'))
       AND (t.Username = u.UserName)

   -   Notes:
          o Table is specified via explicit schema (where it is located in the DB:
              dbo.Transactions) and is followed by an „alias‟ or shortcut designation that
              makes it easier when denoting the table throughout the rest of the query
          o I am using a SQL Server string function called LOWER to convert values
              from the table to all lowercase, ensuring data consistency and that
              everything is caught
          o I am using ANDs to logically specify multiple conditions that must be true
              (you can also use ORs and NOTs; use parenthesis to control order of
          o I am also using the IN keyword to easily specify that I want to match on a
              value within the parenthesis (easier than writing out „equals‟ conditions
              for each of those values)
          o JOIN occurs at the bottom of our WHERE clause: here I specify which
              fields I want to join the two tables on.

You can also join in the FROM clause (I prefer this way as it is cleaner and more ANSI

SELECT u.Status, u.Department, t.TransactionDate, t.LoanTitle, t.LoanPublisher,
t.LoanDate, t.LoanEdition, t.ISSN, t.LendingLibrary
FROM dbo.Transactions t INNER JOIN dbo.UsersAll u ON (t.Username = u.UserName)
WHERE LOWER(t.TransactionStatus) = 'request finished'
       AND LOWER(t.DocumentType) = 'loan'
       AND LOWER(u.Status) IN ('faculty/staff', 'student', 'distanceeducation')
       AND ((t.TransactionDate < '1/1/06') AND (t.transactionDate >= '1/1/05'))

   -   Notes:
          o INNER JOIN: returns all rows where both tables have matching rows,
              unmatched rows are excluded from our result set (ie: an average join)
          o LEFT and RIGHT JOIN: LEFT returns all rows where there are matches,
              but also returns rows from the LEFT or first table that have no match on
              the right, or second table. RIGHT is the opposite.

But I am anal and want my data sorted! Well, use an ORDER BY clause then:
SELECT u.Status, u.Department, t.TransactionDate, t.LoanTitle, t.LoanPublisher,
t.LoanDate, t.LoanEdition, t.ISSN, t.LendingLibrary
FROM dbo.Transactions t INNER JOIN dbo.UsersAll u ON (t.Username = u.UserName)
WHERE LOWER(t.TransactionStatus) = 'request finished'
       AND LOWER(t.DocumentType) = 'loan'
       AND LOWER(u.Status) IN ('faculty/staff', 'student', 'distanceeducation')
       AND ((t.TransactionDate < '1/1/06') AND (t.transactionDate >= '1/1/05'))
ORDER BY t.TransactionDate, u.UserName

   -   Notes:
          o Here, we specify which column we want our result set sorted on. You can
              also specify (separated by columns) what to sort on after that initial
              column if there more than one value.

Wait a minute buddy, I am getting duplicate records in my result set. How do I deal with

The DISTINCT keyword after SELECT will tell SQL Server to only return unique rows
(i.e.: per our result set it will not report rows that are the same) if it encounters them. An
example that returns users with visa cards:

              u.Username as 'UserName',
              u.LastName as 'LastName',
              u.FirstName as 'FirstName',
              u.Status as 'Status',
              u.EMailAddress as 'EmailAddress'
FROM UsersAll u
       JOIN UserAccountsLink l ON (u.UserName = l.UserName)
       JOIN UserAccounts a ON (l.InternalNo = a.InternalNo)
WHERE u.NVTGC = 'ILL' AND u.Cleared = 'Yes' And a.Active = 'Yes' AND a.Type
like 'VISA%'
ORDER BY u.LastName

   -   Notes:
          o I am using “Aliases” for columns returned in our result set (specified after
              AS keyword) for a cleaner report
          o To get a result set that joins users with their accounts (one user can have
              any accounts) we need to use a “link table” called UserAccountsLink,
              which is essentially a representation of the correlation (that rhymes!)
              between UsersAll.Username and UserAccounts.InternalNo. So we join
              three tables…
          o My queries are ran against an Illiad setup with multiple sites, as such I am
              querying a UsersAll table and am using the NVTGC field in that table to
              specify which site I want to restrict my query to. Otherwise, the query
              would include User data from all sites. You may be working with table in
              a single site installation, so your mileage may vary.

In the above example, we were still getting dupe records (ie: some users had multiple
records in the UsersAll table with slightly different date in them). Realistically, this data
should be cleaned up pronto, but we needed a way to get an accurate count of users and
their contact details so I used a GROUP BY clause and the aggregate function MAX to
discard these erroneous rows:

             MAX(u.Username) as 'UserName',
             MAX(u.LastName) as 'LastName',
             MAX(u.FirstName) as 'FirstName',
              MAX(u.Status) as 'Status',
              MAX(u.EMailAddress) as 'EmailAddress'
FROM UsersAll u
       JOIN UserAccountsLink l ON (u.UserName = l.UserName)
       JOIN UserAccounts a ON (l.InternalNo = a.InternalNo)
WHERE u.NVTGC = 'ILL' AND u.Cleared = 'Yes' And a.Active = 'Y es' AND a.Type
like 'VISA%'
GROUP BY u.UserName
ORDER BY u.LastName

   -   Notes:
          o GROUP BY takes a field and groups the output rows based on it. In this
              case, any row with the same UserName is merged into a single row. In
              order for the GROPU BY clause to work, you typically have to use
              aggregate functions in the SELECT statement. Here I am using the MAX
              function, which returns the highest value (in numeric terms), but in this
              case it returns the first value and thus will ignore any dupes below it.
              Other aggregate functions include COUNT, MIN, AVG, etc – COUNT in
              this case would tell us HOW many dupe records we had – which would
              yield helpful for cleaning up the data…

INSERT, UPDATE, DELETE statements are used for manipulating data. Use INSERT to
add new records to table(s), UPDATE to modify fields of existing record(s) and DELETE
to remove records.

INSERT example (I don‟t have a tested example, but here is the context:

INSERT INTO table_name (column1, column2,...)
VALUES (value1, value2,....)

   -   Notes:
          o     table_name is the table to insert values into, followed by:
          o     columns we want data inserted into (typically in parenthesis), followed by:
          o     VALUES clause and values to insert into those columns, followed by:
          o     If you do not specify columns after the table name, values will be inserted
                starting with the first column in the table‟s design and working up (so, you
                need to specify enough values for the entire table, otherwise SQL Server
                will yell at you)

UPDATE example (setting user accounts to inactive who have visa cards):

UPDATE UserAccounts
SET UserAccounts.Active = 'No'
WHERE UserAccounts.InternalNo IN
       (SELECT a.InternalNo FROM UserAccounts a
              JOIN UserAccountsLink l ON (l.InternalNo = a.InternalNo)
              JOIN UsersAll u ON (u.UserName = l.Username)
       WHERE u.NVTGC = 'ILL' And a.Active = 'Yes' AND a.Type like 'VISA%')

   -   Notes:
          o Here I am updating a single field “Active” in the UserAccounts table to
              „No‟ (specified via the SET clause)
           o Note how I am using TWO (count them, TWO!) SQL queries in one. In
             this case, the inner most query, the SELECT query is ran first and returns
             a result set (in this example, it is pulling InternalNo from UserAccounts
             who are in the „ILL‟ site, active and have visa cards) . Then, the Update
             query updates fields for records in that results set (using the IN keyword).

DELETE example (delete accounts for users with visa cards):

DELETE FROM UserAccounts a2
WHERE a2.InternalNo IN
       (SELECT a.InternalNo FROM UserAccounts a
              JOIN UserAccountsLink l ON (l.InternalNo = a.InternalNo)
              JOIN UsersAll u ON (u.UserName = l.Username)
       WHERE u.NVTGC = 'ILL' And a.Active = 'Yes' AND a.Type like 'VISA%')

Disclaimer: Obviously it is a good idea to read up on and test out modification queries
like these on test data before messing around with them!

Different ways of storing your SQL on the server:
All of the above details ways of writing what are known as ad-hoc queries, or queries
where you typically are using a program like Query Analyzer, TOAD, or Access to run
the query against the DB. When you reach a point where you want to start exploring the
possibilities of utilizing SQL queries within applications and scripts, it is highly
beneficial to write queries that are stored on the SQL server. Some examples include:
    - A view (easier to write, but is very limited: cannot manipulate data, has no control
        flow, no parameterization, and is limited to a single select statement; good choice
        for dealing with simple reports where you want some abstraction of your original
    - A stored procedure/user defined function (UDF) (more difficult, start getting into
        T-SQL, but is exponentially more powerful: allows all those above, better choice
        when building applications, and thinking „modular‟; UDF returns a value, stored
        procedure does not)

   -   Notes:
          o There is really no performance gain in using stored procedures over views;
              in both cases the execution plan is stored in the server‟s cache, making the
              SQL run much faster than if it is delivered to the system at query time (ad-
          o A well written stored procedure or function leverages your data processing
              entirely onto the SQL Server (it is built for that; hit it hard and be happy!)
              thus providing data abstraction (read: protection), requires less manual
              coding on the programming/scripting end, and always executes faster than
              ad-hoc queries.

Resources online (I didn’t make this stuff up, I stole it):
Straight forward/no nonsense SQL reference (geared towards web work):
SQL Server security model and security best practices, tips:

Should I use a view, a stored procedure, or a user-defined function?

MSDN reference for SQL syntax, Server functions, T-SQL, etc…:

Contact info:
Nick Peterson
Oregon Health & Science University
Library Computer User Support Analyst
Phone: 503.494.1361

Questions? Insults?

To top