Enterprise development (in .NET), Day 7
Databases and Data Access
Andrey Shchekin http://ashmind.com St. Petersburg, 2009
Images © Mattahan Productions, Inc | http://mattahan.deviantart.com/ | http://mattahan.insocada.com/
Magic words
RDBMS
Relational database management system
Database features
1. Consistency 2.Performance 3. Scalability 4.Portability
1. Consistency: Constraints
What
A rich system of built-in rules. Data is required to conform.
Why
Incorrect data is hard to debug. Fixing live systems can get messy.
Types of Constraints
Data Types Checks Unique Keys Foreign Keys Triggers Assertions
Sample Constraints
CREATE TABLE Users ( Id int Login varchar(32) Age int InvitedById int )
NOT NOT NOT NOT
NULL NULL NULL NULL
PRIMARY KEY, UNIQUE, CHECK (Age > 17), REFERENCES Users(Id)
Transactions
ACID
Atomicity Consistency Isolation Durability
Transaction workflow
Step 1
Step 2
?
Transactions and debugging
BEGIN TRANSACTION -- Your debug code here
ROLLBACK TRANSACTION
2. Performance: Indexes
Indexes are quite similar to keys in hash tables. Clustered Data in table is ordered by the index.
Non-clustered
3. Scalability
Size
~O(1) selection over millions of rows ~O(1) inserts to large tables
Users
Heavy scaling is mostly outside of our scope. Suppose it just works with lots of users.
4. Portability
SQL = structured query language
Version SQL-86 Basic Features Advanced Features
SQL-89
SQL-92 SQL:1999 SQL:2003 JOIN, ALTER TABLE, UNION, EXCEPT, INTERSECT MERGE, CTEs MULTISET Deferrable constraints, assertions OOP XML
Most databases support basic features of SQL-92.
5 rules for fast queries
1. 2. 3. 4. Think in sets Know your needs Watch the indexes Experiment
2. Know your needs
Less important Do not use SELECT *. More important Do not ask for 100000 rows if you need 10. Storing 10000 items in a combo-box is not a technical question. Large data – Do not aggregate in business logic. Do not sort in business logic.
4. Watch the indexes
SELECT Login FROM dbo.Users WHERE LEFT(Login, 1) = 'a'
Table scan (very slow)
SELECT Login FROM dbo.Users WHERE Login LIKE 'a%'
Index scan (fast)
5. Experiment
Even small changes can affect performance.
Different engines have different optimizations.
Engines
MS SQL Server Oracle DB2 MySQL PostgreSQL VistaDB SQLite
MS SQL Server
A Microsoft RDBMS. Fast, simple, useful. Should be the first RDBMS to consider under Windows.
Pro
Great tool support Great performance Great usability
Contra
Requires Windows Costs money (Express is free)
Native XML support Native XQuery support
Missing useful SQL features
MySQL
A GPL/Commercial licensed database server. Has pluggable engine support. Has some useful extensions to standard SQL.
http://www.mysql.com/
MySQL: Engines
MyISAM
No transactions No foreign keys Table-level locks
InnoDB
Transactions Foreign keys Fine-grained locks
Pro
Cross-platform Free Small footprint
Contra
Missing critical tools (backup/restore) Lot of configuration required for optimal performance Very slow on index changes
Useful SQL extensions: REPLACE, INSERT IGNORE, LIMIT
IBM DB2
An enterprise DB server from IBM. Cross-platform. Free version is almost not limited.
http://www.ibm.com/db2
Pro
Cross-platform Useful free version Fast and powerful
Contra
Terrible tool UI Bad documentation Extremely expensive advanced features
Beyond relational
Full-text search
Supported by a lot of RDBMS. Implemented as an index text column.
For better full-text in MySQL look at Sphinx.
Full-text search in SQL Server
SELECT Id, Title, Body FROM Posts WHERE CONTAINS(Body, '"install*" NEAR mysql')
Data warehouses
Large bodies of denormalized data.
Database Reporting Server
OLAP Cube (Data Warehouse)
Application Server
Other approaches
Scalability
1. Sharding 2. Google BigTable 3. Yahoo Hbase
OODB: db4o
Data Access
Levels of data access
1. Direct
ADO.NET, JDBC
2. O/R: Active Record
Castle ActiveRecord, Rails LINQ to SQL
3. O/R: Data Mappers
(N)Hibernate, ADO.NET Entity Framework
Direct Access
string commandText = "SELECT Login FROM dbo.Users WHERE Id = @Id"; using (var connection = new SqlConnection(connectionString)) using (var command = new SqlCommand(commandText, connection)) { command.Parameters.AddWithValue("@Id", 5); using (var reader = command.ExecuteReader()) { while (reader.Read()) { // do something with data } }
}
O|R
Object
Relational
Common features
Reference-collection mapping Inheritance-table mapping Table per hierarchy Table per class Table per concrete class
Common terms
Lazy Load Unit of Work All changes to your objects are remembered and replayed against the database.
Identity Map Same Id == same entity.
Active Record
Database-driven design: Object ~ database row. Base class inheritance (ActiveRecordBase). ‘Classic’ (Ruby-like) Active Record requires explicit lifecycle.
LINQ to SQL
LINQ + ActiveRecord-like O/R mapper.
var ashminds = from user in db.Users where user.Login == "ashmind" select new { user.Id, user.Name }; var ashmind = ashminds.Single();
Good database-to-object model generator. Nearly not testable. Nice for small projects.
Magic words
POCO
Plain old CLR objects
NHibernate
Supports mapping of any POCO. Completely abstracts storage structure. Possible mappings: XML Attributes Code
Nhibernate: Sample mapping
ADO.NET Entity Framerwork
Data Mapper (NHibernate-like). Written by Microsoft. Good tools, but
1. complex to maintain 2. hard to test 3. missing important features
Questions