Real-Time Databases
• Krithi Ramamritham, “Real-Time Databases,” International Journal of Distributed and Parallel Databases, 1(2), pp. 199-226, 1993. • J. Stankovic, S. H. Son, and J. Hansson, " Misconceptions About Real-Time Databases," IEEE Computer, vol. 32, no. 6, pp 29-36, June 1999.
Outline
• • • • • • Motivation Characteristics of data in RTDB Characteristics of transactions in RTDB Relations between active DB and RTDB Transaction processing in RTDB Research issues
Motivation
• Many applications involve:
– time-constrained access to data – data temporal validity – examples: agile manufacturing, stock trading, e-commerce, command and control, network management, target tracking, ...
– Timely transaction/query processing – Use fresh, i.e., temporally consistent, data
• Requirements
Traditional Databases
• (Traditional) DBs
– Deal with persistent data – Transactions access (persistent) data, while maintaining the consistency – Serializability is the correctness criterion – Support a good throughput and response time
Background: Serializability
• Correct criterion for concurrent transaction executions • Why concurrent transactions?
– Better performance than serial executions
• Deifinition
– A concurrent execution of transactions is equivalent to a serial execution of the transacions – A correct concurrent execution of the transactions produces the same result as they are executed one at a time
Background: Conflict Serializability
• Two operations conflict if:
– they are issued by two transactions; – they access the same data; and – at least one of them is a write
• Two transaction schedules are conflictequivalent if all conflicting operations are in the same order in the two schedules • A concurrent schedule is conflict-serializable if it is conflict-equivalent to a serial schedule
Background: Conflict Graph
• Conflict graph – Nodes: transactions – Directed Edges: conflicts • Example: schedule S = w1(x)r2(x)r3(y)w2(z)r3(z)w3(y) r1(y)
T1 T3
T2
• A schedule is conflict serializable if there’s no cycle in the conflict graph
Background: Concurrency Control - Locking
• A transaction should get a lock on a data before accessing it • Shared lock: More than one transaction can get a shared lock on a data at the same time • Exclusive lock: Only one transaction can get an exclusive lock on a data at a time • If a data has a shared lock, other transactions can get a shared lock to read the data • If a data is already locked through either a shared or exclusive lock, another transaction cannot get an exclusive lock on the same data -> It has to block • This simple mechanism doesn’t necessarily support conflict-serializability
Background: 2PL (Two Phase Locking) for Conflict Serializability
• A transaction execution can be divided into two phases:
– Growing phase: The transaction can only acquire locks – Shrinking phase: It can only release locks
#locks
• Strict 2PL: Hold an exclusive lock until the transaction committs
RT systems
• Meet timing constraints • Deal with temporal data that become outdated after a certain time • Recall real-time ≠ fast: See the next slide
Real-time ≠ Fast
Time-cognizant transaction scheduling & concurrency control required!
Why RTDB?
• • RT applications may deal with many data, e.g., for target tracking, agile manufacturing, stock trading, ... DB can facilitate:
(a) description of data – schemas help avoid redundancy of data (b) maintenance of correctness & integrity of data (c) efficient access to data - indexing (d) correct execution of transactions in spite of concurrency and failures – ACID properties (Atomicity, Consistency, Isolation, Durability)
RTDB Features
• Not all data are permanent but temporal, e.g., sensor data or stock prices • Temporally-correct serializable schedules are a subset of serializable schedules • Timeliness is more important than correctness
– Tradeoff btwn timeliness & serializability – Tradeoff btwn timeliness & atomicity
• Motononic queries and transactions supported by the milestone approach • Data similarity concept • Adaptive update policy
– Tradeoff btwn timeliness & data temporal consistency
• Both real-time scheduling & database technologies can be applied to real-time data management
Data Characterics in RTDB
• Temporal data consistency: Keep track of the real world status
– Absolute consistency btwn the state of the environment, e.g., manufacturing or market status, and its reflection in databases – Relative consistency among the temporal data used to derive other data
• Relative consistency of stock price data used to derive SP500 index
Absolute consistency
• Denote a temporal data item in RTDB by d: (value, avi, timestamp)
– dvalue denotes the current value of d – dtimestamp denotes the time when the d was updated – davi denotes d’s absolute validity interval, i.e., length of time interval following dtimestamp during which d is considered to have absolute validity – d is absolutely consisntent if current time ≤ dtimestamp + avi
Relative Consistency
• Relative consistency set R: a set of data used to derive a new data • Each set R is associated with a relative validity interval (rvi) • Example:
– SP500 index is an average of 500 stock prices – Target position can be computed using, e.g., aircraft heading, air speed, wind speed & direction, barometric pressure, ...
Relative Consistency
• Assume a data d in R (relative consistency set) • d has a correct state iff
– dvalue is logically consistent – satisfy all integrity constraints – d is temporally consistent
• absolute consistency: (current time – dtimestamp) ≤ davi • relative consistency: For arbitrary d’ in R, |dtimestamp – d’timestamp| ≤ Rrvi
Relative Consistency
• Examples
– temperatureavi = 5, pressureavi = 10, R = {temperature, pressure}, Rrvi = 2 – If current time = 100,
• temperature = {347, 5, 95} ({value, avi, timestamp}) & pressure = {50, 10, 97} are temporally cosistent • temperature = {347, 5, 95} & pressure = {50, 10, 92} are not because (95-92) > Rrvi = 2, although temperature and pressure meet the absolute consistency requirements
Relative consistency
• At time 100, temperature = {347, 5, 95} & pressure = {50, 10, 92} are not temporally consistent because (95-92) > Rrvi = 2, although temperature and pressure meet the absolute consistency requirements • Is this good? • Users may expect relative consistency is satisfied if the abosolute consistency of all the data in R is met! • avi of pressure should be reduced to 5 to meet the required rvi of 2 and the updates of pressure and temperateure should always be done within 2 time units • A better metric is required! But, not much work has been done to address this issue!
Transaction characteristics in RTDB
• Transaction types
– Write-only transactions obtain the real-world status and write into RTDB (also called sensor transactions) – Update transactions derive and store new data in RTDB (also called derived data recomputations) – Read only transactions, i.e., queries
• Read sensor data and compute actuation signals
– User transactions that read temporal data and read/write non-temporal data
Transaction characteristics in RTDB
• Example transactions
– Sample wind velocity every 10s – Update robot positions every 20s – If temperature > 100, add coolant to reactor in 10s – If the average stock price of a user portfolio changes by more than 10%, sell the stocks within 5s
Transaction characteristics in RTDB
• Deadlines
– Hard: Negative infinte value upon a deadline miss – Soft: Value decreases as time goes on after the deaadline – Firm: No value after the deadline miss
Transaction characteristics in RTDB
• How often do we need to execute a sensor transaction to update data x?
– Period = 0.5 * avi(x): Half-half principle
If period = avi: avi x is stale
If period = 0.5avi: avi avi x is fresh as long as the sensor transaction finishes within the period
Transaction characteristics in RTDB
• How often do we need to recompute a derived data?
– More complex – Ideally, a derived data should be fresh if recomputed at every rvi – Alternatively impose precedence constraints on the transactions to confirm with the derived-from relationship
Relationship to Active Databases
• Basic building block in active DB: Event, Condition & Action (ECA)
– On event
If condition Do Action
– Upon the occurence of the specified event, if the condition holds, then trigger the specified action – Good model for triggering periodic/aperiodic activities based on the events and conditions – Timing constraints are not explictly considered
Relationship to Active Databases
• Active DB has necessary features for real-time data management • Timeing constraints should be considered • Example
On (10 seconds after “initiating landing preparations” If (steps are not completed) Do (within 5 seconds “abort landing”)
Transaction Processing in RTDB
• Key issue: predictability
– Will the transaction meet its timing constraint? – Sources of unpredictability – Processing hard real-time transactions – Processing soft real-time transactions
Sources of unpredictability in DB
• Dependence of transaction exec sequence on data values
– Very hard to predict the worst case exec time – Avoid to use unbounded loops, recursive or dynamically constructed data structures – In RTDB, the data items accessed by a transaction are likely to be known once its functionality in the controlled environment is known
Sources of unpredictability in DB
• Data & resource conflicts
– Wait for data and resources, e.g., CPU & I/O device – Data consistency requirements exacerbate the problem
• Long blocking due to concurrency control • Priority inversion • Deadlock – 2PL is not free of deadlock
Sources of unpredictability in DB
• Dynamic paging & I/O
– Demand paging in disk-resident databases – Very pessimistic worst case where all data need to be fetched from disk – Disk scheduling & buffering – Main memory databases eliminate these problems
Aborts, rollbacks, and restarts
• Transaction aborts, rollbacks, and restarts
– A transaction can be aborted and restarted several times before it commits
• Total exec time increases. If #total aborts cannot be controlled, it can be unbounded • Resources & time needed to deal with aborts & restarts can be denied to other transactions
Preanalysis of transactions
• Get an estimate of a transaction’s exec time & data/resource requirements • Impossible for complex transactions • Two-phase transaction exec
– Pre-fetch phase
• A transaction is run once, bringing in the necessary data into main memory • Access invariance [15]: A transaction’s exec path does not change due to possible concurrent changes done to the data by other transactions, while the transaction is going through its pre-fetch phase • No writes are performed • Conflicts with other transactions are not considered • Determine computation demands
Preanalysis of transactions
• Two-phase transaction exec
– Try to guarantee the transaction will commit by its deadline in the 2nd phase – Ensure the necessary data & processing resources are available at the appropriate times via planning
• If access invariance holds, a transaction will complete by its deadline • No recovery such as undo is necesary if a transaction is unable to execute • How much overhead?? Worth it?
Dealing with Hard Deadlines
• Must meet all deadlines • Requirements
– Transactions should be periodic – WCET & resource requirements must be determined – Many restrictions on the structure & characteristic of RT transactions -> RT scheduling techniques can be applied
Dealing with Soft Deadlines
• More leeway • Most DB applications are not hard but soft real-time • Meet as many deadlines as possible • Abort a transaction upon its deadline miss
– Don’t waste resources for tardy transactions – Always good? Different application semantics?
• Real-time scheduling and conflict resolution are required
Scheduling
• EDF • Least slack first
– Schedule the transaction with the least slack (i.e., deadline – current time– remaining exec. time) first – High overhead – Priority changes very often
• Highest value first • Highest value density (value/exec time)
• Longest executed transaction first
– How to determine value???
Conflict resolution: 2PL variations
• Priority inheritance
– If a high priority is blocked due to a low priority transaction, a low priority transaction inherits the high priority – Reduces blocking time; however, – Blocking time = Duration of a transaction under strict 2PL
• Priority abort
– A high priority transaction aborts a low priority transaction upon a data conflict – Better real-time performance than priority inheritance – 2PL-PA/2PL-HP well accepted in RTDB – Low priority transactions may suffer repeated aborts and restarts, which can be a problem in, e.g., e-commerce
Conflict resolution: Optimistic concurrency control
• Assume there’s not data conflict during a transaction execution • Keep executing a transaction • Upon finishing every operation in a transaction, enter the validation phase • If validation succeeds, the transaction commits • Otherwise, it is aborted
Conflict resolution: Optimistic concurrency control
• Backward validation
– A validating transaction is aborted if it conflicts with transactions already committed – Characteristics of a validating or ongoing transactions cannot be considerd for conflict resolution – A validating transaction aborts ongoing transactions if there’s a conflict – More applicable to RTDB – Wait-50: A validating transaction blocks as long as more than half the transactions that conflict with it have earlier deadlines
• Forward validation
Distributed RTDB
• Very little work has been done • Challenges
– Transaction commitment protocol, e.g., 2PC (Two Phase Commit), has high overhead – Unpredictable network delay
• Opportunities
– Data & resource availability at remote nodes – Load balancing – Fault/intrusion tolerance
Two Phase Commit (2PC) Protocol
• Supports the integrity in distributed databases used in, e.g., airline reservation, banking, and stock trading • All participating databases must either commit or abort and rollback • Prepare phase: Each database informs the coordinate whether it will commit or abort a transaction • Commit phase: Commit if every database intends to commit; otherwise, abort & rollback • Drawback
– If only one database is unavailable, all the other databases cannot commit – Too much overhead for real-time applications – Better approaches are required!
QoS Tradeoff & Overload Management
• APPROXIMATE
– Montonically increase the accuracy of the answer to a query as more exec time is spent – Provide an approximate answer, if necessary, to meet the deadline
– Allow transactions to read data while concurrent writes are going on – Bound the error to be below the specified epsilon – Apply a weaker security mechanism under overload
• Epsilon serizability
• Timeliness & security tradeoff
Research issues
• QoS guarantees in RTDB • Distributed real-time data management • Security
– Transaction timeliness & data freshness
• New applications
– Access control for RTDB?
– e-commerce: QoS guarantees given dynamic workloads – Embedded applciations: Timeliness, data temporal consistency, energy-efficiency, composability, security, real-time data-centric routing and sensor data aggregation, ...
Questions?