Embed
Email

HPSAA2004

Document Sample

Shared by: Kerala g
Categories
Tags
Stats
views:
0
posted:
12/26/2011
language:
pages:
6
AUTOMATION RELIABILITY IN UNMANNED AERIAL VEHICLE FLIGHT

CONTROL



Stephen R. Dixon & Christopher D. Wickens

University of Illinois at Urbana-Champaign



ABSTRACT



Twenty-four students flew a simulated unmanned aerial vehicle (UAV) through ten mission legs

while searching for targets of opportunity and monitoring system parameters. Participants were

assisted by automation which provided auditory alerts in response to system failures (SF). The

auto-alerts were either 80% reliable or 60% reliable; the latter condition resulted in either a 3:1

ratio of false alarms to misses, or vice versa. Results indicated that the 80% reliable automation

exceeded baseline (no automation) performance in the target search task. The two 60% reliable

conditions provided no benefits to performance; both false alarms and misses hurt performance

in the automated task and concurrent tasks, but did so qualitatively differently. Implications for

this study suggest that automated aids must be fairly reliable to provide global benefits, and data

regarding the relative costs of misses versus false alarms on performance were equivocal.



Keywords: unmanned aerial vehicle, automation, false alarm, miss



INTRODUCTION



Flying a single unmanned aerial vehicle (UAV) includes navigating the UAV, monitoring craft

parameters, and searching for possible targets (Dixon & Wickens, 2003). The military currently

employs different forms of automation to aid pilots in these tasks; however, very few automated

aids are perfectly reliable, and can create different states of overtrust, undertrust, or calibrated

trust (Parasuraman & Riley, 1997). It is unclear how unreliable the automation needs to be to

cause performance to drop below that of baseline (no automation), and while a 70% “threshold”

has been offered (Dixon & Wickens, 2003; Lee & See, in press), there are noted exceptions both

above and below that level (e.g. Dzindolet et al., 1999; Rovira, Zinni, & Parasuraman, 2002).

Dixon & Wickens (2003) found benefits for an auto-pilot with 67% reliability, but costs for an

auto-alerting system at the same reliability level, and reasoned that under conditions of high

workload, an operator may rely upon imperfect automation even if the automation is not fully

trusted. Such reliance will degrade performance of the automated task itself even as it helps

concurrent tasks (e.g. Rovira et al., 2002).

Within the class of automation that guides attention to notice or diagnose a failure

(Parasuraman et al, 2000), unreliable aids will create false alarms (alarm with no event) and/or

misses (no alarm with an event). False alarms tend to cause distrust in the aid (Meyer & Ballas,

1997), while misses lead to reallocation of visual resources to the raw data in order to “catch” the

automation miss (Cotté, Meyer & Coughlin, 2001). Using target recognition automation, Maltz

& Shinar (2003) found that increasing false alarm rates caused greater disruption to performance

than did increasing miss rates. Dixon & Wickens (2003) also made such a contrast by having

pilots perform a high-fidelity UAV simulation under conditions with either no automation,

perfectly reliable auto-alerts, or 67% reliable auto-alerts with either false alarms or misses.

Results revealed that while the perfectly reliable auto-alerts benefited the automated task, the two







1

imperfect auto-alert conditions equally hurt performance in both the automated task and

concurrent tasks.

While Dixon & Wickens (2003) used conditions with only false alarms or only misses,

the current study included an 80% reliable condition with an equal number of false alarms and

misses, as well as two 60% reliable conditions with a 3:1 ratio of false alarms to misses and vice

versa. We hypothesized that (1) 80% reliability would consistently improve performance above

baseline; (2) both 60% reliability conditions would degrade performance below baseline; (3)

decrements due to unreliability would be more pronounced on the automated task than on

concurrent tasks; and (4) miss-prone automation would disrupt concurrent tasks more than false-

alarm prone automation, because of the former’s requirement for more continuous visual

monitoring of SF status. Please refer to Dixon & Wickens (2004) for a more thorough

presentation of the experimental methods



METHOD



Participants and Equipment. Thirty-two students at the University of Illinois received $8 per

hour, plus bonuses of $20, $10, and $5, for 1st, 2nd, and 3rd place finishes, respectively, in their

group of eight pilots. Figure 1 presents a sample display for a UAV simulation, with verbal

explanations for each display window and task.









Figure 1. A UAV display with explanations for different visual areas.



Procedure. Each pilot flew one UAV through ten different mission legs, in one of the four

experimental conditions, while searching for targets of opportunity and monitoring system

parameters. Pilots obtained flight instructions via the Message Box, including fly-to coordinates

and a report question pertaining to the command target (CT). These instructions were present for

15 seconds, and pressing a repeat key automatically refreshed the flight instructions for an

additional 15 seconds.

CT reports required that pilots loiter around the target, manipulate a camera for closer

target inspection, and report back relevant information to mission command. Along each







2

mission leg, pilots were also responsible for detecting and reporting targets of opportunity

(TOO), a task similar to the CT report, except that the TOOs were much smaller (1-2 degrees of

visual angle) and camouflaged. TOOs could occur during simple tracking (low workload) or

during a pilot response to a system failure (high workload).

Concurrently, pilots were also required to monitor system gauges for possible system

failures (SF), which were indicated by the white needle moving into a red zone (at the top or

bottom of the gauges). SFs were designed to fail either during simple tracking (i.e. low

workload) or during TOO and CT inspection (i.e. high workload). The SFs lasted only 30

seconds, after which the screen flashed bright red and a salient auditory alarm announced that the

pilot had failed to detect the SF.

Automation aids, in the form of auditory auto-alerts during SFs, were provided for three

out of the four conditions. The A80 condition (A = automation; 80% reliable) failed by giving

one false alarm (i.e. alarm with no actual SF), and one miss (i.e. a SF with no alarm) during each

mission. The A60f condition (f = false alarm; 60% reliable) resulted in more false alarms (3)

than misses (1), while the A60m condition (m = miss; 60% reliable) resulted in more misses (3)

than false alarms (1). Pilots were told that the automation was either “fairly reliable” or “not very

reliable”, as well as the bias setting (i.e. more false alarms or more misses). Ratings of

subjective trust were given by each pilot at the end of the mission.



RESULTS



3.1 Mission Completion. Tracking error was not affected by condition [F(3, 27) = 1.24, p >

.10]. The number of repeats was affected by condition [F(3, 25) = 3.56, p = .029]; however, only

the A60m condition (mean = 8.5) suffered relative to baseline (mean = 3) condition [p < .01].



3.2 Targets of Opportunity (TOO) and Command Targets (CT). For TOO detection rates,

only the A80 condition (mean = 93%) improved performance relative to baseline (mean = 76%)

[p < .05]. For TOO detection times, as shown in Figure 2, an interaction between condition and

load [F(3, 23) = 4.82, p = .01] indicates that the condition effect was only present at high load.





TOO Detection Times (High Load vs. Low Load)



20

Detection Times (secs)









16



12

High

8 Load



4 Low

Load

0

Man A80 A60f A60m

Condition



Figure 2. TOO detection times across condition and workload. SE bars are included.









3

Figure 2 reveals that the penalty for increased load was higher for both the A60f (mean =

14.73) and the A60m (mean = 11.87) conditions relative to baseline (mean = 6.04) [all p < .05].

Only the A60f condition differed from the A80 condition (mean = 8.58) [p < .01]. For CT

detection times, there was a main effect of condition [F(3, 27) = 6.16, p < .01], and both the A60f

(mean = 4.17) and the A60m (mean = 4.11) conditions suffered relative to baseline (mean =

2.45) [all p < .05].



3.3 System Failures (SF). For SF detection rates, higher load reduced detection rates [F(1, 27)

= 21.46]; however, there was no main effect of condition [F(3, 27) < 1.0], or interaction [F(3, 27)

< 1.0]. For SF detection times, as shown in Figure 3, higher load increased detection times [F(1,

27) = 93.3, p < .001]. The main effect of condition [F(3, 27) = 3.62, p = .026] can only be

interpreted in the context of the interaction [F(3, 27) = 3.06, p = .045], which reveals that the

A60f condition (mean = 19.99) suffered more due to high load than the other conditions.



SF Detection Times (High load vs. Low load)



25

Detection Times (secs)









20



15

Low

10 Load



5 High

Load

0

Man A80 A60f A60m

Condition



Figure 3. SF detection times across condition and workload. SE bars are included.



Figure 3 reveals that the penalty due to high load was approximately 6-9 seconds more

for the A60f condition than the other three conditions [all p < .03]. We note that each of the 60%

condition means is actually composed of two different components: responses when an alert

correctly sounded, and those when the alert failed to sound. Table 1 shows the resulting four

means, within the high workload condition.



Table 1. Component means in the A60f and A60m conditions. SE is in parentheses.

CONDITION

A60f A60m

26.05 sec 23.29 sec

Miss (failure) (1.83) (2.77)

EVENT

13.93 sec 3.96 sec

Alarm (correct) (4.85) (1.17)



The data reveal the clear slowing for RT when the alarm “missed” the SF event,

indicating that in both conditions, pilots had relied heavily upon the automation, and their

detection suffered when it failed. Correct alerts were responded to more rapidly with the miss

prone automation (mean = 3.96) than the false alarm-prone automation (mean = 13.93) [p < .05],





4

reflecting the pilots’ immediate compliance with the auditory alert (Meyer, 2001) in the former

condition, in contrast to the false-alarm prone condition, where pilots were less likely to interrupt

target inspection to deal with the alarms. We also infer that greater compliance in the miss

condition is coupled with an ongoing greater awareness of the SF gauges, fostered by a reduced

reliance on that automation, and causing greater disruption to memory recall.



3.4 Subjective ratings of trust. Pilots were surprisingly accurate in their overall assessment of

the automation reliability [A80 = 82%; A60f = 54%; A60m = 56%], in contrast to Dixon &

Wickens (2003), who concluded that pilot trust in the automation was poorly calibrated when

they did not receive any prior information as to reliability levels or bias setting.



DISCUSSION



The A80 condition (80% reliability) supported a significant increase in concurrent task

performance, confirming our first hypothesis. This indicates that the automation, while

imperfect, still allowed pilots to save visual and cognitive resources, which they could reallocate

to the concurrent target search task (Rovira et al, 2002).

At 60% reliability, neither the false alarm nor miss conditions (A60f and A60m) provided

any benefits, and in some instances performance was well below baseline during high workload

conditions, thereby confirming hypothesis 2. In general, however, the costs of imperfection were

as heavily born on the concurrent tasks as on the SF task itself, a pattern inconsistent with

hypothesis 3.

Finally, regarding hypothesis 4, the false alarm condition (on average, across

performance measures) resulted in slightly poorer performance in the SF detection task, than did

the miss condition. On the one hand, the miss condition degraded CT memory (requiring more

repeats) to a greater extent than did the false alarm condition, supporting hypothesis 4. That is,

more continuous monitoring of the raw system data was required in the miss condition. On the

other hand, the false-alarm condition (in high workload) appeared to delay detection of a TOO

that became visible while the failure was present, more than the miss condition. This difference

we attribute to pilots’ need, when an alarm sounds in the A60F condition, to double check the

raw data (visual system gauges) to assess its consistency with the auditory alert (a distrust, or

reduced compliance). Thus the two types of automation imperfection had opposing effects on the

concurrent tasks, both replicating prior findings of Dixon & Wickens (2003).

With regard to SF performance itself, figure 3 and table 2 clearly indicate reduced costs

for the miss condition than for the false alarm condition at high workload, a pattern at odds with

that reported by Dixon & Wickens (2003). We can account for the current pattern in terms of the

greater compliance with, and lesser reliance on, the imperfect automation in the miss than in the

false alarm condition (Meyer, 2001). Compliance is increased because of the belief that if an

alarm sounds, it is quite likely to be true. Reliance on the alert is reduced because of the subjects’

knowledge that it may frequently fail to signal a true system failure. The reason for the

discrepancy of the current pattern of results with those of Dixon and Wickens requires further

research.

The implications of this study are that higher reliability automation in necessary to

facilitate improvements in overall performance relative to baseline, and that false alarms may be

more detrimental to overall alerted task performance than misses.









5

ACKNOWLEDGMENTS



This research was sponsored by a subcontract # ARMY MAAD 6021.000-01 from

Microanalysis and Design, as part of the Army Human Engineering Laboratory Robotics CTA,

contracted to General Dynamics. David Dahn and Marc Gacy were the scientific/technical

monitors Any opinions, findings, and conclusions or recommendations expressed in this paper

are those of the authors and do not necessarily reflect the views of the Army CTA. The authors

also wish to acknowledge the support of Ron Carbonari and Jonathan Sivier (in developing the

UAV simulation), and of Dervon Chang for assisting with data collection.



REFERENCES



Cotté, N., Meyer, J., & Coughlin, J. F. (2001). Older and younger driver’s reliance on collision

warning systems. Proceedings of the 45th Annual Meeting of the Human Factor Society (pp.

277-280). Santa Monica, CA: Human Factors and Ergonomics Society.

Dixon, S. & Wickens, C.D. (2003). Imperfect Automation in Unmanned Aerial Vehicle Flight

Control. (AHFD-03-17/MAAD-03-1). Savoy, IL: University of Illinois, Aviation Research

Lab.

Dzindolet, M. T., Pierce, L. G., Beck, H. P., & Dawe, L. A. (1999). Misuse and disuse of

automated aids. Proceedings of the 43rd Annual Meeting of the Human Factors and

Ergonomics Society (pp. 339-343). Santa Monica, CA: Human Factors and Ergonomics

Society.

Lee, J. D., & See, K. A. (in press). Trust in automation: Designing for appropriate reliance.

Human Factors.

Maltz, M., & Shinar, D. (2003). New alternative methods in analyzing human behavior in cued

target acquisition. Human Factors, 45, 281-295.

Meyer, J. (2001). Effects of warning validity and proximity on responses to warnings. Human

Factors, 43, 563-572.

Meyer, J., & Ballas, E. (1997). A two-detector signal detection analysis of learning to use alarms.

Proceedings of the 41st Annual Meeting of the Human Factor Society (pp. 186-189). Santa

Monica, CA: Human Factors and Ergonomics Society.

Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse.

Human Factors, 39, 230-253.

Parasuraman, R., Sheridan, T. B., & Wickens, C. D. (2000). A model for types and levels of

human interaction with automation. IEEE Transactions on Systems, Man, & Cybernetics,

30(3), 286-297.

Rovira, E., Zinni, M., & Parasuraman, R. (2002). Effects of information and decision automation

on multi-task performance. In Proceedings of the 26th Annual Meeting of the Human Factors

and Ergonomics Society. (pp. 327-331). Santa Monica, CA: Human Factors and Ergonomics

Society.









6



Other docs by Kerala g
union-budget-2012-13-highlights
Views: 102  |  Downloads: 0
notification M.Tech_05-03-09
Views: 59  |  Downloads: 0
India_Customs Regulation 1
Views: 56  |  Downloads: 0
CE Notification 39-2011-12.9.2011
Views: 54  |  Downloads: 0
STATISTICS
Views: 72  |  Downloads: 0
A Hero (R.K. Narayan)
Views: 91  |  Downloads: 6
RRBPatna-Info-HN
Views: 116  |  Downloads: 0
RRB-Notice-Para
Views: 113  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!