USE OF ARTIFICIAL NEURAL NETWORKS IN A STREAMFLOW PREDICTION SYSTEM BERNARD B. HSIEH CPT CHARLES L. BARTOS USAE R&D CENTER USAE R&D CENTER WATERWAYS EXP. STATION WATERWAYS EXP. STATION VICKSBURG, MS 39180 USA VICKSBURG, MS 39180 USA BIN ZHANG SCHOOL OF CIVIL ENG. PURDUE UNIVERSITY WEST LAFAYETTE, IN 47907 USA ABSTRACT A streamflow prediction system is developed by the Artificial Neural Networks (ANN) for addressing the flood forecasting issues of two different scale watersheds, the Sava River, Croatia and a segment of the lower Mississippi River. The study investigated the prediction system with single-point river stage, upstream-downstream riverflow forecasting, and rainfall-runoff hydrological process. The study indicated that the minimum length of riverstages required achieving about 90 percent of up to 3 days forecasting reliability was about 3 months for the Sava River. The reasonable downstream riverflow prediction from upper stream gauges was found in the Sava River even only half year daily values were available for model training. On the Mississippi River, with 16 years long-term daily information, the ANN can construct a very high precision riverflow forecasting system for Memphis, TN, from two upstream inputs, near the confluence of the Ohio River, without significant rainfall contribution in this river segment. INTRODUCTION Forecasting a riverflow provides a warning of impending stages during floods and assists in regulating reservoir outflows during low river flows for water resources management. For military application, an accurate forecasting of river stage and flow is critical during military operations since it directly influences force mobility. Most hydrologic processes exhibit a high degree of temporal and spatial variability, and are further plagued by issues of nonlinearity of physical processes, conflicting spatial and temporal scale, and uncertainty in parameter estimates. The capability exists to extract the relationship between the inputs and outputs of a process, without the physics being explicitly provided. It is also possible to provide a map from one multivariable space to another, given a set of data representing that mapping. These properties of ANNs may be well suited to the problems of estimation and prediction in hydrology. Two major approaches for modeling the rainfall-runoff or prediction of riverflow have been explored in the literature: conceptual (physical basis) modeling and system theoretic modeling. While conceptual models are important in understanding hydrologic processes, there are many practical situations such as streamflow forecasting where the main concern is making accurate predictions at specific watershed locations. In such a situation, a hydrologist may prefer not to expend the time and efforts required in developing and implementing a conceptual model or numerical model, but instead implement a simpler system theoretic model, such as ANN. Applications of ANN in rainfall-runoff modeling and streamflow forecasting have been described in many sources. The algorithms to performed these approaches were from backpropagation (Hjelmfelt and Wang 1996), time-delayed (Karunanithi et al. 1994), recurrent (Carriere, Mobaghegh, and Gaskari, 1996), radial-basis function (Fernando and Jayawardena 1998), modular (Zhang and Govindaraju 1998), to self-organizing (Hsu, Gupta and Sorooshian 1998). It is noted that only one reference for each algorithm is cited. The objective of this study is to demonstrate the applicability of the system theoretic ANN approach in developing effective nonlinear models of the river stage, riverflow- forecasting process without the need to explicitly represent the internal hydrologic structure of the watershed for military use. In addition, a large-scale watershed, such as the Mississippi River, is also used to demonstrate the capability of flood forecasting by ANN. ALGORITHMS AND SOFTWARE USED FOR THIS STUDY For the hydrological applications, the backpropogation network, a class example of supervised learning, is a popular computational tool for the study applications. However, considering travel time between the input and output for the signal propagation of natural phenomena, the time-delayed ANN and recurrent ANN are also used to perform the comparisons. The fundamental theory regarding these algorithms will not be discussed here. The software used for this study is NeuroSolutions (version 3.0). The particular version used to run the simulation is an Excel environment module. The data set is divided into training, cross-validation, and testing portions. In order to keep the simplicity of model structure, the number of hidden layers for the neural architecture is chosen as one. The performance analysis is represented by several quantity numbers, including mean square error, normalized mean square error (NMSE), mean/maximum/minimum absolute errors, and correlation coefficient (CC). RIVERFLOW PREDICTION IN A SEGMENT OF LOWER MISSISSIPPI RIVER The Lower Mississippi River is considered to begin at Cairo, IL, at the confluence of the Ohio and Upper Mississippi Rivers. It travels southward a distance of approximately 954 miles to Head of passes, LA. During 1973, a series flood occurred in the Lower Mississippi River. The peak flows for the crest stages were over 1.5 million cfs. Major flooding as what occurred in 1973 is a good example of the need for forecasting system as an essential tool to reduce flood damage. In this study, the ANN was used to predict the riverflow at Memphis, TN, from the upstream gauge at Thebes, IL before the Mississippi River merges with the Ohio River and the nearby gauge at Metropolis, IL, at its confluence with the Ohio River. The lateral contribution of tributaries within this river segment is the Obion, Hatchie, Loosahachie, and Wolf Rivers in West Tennessee and rainfall in this river basin. The purpose of this study is to identify the prediction capability with minimum hydrologic information using of ANN and to determine the contribution of the Ohio River to flooding. A database was developed using 16 years (from 1975 to 1990) of daily riverflow from these three stations and ten daily rainfall stations, which are about uniformly distributed spatially over the river basin. A rainfall-runoff model was constructed using two upstream flows (Ohio & Mississippi Rivers) and total daily rainfall as the inputs and the downstream riverflow as the output. The first 6 years’ data are used as the training; the next 2 years’ information is used as the cross-validation, and the last 8 years is used as the testing. A multilayered perceptrons feed-forward backpropagation architecture design was used. The fairly accurate results are obtained by three subdatasets based on the performance of NMSE and CC. The CC ranges are 0.95, 0.93 and 0.94. However, the graphical comparisons show that the spikes match very well but a phase exists between the observed values and simulated outputs. This difference implies the consideration of time lags is required. Hence, the second test was conducted by up to 2-day lag for each input series. It forms four inputs and one output system. This modification produced a significant improvement. Figure 1 shows the model testing results. Although the cross-validation overestimated the flow values, the testing set presents excellent results. A sensitivity analysis was performed to identify the ranking among these four inputs. It indicated that 1800000 1600000 observed ANN output 1400000 Flow Rate (cfs) 1200000 1000000 800000 600000 400000 200000 0 1 292 1983 Portion, Total Plot of Testing Data (1983-1990, daily) Figure 1. Testing results for desired riverflow and actual network riverflow at Memphis station. the highest correlation of downstream gauges was related to 2-day lag riverflow for both upstream gauges. The second scenario of this model was to add the rainfall factor (up to 2-day lag) as the input variable. There was very limited difference between these two runs. This seems to indicate that the downstream prediction is not being sensitive to rainfall. This also means that the contribution of watershed inflow comes from tributaries. However, it might not necessary to test the sensitivity of tributary flow, since the forecasting system is developed well enough by ANN only to require upstream gauges. SAVA RIVER FLOW AND STAGE PREDICTION MODEL The Sava River is the largest river in former Yugoslavia. Since Yugoslavia was divided into several new republics, the Sava River starts into Slovenia. Going downstream, the Sava flows through Croatia, Bosnia, and Serbia. The total drainage area at the confluence of the Sava and Danube River comprises 96 thousand square kilometers and the watershed length is 2,255 km. The length of the Sava River is 950 km. During the Bosnia war, the prediction of river stages for military crossings became particularly important. The accuracy of prediction was critical to determine the schedule of military operations, especially the locations at which to construct bridges. Therefore, a riverflow and stage forecasting system was required to address changes in weather condition. An upstream-downstream modeling study can provide the necessary answers. A number of riverflows and stages are available for nearly two dozen gauges in both the mainstream and some tributaries. The best data files that can currently be used to construct the model are eight stations for riverflow and two stations for river stage. These riverflow stations are along the main river, and the data set is a year of daily mean flow (the most downstream station is the site for the bridge for the military operation). The daily river stage (2-1/2 years of data) data exist only for the two most downstream stations, which both have the military bridges. This modeling effort is to find an alternative method to predict downstream flow and stage based on the minimum upstream information, other than the numerical watershed model simulation. Sava River Stage Forecasting Model Using the data available, a river stage forcasting model was constructed using the Savonski gauge (upstream) to predict the Zupanja gauge (downstream). The data file (2.5 years) as the regular ANN modeling procedure was divided into a training set (1 year), a cross-validation set (6 months) and testing set. This model again trained by multilayer perceptron with backpropagation algorithm with one hidden layer. Since these two gauge stations are not far from each other, the result shows very good agreement with observations. In order to test the model forecasting capability, the forecast ranges were set to up to 3 days ahead and several scenarios were conducted. The lowest correlation coefficient was 0.911 and the mean absolute error was 0.52 m for the 3-day ahead prediction with current and previous 2- day stage at the upstream location. Sava River Flow Prediction Model As described above, the data file for riverflow exists only at eight stations from upstream to downstream. Station 8 (Slavonski Brod) is the most downstream gauge and the only one that has a bridge. From the preliminary analysis for riverflow distribution, the first four stations show similar flow patterns. The pattern starts to change at station 5 (Ornac) due to merging of tributary flow. The flow patterns change rapidly from station 5 to station 7 (Davor) due the more complicated hydrographic conditions and geomorphology. It was therefore decided to take two locations, one model using station 1 and station 5 with time lags and try to predict the flow at station 8. While the training (6 months’ data) shows fair agreement, the cross-validation (1 month of data) and testing (5 months’ data) overestimate results with some degree of deviation. The explanation for these differences is that the first 6 months’ flow patterns are quite different from the last six month patterns and the record for training is not long enough to adjust the difference from stations 1 and 5 to station 8. Some improvement was found if the input series also included station 7. Even better agreement was obtained by using time-lag recurrent algorithm. The final result for model testing is shown in Figure 2. PREDICTION RELIABILITY ANALYSIS DUE TO THE LENGTH OF RECORD It is interesting to know how reliable the prediction would be if only limited data were available. This will be demonstrated by selecting single station and repeating the model run with different lengths of record. The river stage at Zupanja was selected to perform this test. Nine test runs were conducted with different lengths of training, cross-validation, and testing data with forecasting ranges from 1 day to 3 days. The results are summarized in Table 1. Six parameters were used to determine the prediction reliability. This table provides the prediction reliability giving length of record and expected criterion. For example, for only 3 months’ record, a 3-day prediction has over 1-m prediction error and the correlation coefficient is about 0.90. )78 ,13 ceD - 1 luJ( etad gnitseT 1 81 53 25 96 68 301 021 731 451 171 0 002 004 ) smc( etaR wo lF 006 008 0001 0021 0041 tuptuo NNA 0061 devresbo 0081 0002 Figure 2. Testing results for desired riverflow and actual network output, Slavonski Brod MSE NMSE MAE Min Abs E Max Abs E r 1 day p. 0.0821 0.0233 0.2015 0.0002 1.3868 0.9884 1 yr training 2 day p. 0.2838 0.0808 0.4020 0.0027 2.3479 0.9595 3 day p. 0.5554 0.1583 0.5802 0.0036 3.2460 0.9203 1 day p. 0.1125 0.0241 0.2336 0.0004 2.0161 0.9885 6 mo training 2 day p. 0.3413 0.0731 0.4075 0.0001 3.3268 0.9628 3 day p. 0.6491 0.1390 0.5990 0.0024 3.8329 0.9283 1 day p. 0.6955 0.2053 0.6783 0.0041 1.5347 0.9731 3 mo training 2 day p. 0.9933 0.3041 0.8544 0.0034 1.7320 0.9489 3 day p. 1.3350 0.4227 1.0194 0.0201 1.9215 0.9174 Table 1. Prediction reliability due to the length of training record for the Sava River stage prediction model MODEL RELIABILITY ANALYSIS DUE TO APPROACH ALGORITHMS With the computer facility speed improvement, the training time due to different algorithms may no longer be such a critical factor if the training record is not so long and design architecture is not so complicated. The testing accuracy could get worse if the selection of the algorithm to represent the problem is not proper. Four different algorithms for the example of the riverflow prediction of the Mississippi River will demonstrate this comparison. Table 2 summarizes the results. While the traditional backpropagation algorithm without time-delay showed the least accuracy, the recurrent algorithm represented the best results. The results indicate that the information from input to output mapping with certain memory length and strong nonlinearity can best describe this hydrological phenomenon. Backpropagation Backpropagation, Time-Delay Recurrent time shift input NMSE 0.0966 0.0303 0.0194 0.0168 Training R 0.9505 0.9847 0.9903 0.9918 NMSE 0.1448 0.0416 0.0665 0.0652 Cross-V R 0.9286 0.9807 0.9679 0.9680 NMSE 0.1042 0.0344 0.0210 0.0171 Testing R 0.9475 0.9834 0.9909 0.9922 Table 2. Riverflow prediction reliability due to approach algorithms for the Mississippi River segment model CONCLUSIONS ANN algorithms are successfully applied to two different scale watershed systems for riverflow and stage prediction with the addressing the two primary hydrological forecasting issues – time delay and nonlinearity. In a segment of the Lower Mississippi River, the riverflow at Memphis, TN can be predicted with a high degree of accuracy, even with no rainfall data provided, from two upstream gauges. This model also can be used to simulate the influence of the Ohio River downstream to the Mississippi River. Relatively less accurate results were obtained for the Sava River due mainly to the record of limited length. The prediction for river stage/flow can be obtained by generating the relationship between training length and performance parameters. The proper selection for solution algorithm could help to increase the model accuracy. The best performance of ANN for flow prediction heavily depends on not only the length of the data set but also whether the most significant patterns are included or not. ACKNOWLEDGEMENTS The U.S. Army Engineers, Research and Development Center, Waterways Experiment funded this work. The Chief of Engineers to publish this information granted permission. REFERENCES Carriere, P.S. Mohaghegh, and R. Gaskari, 1996, “Performance of a Virtual Runoff Hydrographic System,” Journal of Water Resources Planning and Management, Vol. 122, No 6. 120-125. Fernando, D.A.K. and A. W. Jayawardena, 1998, “Runoff Forecasting Using RBF Networks with OLS Algorithm,” Journal of Hydrologic Engineering, 3(3) 203-209. Hjelmfelt, A. T. and M. Wang, 1996, “Predicting Runoff Using Artificial Neural Networks,” Surface Water Hydrology, 233-244. Hsu K, H. V. Gupta, and S. Sorooshian, 1998, “ Streamflow Forecasting Using Artificial Neural Networks,” ASCE Water Resources Engineering Conference ’98, 967-972. Karunanithi, N., W. J. Grenney, D. Whiteley, and K. Bovee, 1994, “Neural Networks for River Flow Prediction,” ASCE Journal of Computing Civil Engineering, 8(2), 201-220. Zhang B. and R. S. Govindaraju, 1998, “Using Modular Neural Networks to Predict Watershed Runoff,” ASCE Water Resources Engineering Conference’98, 897-902.