2008. 05. 15
권오진 서울시립대학교 HPC 연구실
I
Simple Analysis of NN in DM 8.1 Feed-Forward Neural Networks
II
III 8.2 Neural Network Training: A Conceptual View IV IV IV
8.3 Neural Network Explanation
8.4 General Considerations
8.5 Neural Network Training: A Detailed View
Simple Analysis of NN in DM
• TS="data Mining" and TS="Neural Network"
KAIST 산업공학과 박상찬 교수
A taxonomy of feed-forward and recurrent/feedback network architectures.
S
S
S
U
U
S
U
S: Supervised
U: Unsupervised
Jain, A.K., Mao,J., “Artificial Neural Networks: A Tutorial” Computer, March,31-44
Neural network learning can be supervised or unsupervised.
Learning is accomplished by modifying network connection weights while a set of input instances is repeatedly passed through the network.
Once trained, an unknown instance passing through the network is classified according to the value (s) seen at the output layer
Jain, A.K., Mao,J., “Artificial Neural Networks: A Tutorial” Computer, March,31-44
History of Neural Networks
• 1943: McCullough and Pitts - Modeling the Neuron for Parallel Distributed Processing • 1958: Rosenblatt - Perceptron • 1969: Minsky and Papert publish limits on the ability of a perceptron to generalize • 1970’s and 1980’s: ANN renaissance • 1986: Rumelhart, Hinton + Williams present backpropagation • 1989: Tsividis: Neural Network on a chip
8.1 Feed-Forward Neural Networks
Input Layer 1.0 Node 1
W1j W1i W2j
Hidden Layer
Output Layer
Node j
Wjk
0.4
Node 2
W2i
Node k Node i
Wik
W3j
0.7
Node 3
W3i
Figure8.1 A fully connected feed-forward neural network
Table 8.1 • Initial Weight Values for the Neural Network Shown in Figure 8.1
W lj W li W 2j
0.20
0.10
0.30
–0.10
W 2i
–0.10
W 3j
W 3i
W jk
W ik
0.20
0.10
0.50
The user specifies the number of hidden layers as well as the number of nodes within a specific hidden layers
**
Neural Network Input Format(1/2)
Categorical data Color = { red, green, blue, yellow} Ex 1) Straightforward technique
red =0.00, green=0.33, blue=0.67, yellow=1.00
Pitfall Ex 2) Additional input nodes
red = [0,0], green=[0,1]
blue=[1,0], yellow=[1,1]
Neural Network Input Format(2/2)
Conversion of numerical data 1. Divide all attribute values by the largest attribute value(
단점 : 0에 근접한 값이 없을 경우) 2. 최대값과 최소값이용
newValue where newValue is the computed value falling in the[0,1] interval range originalVa lue is the value to be converted minimumVal ue is the smallest possible value for theattribute maximumVal ue is the largest possible attribute value originalVa lue minimumVal ue maximumVal ue minimumVal ue
3. 편중된 데이터 base가 2나 10인 로그변환
Neural Network Output Format
Neural Network Output Format
Ambiguous output value
% 신용카드 프로모션
프로모션 Yes Node1 = 1, Node2 = 0
프로모션 NO Node1= 0, Node2 =1 NN의 출력 값이 Node1 = 0.9 Node2 = 0.2 프로모션 Yes NN의 출력 값이 Node1 = 0.2 Node2 = 0.3 ? % 출력 값 0.8이상 프로모션이 Yes일 가능성이 큰 것으로 고려 그렇다면 0.45는 어떻게 처리 해야 하나 ? KNN방식 사용.
Prediction
% 주식가격 예측 시 출력 값이 0.35라면 출력 값은 주식의 최소값 $10.00 최대값 $100.00 (90.00(현재주식값))(0.35)+$10.00 $41.50
The Sigmoid Function
평가함수: [0,1]사의의 출력 최대값 1출력
1 f ( x) 1 e x
where e is the base of natural logarithmsapproximated by 2.718282.
1.200 1.000 0.800
f(x)
0.600 0.400 0.200 0.000 -6 -5 -4 -3 Equation08.21 -2 -1 2 3 4 5 6
x
Input Layer 1.0 Node 1
W1j W1i W2j
Hidden Layer
Output Layer
Sigmoid 함수 사용
Node j
Wjk W2i
0.4
Node 2
Node k Node i
Wik
W3j
0.7
Node 3
W3i
Table 8.1 • Initial Weight Values for the Neural Network Shown in Figure 8.1
W lj W li W 2j
0.20
0.10
0.30
–0.10
W 2i
–0.10
W 3j
W 3i
W jk
W ik
0.20
0.10
0.50
Node j InputV=(1.0)(0.2) +(0.4)(0.3)+(0.7)(-1.0)=0.25 F(0.25) = 0.562
8.2 Neural Network Training: A Conceptual View
Supervised Learning with FeedForward Networks
• Backpropagation Learning
Input Layer 1.0 Node 1
W1j W1i W2j
Hidden Layer
Output Layer
Node j
Wjk
0.4
Node 2
W2i
Node k Node i
Wik
W3j
0.7
Node 3
W3i
Weight 값의 조정 방향
Unsupervised Clustering with Self-Organizing Maps
Output Layer
Input Layer Node 1 Node 2
Figure 8.3 A 3x3 Kohonen network with two input layer nodes
8.3 Neural Network Explanation
• Sensitivity Analysis • Average Member Technique
Sensitivity analysis (Supervised)
To insight into the effect individual attributes have on neural network
1. 2. 3. Divide the data into a training set and a test dataset. Train the network with the training data. Use the test data to create a new instance I. Each attribute value for I is the average of all attribute values within the test data.
4.
For each attribute:
a. b. Vary the attribute value within instance I and present the modification of I to the network for classification. Determine the effect the variations have on the output of the neural network.
c.
The relative importance of each attribute is measured by the effect of attribute variations on network output.
Average member technique
The average or most typical member of each class is computed by finding the average value for each class attribute AMT는 Unsupervised에 이용
Supervised 학습을 사용하여 unsupervised 학습에 이용
Unsupervised clustering을 위한 데이터변환NN을 사용하여 Clustering각 Clsuter를 Class로 명명규칙생성기를 가진 supervised 분류모델을 위한 Training 데이터로 사용생성된 규칙을 검토하여 클래스 내용 파악
8.4 General Considerations
• • • • • What input attributes will be used to build the network? How will the network output be represented? How many hidden layers should the network contain? How many nodes should there be in each hidden layer? What condition will terminate network training?
– Minimum Total Error, Specific Time Criterion, Maximum number of iterations
The process of building a neural network is both an art and a science
Neural Network Strengths
• Work well with noisy data. • Can process numeric and categorical data. • Appropriate for applications requiring a time element. • Have performed well in several domains. • Appropriate for supervised learning and unsupervised clustering.
Weaknesses
• Lack explanation capabilities. • May not provide optimal solutions to problems. • Overtraining can be a problem.
8.5 Neural Network Training: A Detailed View
The Backpropagation Algorithm: An Example
Backpropagation works by making modifications in weight values starting at the output layer and then moving backward through the hidden layers.
Input to node j=(0.2)(1.0)+(0.3)(0.4)+(-0.1)(0.7)=0.250 Output from node j =0.562 Input to node i=(0.1)(1.0)+(-0.1)(0.4)+(0.2)(0.7)=0.200 Output from node i =0.550 Input to node k=(0.1)(0.562)+(0.5)(0.550) =0.331 Output from node k =0.582
Backpropagation Error Output Layer
Error ( k ) (T Ok )[ f ' ( xk )] where T The target output Ok The computed output at node k (T Ok ) The actual output error f ' ( xk ) The first - order derivative of the sigmoid function xk the input to the sigmoid function at node k
Error ( k ) (T Ok )Ok (1 Ok )
Error ( k ) (T Ok )Ok (1 Ok )
T=0.65 Error (k) = (0.65-0.582)(0.582)(1-0.582)=0.017
Error (j) = (0.017)(0.1)(0.562)(1-0.562) = 0.00042
△Wjk =(0.5)(0.017)(0.562)=0.0048 The update value for Wjk=0.1+0.0048=0.1048
△W1j =(0.5)(0.00042)(1.0)=0.0002 The update value for W1j=0.2+0.0002=0.2002
△W2j =(0.5)(0.00042)(0.4)=0.000084 The update value for W2j=0.3+0.0048=0.300084 △W3j =(0.5)(0.00042)(0.7)=0.000147 The update value for W3j=-0.1+0.000147=-0.099853
Backpropagation learning algorithm
1.Initialize the network a. Create the network topology by choosing the number of nodes for the input, hidden, and output layers b. Initialize weight for all node connections to arbitrary values between -1.0 and 1.0 c. Choose a value between 0 and 1.0 for the learning parameter 2. For all training set instances: a. Feed the training instance through the network
b. Determine the output error.
c. Update the network weights using the previously described method. 3. If the terminating condition has not been met repeat step 2. 4. Test the accuracy of the network on a test dataset. If the accuracy is less
than optimal, change one or more parameters of the network topology
and start over.
Root Mean Squared Error
(Tin Oin ) n i ni where n the totalnumber of training set instances i the totalnumber of output nodes Tin the target output for thenth instance and the ith output node Oin the computed output for thenth instance and ith output node
A common criterion is to terminate backpropagation learning when RMS <0.10
Equation 8.8
Kohonen Self-Organizing Maps: An Example
Input Layer 0.4 Node 1
W1j = .3 W1i = .2
Output Layer Node i
W2i = .1
0.7
Node 2
W2j = .6
Node j
Figure 8.4 Connections for two output layer nodes
(0.4 0.2) 2 (0.7 0.1) 2 0.632( nodei) (0.4 0.3) 2 (0.7 0.162 0.141( nodej)
r = 0.5 △w1j = (0.5)(0.4-0.3)=.05 △w2j = (0.5)(0.7-0.6)=.05
△w1j(new) = 0.3+.05=.35
△w2j(new) = 0.6+.05=.65