Information realization with statistical predictive inferences and coding form

Document Sample
Information realization with statistical predictive inferences and coding form Powered By Docstoc
					                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                         Vol. 8, No. 6, September 2010

      Information realization with statistical predictive
                inferences and coding form
                      D.Mukherjee                                                         P.Chakrabarti* , A.Khanna , V.Gupta
             Sir Padampat Singhania University                                              Sir Padampat Singhania University
              Udaipur-313601,Rajasthan,India                                                 Udaipur-313601,Rajasthan,India

Abstract—The paper deals with information realization in case of
grid topology. Nodal communication strategies with clusters has               A
also been cited. Information prediction has been pointed out with         *                     -
relevant statistical method, forward sensing, backward sensing
and cumulative frequency form. Binary tree classifier theory has                  *             -
been applied for information grouping. The paper also deals with                      *         -                              M rows (m- 1)paths
comparison analysis of information coding.                                                  *   -
                                                                          -       -   -     -   *   * *   *    *
Keywords- grid topology ,forward sensing , backward sensing,                                        
binary tree classifier, information coding

        I.    INFORMATION MERGING IN GRID IN                                                                       * shows path of traversal
                 DIAGONAL APPROACH
                                                                                  N columns (n-1) paths
In order to solve complex problems in artificial intelligence
one needs both large amount of knowledge & some                                           Fig1: Information merging in mesh/grid
mechanisms for manipulating that knowledge to create
solutions to new problems .Basically knowledge is a mapping               The above concept can be realized in DDM(Distributed Data
of different facts with help appropriate functions for e.g. Earth         Mining) where large amount of geographically scattered
is a planet. Can be realized as a function – planet (Earth).              knowledge is merged & is mined to derive conclusions &
                                                                          make decisions for e.g. GIS i.e. the Geographical Information
Information merging can be realized as combining different                System which uses cartography(art of making maps) with
pieces of information to arrive at a conclusion. The different            various information elements(sources) to derive decision
information elements can be related in different ways i.e.                support results like which route to choose for a given
either in hierarchy or in form of a graph or even a mesh.                 destination.
Consider a mesh of size m X n i.e. m rows & n columns then
if each intersection point has a information element placed on            II. INFORMATION MERGING IN CLUSTER NETWORKS
it then one way of merging element A with B can be covering
a path of length (5XN) (here m= 8 & n=9). If weight of                    This section mainly focuses on the nodal communication
covering each path is considered same then in case of diagonal            between the farthest node in a N*N structure[1] and
approach we can find a path of diagonal nature of length 5√2              information realization indicates nodal message . Let us
and then travelling a length (N-5) in linear fashion thus finding         assume each cluster to be consisting of 16 nodes and then try
a shortest path the same can also be determined by graph                  to communicate between the source and the destination node
algorithms like Dijkstra‟s       or kruskal‟s algorithm for               as described in the fig1.The point to be noted here is that to
minimum spanning tree. If each path is considered to be of                establish the communication link between the adjacent
zero weight then interestingly there is no sense travelling a             elements or units of the cluster we have to have the
path from A to B i.e. we can directly merge the two points                communication in just reverse order in the 2 adjacent
i.e. we take point A &directly merge it with point B in such a            elements. The order of the communication is
case we need to have some stack like mechanism to determine
the order in which the nodes arrive & are merged.

                                                                                                          ISSN 1947-5500
                                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                                     Vol. 8, No. 6, September 2010

The condition can be visually imagined as follows:                    nodes, i.e, f(x)=3+4+4;In case of 4*4 matrix to communicate
                                                                      between the farthest node we need 7 nodes, i.e,
                                                                      f(x)=3+4+4+4;In case of 16 elements in a ring , we can
                                                                      proceed as follows. Let us consider the case of 1*1 matrix to
                                                                      communicate between the farthest node we need 3 nodes, i.e,
                                                                      f(x)=7.In case of 2*2 matrix to communicate between the
                                                                      farthest node we need 7 nodes, i.e, f(x)=7+8;In case of 3*3
                                                                      matrix to communicate between the farthest node we need 7
                                                                      nodes, i.e, f(x)=7+8+8;In case of 4*4 matrix to communicate
                                                                      between the farthest node we need 7 nodes, i.e,
Now let us first talk about the case when there is only one           f(x)=7+8+8+8;Now the total number of nodes can be derived
element i.e, 1*1.In this particular case if we want to                by the general formula as (N/2-1)+(M-1)*(N/2) where N =
communicate between the farthest node then there will be only         number of nodes present in the unit or element, M =
1 node in between the source and the destination which can be         dimension of the square matrix.The data can be represented in
further visualized as follows:                                        the tabular form as follows:

                                                                       No. of        1*1       2*2         3*3         4*4
                                                                         4           1           3          5            7
                                                                         8           3           7          11          15
                                                                        16           7          15          23          31
If we denote it by using the function f(x)then the value of
f(x)will be 1.f(x)=1; The intermediate node is ll. Now let us
consider the case 2*2 matrix the value here will be                             35
f(x)=1+2=3; The intermediate nodes are 1(2,3),2(4).


                                                                                20                                               4
For the case for the 3*3 matrix the value of the function                       10

                                                                                         1*1         2*2         3*3           4*4
                                                                                         Fig.2: Nodal communication in cluster

                                                                      The x-axis represents the M*M matrix where M varies from 1
                                                                      to 3.The y-axis represents the number of optimum
                                                                      communication nodes required in the establishing the path
                                                                      between the source node and the farthest node. The number of
                                                                      nodes per element is indicated by the 3colors.

Similarly for the 4*4 matrix we can get the value of
f(x)=1+2+2+2.                                                             III. STATISTICAL INFERENCE OF FUTURISTIC
Here in this case we were having only 4 elements in a ring                                  VALUES
.Suppose we have 8 elements in the ring in that case we have
to compute the number of nodes required to communicate or             In statistical inferences the input & output of a situation are
establish the connection between the farthest nodes.                  related with a certain relation or function based on which we
Justification - Let us consider the case of 1*1 matrix to             infer futuristic values. Consider a real-time situation in which
communicate between the farthest node we need 3 nodes, i.e,           a given input parameter is observed over time between instants
f(x)=3.In case of 2*2 matrix to communicate between the               T1 & T2 given the relation [2]
farthest node we need 7 nodes, i.e, f(x)=3+4;In case of 3*3
matrix to communicate between the farthest node we need 7             Mt = then Mavg = √(Mt1 . Mt2 )

                                                                                                     ISSN 1947-5500
                                                             (IJCSIS) International Journal of Computer Science and Information Security,
                                                             Vol. 8, No. 6, September 2010

                                                                               C. Cumulative frequency based information sensing
Case 1:
                                                                                   OBSERVATIONS                         INFORMATION INVOLVED
If we take observations at equal instants of time then                                  g1                                     i1,i3,i4,i6
Mt1 = a.et1                                                                             g2                                        i3,i5
Mt2 = a.et1+k                                                                           g3                                      i4,i5,i6
Mt3 = a.et1+2k                                                                          g4                                      i2,i3,i5
General term Mtn = a.et1+(n-1)k i.e. the values of output M                             g5                                        i1,i2
forms a G.P. series of increasing order common ratio as ek .                            g6                                     i1,i2,i3,i6
Case 2:                                                                         Table1 : Association of information against each observation
 If we take observation at unequal timing interval in that case                Features          Initial          Count          Value           (Value)2
T1 = t1 => Mt1 = a.et1                                                                           value
T2 = t1 + k1 => Mt2 = a.et1+k1                                                     i1             0.1               3              0.3             0.09
T3 = t2 + k2 = t1 + (k1 + k2) => Mt2 = a.et2+k2 = Mt2 =
                                                                                   i2             0.2               3              0.6             0.36
                                                                                   i3             0.3               4              1.2             1.44
General term Tn = T1+(k1+k2+k3+…+kn)
                                                                                   i4             0.4               2              0.8             0.64
Tn = tn-1 + kn-1 = t1 + (k1 + k2 + k3+…+kn-1) =>
Mtn = a.etn-1+kn-1 = Mtn = a.e(t1+k1+k2+k3+…+kn-1) = a.et1+Ktotal i.e.             i5             0.5               3              1.5             2.25
now any futuristic value say at instant tn is                                      i6             0.6               3              1.8             3.24
Mtn = a.et1.eKtotal (observed value)
Given Mt = , taking log on both sides we have,                                      Table 2 : Determination of count and value
ln(Mt) = ln(a) + t
i.e. ln(Mtn) = ln(a) + tn                                                      Now CF = ( x , y , z )
ln(Mtn) = ln(a) + t1+k1+k2+k3+…+kn-1                                           where x = number of elements , y = linear sum of the elements
Thus we have obtained a log linear model for the above                         and z = sum of the square of the elements[3]
function Mt = using which we can calculate or predict the
futuristic values for increased ranges.
Y = m.X + C                                                                             V. BINARY TREE BASED GAIN CLASSIFIER
If we try to minimize the value of Ktotal we can do so by
making k1=k2=k3=…=kn-1 which is same as Case 1.                                In this section information represents gain analysis. A
                                                                               search[4] can be formed based on the initial search term and
      IV. PROJECTION OF SENSED INFORMATION                                     its gradual sub term while the process of matching. Thereby
                                                                               the level is increased, in initial search term is the root and the
Let I= {i1,i2,…in} be the set of sensed information. In the                    final term fully matching with the context of the users‟ desire
process of feature appropriate observation, forward selection ,                is a leaf node.
backward elimination and decision based induction methods
are applied.                                                                                      G0                                         LEVEL 0

A. Forward selection based information sensing
                                                                                               G1,1        G1,2                              LEVEL 1
Let I= {i1 , i2,….,in}be the set of information estimates of
various trends noted after observation in respective timing                             G2,1          G2,2 G2,3         G2,4                LEVEL 2
instants Y = {y1,y2,…yn}. The accuracy measurement is to be
calculated first based on comparison analysis. The minimum
deviation reflects high accuracy level of prediction and that                                                                               LEVEL 3
information will be selected. In this manner, { } , {best                      G3,1 G3,2 G3,3 G3,4 G3,5 G3,6 G3,7 G3,8
information},{first two}….will be selected.
                                                                                               Fig3: Binary tree based gain classifier
B. Backward elimination based information sensing
                                                                               In the above figure, G0 is the root that is initial search term. If
Using backward elimination , in each stage each information is                 a user wants to analyze further gain classification, then
eliminated and thereby after the final screening stage the                     identify each search term as a binary code and by giving the
projected set reveals the final optimum information space.                     code number he can analyze the position of gain estimate in
                                                                               the model . The concept of coding is as follows:

                                                                                                                  ISSN 1947-5500
                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                         Vol. 8, No. 6, September 2010

Value = 0 if the search term is a left child of parent node                                        C3 = { 10 }
      = 1 otherwise                                                                                C4 = { 11 }

                                     N                                    For p3:
Theorem: In the process of coding, ∑ 1/2Li =1, where                       Class       ijk         False          True
                                     i=1                                    C1         000        p1,p2,p3           -
Li is the length of code of ith leaf node in the tree, N is total           C2         001         p1,p2            p3
number of leaf nodes and 1<i<N.                                             C3         010         p1,p3            p2
                                                                            C4         011           p1           p2,p3
Proof:                                                                      C5         100         p2,p3            p1
                                                                            C6         101           p2           p1,p3
From fig.3 codes of leaf nodes are as follows:                              C7         110           p3           p1,p2
                                                                            C8         111            -          p1,p2,p3
          Nodes                Respective code
          G3,1                      000                                   In the initial stage, classes are C1, C2 based on the parameter
          G3,2                      001                                   p1. In the second stage, the classes are C1,C2,C3 based on p2.
          G3,3                      010                                   In the last stage, classes are C1,C2...C8 based on p3.This
          G3,4                      011                                   means that if we assume that „n‟ is the number of parameters
          G3,5                      100                                   involved in the system for examination purpose. Then, the
          G3,6                      101                                   maximum length of code word for a particular class is „n‟. The
          G3,7                      110                                   number of classes is 2n, provided that the classes are distinct in
          G3,8                      111                                   nature.

So, N=8. Each leaf node has identical code length i.e. 3.                            VI. CODED INFORMATION SENSING
Therefore, 1/2Li =1/2=1/8, 1/2= 1/8, …1/2=1/8
                                                                          Let original message is “FATHER”. For the first alphabet,
We now design a binary tree based classifier taking some                  µvalue = 1/((position of that)+ π /100). Hence it‟s offset value =
parameters for examination purpose and represent each point               ceiling of (the product of µvalue and 10). The weight is given by
on the basis of a code generated by arithmetic coding. Finally,           its position in alphabet string[5].
represent the same on the basis of set theory .We assume that             Therefore total_value = offset value * weight. From the next
the gain set available is G ={ g1,g2,g3,g4 }.The parameters               character onwards, µvalue_next = 1/(mod value of (position of
based on which the examination is to be carried are the                   next - position of previous ) + π/100). Hence total_value is
elements of the set P = { p1,p2,p3 }.The result of the                    calculated in similar manner. Now, bias value will be equal to
examination are denoted in the form of Boolean variables such             total number of characters in the message.Compute net_value
that the outputs are denoted as:                                          as (total_value_first char + total _value_last char)- (bias value)
NO = 0                                                                    and let it be x (say).
YES = 1
                                                                          Mode               Operation
At the initial timing instant, the parameter p1 is applied for
testing purpose. Hence, in the initial stage, there will be at            0≤x<100            Reverse the message.
least one class while a maximum of two classes. In the second             100≤x<150          Circular left shift of message by n/2 bits
level, the parameter p2 is applied and accordingly the classes                               where n= bias value.
are defined. In the final stage, the parameters p3 is applied.            150≤x<200          Circular right shift of message by n/2 bits
If we assume the classifier as a binary tree representation, we
can apply arithmetic coding to each class such that a „NO‟ of a           Iteration 1: µF = 1/((position of „F‟ in alphabet list) + π /100) =
particular exam is denoted by „0‟ and a „YES‟ is denoted by               1/((6)+ π /100) = 0.165798547. Offset value = ceiling of
„1‟.In the initial stage, the class which contains the elements           (0.165798547*10) = 2. Weight = position of „F‟ in alphabet list
for negative supply of p1 is C1 ={ d1,d2}, while, C2 = { d2,d4            = 6. Thus, total_value = 2*6 =12.
}. In this manner, the tree is to be constructed such that the
code word for each class is denoted by ijk where i Є { 0,1 } , j          Iteration 2: µA= 1/(|(position of „A‟ – position of „F‟)|+ π/100)
Є { 0,1 } and k Є { 0,1 }.                                                = 1/(|(1-6)|+ π/100) = 1/5.031415927 = 0.198751209. Offset
                                                                          value = ceiling of (0.198751209*10) = 2. Weight = 1. Thus,
For p1:                                                                   total_value = 2*1 = 2.
                        C1 = { d1,d2 }                                    Iteration 3: µT = 1/(|(position of „T‟ – position of „A‟)|+ π/100)
                        C2 = { d2,d4 }                                    = 1/(|(20-1)|+ π/100) = 1/19.03141593 = 0.052544697. Offset
For p2:                                                                   value = ceiling of (0.052544697*10) = 1. Weight = 20. Thus,
                        C1 = { 00 }                                       total_value = 1*20 = 20.
                        C2 = { 01 }

                                                                                                      ISSN 1947-5500
                                                          (IJCSIS) International Journal of Computer Science and Information Security,
                                                          Vol. 8, No. 6, September 2010

                                                                           [1] A.Kumar , P.Chakrabarti , P.Saini , V.Gupta ,“Proposed techniques of
                                                                           random walk, statistical and cluster based node realization” communicated to
                                                                           IEEE conf. Advances in Computer Science ACS 2010 , India , Dec10

                        -            Encryption        Output              [2] P.Chakrabarti , S.K.De , S.C.Sikdar , “Statistical Quantification of Gain
                                      system           Cipher              Analysis in Strategic Management” published in IJCSNS ,Korea , Vol 9 No11
                                                                           ,pp.315-318, 2009

                                                                           [3]P.Chakrabarti, “Data mining- A Mathematical Realization and cryptic
               +                                                           application using variable key” published in International journal , Advances
                                                                           in Information Mining , Vol 2 No 1, pp-18-22,2010

                                                                           [4] P.Chakrabarti, P.S.Goswami, “Approach towards realizing resource
                                            *                              mining and secured information transfer” published in international journal of
                                                                           IJCSNS, Korea , Vol 8 No.7, pp345-350, 2008
           *                *                        wn                    [5] P.Chakrabarti , “Attacking Attackers in Relevant to Information Security”
                                       cn                                  Proceedings of RIMT-IET, Mandi Gobindgarh. pp 69-71, March 29, 2008
     C1                                                                    About authors:

Ci = offset value for i = 1 to n , wi = weight , wb = bias value

                     Fig 4: Coding Model

Iteration 4: µH = 1/(|(position of „H‟ – position of „T‟)|+ π/100)
= 1/(|(8-20)|+ π/100) = 1/12.03141593 = 0.083115736. Offset                Debasis Mukherjee (20/08/80) is pursuing Ph.D. from USIT,
value = ceiling of (0.083115736*10) = 1. Weight = 8. Thus,                 GGSIPU, Delhi, India from2010. He received the M. Tech.
total_value = 1*8 = 8.                                                     degree in VLSI Design from CDAC Noida in 2008 and
Iteration 5: µE = 1/(|(position of „E‟ – position of „H‟)|+ π/100)         bachelor degree in Electronics and Instrumentation
= 1/(|(5-8)|+ π/100) = 1/3.031415927 = 0.32987885. Offset                  Engineering from BUIE, Bankura, West Bengal, India in
value = ceiling of (0.32987885*10) = 4. Weight = 5. Thus,                  2003.He achieved first place in district in “Science Talent
total_value = 4*5 = 20.                                                    Search Test” 1991. He has some publications of repute in
                                                                           IEEE conferences.
Iteration 6: µR = 1/(|(position of „R‟ – position of „E‟)|+ π/100)
= 1/(|(18-5)|+ π/100) = 1/13.03141593 = 0.076737632. Offset
value = ceiling of (0.076737632*10) = 1. Weight = 18. Thus,
total_value = 1*18 = 18.
Now, wb= bias value = number of bits in FATHER= 6. So
net_value= accumulated sum of all total_value – wb =
(12+2+20+8+20+18) - 6 = 74. It falls in the range 0≤x<100.
So, “FATHER” is reversed.
Therefore resultant cipher is “REHTAF”.
                                                                           Dr.P.Chakrabarti(09/03/81) is currently serving as Associate
                        VII. CONCLUSION                                    Professor in the department of Computer Science and
                                                                           Engineering of Sir Padampat Singhania University,Udaipur.
The paper points out information merging in grid and cluster               Previously he worked at Bengal Institute of Technology and
network models. Statistical means of information prediction as             Management , Oriental Institute of Science and Technology,
well as forward, backward and cumulative frequency based                   Dr.B.C.Roy Engineering College, Heritage Institute of
schemes have been analyzed . Binary tree based information                 Technology, Sammilani College. He obtained his Ph.D(Engg)
classification and coded information have been justified with              degree from Jadavpur University in Sep09,did M.E. in
relevant mathematical analysis.                                            Computer Science and Engineering in 2005,Executive MBA
                                                                           in 2008and B.Tech in Computer Science and Engineering in
                                                                           2003.He is a life member of Indian Science Congress
                                                                           Association , Calcutta    Mathematical Society , Calcutta

                                                                                                            ISSN 1947-5500
                                                       (IJCSIS) International Journal of Computer Science and Information Security,
                                                       Vol. 8, No. 6, September 2010

Statistical Association , Indian Society for Technical
Education , Cryptology Research Society of India,
IAENG(HongKong), CSTA(USA), annual member of
Computer Society of India, VLSI Society of India ,
IEEE(USA), senior member of IACSIT(Singapore) and
selected member of The IAENG Society of Artificial
Intelligence , Computer Science , Data Mining. He is a
Reviewer of International journal of Information Processing
and      Management (Elsevier) , International Journal of
Computers and Applications , Canada and International
Journal     of    Computer       Science     and   Information
Security(IJCSIS,USA),       editorial    board    member     of
International Journal of Engineering and Technology,
Singapore and International Journal of Computer and
Electrical Engineering. He has about 100 papers in national
and international journals and conferences in his credit and
two patents(filed). He has several visiting assignments at BHU
Varanasi , IIT Kharagpur , Amity University,Kolkata , et al.

A.Khanna and V.Gupta are the third year students of
Information Technology and Computer Science & Engg.
branch respectively of Sir Padampat Singhania University.

                                                                                                  ISSN 1947-5500

Description: IJCSIS is an open access publishing venue for research in general computer science and information security. Target Audience: IT academics, university IT faculties; industry IT departments; government departments; the mobile industry and computing industry. Coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; computer science, computer applications, multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. The average paper acceptance rate for IJCSIS issues is kept at 25-30% with an aim to provide selective research work of quality in the areas of computer science and engineering. Thanks for your contributions in September 2010 issue and we are grateful to the experienced team of reviewers for providing valuable comments.