ARD by ashrafp

VIEWS: 22 PAGES: 27

									 P2P INSPECTOR GADGET
                   PROJECT
PORTAUTHORITY® TECHNOLOGIES & AMOS TEAM




   Ben Gurion University – Software Engineering department
P2P Inspector Gadget – Learning P2P agent                                        AMOS


Document Properties
File Name        AMOS_ARD.doc
Last Printed     7/26/2011 9:48:00 PM



Revision Properties
Author           AMOS TEAM                                          25-Nov-05
Checked by
Approved
by



Revision History
Revision         Date              Edited By          Notes
0.1              25-Nov-05         Miki & Shai        First Draft
                                   Koffman
0.2              26-Nov-05         Amir Freiman
0.3              26-Nov-05         Shai Koffman       Added functional
                                                      requirements
0.4              03-Dec-05         Miki Koffman       Fixes
0.5              04-Dec-05         Amir & Ohad        Added use cases.
0.6              05-Dec-05         Shai Koffman       Refine Document and add
                                                      Diagrams
0.7              07-Dec-05         AMOS team          Change document due to
                                                      meeting with Boris
0.8              11-Dec-05         Lidror Troyansky   Overview and remarks
                                   (PortAuthority®
                                   Technologies )
0.9              11-Dec-05         AMOS team          Fixing Lidror's remarks.




26 July 2011                 Revision 0.3             Page 2
P2P Inspector Gadget – Learning P2P agent                                           AMOS


Table of Content
1    INTRODUCTION                                                                             5
    1.1 VISION                                                                                5
    1.2 THE PROBLEM DOMAIN                                                                    6
    1.3 STAKEHOLDERS                                                                          6
    1.4 SYSTEM CONTEXT                                                                        7
    1.5 SYSTEM INTERFACES                                                                     7
     1.5.1 Hardware Interfaces                                                                7
     1.5.2 Software Interfaces                                                                7
     1.5.3 Events                                                                             7
2    FUNCTIONAL REQUIREMENTS                                                                  8
  2.1 CONNECTION TO A P2P NETWORK (GNUTELLA).                                                 8
  2.2 DEFINING THE SEARCH PARAMETERS AND KEYWORDS BY THE USER.                                8
  2.3 SCANNING AND LOOKING FOR SUSPICIOUS TARGET (E.G. AS CONFIDENTIAL) INFORMATION IN THE
  P2P NETWORK (GNUTELLA).                                                                     8
  2.4 DOWNLOADING THE SUSPICIOUS TARGET (E.G. AS CONFIDENTIAL) INFORMATION FROM THE P2P
  NETWORK (GNUTELLA).                                                                         9
  2.5 ANALYZING THE SCANNED RESULTS (DETERMINE THE VALUE OF THE DOCUMENTS).                   9
  2.6 PERIODIC DATA LEAKAGE INSPECTION                                                        9
  2.7 INFORMATION LEAKAGE DISTRIBUTION STATISTICS                                            10
  2.8 STATISTICS GATHERING                                                                   10
  2.9 REPORTS EXTRACTION                                                                     10
  2.10     CONNECTING TO OTHER P2P NETWORKS.                                                 11
  2.11     ANTIBOT AVOIDANCE.                                                                11
3    NON-FUNCTIONAL REQUIREMENTS                                                             11
  3.1 PERFORMANCE CONSTRAINTS                                                                11
     3.1.1 Speed Capacity and Throughput                                                     11
     3.1.2 Reliability                                                                       11
     3.1.3 Safety and Security                                                               12
  3.2 PLATFORM CONSTRAINTS                                                                   12
     3.2.1 Personal Computer.                                                                12
     3.2.2 A Database server.                                                                12
     3.2.3 Programming languages.                                                            12
  3.3 SE PROJECT CONSTRAINTS                                                                 12
     3.3.1 feasability study                                                                 12
     3.3.2 Project progress                                                                  12
  3.4 SPECIAL RESTRICTIONS AND LIMITATIONS                                                   13
     3.4.1 Simulated environment                                                             13
4    USAGE SCENARIOS                                                                         13
  4.1 USER PROFILES – THE ACTORS                                                             14
  4.2 USE-CASES                                                                              15
  Start System                                                                               15
  Disconnect from network                                                                    16
  Connect to the network                                                                     17
  Shutdown system                                                                            18
  Scan network.                                                                              19
  Scan network - Use Case Diagram                                                            20
  Analyze downloaded files                                                                   21
  Analyze downloaded files - Use Case Diagram                                                22
                                                                                             22


26 July 2011                 Revision 0.3                  Page 3
P2P Inspector Gadget – Learning P2P agent                           AMOS
  Update system parameters.                                                23
  View algorithm’s result and feeds back on the results.                   24
  View statistics                                                          24
  4.3 SPECIAL USAGE CONSIDERATIONS                                         26
5    APPENDICES                                                            26
  5.1 P2P / PEER TO PEER NETWORK                                           26
  5.2 BAYESIAN FILTERING ALGORITHM EXPLANATION                             26
  5.3 GNUTELLA NETWORK EXPLANATION                                         27




26 July 2011                    Revision 0.3               Page 4
P2P Inspector Gadget – Learning P2P agent                                                   AMOS

1 INTRODUCTION
This document describes the functional and non-functional requirements of the system.
This document stands as a non-formal contract between Ben Gurion University, AMOS team
members and PortAuthority® Technologies.
This system ("P2P Inspector Gadget") designed as a research platform to perform smart search
and tracking of target information (confidential) over P2P networks (Gnutella), using advanced
search technologies and machine learning mechanisms.
PortAuthority Technologies® is a pioneer company in the field of organization data-leak detection
and prevention.
Peer to Peer (P2P) networks have become a major part of the worldwide communication, that are
mainly used for file-sharing (such as Kazaa, Gnutella) and for real-time data such as VoIP
(Skype).
A great danger to an organization can result from sensitive data that leaks into such file-sharing
network.




                 * red dots simulates peers with the target sensitive information.




1.1 VISION
The P2P Inspector Gadget system should give a solution for detecting information leaks in P2P
networks, by first connecting to the P2P network and by searching the network for relevant files
and using machine learning algorithms to recognize the leaked information.




26 July 2011                    Revision 0.3                     Page 5
P2P Inspector Gadget – Learning P2P agent                                                    AMOS
1.2 THE PROBLEM DOMAIN




         Computer A:                           Laptop B:                   Router          PDA C:
 Sharing non-confidential files        Containing an organization                 Searches and downloads
                                            confidential file                   organizations confidential file




                 Router
                                                Gnutella network




                                    Router
  P2P Inspector                                  Router      Organization Firewall        Client Organization
    Gadget

Our system works over the Gnutella network, and should implement the clients API.
Definitions:
      “Target information” – The kind of data the system is defined to look for (Organization's
          confidential information).
          During the development the target information may be defined as eBooks for example (in
          order to examine the correctness of the algorithms.)
      P2P – peer to peer networks. We will use Gnutella.
      “Machine learning” – Area of Artificial Intelligence concerned with the development of
          techniques which allows computers to “learn” (we will use Bayesian filtering).
      "IP Geolocation" – transforming IP addresses to the "real world" location.



1.3 STAKEHOLDERS
The stakeholders for our product are:
       PortAuthority® Technologies – Our research work may help them to extend their
        products into the P2P networks and to present better solutions to the industry world.
       Companies and organizations – Our research work will increase companies and
        organizations data-security and will detect and alert for the organization's data-leaks in
        the P2P network.




26 July 2011                    Revision 0.3                      Page 6
P2P Inspector Gadget – Learning P2P agent                                                   AMOS
1.4 SYSTEM CONTEXT
The system shall scan the network for “target information” which resides in shared files across the
network.
Major inputs include rules which “teach” the system about what is “target information” to be sought
after.
The system shall evolve these rules and extract new rules by itself.


1.5 SYSTEM INTERFACES
  1.5.1   HARDWARE INTERFACES
          No special hardware will be used: A PC which is connected to the Internet.
          The system must have permissions to use P2P networking in the Firewall's rules.
  1.5.2   SOFTWARE INTERFACES
          The system shall interact with P2P networks. Therefore it will make use of exposed
           API of those networks.
          The system should integrate with several Libraries from PortAuthority® Technologies
           related to documents format conversions (PDF to ASCII and DOC to ASCII).
          The system shall interact with a database containing the Machine learning results of
           searches in the P2P network for later statistics and reports analysis.
  1.5.3   EVENTS
          The system shall scan the network for its target information.
          The system will not be subject to outside events (e.g. although the system should act
           like P2P network client, it will not upload data to other users).




26 July 2011                    Revision 0.3                    Page 7
P2P Inspector Gadget – Learning P2P agent                                                 AMOS


2 FUNCTIONAL REQUIREMENTS
Number & Name:
                             2.1 CONNECT TO A P2P NETWORK (GNUTELLA).
Description:                 The system shall connect to the Gnutella network as a client.
Customer Benefit:            Fundamental requirement, the customer must have this ability in order
                             to search in the P2P network.
Customer Priority:           High
Target Release:              1.0
Change History:              25-11-2005 Created.
                             2-12-2005 Edited.



Number & Name:
                             2.2 DEFINE THE SEARCH PARAMETERS AND
                                 KEYWORDS BY THE USER.
Description:                 The system shall enable the user to define the search parameters and
                             keywords, according to the given parameters the system will know how
                             to scan for the target information within the P2P network.
Customer Benefit:            The customer will be able to define the keywords which characterize the
                             suspicious target information, resulting with more flexibility and reliable
                             search results.
Customer Priority:           High
Target Release:              1.0
Change History:              2-12-2005 Created.



Number & Name:
                             2.3 SCANNING AND LOOKING FOR SUSPICIOUS
                                 TARGET (E.G. AS CONFIDENTIAL) INFORMATION IN
                                 THE P2P NETWORK (GNUTELLA).
Description:                 The system shall scan the network for the suspicious target (e.g. as
                             confidential) information defined by the user, and download the relevant
                             files to the client.
                             Optional: Improve search method (not in version 1.0).
Customer Benefit:            Knowledge of possible existence of target information in the P2P
                             network.
Customer Priority:           High
Target Release:              1.0
Change History:              25-11-2005 Created.
                             2-12-2005 Edited.




26 July 2011                 Revision 0.3                     Page 8
P2P Inspector Gadget – Learning P2P agent                                                  AMOS
Number & Name:
                             2.4 DOWNLOAD THE SUSPICIOUS TARGET (E.G. AS
                                 CONFIDENTIAL) INFORMATION FROM THE P2P
                                 NETWORK (GNUTELLA).
Description:                 The system shall Download from the network the suspicious target (e.g.
                             as confidential) information defined by the user.
                             The system will download files which where detected as suspicious by
                             the keywords and parameters that the user entered the system.
Customer Benefit:            The ability to keep the suspicious target information in order to analyze
                             it later.
Customer Priority:           High
Target Release:              1.0
Change History:              25-11-2005 Created.
                             2-12-2005 Edited.



Number & Name:
                             2.5 ANALYZE THE SCANNED RESULTS (DETERMINE
                                 THE VALUE OF THE DOCUMENTS).
Description:                 The system will use the Machine learning based on the filtering
                             algorithm to classify the documents.
                             This stage is done after the files of the target data (suspected classified
                             documents) were downloaded.
                             The system should alert the user when finding the target data.
Customer Benefit:            The client will get alerts of the documents detected as confidential and
                             will not have to look for them himself.
Customer Priority:           High
Target Release:              1.0
Change History:              26-11-2005 Created
                             2-12-2005 Edited.
                             5-12-2005 Edited – changed phrasing.



Number & Name:
                             2.6 PERIODIC DATA LEAKAGE INSPECTION
Description:                 The system will be able to do a periodic check for leakage of
                             confidential information in order to find out if there have been any leaks
                             related to the company/organization since last checked.
                             and save the results in a database.
Customer Benefit:            The users of the system will have a history of information leaks without
                             any need for someone to work with the system.
                             The system shall be configured to perform routine checks that will track
                             the data leaks of the organization.
Customer Priority:           Medium
Target Release:              1.1
Change History:              26-11-2005 Created




26 July 2011                 Revision 0.3                      Page 9
P2P Inspector Gadget – Learning P2P agent                                                   AMOS
Number & Name:
                             2.7 INFORMATION LEAKAGE DISTRIBUTION
                                    STATISTICS
Description:                 The system will gather statistics on the distribution of traced confidential
                             information on the P2P network from the discovery time and save the
                             results in a database.
Customer Benefit:            The users of the system will have an idea of the potential damage
                             created by the information leak to the company/organization.
Customer Priority:           Medium
Target Release:              1.1
Change History:              26-11-2005 Created




Number & Name:
                             2.8 STATISTICS GATHERING
Description:                 The system will gather the following statistics and store them in the
                             system's database:
                                   The number of users which currently hold the target information.
                                   Using IP Geolocation and finding out the geographic location of
                                      the leaked information.
                                   The history of searched for, downloaded & analyzed files.
Customer Benefit:            The user will get more reliable results of information-leaks. And will be able
                             to track the distribution of their confidential information.
Customer Priority:           Medium
Target Release:              1.1
Change History:              03-12-2005 Created
                             05-12-2005 Edited.



Number & Name:
                             2.9 REPORTS EXTRACTION
Description:                 The system will enable to get the following reports:
                                 List of the target information documents that were found in the
                                    P2P network.
                                 Geographic distribution of a tracked document (not in version
                                    1.0).
Customer Benefit:            The user will have the abilities to get reports that will help the
                             organization in making the right estimation of the organization P2P data
                             leakage.
Customer Priority:           High
Target Release:              1.0
Change History:              03-12-2005 Created.
                             05-12-2005 Edited.




26 July 2011                 Revision 0.3                      Page 10
P2P Inspector Gadget – Learning P2P agent                                                   AMOS


Number & Name:
                                2.10 CONNECT TO OTHER P2P NETWORKS.
Description:                    The system will be able connect to more P2P networks and perform all
                                the related functionality
Customer Benefit:               The system will potentially cover a wider range of internet bandwidth.
Customer Priority:              Medium.
Target Release:                 1.1
Change History:                 07-12-2005 Created.



Number & Name:
                                2.11 ANTIBOT AVOIDANCE.
Description:                    The system should be able to avoid the AntiBot programs running on
                                the P2P networks (in case there is AntiBot protection in the P2P
                                networks supported by the system).
Customer Benefit:               Many AntiBot programs with various purposes run on the P2P network,
                                looking for Computerized-Clients.
                                If we improve the anti AntiBot activities we can ensure more reliable
                                system.
Customer Priority:              Medium.
Target Release:                 1.1
Change History:                 3-12-2005 Created.



3 NON-FUNCTIONAL REQUIREMENTS
3.1 PERFORMANCE CONSTRAINTS
  3.1.1   SPEED CAPACITY AND THROUGHPUT

     3.1.1.1    The system should return a search result for suspicious target after no more than 15
                minutes.

     3.1.1.2    The system should report on bad Internet connection after 5 attempts of establishing
                connection.

      3.1.1.3   The system will enable the user to set the download time-out, if the user did not
                configure the time-out, a default value should be set.

      3.1.1.4   The system should hold history result and statistics of not more than one year ago.
  3.1.2   RELIABILITY

      3.1.2.1   The system's decisions will be based only on math formulas and facts (there will be no
                arbitrary decisions).




26 July 2011                    Revision 0.3                     Page 11
P2P Inspector Gadget – Learning P2P agent                                                     AMOS
 3.1.3 SAFETY AND SECURITY

     3.1.3.1   The system will not be used for any other purpose than find information leaks
               in P2P networks (e.g. to find MP3 shares).

     3.1.3.2   The system will not expose the confidential documents it downloads and the
               documents were used by the Machine Learning algorithm.


3.2 PLATFORM CONSTRAINTS
Number & Name:
                                 3.2.1    PERSONAL COMPUTER
Description:                   The system will run on an end console which will reside on a PC, which
                               will be connected to the Internet.
Customer Benefit:              n/a
Customer Priority:             High
Target Release:                1.0
Change History:                26-11-2005 Created.



Number & Name:
                                 3.2.2    A DATABASE SERVER
Description:                   The system will need a server to hold statistical information and
                               functional information (needed by the algorithm).
Customer Benefit:              n/a
Customer Priority:             Medium
Target Release:                1.1
Change History:                05-12-2005 Created.



Number & Name:
                                 3.2.3    PROGRAMMING LANGUAGES
Description:                   The system should be programmed with Python, Java/J2E, C++ and
                               C#.
Customer Benefit:              n/a
Customer Priority:             High
Target Release:                1.1
Change History:                11-12-2005 Created.


3.3 SE PROJECT CONSTRAINTS
 3.3.1   FEASABILITY STUDY

         The project is mainly a research project therefore:
         We will conduct a feasibility study in order to see if the Bayesian algorithm is suitable for
         identifying confidential documents.
         For feasibility test date check the AMOS website Gantt.
         The project progress will be defined by these results.


 3.3.2   PROJECT PROGRESS
         Some of the progress in the project will be based on the feasibility study result:


26 July 2011                    Revision 0.3                      Page 12
P2P Inspector Gadget – Learning P2P agent                                                AMOS
         3.3.2.1 In case that the feasibility study results are satisfying (over 90% of the
                  positives are true positives, less than 10% false negatives) all the
                  requirements for version 1.1 will be included in the product.

          3.3.2.2   In case the feasibility study results are not satisfying, the rest of the
                    project will focus on finding or creating a satisfying algorithm to identify
                    confidential information and all the requirements defined for version 1.1
                    will be optional and included if time permits.



3.4 SPECIAL RESTRICTIONS AND LIMITATIONS
 3.4.1   SIMULATED ENVIRONMENT
         Due to the difficulties in finding many leaked information at a specific time in the P2P
         networks,
         we will prepare a "simulated environment" running several clients trafficking
         "confidential information" and let the system work in this environment in order to
         research the Machine learning algorithm and in order to test the results of the system.



4 USAGE SCENARIOS


                                      2. Disconnect from Network


                                                                1. Start System

                             4. Shutdown system                                        3. Connect to the network




                                                               5. Scan network
                                                                                  6. Analyze downloaded files
         User




                                                               7.Update system
                                                                  parameters.




                                                                        8. View statistics




26 July 2011                    Revision 0.3                     Page 13
P2P Inspector Gadget – Learning P2P agent                                                     AMOS
4.1 USER PROFILES – THE ACTORS
User – The system will have several users. Only one type of user will exist, which will be classified
also as an administrator of the system.

System – The system will initiate several actions.

P2P networks – Passive actor.




26 July 2011                     Revision 0.3                     Page 14
P2P Inspector Gadget – Learning P2P agent                                                 AMOS


4.2 USE-CASES
Use Case ID:                 1
Use Case Name:               Start System
Created By:                  Ohad                      Last Updated By:                  Shai Koffman
Date Created:                2-12-2005                 Date Last Updated:                05-12-2005
Actor:                       User
Description:                 Upon system restart, it connects to the P2P network, inputs relevant
                             parameters and connects to the database.
Preconditions:               Available internet connection which enables peer to peer connectivity (All the
                             needed ports are open, and the firewall will not deny the connection).
Postconditions:              The system is connected to the P2P network as an ordinary client and will be
                             able to receive input commands.
Priority:                    High
Frequency of Use:            A few times per day
Normal Course Of
Events:                      Actor Actions                              System Actions
                             1. User starts system.                     2. System connects to the
                                                                        database.
                                                                        3. System reads all relevant
                                                                        parameters.
                                                                        4. System establishes connection
                                                                        to the network (see UC 3)
                                                                        5.System connects to database.



Alternative Courses:         Use Case 3




Exceptional Courses:




Includes:                    Use Case 3
Special Requirements:        n/a
Assumptions:                 n/a
Notes:                       n/a




26 July 2011                 Revision 0.3                     Page 15
P2P Inspector Gadget – Learning P2P agent                                                AMOS

Use Case ID:                 2
Use Case Name:               Disconnect from network
Created By:                  Ohad & Amir.              Last Updated By:
Date Created:                3-12-2005                 Date Last Updated:
Actor:                       User
Description:                 The user may want to work offline.
Preconditions:               System is already connected or attempts connection to the network.
Postconditions:              The system is disconnected from the P2P network.
Priority:                    High
Frequency of Use:            A few times per day
Normal Course Of
Events:                      Actor Actions                              System Actions
                             1. User chooses to disconnect.             2. System disconnects from the
                                                                        network.




Alternative Courses:
                             Actor Actions                              System Actions




Exceptional Courses:




Includes:                    n/a
Special Requirements:        n/a
Assumptions:                 n/a
Notes:                       n/a




26 July 2011                 Revision 0.3                     Page 16
P2P Inspector Gadget – Learning P2P agent                                                   AMOS


Use Case ID:                 3
Use Case Name:               Connect to the network
Created By:                  Ohad & Amir.              Last Updated By:
Date Created:                3-12-2005                 Date Last Updated:
Actor:                       User
Description:                 The user may want to work online.
Preconditions:               System is disconnected.
Postconditions:              The system is connected to the P2P network.
Priority:                    High
Frequency of Use:            A few times per day
Normal Course Of
Events:                      Actor Actions                                 System Actions
                             1. User chooses to connect to the             2. System establishes connection
                             network                                       to the network.
                                                                           3. System notifies user of success.
                                                                           4. System waits for input.




Alternative Courses:         In case of unsuccessful attempt to connect to the network
                             Actor Actions                                 System Actions
                                                                           2. System attempts to connect to
                                                                           the network for 5 unsuccessful
                                                                           times.
                             4. User must acknowledge.                     3. System outputs an error
                                                                           message to the user.



Alternative Courses:         User explicitly aborts connection
                             2.5 User aborts connection to the             2. System establishes connection
                             network                                       to the network.
                                                                           3. System waits for input.




Includes:                    n/a
Special Requirements:        n/a
Assumptions:                 n/a
Notes:                       n/a




26 July 2011                 Revision 0.3                        Page 17
P2P Inspector Gadget – Learning P2P agent                                                AMOS


Use Case ID:                 4
Use Case Name:               Shutdown system
Created By:                  Ohad & Amir.              Last Updated By:
Date Created:                3-12-2005                 Date Last Updated:
Actor:                       User
Description:                 The user wants to end the session, without any loss of information.
Preconditions:               System is up.
Postconditions:              The system is shutdown.
Priority:                    High
Frequency of Use:            A few times per day
Normal Course Of
Events:                      Actor Actions                             System Actions
                             1. User chooses to shutdown the           2. System closes all open
                             system.                                   connections to the network.
                                                                       3. System saves all relevant
                                                                       information in the database.
                                                                       4. System closes connection to the
                                                                       database.
                                                                       5. System shuts down.



Alternative Courses:         In case system is not connected to the network.
                                                                       2. is not included.




Exceptional Courses:         Problems in network connection.
                                                                       2. System attempts to close all
                                                                       open connections to the network.
                                                                       2.5 System notifies user.




Includes:                    n/a
Special Requirements:        n/a
Assumptions:                 n/a
Notes:                       n/a




26 July 2011                 Revision 0.3                    Page 18
P2P Inspector Gadget – Learning P2P agent                                                 AMOS


Use Case ID:                 5
Use Case Name:               Scan network.
Created By:                  Ohad & Amir.              Last Updated By:                  Shai Koffman
Date Created:                3-12-2005                 Date Last Updated:                5 – 12 -2005
Actor:                       User
Description:                 The system scans the network for the target information.
Preconditions:               After successful use case 3.
Postconditions:              Search hits are downloaded.
Priority:                    High
Frequency of Use:            A lot of times per day (~ a few dozens).
Normal Course Of
Events:                      Actor Actions                              System Actions
                             1. User chooses to search the              2. System awaits user input (such
                             network.                                   as search keywords).
                             3. User enters search parameters.          4. System scans the network
                                                                        according to an algorithm which
                                                                        will be decided upon at a later
                                                                        phase.
                                                                        5. System notifies user of end of
                                                                        scan.
                                                                        6. The system downloads all the
                                                                        search hits to a local machine.
                                                                        7. Continues on use case 6.

Alternative Courses:         User aborts search.
                             4.5 User chooses to abort the              4. System scans the network
                             search or download activity.               according to an algorithm which
                                                                        will be decided upon at a later
                                                                        phase.

Exceptional Courses:         Connection is lost.
                                                                        4. System scans the network
                                                                        according to an algorithm which
                                                                        will be decided upon at a later
                                                                        phase.
                                                                        4.5 System notifies user of
                                                                        connection loss.
                                                                        4.6 System saves status of
                                                                        downloaded files for follow-up.
Includes:                    Use case 6
Special Requirements:        n/a
Assumptions:                 n/a
Notes:                       n/a




26 July 2011                 Revision 0.3                     Page 19
P2P Inspector Gadget – Learning P2P agent                             AMOS
Scan network - Use Case Diagram




        User                     System


               1: start scan
                                      2: Scan the network



                                     3: Download results to disk

               4: end of scan


                                      5: start Use case 6




26 July 2011                    Revision 0.3                Page 20
P2P Inspector Gadget – Learning P2P agent                                                 AMOS
Use Case ID:                  6
Use Case Name:               Analyze downloaded files
Created By:                  Ohad & Amir.               Last Updated By:
Date Created:                3-12-05                    Date Last Updated:
Actor:                       System
Description:                 The system analyzes the downloaded files for the target information.
Preconditions:               Successful use case 5 (files are downloaded).
Postconditions:              Target information is analyzed and statistics are updated.
Priority:                    High
Frequency of Use:            A lot of times per day (~ a few dozens).
Normal Course Of
Events:                      Actor Actions                               System Actions
                                                                         1. System converts downloaded
                                                                         files to a text format.
                                                                         2. System scans through the
                                                                         converted files for target
                                                                         information using special
                                                                         algorithms (such as Bayesian).
                                                                         3. System updates statistics and
                                                                         history tracking.




Alternative Courses:         System is unable to convert file(s) to a text format.
                                                                         1. System converts downloaded
                                                                         files to a text format.
                                                                         1.5 System notifies user and
                                                                         records the file's status.
                                                                         1.6 System moves on to the next
                                                                         file.
Exceptional Courses:




Includes:                    n/a
Special Requirements:        n/a
Assumptions:                 n/a
Notes:                       n/a




26 July 2011                 Revision 0.3                      Page 21
P2P Inspector Gadget – Learning P2P agent                     AMOS
Analyze downloaded files - Use Case Diagram




     System


          1: Convert Files on disk to text format



           2: Scan files using "smart" algorithm



          3: Save results to statistics database




26 July 2011                  Revision 0.3          Page 22
P2P Inspector Gadget – Learning P2P agent                                                  AMOS
Use Case ID:                  7
Use Case Name:               Update system parameters.
Created By:                  Ohad & Amir.              Last Updated By:
Date Created:                3-12-05                   Date Last Updated:
Actor:                       User.
Description:                 The system holds several parameters and will allow the user to view and edit
                             them.
Preconditions:               n/a
Postconditions:              The parameters are updated in the system's database.
Priority:                    High
Frequency of Use:            Very low
Normal Course Of
Events:                      Actor Actions                                System Actions
                             1. The user chooses to edit/view             2. The system displays all the
                             the system's parameters.                     relevant parameters
                                                                          3. The system allows the user to
                                                                          edit any parameter.
                                                                          4. The system will allow the user
                                                                          to save or discard changes.




Alternative Courses:




Exceptional Courses:




Includes:                    n/a
Special Requirements:        n/a
Assumptions:                 n/a
Notes:                       n/a




26 July 2011                 Revision 0.3                       Page 23
P2P Inspector Gadget – Learning P2P agent                                                   AMOS


Use Case ID:                 8
Use Case Name:               View algorithm’s result and feeds back on the results.
Created By:                  Miki & Shai &               Last Updated By:
                             Ohad
Date Created:                7-12-2005                   Date Last Updated:
Actor:                       User.
Description:                 The system will show the results of the algorithm including the files that were
                             recognized as confidential and files that were not recognized as confidential.
                             The system will show the used the numbers of the total files were scanned
                             and the percentage that were found as confidential and non-confidential.
Preconditions:               Use case 3, use case 5 and use case 6 were performed.
Postconditions:              n/a
Priority:                    High
Frequency of Use:            Few times a day.
Normal Course Of
Events:                      Actor Actions                                System Actions
                             1. The user chooses to view the              2. The system displays the
                             algorithms results.                          algorithms results, and will allow
                                                                          the user to view file’s contents as
                                                                          a text.
                             3.The user feeds back to the                 4. The system uses the
                             system the correct and incorrect             feedbacked information to improve
                             decisions the algorithm made.                the algorithm .
Alternative Courses:         n/a




Exceptional Courses:         n/a




Includes:                    n/a
Special Requirements:        n/a
Assumptions:                 n/a
Notes:                       n/a




Use Case ID:                 9
Use Case Name:               View statistics
Created By:                  Ohad & Amir.                Last Updated By:
Date Created:                3-12-05                     Date Last Updated:
Actor:                       User.
Description:                 The system will maintain statistics and history tracking of its actions and will
                             allow old users to inspect periodic data leakage.


26 July 2011                 Revision 0.3                       Page 24
P2P Inspector Gadget – Learning P2P agent                                                    AMOS
Preconditions:                Statistics are available in the database.
Postconditions:                n/a
Priority:                      High
Frequency of Use:              Few times a day.
Normal Course Of
Events:                        Actor Actions                               System Actions
                               1. The user chooses to view the             2. The system displays the
                               system's statistics/history tracking.       statistics, in a graphical way and
                                                                           will allow the user to manipulate
                                                                           the data.

Alternative Courses:




Exceptional Courses:




Includes:                      n/a
Special Requirements:          n/a
Assumptions:                   n/a
Notes:                         n/a




26 July 2011                    Revision 0.3                     Page 25
P2P Inspector Gadget – Learning P2P agent                                                          AMOS

4.3 SPECIAL USAGE CONSIDERATIONS

5 APPENDICES
Basis for all appendices is taken from the wikepedia website .


5.1 P2P / PEER TO PEER NETWORK
A peer-to-peer (or P2P) computer network is a network that relies on the computing power and
bandwidth of the participants in the network rather than concentrating it in a relatively few number
of servers. P2P networks are typically used for connecting nodes via largely ad hoc connections.
Such networks are useful for many purposes. Sharing content files containing audio, video, data
or anything in digital format is very common, and real-time data, such as VoIP traffic, is also
passed using P2P technology.

A pure peer-to-peer network does not have the notion of clients or servers, but only equal peer
nodes that simultaneously function as both "clients" and "servers" to the other nodes on the
network. This model of network arrangement differs from the client-server model where
communication is usually to and from a central server.

Peer-to-peer architecture embodies one of the key technical concepts of the internet, described in
the first internet Request for Comments, dated 7 April 1969.

More recently, the concept has achieved recognition in the general public in the context of the
absence of central indexing servers in architectures used for exchanging multimedia files.




5.2 BAYESIAN FILTERING ALGORITHM EXPLANATION
Bayesian filtering is the process of using Bayesian statistical method to classify documents into
categories.

Bayesian filtering gained attention when it was described in the paper A Plan for Spam by Paul
Graham, and has become a popular mechanism to distinguish illegitimate spam email from
legitimate "ham" email.

Bayesian filtering take advantage of Bayes' theorem, says that the probability that a document is
of a certain group (confidential documents), given that it has certain words in it, is equal to the
probability of finding those certain words in a document from that group (confidential documents),
times the probability that any document is of that group (confidential documents), divided by the
probability of finding those words in any Group:




Furthermore, Bayesian theorem is a part of a statistical inference which is a large area in statistics
and mathematics.
The canonical example for the Bayesian world is a man that have doubts whether or nor taking an
umbrella, if he takes the umbrella and it will not be a raining day, he'll carry it in vain, and if it will
be a raining day and he won't take the umbrella he'll get wet.
The man looks out of the window and sees black clouds and decided to take the umbrella…
In the Bayesian terms there are the following definitions:

26 July 2011                       Revision 0.3                       Page 26
P2P Inspector Gadget – Learning P2P agent                                                           AMOS
        {i } - "World states" the states that can occur.
       Each state is extrinsic, and the conjunction of all of the states equals    .
       i  j : i  j   and  i   .
      Observations: X  {x1 ,..., xn } , this is the data we have, the facts.
       From those observation we will try to conclude the state of the world.
      Probabilistic model of the world: P  {P0 (i ), P( X | i )} .
       In the Bayesian approach we assume that we hold the complete probabilistic knowledge of
       the world. The knowledge include the a priori probability ( P0 (i ) ) to have the world state of
       i .
      Available activities: A  {a1 ,..., a k } , this is the set of activities we can choose from, every
       activity has its "price" and we would like to choose the best suitable activity according to the
       state of the world.
      Activities "prices":   { ( a k , i )} , Every activity we choose has its price according to
      the state of the world, activities that are not suitable to the current state of the world has a
      positive price, activities that are suitable to the current state of the world have a negative
      price.
We can use the Bayesian theorem in the "naïve" way to classify the downloaded documents.
We will treat any "token" independently from each other, and we will multiply their conditional
probabilities and get the result.
Tokens can be almost anything, starting from specific words and ending in the amount of
sequential capital letters in a document (anything that can help classifying the document).


5.3 GNUTELLA NETWORK EXPLANATION
Gnutella file sharing network used primarily to exchange music, films and software. It is a true
peer-to-peer network. it operates without a central server. Files are exchanged directly between
users.

Gnutella client programs connect to the network and share files. Search queries are passed from
one node to another in round-robin fashion. Gnutella clients are available for a number of
platforms.

According to the file sharing website Slyck.com, Gnutella is the fourth-most-popular file sharing
network in the Internet, following eDonkey 2000, BitTorrent, and FastTrack. While figures vary
from hour to hour and day to day, Gnutella is thought to host on average approximately 1.8 million
users, although around 400,000-500,000 at one time.




26 July 2011                       Revision 0.3                       Page 27

								
To top