Learning Center
Plans & pricing Sign in
Sign Out

Improving Email Host Security using Maintainability of Open Source


									Open Source Science Journal                                                 Vol. 2, No. 3, 2010

            Improving Email Host Security using Maintainability of
                    Open Source Bayesian Spam Filters

                                   Mădălina ZURINI
                     Academy of Economic Studies, Bucharest, Romania

        Abstract: The security concept is presented in the field of electronic messages.
Different types of spam filters are characterized along with the principles of email analysis.
The concept of Bayesian spam filter is debated within the open source distribution. The
maintainability characteristic is highlighted in terms of spam filtering. The impact given by
the cost of assuring a high quality product during the life cycle phases transforms the problem
of maintainability into a solution for decreasing the final cost of a software product. The main
functions of BayesianCS open source spam filter are analyzed for creating measurement
metrics and evaluating the maintainability of the software product. A procedural model is
described for increasing the quality of the maintainability process of an open source Bayesian
spam filter.

       Keywords: Bayesian spam filter, maintainability, security, software quality
characteristics, open source, software life cycle.

   1. Email host security

       Being the most used functionality over the Internet, the email had become one of
today’s standards of communication. Even though it was originally created to send simple text
messages, it is now more robust over the last years. Now, based on HTML, the email can use
the same code as a web page for incorporating text format, colors and images inside the
       Beside the security problems like computer break-ins, the problems with electronic
messages security are hard to detect. Security is defined as being the environment in which all
data sent from the expeditor to the receiver to be original, complete and accurate, and not
being read by another person. The main characteristics of a secure message in an intense
communication systems, as there are described in [01], are:
    • authenticity – the characteristic of a message to be the original one sent be the
       expeditor and received by the destination part;
    • integrity – the property of the sent data to be complete and accurate;
    • confidentiality – the situation in which the data from the sent message has not been
       read by another person beside the receiver;
    • non- repudiation – the impossibility of a sender to deny the message sent and received
       by the destination intended.
       In [*01] the security threads of email communications are presented, as follows:
    • eavesdropping – the situations when a person can potentially read and copy a private
    • identity theft – when the username and password is stolen and used for sending
       messages under the stolen identity;
    • invasion of privacy – sending an email over SMTP has the risk of finding the IP
       address of the computer from where the email had been sent;
    • message modification – the system administrator of the email server can read or delete

Open Source Science Journal                                                  Vol. 2, No. 3, 2010

        the message before it arrives to its destination;
   •    false messages – the false messages, also called spam, can include many viruses along
        with the other problems caused by the time and money wasted by the receiver of the
    • message replay – resending saved emails that were saved by a thief;
    • unprotected backups – the plain text format of the message saved on the email server
        can be read by anyone that can access them, even though the receiver of the email had
        deleted it;
    • repudiation – even though a person had sent an electronic message, he can
        successfully deny it.
        One of the sources of those security threads is the existence of unsolicited electronic
messages. Spam is a problem for both the email addresses’ users and for Internet Service
Providers. There is no standard definition of spam. The difficulty in treating these messages
comes from the subjective side of the clients that receive them. It is necessary for explaining
this concept using both an objective and a subjective analysis.
        As seen from an objective and legal point of view, spam represents an advertising
message that was not requested and for which an agreement hasn’t been given for it by the
receiver to be sender. The subjective definition, on the other hand, is much more restrictive,
thus harder to follow. Spam represents everything that the recipient considers to be spam.
        The costs resulting from the existence of such emails is transmitted to both the users
and the Internet Service Providers. Unfortunately, most of the times, these costs are not
understood by some of the email addresses’ owners, who see spam messages as annoying
emails that they can easily delete. The dramatically increased costs are being experienced by
Internet Service Providers on two different levels: one is the overloading of the email servers
and the other is narrowing the bandwidth used by the emails that travel from the sender to the
receiver, costs that consequently affect the clients by diminishing the quality of the Internet
services that are provided.

   2. Open source spam filters

        The increasing number of spam messages has generated the need for spam filters. Not
even the attempts, some remaining unfinished but others that have been legally materialized,
such as the 2003 conference at MIT Cambridge, where CAUCE (Coalition Against
Unsolicited Commercial E-mail) was elaborated, haven’t diminished the undesirable effects
that these unwanted messages have upon people, companies, consuming important human and
material resources.
        At this moment, research and development in this field have more than one direction
resulting in many types of spam filters that implement various technologies. In the following,
six types will be listed along with a short presentation of them.
        Client and server solutions. Both of these solutions have advantages and are being
used differently depending on the desired results. For a user who has problems with spam
messages integrating a client solution type of spam filter brings major benefits in blocking the
reception of such unwanted messages. However, within an organization, using server solution
types of spam filters that can be custom adjusted and controlled by the network administrator
are recommended. This way, the time spent by every employee for adjusting and training the
spam filter is diminished and filter criteria that are accepted by the organization in general and
not the employees in particular can be defined.

Open Source Science Journal                                                  Vol. 2, No. 3, 2010

         Challenge and response. A question is addressed to the sender of the email, to which
he has to answer correctly in order to differentiate him from a machine. This type of spam
filters allows identifying the legitimate senders in order to add them to the “whitelists”.
         Header analysis. This technique is used in the heuristic analysis performed by spam
filters. The header contains information that helps trace the emails, information which is
generally hidden from the receiver, due to the default settings of the email servers. Elements
testifying the spam characteristics of an email can be identified, such as sender’s IP that is not
the same to the domain’s name, the distribution list or possible errors that could not be
observed in the usual viewing mode.
         Digital signature. Also known as fingerprint, the digital signature identifies a message.
Thereby the signatures of the emails that are spam are stocked in a database and every time an
email with this signature is received within the network, it gets blocked. The major
disadvantage of this approach is the constant changing of spam emails’ signatures by adding
random text done by spammers.
         Address list. There are two types of lists. The white list also known in literature as
“whitelist” which includes the addresses of the senders from which a receiver accepts all
email messages and the “blacklist” which includes the addresses of the spammers whose
emails get blocked and aren’t received by the user. These lists can be implemented so that a
classification of the emails can be done based on their content. The whitelists are usually
filled up by each user, as for the blacklists, a collection named „Real- time blackhole” is being
updated within the network.
         Keyword list. As in the previous approach, keywords can be classified in two lists, a
white list and a black list, based on the same fundamentals. Thereby when an email containing
a word from the white list is received, it will be accepted, while receiving an email which
contains a word from the black list is not authorized. The disadvantages of this technique are
caused by both the approach oriented mainly on the client and less on the server and the
indecision which occurs when an email that is received contains both words that are on the
white list and on the black list.
         Bayesian statistical method. The statistical techniques of detecting spam use advanced
fundamentals that allow continuous and controlled updating. Each email receives a rating that
represents its probability of being spam. Emails are divided in words, and each word receives
an individual spam probability which finally results in a cumulated probability through the
individual probabilities’ multiplication.
         The Bayesian filter is the most well-known filter that uses this detecting technique,
and as time passed has proven to be an efficient method. The advantages are given by the high
percentage of detected spam emails which most of the times exceeds the 99% threshold, and
also by the low percentage of the legitimate messages mistaken as spam, which is named
“ham”. But a disadvantage is the training stage, the first step of the classification process that
must be done correctly in order to grant the success of the use, step that is eventually found in
every type of spam filter. A collection of spam and ham emails should be previously owned in
order to shape the network with the probabilities given to each word according to every user.
One of the first researchers that founded this filter that uses a Bayesian network was Paul
Graham in 2001 when he described in works [15] and [16] the principles of the method and
possible ways of optimization that were subsequently used successfully.
         As for the position of the spam filter in the process of transmission of a message, there
are four categories of filters, presented in contrast in figure 1:
     • filter as local proxy;
     • filter as an extension of the user’s email address;
     • filter as a local parallel process;
     • filter that runs on the email server for more users.

Open Source Science Journal                                                 Vol. 2, No. 3, 2010

                           Fig. 1. Taxonomy of spam filter location [2]

         Spam filters have seen a surprising evolution with the opening offered by open source
projects as well. In [3], the concept of open source software is defined as a free program
distributed in which source code is open and visible and its main features are:
     • free distribution– restriction is not permitted by license;
     • source code – it should be included and open to a product distributed through open
     • changes made on these products can be made and the resulting programs can
         themselves be distributed;
     • the integrity of the author’s code meaning that the product’s license shows clearly
         whether the programs resulting from changes can be distributed with the same name as
         the original product or not;
     • lack of discrimination – license does not discriminate any group of persons or areas in
         which the product is intended to be used.
         Open source projects offer the willing developers the freedom to improve a specific
field of activity. This freedom allows faster building of solutions but only if the field present
interest. As spam emails assaults arose, and involved targeting a solution, open source spam
filters were not late to appear to combat this global phenomenon.

   3. Maintainability in the life cycle of a product

        From the 70’s, different authors have studied the maintainability phenomenon with the
desire purpose of determining the will that drive the need for change, the frequency and costs
that come with it, as it is presented in [4]. As a result of these studies classifications of the
maintenance were conducted for a better understanding of its significance. Lientz and
Swanson divided maintenance into three categories: corrective, adaptable and perfecting, to
which IEEE, in [6], adds a forth one, the emergence, that are defined as following:
    • corrective – changes made to a software following the discovery of errors;
    • adaptable – changes made to a software for the purpose of keeping the product’s
        objectives in an environment found in a process of constant change;
    • perfecting – changes made to a software to improve performance;
    • emergence – making unscheduled corrections cu maintain the system operable.

Open Source Science Journal                                                  Vol. 2, No. 3, 2010

         The term of maintenance has a different significance in the software area compared to
its use in the production area where it is defined as the process of maintaining a product in use
at the same level of performance. As opposed to the material products, software wear has
other causes which are well described in [7] by the term of obsolescence.
         Maintainability, as defined in [3], is the ability of a software to be maintained at an
appropriate use level after changes are made in instructions and the related databases, to
reflect the changes of the algorithm in accordance with new requirements of the users and not
         Maintenance cost is between 60-80% of the final cost of software so this feature
should be considered throughout the whole product life cycle.
         Steps taken in developing a computer application, also called life cycle, includes the
following phases, defined in [10]:
     • gathering the specifications;
     • analysis;
     • design;
     • implementation;
     • testing;
     • putting into service;
     • maintaining;
     • withdrawal or replacement.
         Early from the step of gathering the specifications, when the problem, the available
resources, the input data and the desired results are presented, a detailed analysis should be
done by testing similar products available, knowing the domain to integrate the possible
updates that may occur from this step forward. The earlier this process is done through the
steps of software developing, the cheaper the costs with the staff and the resources later
involved will be at the time when the necessary updates are done.
         The obsolescence that occurs during the use of the product, [7], can be seen in all three
of the components that come together in the software: the data, the software and the results
express through:
     • reliability drop;
     • structure corruption;
     • lose of consistency in the documentation;
     • maintainability drawback;
     • data corruption;
     • the ageing of the results’ presentation format;
     • existence of new methods and techniques for development.
         Maintenance is part of the collection of quality characteristics that define a software
application. The notion of quality has its origins back to Greek philosophy, when it was seen
in accordance with the term of excellence, and ideals. In the 18th century, quality was defined
as a value when the focus used to be on trade, market and products traded. Quality through
compliance came out in the 19th century at the time when the master productions’ occur. The
evolution of market produced goods and services transformed quality into achieving or
exceeding customer expectations, a modern notion that is encountered in marketing and other
developing areas.
         As seen from the producer’s point of view, in [08], a software is a development
process which comprises the steps mentioned in this chapter, instead for the user, it is
analyzed through its costs and according between the requests over the processed data and the
final effective result that the product offers.

Open Source Science Journal                                                   Vol. 2, No. 3, 2010

        Quality characteristics are analyzed by the developers and their input data are:
application structure, its objectives and its results and are materialized through values directly
correlated with the satisfaction that users of this software encounter. In [9], software quality is
seen as being composed from a set of components that collect, process, stock and distribute
information that is further used as a support in taking decisions and control within an
        The quality characteristic system is made up from:
     • maintainability – effort required to make certain changes;
     • reliability – capacity to maintain a level of performance;
     • efficiency – relation between the level of performance and the volume of resources
     • usability – effort required to use the software product;
     • portability – ability to be transferred from one work environment to another;
     • functionality – satisfaction of requirements set by the users.
        Maintenance is a complex process, just because of costs involved and the level of
distribution of effort throughout all the steps of the life cycle of a computer application. Thus,
it can be analyzed and controlled in terms of the 4 directions defined in [4]:
     • software’s ability to be analyzed given by the level of effort to identify parts and
        modules of the application to be amended;
     • software’s ability to be modified include the impact of the changes and total
        eliminations of some modules and functionalities;
     • software’s stability, after making the changes, that are unlikely to appear;
     • testability is uncounted in the moment of validation of the modifications made on the
        software product.
        In [8], a new direction for maintainability is introduced, correctness indicator,
representing the level of effort necessary for correcting the software errors, in order to face
the users’ requests.
        In [3], it is defined the indicator that helps calculating the level of maintainability as
                                                           , where:
   •   LO – represents the number of instructions of the software product before applying the
   • LM – represents the number of instructions that suffer modifications because of the
       changes made;
   • LE – represents the number of eliminated instructions;
   • LA – represents the number of added instructions.
       There is also a last stage of a life cycle of a software namely withdrawal from the
market or replacement. In [7], the concept of optimal duration of use of the computer
application is defined, depending on elements:
   • application intrinsic factors including data organization, software architecture, results
   • factors that characterizes the environment in which the application is being used;
   • technical factors that refer to the hardware part;
   • economical factors;
   • legislative factors for applications that use as entry data information that depend on the
       Climate and requirement change come with modifications made in software
applications leading to increasing risk in prolonged use of software. Thus we come to the idea

Open Source Science Journal                                                Vol. 2, No. 3, 2010

that maintainability is a process that must be taken into account in order to maintain a
standard level of market production of software applications. The objective is to see
maintainability not as a problem, but as a solution.

   4. Open source BayesianCS filter

        The application BayesianCS implements a Bayesian filter for dealing with spam
messages. This is a open source product, downloaded from the address [17]. The logic of the
application is being held by the two classes: Corpus and SpamFilter.
        Corpus Class consists in a list of words and the number of their appearance in the
given text; the class is used for training the filter.
        The constructors are:
    • public Corpus() – constructor implicit, included for the serialization;
    • public Corpus(TextReader reader);
    • public Corpus(string filepath).
        The principal methods are:
    • public void LoadFromReader(TextReader reader) – using regular expressions, all the
word found in the text are loaded in the class list;
    • public void AddToken(string rawPhrase) – adds a word in the class list or increments
the number of appearance of it if it is being found.
        SpamFilter Class implements the Bayesian spam filter. The class uses 2 variables of
Corpus type (one for the words found in the spam messages, and one for those found in the
ham ones), for calculating the probability of a word to appear in a spam message. The
calculated probabilities are then saved in an sorted dictionary, of a string – double type, and
then are used in Test method, that determine if a message is spam or not.
        The principal attributes:
    • public Corpus Bad – the list of words that tend to appear in spam messages;
    • public Corpus Good – the list of the words that tend to appear in ham messages;
    • public SortedDictionary <string, double> Prob – the list of probabilities of a word to
        appear in a spam message.
        The principal methods:
    • public void Load(Corpus good, Corpus bad) and public void Load(TextReader
        goodReader, TextReader badReader) – are used for initialing the spam filter with the
        entry dates specified;
    • private void CalculateProbabilities() – calculates the probabilities of the words from
        both the lists;
    • private void CalculateTokenProbability (string token) – for a given name, the method
        calculates the probability on this word to appear in spam compared to the probability
        of it to appear in ham messages;
    • public double Test(string body) – the method calculates for a given text the probability
        of being spam.

Open Source Science Journal                                                Vol. 2, No. 3, 2010

                           Fig. 2. Class of the module application

        The application comes with a testing module already implemented, representing figure
2. Clicking on the “Load TestData”, the spam filter is being trained with a predefined set of
data – the two Corpus classes are created from the spam.txt and good.txt files and a
SpamFilter object is initialized.
        Once the filter is being trained, three messages are available, a ham one, a spam one
and a possible spam, for testing the application. The user can also add his own test message.
        The classification using Bayesian filter is done according to the method defined by
Paul Graham in [15] and contains the following steps:
    • the existence of a collection of ham and spam emails;
    • these collections are divided into words, named by Graham tokens, with the help of
        predefined separators;
    • contouring the appearances of every word in those two collections;
    • the results offered by the stages presented above are consisted in 2 lists of word, one
        for the ham set and the other one for the spam, along with the number of appearances
        of each word in the lists mentioned;
    • this stage consists in calculating the spam probabilities for every word with the help of
        the 2 number of appearances; this stage is detailed in the following, adding the source
        code that calculated the probability for every token; it must be mentioned that for the
        ham words the number of appearances is doubled, for avoiding treating it as spam,
        instead of ham.

           private void CalculateTokenProbability(string token)
           int g = _good.Tokens.ContainsKey(token) ? _good.Tokens[token] *
           Knobs.GoodTokenWeight : 0;
           int b = _bad.Tokens.ContainsKey(token) ? _bad.Tokens[token] : 0;
           if (g + b >= Knobs.MinCountForInclusion)
           double goodfactor = Min(1, (double)g / (double)_ngood);
           double badfactor = Min(1, (double)b / (double)_nbad);

Open Source Science Journal                                                Vol. 2, No. 3, 2010

                 double prob = Max(Knobs.MinScore,
           Min(Knobs.MaxScore, badfactor / (goodfactor + badfactor)))

           if (g == 0)
                 prob = (b > Knobs.CertainSpamCount) ? Knobs.CertainSpamScore
           : Knobs.LikelySpamScore;
           _prob[token] = prob;
   • token represents the word for which the spam probability is calculated;
   • _good represents the collection of words from the ham mails;
   • _bad represents the collection of words from the spam mails.
       This technique offers every word a probability calculated using the formula of
conditioned probabilities defined by Bayes, namely:
                                                          , where:
   •   H – the hypothesis;
   •   D – data;
   •         – prior probability of H, the probability that H is true before the data D to be
   •             – conditional probability assigned to the data D and the hypothesis H being
   •          – probability of data D to be realized;
   •            – probability that hypothesis H is satisfied, data D being given in reference to
       the earlier assumption.
       From this formula, it was calculated the Bayesian formula of probabilities that a word
must belong to a spam mail as:
                                                        , where:

   •            – probability that the email containing the word W to be spam;
   •            – probability of occurrence of the word W in spam emails;
   •             – probability of occurrence of the word W in all emails held, spam and ham.

   5. Maintainability’s analysis of the open source application

        As it was presented in the previous chapters, maintainability is a characteristic that
needs to be taken in account in every step of the life cycle’s product. In the following, there
will be described the actions that needs to be realized, divided in the fifth stages of
development, for increasing the level of maintainability of BayesianCS open source spam
        Gathering the specifications. This stage has the purpose to guide the development
process, obtaining a correct system conforming to the users’ requests. Starting from the
understanding of the system’s environment, there can be added or deleted modules of the
        BayesianCS filter is a software application that classifies the emails into two
categories: ham and spam. This is done with the training of existing data. The principles that

Open Source Science Journal                                                 Vol. 2, No. 3, 2010

the product needs to follow are represented by the collaboration between different client
filters for offering server facilities in which a number of users’ data are stored. For the
satisfaction of this need, the degree of personalization must be taken in account along with the
level of security demanded by the applications that interact through the Internet and which
operate a series of personal information.
         In this stage, the level of maintainability can be measured as representing the level of
functionalities which the application implements, measure given by the indicator IFI and it is
equal to:
                                                     , where:
   •     NRFI represents the number of functionalities implemented by the application;
   •     NRTFD represents the total number of functionalities discovered in the process of
         gathering specifications.
         This metric, percentage expressed,                        represents the degree of
implementations of the existing possible modules in the area of interest that the application
can be used. The aim of this analysis is to fit the software product in the lot of available
applications similar to this one.
         Analysis. Represents the process in which it is being analyzed, refined and structured
the requests captured in the previous stage
     The BayesianCS filter has 3 components, namely:
     • data input;
     • data processing;
     • data output.
         For input, we have 2 files, named good and bad which each contains a collection of
ham and spam emails. The problem that is put here is given by the way that these information
is taken that needs improving by adding a module of interacting directly with the Inbox folder
of the email address owned by the user.
         Storage needs to be extended to a database, for offering facilities of manipulation,
security and collaborative working between different implementations of the application.
         A module for retraining the system the probabilities of the existing tokens needs to be
implemented in order to maintaining the balance between the calculated probabilities from the
database and the preferences of the users that are in a continuous change. Also, the user must
have the possibility for manually setting the filter, along with the automate one made by the
         Data processing implies information manipulation stored in the database by
calculating the Bayesian probability. For this component, nothing will be modified because of
the correct algorithm used, well described in comment lines, using variables with suggestive
names, giving the module a high level of maintainability.
         The output of the application is the classification of the emails. There results can be
used, by moving the emails in special folders, according to the classification. This additional
facility offers a superior structuring and organization.
         Figure 3 includes the schedule of the database resulted after the analysis described

Open Source Science Journal                                                 Vol. 2, No. 3, 2010

         Fig. 3. The database diagram resulted after applying maintainability analyze

       It is consisted in the following:
    • EmailUsers – represents the information stored about the users enrolled in the system;
    • Mails – the total stored information about the users’ emails; those are necessary if it is
       needed a further analysis of the defining components of an email;
    • Words – is the table in which the tokens are stored; the tokens have unique values for
       each user;
    • Whitelists – the addresses considered by the users as legitimate;
    • Blacklists – data about the addresses that are suspected to be of spammers.
       Adding this tables present the advantage of a modularization which, in the need of
new modifications, so the effort is diminished, increasing the application’s maintainability.
       Designing. Consists of understanding the nonfunctional requires and constrains. It also
implies creating a starting point of the implementation along with the decomposing of the
implementation’s activities.
       The integration of the new defined modules in the structure of the existing one must be
made using the modularization and the hierarchy. Figure 4 contains the graph associated to
the BayesianCS application after applying the maintainability processes.

Open Source Science Journal                                                   Vol. 2, No. 3, 2010

                     Fig. 4. The graph associated to the BayesianCS filter

         The modules marked with red represents the new facilities added during the process of
maintainability of the Bayesian spam filter and the blue ones consists in the ones owned by
the initial application
         The graph highlights the impact that a module has on the other parts of the product,
this helping in at the analysis of possible errors that are propagated in the software
         Implementation is the step of integration in the source code of the modifications which
were made. The parts that interact with the email address of the user can be done with the help
of the namespace Microsoft.Office.Interop.Outlook.

       For connecting to a email address, the following line codes are:
       Microsoft.Office.Interop.Outlook.Application myApp= new
       Microsoft.Office.Interop.Outlook.NameSpace mapiNameSpace =
       Microsoft.Office.Interop.Outlook.MAPIFolder myInbox =

         The classification of the emails in different folders, according to the probabilities
results, uses the same namespace, and the code is:


        Testing the efficiency of the changes made is the impact of the new modules has on
the entire system and the verification of the operations’ logic. So, it is defined the metric ITEFA
as being the percentage of success testes from the total number of testes done:
                                                       , where:
   •   NRTC is the number of successful testes;
   •   NRTT is the total number of testes; this value is calculated using the graph described

Open Source Science Journal                                                 Vol. 2, No. 3, 2010

        For every level from the graph the cardinal is being calculated, noted Cardi, and is
equal to the number of modules on the specific level. So, NRTT is:
                                                        , where:
    • n represents the number of the graph’s levels.
        The steps described above are the components of the procedural model presented in
figure 5. The evolution in time of the changes made is given by the evolution of consecutive
steps of the development of a software product.

               Fig. 5. The procedural model for increasing the maintainability
                           of BayesianCS open source spam filter

        The architectural analysis can offer valuable information for easily localizing the
effects of a change in the software. This analysis is used for preventing that the modifications
made can have effects on other modules of the application, theory also sustained in [12].
        Assuring the correctitude is done in mediation with the model verification and also its
structure and all the connections of the application’s functionalities.
        The objective of the approach is to highlight the number of existing relations existed
before or added after for assuring the crossing over the development process, for increasing
the quality of the software product and minimizing the errors propagated.
        Even though the effort sustained in trying to construct efficient and effective
mechanisms for predicting and assuring software quality, [13], this provocation still remains a
need to fulfill.

   6. Conclusions

        The direction that spam filters took over the past few years and the enormous increase
of the number of such unpleasant messages that email users receive made spam filters
solutions to get another approach more offensive. Those messages are a real source of security

Open Source Science Journal                                             Vol. 2, No. 3, 2010

         For designing a competitive security tool that is used for classifying the received
emails and for diminishing the number of unsolicited emails must take into account the
quality system characteristics.
         The new perspective of complexity in which it can always be integrated new modules
with special techniques of analysis, classification and blocking made from this area of spam
filters a domain that is continuous in a tremendous change due to the high number of spam
treats which invade the privacy of a normal email user.
         Like spammers do, adapting to the new methods of rejecting spam emails, so the
applications which implements such facilities need to have a correct implementation of their
modules for easily integrate in the future new functionalities for responding to spammers’
         For increasing the maintainability of the open source BayesianCS application by
structuring the new added processes and decreasing the effort for later modifications which
can arise, a procedural analysis has been implemented in chapter 5. This way, the
maintainability process had transformed itself from problem to solution for the following
versions of the software application.
         The systems’ needs to develop further in a short time also by keeping their quality
characteristics unchanged is made through development of informatics systems evolved and
easily maintainable as is presented in [11].


[1] M. Doinea, “Security Process Development in Collaborative Systems,” The Journal of
Applied Collaborative Systems, Vol. 1, No. 1, 2009, pp. 32- 40, ISSN 2066-7450.

[2] A. Gray and M. Haahr, „Personalised, Collaborative Spam Filtering,” Proceedings of 1st
Conference on Email and Anti-spam CEAS, 2004.

[3] M. Doinea, “Open Source Security – Quality Requests,” Open Source Science Journal,
Vol. 1, No. 1, 2009, pg. 126-135, ISSN 2066 – 740X

[4] G. Candora, A. De Lucia, G. A. Di Lucca, V. V. Patriciu and I. Bica, “An incremental
Object- Oriented Migration Strategy for RPG legacy Systems,” International Journal of
Software Engineering and Knowledge Engineering, 1999, Vol. 9, pp. 5-25, ISSN 1532-060X

[5] P. Grubb, A. A. Takang, “Book Review Software Maintenance Concepts and Practice,”
Journal of software maintenance and Evolution Research and Practice, Vol. 20, 2008, pp.
463 – 466, ISSN 1532-060X

[6] IEEE Standard 1219 – 1998, Standard for Software Maintenance, IEEE Computer Society
Press, 1998.

[7] I. Ivan and L. Teodorescu, Software Quality Management, Inforec Publishing House,
Bucharest, 1999, ISBN 1018-046x

[8] I. Ivan, M. Popa and I. Gh. Roşca, Quality management of software applications, ASE
Publishing House, 2007.

[9] I. Ivan and C. Boja, Practice for software application optimization, ASE Printing House,

Open Source Science Journal                                                  Vol. 2, No. 3, 2010

Bucharest, 2007, pg. 483.

[10] I. Lungu, Gh. Sabău, M. Velicanu, M. Muntean, S. Ionescu, E. Posdarie and D. Sandu,
Software systems. Analysis, design and implementation, Economica Press, Bucharest, 2003.

[11] Editorial Bord, “Special Issue of the 12th Conference on Software Maintenance and
Reengineering,” Journal of software maintenance and Evolution Research and Practice,
2009, Vol. 21, pp. 79 - 80, ISSN 1532-060X

[12] M. Anan, H. Saiedian and J. Ryoo, “An arhitecture – metric software maintainability
assessment using information theory,” Journal of Software maintenance and Evolution
Research and Practice, 2009, Vol. 21, pp. 1 – 18, ISSN 1532-060X

[13] D. Kozlov, J. Koskinen, M. Jakkinen and J. Markkula, Assessing maintainability change
over multiple software releases, Economica Press, Bucharest, 2008, Vol. 20, pp. 31 - 58,
ISSN 1532-060X.





                Mădălina ZURINI is currently a PhD candidate in the field of Economic
                Informatics. She graduated the Faculty of Cybernetics, Statistics and
                Economic Informatics in 2008 and a master in computer science researching
                the implications of Bayesian classifications for optimizing spam filters in
                2010. She is also engaged in Pedagogical Program as part of the Department
                of Pedagogical Studies. Her fields of interest are data classification, artificial
                intelligence, algorithm analysis and optimizations. She wants to pursue a
pedagogical career.


To top