Web-Object Rank Algorithm For Efficient Information Computing
The International Journal of Computer Science and Information Security (IJCSIS Vol. 9 No. 2) is a reputable venue for publishing novel ideas, state-of-the-art research results and fundamental advances in all aspects of computer science and information & communication security. IJCSIS is a peer reviewed international journal with a key objective to provide the academic and industrial community a medium for presenting original research and applications related to Computer Science and Information Security. . The core vision of IJCSIS is to disseminate new knowledge and technology for the benefit of everyone ranging from the academic and professional research communities to industry practitioners in a range of topics in computer science & engineering in general and information & communication security, mobile & wireless networking, and wireless communication systems. It also provides a venue for high-calibre researchers, PhD students and professionals to submit on-going research and developments in these areas. . IJCSIS invites authors to submit their original and unpublished work that communicates current research on information assurance and security regarding both the theoretical and methodological aspects, as well as various applications in solving real world information security problems.
- views:
- 75
- posted:
- 3/8/2011
- language:
- English
- pages:
- 6

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, February 2011
WEB-OBJECT RANK ALGORITHM FOR
EFFICIENT INFORMATION COMPUTING
Dr. Pushpa R. Suri Harmunish Taneja
Department of Computer Science and Applications, Department of Information Technology,
Kurukshetra University Maharishi Markendeshwar University,
Kurukshetra, Haryana- 136119, India. Mullana, Haryana- 133203, India
pushpa.suri@yahoo.com harmunish.taneja@gmail.com
Abstract - In recent years there has been considerable search results based upon various lexicons. As the web
interest in analyzing relative trust level of the web objects. contains the contradictions and hypothesis on a huge scale,
As the web contain facts and the assumptions on the global therefore finding the relevant information using search
scale resulting on various criterions for trusting web page. engines is a tedious job. With the help of object level
In this paper an algorithm is proposed which assigns a ranking [22], various objects on a domain independent of
rank to every web object like a requested document on the the query that describes the relative trust of the web page
web that specify the quality of that object or the relative can be prioritized. The object rank of a page depends upon
level of trust one can make on that web page. It is used for various factors associated with the web object.
object level information extraction for ranking search The organization of the paper is as follows. Related
results and is implemented in C++. In this paper the work is presented in section 2. Section 3 discusses the
behavior of object rank for different values of moister challenges of high quality search results. In section 4,
factor in a domain is analyzed. The results emphasize that Web_Object_Rank algorithm is proposed and discussed.
the moister factor can be useful in rank computation and The algorithm is implemented in section 5. Finally Section
further explore more web pages in alignment with the 6 concludes the paper on the basis of the results obtained.
user’s requirements.
II. RELATED WORK
Keywords- Random Surfer Model, Information Google is a prototype of a large-scale search engine
Computing, Web Objects, Information Retrieval System, that makes heavy use of the structure present in hypertext
Web Graph, Ranking, Object Rank. [1]. Google is designed to crawl and index the web
efficiently and produce much more satisfying search
I. INTRODUCTION results than existing systems. Link Analysis Ranking [16]
Information computing in various web domains is broadly emphasize that hyperlink structures are used to determine
extracting the web objects of unstructured nature like text the relative authority of a web page and produce improved
objects that convince information need from within large algorithms for the ranking of search results. The prototype
collections using document-level ranking and therefore the with a full text and hyperlink database of web pages is
structured information about real-world objects which is available at [8]. In the current era there is much concern in
embedded in static web pages. Online databases exist on the using random graph models for the web. The Random
web in huge amounts which are of unstructured nature. Surfer model [9] and the Page Rank-based selection model
Unstructured data refers to the data which does not have clear, [11] are described as two major models [10]. Page Rank-
semantically obvious structure [7]. In other words information based selection model tries to capture the effect that the
computing constitutes process of searching, recovering, and search engines have on the growth of the web by adding
understanding information, from huge amounts of stored data. new links according to Page Rank. The Page Rank
The information from the web can be retrieved by algorithm is used in the Google search engine [12] for
implementing searching techniques as Keyword based ranking search results. PageRank is a link analysis
Searching, Concept-based Searching, Hybrid Search, and algorithm used by the Google Internet search engine that
Knowledge Base Search. In case of object level information assigns a numerical weighting to each element of a
computing, domain based search is required. Every commercial hyperlinked set of documents, such as the World Wide
information retrieval systems try to facilitate a user’s access to Web (WWW), with the purpose of "measuring" its
information that is relevant to his information needs. This relative importance within the set. Google is designed to
paper highlights ranking problem for domain based be a scalable search engine with primary goal to provide
information retrieval, which states that every owner of the high quality search results over a rapidly growing WWW
document wants to improve ranking of its document for that it [18]. The PageRank theory suggests that even an
can do many manipulations on its document like increasing imaginary surfer who is randomly clicking on links will
number of links to the page by the dummy pages [1]. Object eventually stop clicking. The probability, at any step, that
based information computing maintain the integrity of the the surfer will continue is a damping factor d [2]. The
162 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, February 2011
damping factor (α) is eminently empirical, and in most cases IV. WEB_OBJECT_RANK ALGORITHM AND
the value of α can be taken as 0.85 [1]. Page Rank is the IMPLEMENTATION
stationary state of a Markov chain [2, 7]. The chain is obtained
Page Rank of a web object can be defined as the
by perturbing the transition matrix induced by a web graph
fraction of time that the surfer spends on an average on
with a damping factor that spreads uniformly over the rank.
that object. The probability that the random surfer visits a
The behavior of Page Rank with respect to changes in α is
web page is its Page Rank [1]. Evidently, web objects that
useful in link-spam detection [3]. The mathematical analysis
are hyperlinked by many other pages are visited more
of Page Rank with change in α show that contrary to popular
often. The random surfer gets bored and restarts from
belief, for real-world graphs values of α close to 1 do not give
another random web object with a probability termed as
a more meaningful ranking [2,21]. The order of displayed web
the moister factor (m). The probability that the surfer
pages is computed by the search engine Google as the
follow a randomly chosen outlink is (1-m).
PageRank vector, whose entries are the Page Ranks of the web
pages [4]. The Page Rank vector is the stationary distribution The Markov Chain is a discrete-time stochastic
of a stochastic matrix, the Google matrix. The Google matrix process: a process that occurs in a series of time-steps in
in turn is a convex combination of two stochastic matrices: each of which a random choice is made [7]. There is one
one matrix represents the link structure of the web graph and a state corresponding to each web object. Hence, a Markov
second, rank-one matrix, mimics the random behavior of web chain consists of N states if there are N numbers of Web
surfers and can also be used to fight web spamming. As a Objects in the collection. A Markov chain is characterized
consequence, Page Rank depend mainly the link structure of by an N × N Probability Transition Matrix P each of
the web graph, but not on the contents of the web pages. Also whose entries is in the interval [0, 1]; the entries in each
the Page Rank of the first vertex, the root of the graph, follows row of P add up to 1. Markov Property states that each
the power law [10]. However, the power undergoes a phase- entry Pij is the transition probability that depends only on
transition as parameters of the model vary. the current state i. A Markov chain’s probability
distribution over its states may be viewed as a Probability
Link-based ranking algorithms rank web pages by using the
Vector: a vector all of whose entries are in the interval [0,
dominant eigenvector of certain matrices--like the co-citation
1], and the entries add up to 1. According to [7, 14] the
matrix or its variations [17]. Distributed page ranking on top of
problem of computing bounds on the conditional steady-
structured peer-to-peer networks is needed because the size of
state Probability Vector of a subset of states in finite,
the web grows at a remarkable speed and centralized page
discrete-time Markov chains is considered.
ranking is not scalable [5].
Page ranking can be propagation rates depending on the A. Web_Object_Rank Algorithm: Features
types of the links and user’s specific set of interests [6]. Page Features of Object Rank Algorithm are as follow:
filtering can be decided based on link types combined with
Query independent algorithm (assigns a value to
some other information relevant to links. For ranking, a profile
containing a set of ranking rules to be followed in the task can every document independent of query).
be specified to reflect user’s specific interests [20]. Content independent Algorithm.
Similarities of contents between hyperlinked pages are useful Concerns with static quality of a web page.
to produce a better global ranking of web pages [19]. Object Rank value can be computed offline using
only web graph.
III. CHALLENGES Object Rank is based upon the linking structure of
the whole web.
The primary focus of Web Information Retrieval Support
System (WIRSS) is to address the aspects of search that Object Rank does not rank website as a whole but
consider the specific needs and goals of the individuals it is determined for each web page individually.
conducting web searches [15]. The major goal is to provide Object Rank of web pages Ti which link to page A
high quality search results over a rapidly growing World Wide does not influence the rank of page A uniformly.
Web. Google employs a number of techniques to improve More are the outbound links on a page T, less will
search quality including page rank, anchor text, and proximity page A benefit from a link to it.
information. Decentralized content publishing is the main Object Rank is a model of user’s behavior.
reason for the explosive growth of the web. Corresponding to a
user query there are many documents that can be retrieve by B. Web_Object_Rank Algorithm: Assumptions
search engine. And every owner of the document wants to If there are multiple links between two web objects,
improve the ranking of its document. Commercial search only a single edge is placed.
engine have to maintain the integrity of there search results and
this is one reason for the unavailability of the efforts made by No self loops allowed.
them publicly. Democratization of content creation on the web The edges could be weighted, but we assume that
generates new challenges in WIRSS. This gives rise to the no weight is assigned to edges in the graph.
question on integrity of web pages. In a simplistic approach,
one might argue that only some publishers are trustworthy and Links within the same web site are removed.
others not. One more challenge is fast crawling technology is Isolated nodes are removed from the graph.
needed to gather the web objects and keep them up to date.
163 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, February 2011
C. Web_Object_Rank Algorithm V. IMPLEMENTATION
This algorithm is basically a query independent algorithm This implementation is based upon random surfer
that takes a web graph as an input and assigns a rank to every model [7] and Markov chain [13, 14]. The random surfer
object which can specify the relative authorization of that web visit the objects in the web graph according to distribution
page. In the proposed algorithm, following is the list of based on which random surfer can be in one of the
variables following four possible states at any time.
moist_fact (m) is the moister factor: the probability of Initial state is state of the system from where it will
random surfer to restart search from another web object start its walk. The system is set in the random state by
1-m is the probability of the random surfer to search web randomly selecting an object using random function and
objects from randomly chosen outlinks value corresponding to that web object in the Probability
outlinks is the number of web objects linked with a Vector is set to unity. Rest of the values in the Probability
particular page Vector is zero. Steady state is that state of the system when
N is the number of objects in the domain the Probability Vector of random surfer fulfills the
prob[i][j] is the Probability Transition Matrix for all i ,j € properties of irreducibility and aperiodicity’s. To check
1 to N either the system get the steady state or not, two successive
values of the Probability Vector must be same. Ideal state
adj[i][j] is the Adjacency Matrix for all i ,j € 1 to N
is that state of the random surfer when the system achieves
x is the Probability Vector the steady state but at the same time web object ranks are
itr is Iteration distributed uniformly to all documents. Toggling state is
achieved by the random surfer when the system is not able
D. Web_Object_Rank Algorithm to reach at steady state and just toggle between two set of
object ranks.
Step 1. Create a web graph of various objects in a
domain.
Step 2. Set prob[i][j]=adj[i][j] O
1
Step 3. Compute number of out links from a particular
O
node say counter.
IF outlinks of web objects = NULL 4
THEN prob[i][j] is equally distributed for all i ,j
ELSE prob values are distributed according to O
number of outlinks 2
For all i,j IF (counter = 0) O O
THEN 5 6
prob[i][j]=1/N
ELSE O
IF (prob[i][j] =1) 3
THEN
prob[i][j] =1.0/counter
O O
Step 4. Multiply the resulting matrix by 1 − m.
Step 5. Add m/N to every entry of the resulting matrix, 7 8
to obtain Probability Transition Matrix.
For all i , j Do
prob[i][j]=(prob[i][j]*(1- m))+((m/N); O
Step 6. Randomly select a node from 0 to N-1 to start a 9
walk say s_int .
Step 7. Initialize Random surfer and itr to keep account O
1
of number of iterations required to 0. 0
Step 8. Try to reach at steady state with in 200 iterations
otherwise toggling occur
Step 9. Multiplying Probability Transition Matrixes
with Probability Vector to get steady state Fig. 1. Web Graph
Step 10. Check either system enters in steady state or not
Step 11. Print the ranks stored in Probability Vector x
and EXIT.
164 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, February 2011
C. Results and Discussion
The web graph shown in Fig 1 is used for analyzing various M oister Factor vsNo. of Iterations
factors of the proposed algorithm. Variation in graph structures Moister Factor No. of iterations
used for analysis change the performance of the algorithm. The
250
graph shows 10 web objects in a domain that are interlinked as
strongly connected graph. Every two nodes of the graph have a 200
path with less number of links. Oi is the ith web object in the
No. of Iterations
domain where i vary from 1 to 10. The adjacency matrix for 150
web graph of Fig 1 is shown in Fig 2.
100
0 1 0 0 0 0 0 0 0 0
50
0 0 1 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0
05
15
25
35
45
55
65
75
85
95
1
2
3
4
5
6
7
8
9
0
1
0 0 0 0 1 0 0 0 0 0
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
Moister Factor
0 0 0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 0 0 0 Fig. 3 . Moister Factor vs Number of Iterations
0 0 0 0 0 1 0 0 1 0
0 0 0 0 0 1 0 0 0 0 It is further analyzed that as the Moister Factor is equal
0 0 0 0 0 0 0 0 0 1 to 1, random Surfer enters into the Ideal state and the
corresponding rank values of the web objects is same as in
0 0 0 0 0 0 0 1 0 0
table 2. The graph for the ideal state is shown in Fig 4.
Fig.2. Adjacency Matrix for all i ,j € 1 to 10
Table 2: Ranks of objects at moister factor 1
To analyze the convergence speed, number of iterations Object Computed Rank
required by random surfer to reach at a steady state is recorded O1 0.1
in Table 1 and the corresponding graph is shown in fig 3. In O2 0.1
fig. 3 infinity value is shown by a large number of iterations
(200 or more). It clearly shows that as the moister factor O3 0.1
approaches 1, the number of iterations is reduced. O4 0.1
O5 0.1
Table 1: Moister Factor Vs No. of Iterations
Moister Factor No. of Iterations O6 0.1
0 Infinity O7 0.1
0.05 Infinity O8 0.1
0.1 Infinity O9 0.1
0.15 Infinity
O10 0.1
0.2 83
0.25 73
0.3 62 Computed Rank at Moister factor 1
0.35 46
0.4 41 Computed Rank
0.12
Computed Rank
0.45 33 0.1
0.5 35 0.08
0.55 39 0.06
0.04
0.6 24 0.02
0.65 21 0
0.7 20
1
2
3
4
5
6
7
8
9
10
O
O
O
O
O
O
O
O
O
O
0.75 22 Web Objects
0.8 16
0.85 12 Fig.4. Random Surfer Ideal State
0.9 11 Figure 5 shows that for the Moister Factor less than
0.95 10 0.2, no rank is provided to any web object and system
1 2 enters into the toggling state with large number of
165 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, February 2011
iterations for the given domain. Also, the ranks computed by REFERENCES
the proposed algorithm for moister factor values from 0.2 to 1 [1] Sergey Brin , Lawrence Page, “The anatomy of a
are shown. large-scale hypertextual web search engine”,
Proceedings of the 7th International conference on
World Wide Web 7, p.107-117, April 1998, Brisbane,
Computed Object Ranks at various Moister Factor Australia
[2] Paolo Boldi, Massimo Santini, S. Vigna, “PageRank
MF=0.25 MF=0.3 MF=0.35 MF=0.4 MF=0.45 as a Function of the Damping Factor”, International
World Wide Web Conference Proceedings of the 14th
MF=0.5 MF=0.55 MF=0.6 MF=0.65 MF=0.7
International conference on World Wide Web Chiba,
MF=0.75 MF=0.8 MF=0.85 MF=0.9 MF=0.95
Japan pages: 557 - 566 Year of Publication: 2005
MF=1.0 MF=0.2 [3] Hui Zhang, Ashish Goel, Ramesh Govindan, Kahn
0.250000 Mason,and Benjamin Van Roy. “Making
eigenvector-based reputation systems robust to
0.200000 collusion”, In Stefano Leonardi Editor,
Computed Rank
ProceedingsWAW 2004, number 3243 in LNCS,
0.150000 pages 92–104. Springer-Verlag, 2004.
[4] Nie Z., Wu F., Wen J.R., and Ma W.Y., “Extracting
0.100000 Objects from the Web”, 22nd International
Conference on Data Engineering (ICDE’06), pp 1-3,
0.050000 Year: 2006.
[5] Jianfeng Zheng, Zaiqing Nie, “Architecture of an
0.000000 Object-level Vertical Search”, IEEE, in the
O1 O2 O3 O4 O5 O6 O7 O8 O9 O10 Proceeding of International Conference on Web
Web Object
Information Systems and Mining, pp 51-55, Year:
2009.
[6] Zhanzi qui,Matthias Hemmje,Erich J.Neuhold,
Fig. 4. Moister factor (>.2) to different documents
“Using Link types in web page ranking and filtering”;
From the above graphs and analysis, we can say that the
IEEE Computer Society Proceedings of the Second
moister factor plays a main role in this algorithm and
International Conference on Web Information
performance of algorithm can be improved if this factor is
Systems Engineering (WISE'01) Volume 1 ; Page:
selected properly. The value of moister factor can vary from 0
311 Year of Publication: 2001
to 1 but in most of the cases system enter into the toggling state
[7] Christopher D. Manning, Prabhakar Raghavan,
if value selected is less than 0.2 and at the value 1 system enter
Hinrich Schutze, “An Introduction to Information
into ideal state giving insignificant results. Value must be
Retrieval”, Publisher: Cambridge University
closer to 1 but can not be 1. As shown in Fig. 2 systems
Press New York, NY, USA , Pages: 461-
achieve a steady state in less number of iterations if moister
470 Year: 2008
factor value is closer to 1.
[8] http://google.stanford.edu/
[9] Blum, T.-H. H. Chan, and M. R. Rwebangira, “A
CONCLUSION
random-surfer web-graph model”. In ANALCO '06:
The current study was conducted to demonstrate how the
Proceedings of the 8th Workshop on Algorithm
link structure of the web can be used to provide the ranking to
Engineering and Experiments and the 3 rd Workshop
various documents. This ranking can be provided offline. With
on Analytic Algorithmics and Combinatorics, pages
the help of this approach one can prioritize the various
238--246, Philadelphia, PA, USA, 2006. Society for
documents on the web independent of the query. However a
Industrial and Applied Mathematics.
complete score computation is based on various other factors.
[10] Prasad Chebolu, Páll Melsted,” PageRank and the
In the proposed algorithm a damping factor is used that play a
random surfer model”, Symposium on Discrete
very important role on the analysis of the algorithm. After the
Algorithms Proceedings of the 19th annual ACM-
analysis it is concluded that damping factor must not be
SIAM symposium on Discrete algorithms; Pages:
selected closer to zero. At the damping factor one, the system
1010-1018.Year : 2008
enters into the ideal state and the ranking provided is
[11] Gopal Pandurangan, Prabhakar Raghavan, Eli Upfal,
insignificant. As per evaluation the damping factor must be
“Using PageRank to Characterize Web Structure”,
selected greater than or equals to 0.5. However, if we consider
Proceedings of the 8th Annual International
convergence speed as only factor to evaluate the performance
Conference on Computing and Combinatorics, page
than the best moister factor will be .95. The proposed algorithm
No..330-339, August 15-17, 2002.
is query independent algorithm and does not consider query
during ranking.
166 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, February 2011
[12] Google technology overview [22] Nie Z., Zhang Y., Wen J.R., and Ma W.Y. “Object-
{http://www.google.com/intl/en/corporate/tech.html}, level Ranking: Bringing Order to web Objects”, In
2004 Proceeding of World Wide Web (WWW), 2007.
[13] R. Montenegro,P. Tetali, “Mathematical aspects of
mixing times in Markov chains”, Foundations and Trends Dr. Pushpa R. Suri received her Ph.D. Degree from
in Theoretical Computer Science Volume 1 , Issue Kurukshetra University, Kurukshetra. She is working as
3 (May 2006) Pages: 237 - 354 ;Year : 2006 Associate Professor in the Department of Computer
[14] Tugrul Dayar, Nihal Pekergin, Sana Younes; “Conditional Science and Applications at Kurukshetra University,
steady-state bounds for a subset of states in Markov Kurukshetra, Haryana, India. She has many publications
chains”, ACM International Conference Proceeding in International and National Journals and Conferences.
Series; Vol. 201 Proceeding from the 2006 workshop on Her teaching and research activities include Discrete
Tools for solving structured Markov chains Article No.: Mathematical Structure, Data Structure, Information
3 Year: 2006 Computing and Database Systems.
[15] Orland Hoeber, “Web Information Retrieval Support
Systems: The Future of Web Search, Web Intelligence & Harmunish Taneja received his M.Phil. degree in
Intelligent Agent”, Proceedings of the 2008 (Computer Science) from Algappa University, Tamil
IEEE/WIC/ACM International Conference on Web Nadu and Master of Computer Applications from Guru
Intelligence and Intelligent Agent Technology - Volume Jambeshwar University of Science and Technology,
03 Pages: 29-32;Year: 2008 Hissar, Haryana, India. Presently he is working as
[16] Allan Borodin, Gareth O. Roberts, Jeffrey S. Rosenthal, Assistant Professor in Information Technology
Panayiotis Tsaparas, “Link analysis ranking: algorithms, Department of M.M. University, Mullana, Haryana, India.
theory, and experiments”, ACM Transactions on Internet He is pursuing Ph.D. (Computer Science) from
Technology (TOIT) Volume 5 , Issue 1 (Feb. 2005) Kurukshetra University, Kurukshetra. He has published
Pages: 231 - 297 Year: 2005 11 papers in International / National Conferences and
[17] R. Lempel, S. Moran, “Rank-Stability and Rank- Seminars. His teaching and research areas include
Similarity of Link-Based Web Ranking Algorithms in Database systems, Web Information Retrieval, and Object
Authority-Connected Graphs”, Publisher: Kluwer Oriented Information Computing.
Academic Publishers, April 2005 Information Retrieval ,
Volume 8 Issue 2, Pages: 245 - 264 ;Year : 2005
[18] Sehgal, Umesh; Kaur, Kuljeet; Kumar, Pawan, “The
Anatomy of a Large-Scale Hyper Textual Web Search
Engine”, Computer and Electrical Engineering, 2009.
ICCEE '09. Second International Conference on Volume
2, 28-30 Dec. 2009 Page(s):491 - 495 ; Year 2009
[19] Kritikopoulos, A., Sideri, M., Varlamis, “Wordrank: A
Method for Ranking Web Pages Based on Content
Similarity”, Databases, 2007. BNCOD '07, 24th British
National Conference on 3-5 July 2007, Page(s): 92-100,
Year: 2007 .
[20] Zaiqing Nie, Ji-Rong Wen and Wei-Ying Ma, “Object-
level Vertical Search” January 7-10, 2007, Asilomar,
California, USA, 3rd Biennial Conference on Innovative
Data Systems Research (CIDR), Year: 2007.
[21] Zhi-Xiong Zhang, Jian Xu, Jian-Hua Liu, Qi Zhao, Na
Hong, Si-Zhu Wu, Dai-Qing Yang, “Extraction
knowledge objects in scientific web resource for research
profiling”, IEEE, Baoding, 12-15 July 2009, pp 3475-
3480, Eighth International Conference on Machine
Learning and Cybernetics, Year: 2009.
167 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
Get documents about "