Document Sample
Weblogs Powered By Docstoc
					Department of Computer Science Institute for System Architecture, Chair for Computer Networks


• Inspecting weblogs superficially lets them appear as simple
  online diaries which make it possible for users to publish their
  day-to-day experiences, thoughts and opinions                   2
Weblog – example for a social network

• In reality
  weblogs are an
  enormous social
  network with
  very dynamical
  behaviour and
  high potential in
  the information
• Through their
  linkage structure
  might spread
  over the world
  in seconds

• This constitution leads to a high interest in weblogs by
  companies, mass media and political parties

Weblogs and the blogosphere

• Used as communication media, for documenting software
  projects, for new forms of journalism (“citizen journalism”), ...
• Example: Impact on news coverage during conflicts in middle-


• Schematic anatomy of a weblog interface
• Management functionality of a weblog
• Blogosphere as a social community
   – Linkage principles
   – Search engine considerations
• Problem considerations
   – Avoiding comment spam
   – Legal aspects
• Weblog compared to Content Management System (CMS)

Anatomy of a weblog page



       Anatomy of a weblog page – comment page



Comment by

Comment by

feed and

Comment                                          7
      Anatomy of a weblog page

      • Appears on every page, usually includes a variety of navigation
         links e.g. for categories, archive, favourite sites / blogs (blogroll)
         and other features as search functionality or a news feed:
      • A news feed is a special format that allows an user to subscribe
         to blog content
           – XML format which represents the structured content of a weblog
           – By using a feed reader software articles from many sites can be
             automatically accessed and checked for updates

                    Via HTTP
                                                Service subscribe
                Feed Reader     File transfer                              Blog1
                Blog1   ABC                     Update request
                Blog2   DEF                                                Blog2
                Blog3   GHI                                         DEF
Displays feed
content                                                                    Blog3

                                                 Articles or comments of an article

Weblog management

•   Basic advantage of using weblogs for web publishing is the ease of use
    (no knowledge of web technologies (HTML, ...) is necessary)
•   Common administrative
    interface offers various
     – Editor for creating
     – Editing posts and
     – Layout management
     – Plugin management
     – User management
     – Publishing support via
        WYSIWYG (What You
        See Is What you Get)
     – ...
•   Example for a widespread weblog system:
     - Wordpress
       (open source, easy to install, comfortable management facilities)   9
Wordpress – Administration abilities via AJAX

     Client-side                                       Server-side

                JavaScript call
                                    HTTP Request
  Admin area
                        AJAX                                         engine

  delete                          HTTP Response                               Database-
            HTML+CSS data
                                  e.g. text/plain    Apache                   Query
                                                    web server
                                    HTTP Request
  Reader area                                                        MySQL

                                  HTTP Response
                                  e.g. text/html

          Wordpress – deleting a category item with AJAX
                                 Client-side                                                              Server-side
             categories.html                                                /wordpress/wp-admin/list_manipulation.php
          Push (user activity)                                                                                              MySQL
              Web browser                           Ajax engine                              Web server                    Database

  id=2     delete
                      onclick=return delete_something
  id=1     default                   (what, id, msg)                                                    Server-side processing

msg (Are you sure
you want to delete                confirm (msg)
  this category)
 OK         Chancel
                            confirm (msg)==true             category
                                                  ajaxDelete (what, id)
Confirm function                                                                                             Fetch user data
(JavaScript)                                                      HTTP POST Request
displays a msg
                        Asynchronous transfer between       (action=delete-what&id= ….&cookie…)              with the cookie
and waits for an
                        user activity and server-side                      (“-1”)
                        processing allows to appear                                 IF(the user can manage categories)
                        user actions (e.g. delete a                        false
                        category) without the complete
                        refresh of the categories.html                     (“0”)
                                                                                      IF (it is a default category)
                        site, only the changes have to                     true
                        be transferred to the client
                                                                                               false   QUERY(“Delete From &wpdb
                                                                          HTTP Response                     Where cat_ID=‘id’”)
 id=1      default
                            removeThisItem (id) {…
                      theItem=document.getElementbyID (id);
                   theItem.parentNode.removeChild (theItem);…}                         This category is deleted

            possible transfers                 • Document Object Model (DOM)
                                               • Remove a child element from the parentNode of the category.html                  11
            successful transfer
                                               • E.g. child element <tr> from parentNode <body>
                                 • Main important
                                   technological principles
                                   of weblogs are the
                                   comment functionality,
                                   the linkage mechanisms
                                   and the possibilities to
                                   search for information
                                   available in the
                                 • More important than
                                   regular links (e.g. as
                                   part of the Blogroll) are
                                   references from weblog
                                   entries to entries of
                                   further blogs
                                 • Commenting information
                                   or opinions available in
                                   the web (not only in
                                   weblogs) is one building
Excerpt from the “blogosphere”     block of this social
Linkage mechanisms

Applies to most widespread weblog systems:
• If a user refers within an entry to some entry in another weblog
  by giving the particular URI, the referenced weblog system will
  be informed about it automatically
• The referencing weblog entry is mentioned as special comment
  of the referenced entry
• Two often used mechanisms for informing are:
   1. Pingback      uses an RPC (Remote Procedure Call) server for
       informing the referred weblog system
   2. Trackback       uses a special URI for informing the referred
       weblog system


• Pingback uses an XML-RPC call for notifying the target site of a
• Principle:
    1. Each post made in a weblog is analysed for contained URIs
    2. If an URI is found the associated resource will be examined
       for an advertised pingback server; this examination is
       realised by an autodiscovery mechanism
    3. If an associated pingback server is found an XML-RPC
       request will be sent to it
        • An XML-RPC request passes two parameters to the
             – Source URI - The absolute address of the entry on
               the site containing the link
             – Target URI – The target (abolut URI) of the link on
               the source site
        • In the case of an error a fault code is returned

Pingback autodiscovery

• For autodiscovering the XML-RPC server associated with a
  referenced resource, the particular resource is requested
• The pingback mechanism defines two possibilities for specifying
  the address of the pingback server:
  1. HTTP header of the response for the header contains the
     extension “X-Pingback”
      HTTP/1.1 200 OK
      Content-Length: 988
      Content-Type: text/html                           Absolute URI of the
      X-Pingback:       Pingback server

  2. The referenced resource contains a special (X)HTML
  <link> element
      <link rel="pingback" href=""/>

         Complete pingback mechanism

         Pingback auto discovery workflow
                              Local web server                                        web server
Writing an comment
 article with some                                                           runs the blog           comment
   Permalinks to                                                                    that is            entry
   external blog                                                               referred to

      Save the blog

                                                           Request for the
Parse for                      Request to the
                                                                                              Incoming request
next                         remote web server             linked resource
       Parse content
          for a link                                          Response                            Response to
                         Link found
              No links
              found            Response from
    Publish the entry        remote web server                    Send XML-RPC Request
                                                                                                  Pingback Ping

                       X-Pingback     Extract the XML-   Send Pingback
                       found          RPC server URI     Ping to XML-RPC                            Pingback
         Parse the                                          server URI                              Response
       HTTP Header                                                             XML-RPC Response
      for X-Pingback
           entry                         Parse the
                                          (X)HTML                        Incoming                      …..
                No X-Pingback            doc for link                     Pingback
                found                     element                        Response
  No link element found
                                                                              Comment is registered
 Trackback mechanism

 •   Quite similar to the Pingback mechanism
 •   Instead of using an RPC interface, it is based on the existence of a
     special URI for each entry, the “Trackback Ping URI”
 •   If an entry refers to the entry inside another weblog, this special URI of
     the referenced entry will be used to transmit information about the
     reference via a HTTP POST
 •   Autodiscovery should be done with the help of embedded RDF data (see
     Trackback specification)

                                   Contains information about blog B and
                                   about the entry created in blog B
Comment article                                                       Original article

Blog B                                                                Blog A
                                   HTTP Request (“POST”) to
Title: Study                                                          Title: Study
                             Trackback Ping URI of referenced entry   Content: Computer
Content: Media
Computer science is                                                   science is the best study
                               HTTP Response (content transmittet     course.
more multifarious.
                                                                      Comment: link + excerpt
                                     in a simple XML format)          of a post of blog B

                                                               Trackback Ping URI:
                      Tells blog B if the Trackback  
                      Ping has been successful                                              17
Weblog information retrieval

• There exist quite a few search engines that have been
  specialised on the blogosphere (e.g. Technorati)
• The ranking of a weblog is mainly determined by regular
  search engine algorithms (similar to the PageRank algorithm):

      When many links lead to a weblog it is assumed to have a
      high relevance (“Democratical principal”)

• Due to the high linkage density, weblogs appear on high ranks
  in common search engines
• Possibilities to introduce thematical categories and to annotate
  the content of a post by meaningful tags eases search in

      Categories and tags may be considered as classification
      hints by search engines


• In order to avoid link rot (“dead links”) weblogs use permalinks
• Permalinks remain unchanged indefinitely and still point to a
  weblog entry even if it has been passed to the archive
• Many permalink schemes are meaningful regarding the content
  of an entry they address

                                 Date of entry

   Example scheme: http://domain/yyyy/mm/dd/post-name/

                                          E.g. headline of entry

• Because search engines often evaluate the components of URIs
  such meaningful URI schemes are search engine friendly

Problems considerations

• High participation by arbitrary instances and high linkage
  structure lead to some fundamental problems, of which three
  are discussed here:

   – Viral effects
      • Does easy spreading of information has only positive
   – Comment spam
      • How can abuse of the comment functionality be
   – Legal aspects
      • How can weblog content be protected against illegal
         copying and distribution?

 Comment spam

 •     The linkage intensity in the blogosphere is abused by web crawlers
       sending comment spam (spam bots) to arbitrary weblogs
 •     Simple mechanism for avoiding comment spam: disable comment
          BUT: This would contradict one fundamental principle of the

                                        World Wide Web
                                                                          Post comment
                                                                          to weblog

     Scheduler                      Multithreaded downloader             Comment poster

                                        pages        pages          Comment form
                                          1            2            POST destination
            found relevant URIs     URI analyser               Page analyser
                                   (extracts URIs) (extracts comment form information)

Schematic spam bot architecture for misusing standard comment forms

1 Based on an heuristic algorithm further URIs are extracted from regular pages
2 If the downloaded page is a comment page (logic for detection needed) it is             21
     passed to the page analyser
  Anti spam methods
  • Verifying that a human being
    posted a comment by
    demanding additional input:
     – Human readable picture
                                                                               Has to be
       showing some digits that                                                answered
       have to be added to a form
     – The result of a simple
       calculation passed to a form
  • Advantage:
         – Simple implementation
  • Disadvantages:
         – Only standard comments are protected against automatic posting
         – No protection against Trackback or Pingback comment spam

            Web browser                                                        Web server
                                           HTTP Post Comment-Request
               Leave Comment              (…&answer=&comment=cool&…)
            Please add 2 and 2        HTTP Data-Comment-Response (text/html)
                                               ("Answer the question")
          Submit comment         submit
Anti spam methods

• Distributed approach
   – Spam information is captured by a central server
   – This server manages spam rules / blacklists that are updated
     by information extracted from all participating blogs
   – Before a comment is published, the comment is sent to the
     remote server to check the URI, the author and the comment
     against the stored spam information
   – E.g. Akismet – a spam filter service
       • Available for many weblog implementations (often as
         additional plugin)
       • Integral part of the Wordpress plattform
       • Free for personal use
       • A AIP key is required to use the service

       Anti spam methods

  Web browser                         Web server                             Akismet server
                          HTTP POST            HTTP POST Comment-Check-Request
       Leave Comment   Comment-Request

cool                        HTTP                   HTTP Comment-Check-Response

                       Comment-Response                    (text/plain)
Submit comment

   • The HTTP POST Comment-Check-Request contains as parameter:
                •   comment_type (comment, trackback or pingback)
                •   user_ID (Name of the comment author)
                •   user_ip (IP of the comment author)
                •   user_agent (Information about the browser and the operating
                    system of the comment author)

Anti spam methods

• HTTP Comment-Check-Response
   – IF (The information inside the HTTP POST Comment-
     Check-Request lead to a classification as spam)
     THEN return “true”
     ELSE return “false”
• Akismet supports comment by trackback and pingback
• Every user can report spam to the Akismet server (by
  marking a comment as spam)
   – The Akismet engine captures this information, evaluates it
     and possibly adds spam information to its blacklist
   – Spam information contains the comment and post ID,
     content of the comment, author information,
     announcement time and the request method

Legal aspects

• Publishing information in a weblog does not mean that it can
  be used and distributed by everybody

      How can a convergence of national rights be achieved?
      How can a user know that some content is protected by

• The project Creative Commons enables copyright holders
  to grant some or all of their rights to the public while
  retaining others
• It considers four legal concerns when someone wants to
  redistribute content generated by someone else:

          Original author has to be mentioned (“by”)

          Commercial use is not allowed (“nc”)

          Content can only be distributed without any changes (“nd”)
          Changing the content is permitted if the result is
          Published under the same license as the original (“sa”)      26
Legal aspects

•   Beside specialised lincenses Creative Commons defines six general
    purpose licences
•   Apart from referencing a license, annotation of content by metadata is
    possible (e.g. RDF based to support machine processability)

License                                                  Icons

Attribution Non-commercial No Derivatives (by-nc-nd)

Attribution Non-commercial Share Alike (by-nc-sa)

Attribution Non-commercial (by-nc)

Attribution No Derivatives (by-nd)

Attribution Share Alike (by-sa)

Attribution (by)

  Legal aspects

  •   License hint can be autogenerated on the website of the Creative
      Common project
  •   XHTML representation of a “Attribution Non-commercial Share Alike”
      license hint for publishing text:

<a rel="license" href="">
<img alt="Creative Commons License" style="border-width:0"
        src="" />
</a><br />This
<span xmlns:dc=""
         href="" rel="dc:type">Work</span>
is licensed under a
<a rel="license" href="">
    Creative Commons Attribution-Noncommercial-Share Alike 2.0 Germany License</a>.
                                            Embedded e.g. into the footer or
                                            attached to each post

 Weblog systems vs. Content Management Systems (CMS)
Features of a basic CMS       Realisation in weblog systems
Admission control/ Access     • Users are assigned to user roles
control                       • But no separate allocation of read or write rights to
                                users for news entries or web pages
Log function                  • No extra log function
                              • To every news entry a time stamp and user
                                information is assigned
Multi user ability            • Various weblog articles can be edited independently
                                by many users, conflicts of concurrent editing of a
                                news entry are not notified
Check-in / check-out          • All changes of an article are changed at once
                              • No support of check-in and check-out
Meta information and          • Weblog systems only maintain minimal metadata
maintenance                   • E.g. posted time of a comment or an article,
functionality                   picture
Search function/ Retrieval    • Full text search or picture search in news entries
Mass operation                • In general: No need for mass operations
(modification of any mass     • If needed they have to be implemented explicitly
of data)
Backup / rollback             • In general: Not necessary
(reversal of modifications)   • No automatic backup
                              • No rollback function because of no log function
 Weblog systems vs. Content Management Systems (CMS)

Features of a basic CMS    Realisation in weblog systems
WYSIWYG                    • WYSIWYG-Editor is mostly implemented

Page integrity/ Link       • There are no tools to check the consistency of the
management                   link
Time-independent           • The author can decide whether he wants to publish
publishing                   his article at a different time
Version management         • In general: Not need for version management
                           • No version management
Workflow                   • Weblogs do not know the Content Lifecycle
Separation of structure,   • The separation is most often implemented
layout and raw data

 • Weblogs have few basic CMS features
 • In general a weblog is no multiuser web application     No need
   for version management, workflow management, rollback, etc.
 • Both have different scopes of application with a small area of




Technorati search

„Did you pass math“



Shared By:
Tags: TrackBack
Description: TrackBack is a web log application tool that allows bloggers to know what people see their written after the article with the short. In Movable Type and WordPress software, includes this feature. This feature each other through the blog "ping" mechanism to achieve the interaction between the site notice; Therefore, it can also provide reminders.