Docstoc

Weblogs

Document Sample
Weblogs Powered By Docstoc
					Department of Computer Science Institute for System Architecture, Chair for Computer Networks




                                     Weblogs
Weblogs




• Inspecting weblogs superficially lets them appear as simple
  online diaries which make it possible for users to publish their
  day-to-day experiences, thoughts and opinions                   2
Weblog – example for a social network

• In reality
  weblogs are an
  enormous social
  network with
  very dynamical
  behaviour and
  high potential in
  the information
  age
• Through their
  linkage structure
  ‘information’
  might spread
  over the world
  in seconds

• This constitution leads to a high interest in weblogs by
  companies, mass media and political parties


                                                             3
Weblogs and the blogosphere

• Used as communication media, for documenting software
  projects, for new forms of journalism (“citizen journalism”), ...
• Example: Impact on news coverage during conflicts in middle-
  east




                                                                  4
Outline

• Schematic anatomy of a weblog interface
• Management functionality of a weblog
• Blogosphere as a social community
   – Linkage principles
   – Search engine considerations
• Problem considerations
   – Avoiding comment spam
   – Legal aspects
• Weblog compared to Content Management System (CMS)




                                                       5
Anatomy of a weblog page
Header




Posts




Sidebar

Footer
                           6
       Anatomy of a weblog page – comment page


Post




Standard
comment



Comment by
Trackback




Comment by
Pingback

Comment
feed and
Trackback
URI

Comment                                          7
form
      Anatomy of a weblog page

      Sidebar
      • Appears on every page, usually includes a variety of navigation
         links e.g. for categories, archive, favourite sites / blogs (blogroll)
         and other features as search functionality or a news feed:
      • A news feed is a special format that allows an user to subscribe
         to blog content
           – XML format which represents the structured content of a weblog
           – By using a feed reader software articles from many sites can be
             automatically accessed and checked for updates


                    Via HTTP
                                                Service subscribe
                Feed Reader     File transfer                              Blog1
                                                                    ABC
                Blog1   ABC                     Update request
                Blog2   DEF                                                Blog2
                Blog3   GHI                                         DEF
Displays feed
content                                                                    Blog3
                                                                    GHI

                                                 Articles or comments of an article

                                                                                      8
Weblog management

•   Basic advantage of using weblogs for web publishing is the ease of use
    (no knowledge of web technologies (HTML, ...) is necessary)
•   Common administrative
    interface offers various
    functionality:
     – Editor for creating
        posts
     – Editing posts and
        comments
     – Layout management
     – Plugin management
     – User management
     – Publishing support via
        WYSIWYG (What You
        See Is What you Get)
        editor
     – ...
•   Example for a widespread weblog system:
     - Wordpress
       (open source, easy to install, comfortable management facilities)   9
Wordpress – Administration abilities via AJAX


     Client-side                                       Server-side



                JavaScript call
                                    HTTP Request
  Admin area
                                                                      PHP
                        AJAX                                         engine
                       engine

  delete                          HTTP Response                               Database-
            HTML+CSS data
                                  e.g. text/plain    Apache                   Query
                                                    web server
                                    HTTP Request
  Reader area                                                        MySQL




                                  HTTP Response
                                  e.g. text/html




                                                                              10
          Wordpress – deleting a category item with AJAX
                                 Client-side                                                              Server-side
             categories.html                                                /wordpress/wp-admin/list_manipulation.php
          Push (user activity)                                                                                              MySQL
              Web browser                           Ajax engine                              Web server                    Database

  id=2     delete
                      onclick=return delete_something
  id=1     default                   (what, id, msg)                                                    Server-side processing

msg (Are you sure
you want to delete                confirm (msg)
  this category)
 OK         Chancel
                            confirm (msg)==true             category
  push
                                                  ajaxDelete (what, id)
Confirm function                                                                                             Fetch user data
(JavaScript)                                                      HTTP POST Request
                                                                                                             associated
displays a msg
                        Asynchronous transfer between       (action=delete-what&id= ….&cookie…)              with the cookie
and waits for an
                        user activity and server-side                      (“-1”)
answer
                        processing allows to appear                                 IF(the user can manage categories)
                        user actions (e.g. delete a                        false
                                                                                               true
                        category) without the complete
                        refresh of the categories.html                     (“0”)
                                                                                      IF (it is a default category)
                        site, only the changes have to                     true
                        be transferred to the client
                                                                                               false   QUERY(“Delete From &wpdb
                                                                          HTTP Response                     Where cat_ID=‘id’”)
 id=1      default
                            removeThisItem (id) {…
                                                                          (text/plain=“1”)
                      theItem=document.getElementbyID (id);
                   theItem.parentNode.removeChild (theItem);…}                         This category is deleted


            possible transfers                 • Document Object Model (DOM)
                                               • Remove a child element from the parentNode of the category.html                  11
            successful transfer
                                               • E.g. child element <tr> from parentNode <body>
   Blogosphere
                                 • Main important
                                   technological principles
                                   of weblogs are the
                                   comment functionality,
                                   the linkage mechanisms
                                   and the possibilities to
                                   search for information
                                   available in the
                                   blogosphere
                                 • More important than
                                   regular links (e.g. as
                                   part of the Blogroll) are
                                   references from weblog
                                   entries to entries of
                                   further blogs
                                 • Commenting information
                                   or opinions available in
                                   the web (not only in
                                   weblogs) is one building
Excerpt from the “blogosphere”     block of this social
                                   community
                                                       12
Linkage mechanisms

Applies to most widespread weblog systems:
• If a user refers within an entry to some entry in another weblog
  by giving the particular URI, the referenced weblog system will
  be informed about it automatically
• The referencing weblog entry is mentioned as special comment
  of the referenced entry
• Two often used mechanisms for informing are:
   1. Pingback      uses an RPC (Remote Procedure Call) server for
       informing the referred weblog system
   2. Trackback       uses a special URI for informing the referred
       weblog system




                                                              13
Pingback

• Pingback uses an XML-RPC call for notifying the target site of a
  link
• Principle:
    1. Each post made in a weblog is analysed for contained URIs
    2. If an URI is found the associated resource will be examined
       for an advertised pingback server; this examination is
       realised by an autodiscovery mechanism
    3. If an associated pingback server is found an XML-RPC
       request will be sent to it
        • An XML-RPC request passes two parameters to the
           server:
             – Source URI - The absolute address of the entry on
               the site containing the link
             – Target URI – The target (abolut URI) of the link on
               the source site
        • In the case of an error a fault code is returned

                                                              14
Pingback autodiscovery

• For autodiscovering the XML-RPC server associated with a
  referenced resource, the particular resource is requested
• The pingback mechanism defines two possibilities for specifying
  the address of the pingback server:
  1. HTTP header of the response for the header contains the
     extension “X-Pingback”
      HTTP/1.1 200 OK
      ….
      Content-Length: 988
      Content-Type: text/html                           Absolute URI of the
      X-Pingback: http://www.bloga.com/xmlrpc.php       Pingback server
      ...

  2. The referenced resource contains a special (X)HTML
  <link> element
      <link rel="pingback" href="http://www.bloga.com/xmlrpc.php"/>




                                                                      15
         Complete pingback mechanism

         Pingback auto discovery workflow
                                                                                        Remote
                              Local web server                                        web server
Writing an comment
                                                                                                       Blog
 article with some                                                           runs the blog           comment
   Permalinks to                                                                    that is            entry
   external blog                                                               referred to



      Save the blog

                                                           Request for the
Parse for                      Request to the
                                                                                              Incoming request
next                         remote web server             linked resource
links
       Parse content
          for a link                                          Response                            Response to
                         Link found
                                                                                                    request
              No links
              found            Response from
    Publish the entry        remote web server                    Send XML-RPC Request
                                                                                                    Incoming
                                                                                                  Pingback Ping

                       X-Pingback     Extract the XML-   Send Pingback
                       found          RPC server URI     Ping to XML-RPC                            Pingback
         Parse the                                          server URI                              Response
       HTTP Header                                                             XML-RPC Response
      for X-Pingback
           entry                         Parse the
                                          (X)HTML                        Incoming                      …..
                No X-Pingback            doc for link                     Pingback
                found                     element                        Response
                                                                                                               16
  No link element found
                                                                              Comment is registered
 Trackback mechanism

 •   Quite similar to the Pingback mechanism
 •   Instead of using an RPC interface, it is based on the existence of a
     special URI for each entry, the “Trackback Ping URI”
 •   If an entry refers to the entry inside another weblog, this special URI of
     the referenced entry will be used to transmit information about the
     reference via a HTTP POST
 •   Autodiscovery should be done with the help of embedded RDF data (see
     Trackback specification)


                                   Contains information about blog B and
                                   about the entry created in blog B
Comment article                                                       Original article

Blog B                                                                Blog A
                                   HTTP Request (“POST”) to
Title: Study                                                          Title: Study
                             Trackback Ping URI of referenced entry   Content: Computer
Content: Media
Computer science is                                                   science is the best study
                               HTTP Response (content transmittet     course.
more multifarious.
                                                                      Comment: link + excerpt
                                     in a simple XML format)          of a post of blog B

                                                               Trackback Ping URI:
                      Tells blog B if the Trackback            www.blogA.com/study/trackback
                      Ping has been successful                                              17
Weblog information retrieval

• There exist quite a few search engines that have been
  specialised on the blogosphere (e.g. Technorati)
• The ranking of a weblog is mainly determined by regular
  search engine algorithms (similar to the PageRank algorithm):

      When many links lead to a weblog it is assumed to have a
      high relevance (“Democratical principal”)

• Due to the high linkage density, weblogs appear on high ranks
  in common search engines
• Possibilities to introduce thematical categories and to annotate
  the content of a post by meaningful tags eases search in
  weblogs

      Categories and tags may be considered as classification
      hints by search engines

                                                                18
Permalinks

• In order to avoid link rot (“dead links”) weblogs use permalinks
• Permalinks remain unchanged indefinitely and still point to a
  weblog entry even if it has been passed to the archive
• Many permalink schemes are meaningful regarding the content
  of an entry they address

                                 Date of entry

   Example scheme: http://domain/yyyy/mm/dd/post-name/

                                          E.g. headline of entry

   http://blog.com/2010/05/06/comment-on-timbuktu-president-election



• Because search engines often evaluate the components of URIs
  such meaningful URI schemes are search engine friendly



                                                                       19
Problems considerations

• High participation by arbitrary instances and high linkage
  structure lead to some fundamental problems, of which three
  are discussed here:

   – Viral effects
      • Does easy spreading of information has only positive
         character?
   – Comment spam
      • How can abuse of the comment functionality be
         avoided?
   – Legal aspects
      • How can weblog content be protected against illegal
         copying and distribution?




                                                               20
 Comment spam

 •     The linkage intensity in the blogosphere is abused by web crawlers
       sending comment spam (spam bots) to arbitrary weblogs
 •     Simple mechanism for avoiding comment spam: disable comment
       functionality
          BUT: This would contradict one fundamental principle of the
       blogosphere


                                        World Wide Web
                                                                          Post comment
                                                                          to weblog


                       URI
     Scheduler                      Multithreaded downloader             Comment poster

                                        pages        pages          Comment form
                                          1            2            POST destination
            found relevant URIs     URI analyser               Page analyser
 Queue
                                   (extracts URIs) (extracts comment form information)

Schematic spam bot architecture for misusing standard comment forms

1 Based on an heuristic algorithm further URIs are extracted from regular pages
2 If the downloaded page is a comment page (logic for detection needed) it is             21
     passed to the page analyser
  Anti spam methods
  • Verifying that a human being
    posted a comment by
    demanding additional input:
     – Human readable picture
                                                                               Has to be
       showing some digits that                                                answered
       have to be added to a form
     – The result of a simple
       calculation passed to a form
  • Advantage:
         – Simple implementation
  • Disadvantages:
         – Only standard comments are protected against automatic posting
         – No protection against Trackback or Pingback comment spam

            Web browser                                                        Web server
                                           HTTP Post Comment-Request
               Leave Comment              (…&answer=&comment=cool&…)
answer
            Please add 2 and 2        HTTP Data-Comment-Response (text/html)
          cool
                                               ("Answer the question")
          Submit comment         submit
                                                                                            22
Anti spam methods

• Distributed approach
   – Spam information is captured by a central server
   – This server manages spam rules / blacklists that are updated
     by information extracted from all participating blogs
   – Before a comment is published, the comment is sent to the
     remote server to check the URI, the author and the comment
     against the stored spam information
   – E.g. Akismet – a spam filter service
       • Available for many weblog implementations (often as
         additional plugin)
       • Integral part of the Wordpress plattform
       • Free for personal use
       • A wordpress.com AIP key is required to use the service




                                                             23
       Anti spam methods

       Akismet
  Web browser                         Web server                             Akismet server
                          HTTP POST            HTTP POST Comment-Check-Request
       Leave Comment   Comment-Request

cool                        HTTP                   HTTP Comment-Check-Response

                       Comment-Response                    (text/plain)
Submit comment




   • The HTTP POST Comment-Check-Request contains as parameter:
                •   comment_type (comment, trackback or pingback)
                •   user_ID (Name of the comment author)
                •   user_ip (IP of the comment author)
                •   user_agent (Information about the browser and the operating
                    system of the comment author)



                                                                                      24
Anti spam methods

• HTTP Comment-Check-Response
   – IF (The information inside the HTTP POST Comment-
     Check-Request lead to a classification as spam)
     THEN return “true”
     ELSE return “false”
• Akismet supports comment by trackback and pingback
• Every user can report spam to the Akismet server (by
  marking a comment as spam)
   – The Akismet engine captures this information, evaluates it
     and possibly adds spam information to its blacklist
   – Spam information contains the comment and post ID,
     content of the comment, author information,
     announcement time and the request method




                                                              25
Legal aspects

• Publishing information in a weblog does not mean that it can
  be used and distributed by everybody

      How can a convergence of national rights be achieved?
      How can a user know that some content is protected by
      copyright?

• The project Creative Commons enables copyright holders
  to grant some or all of their rights to the public while
  retaining others
• It considers four legal concerns when someone wants to
  redistribute content generated by someone else:

          Original author has to be mentioned (“by”)

          Commercial use is not allowed (“nc”)

          Content can only be distributed without any changes (“nd”)
          Changing the content is permitted if the result is
          Published under the same license as the original (“sa”)      26
Legal aspects

•   Beside specialised lincenses Creative Commons defines six general
    purpose licences
•   Apart from referencing a license, annotation of content by metadata is
    possible (e.g. RDF based to support machine processability)



License                                                  Icons

Attribution Non-commercial No Derivatives (by-nc-nd)

Attribution Non-commercial Share Alike (by-nc-sa)

Attribution Non-commercial (by-nc)

Attribution No Derivatives (by-nd)

Attribution Share Alike (by-sa)

Attribution (by)


                                                                        27
  Legal aspects

  •   License hint can be autogenerated on the website of the Creative
      Common project
  •   XHTML representation of a “Attribution Non-commercial Share Alike”
      license hint for publishing text:

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/">
<img alt="Creative Commons License" style="border-width:0"
        src="http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png" />
</a><br />This
<span xmlns:dc="http://purl.org/dc/elements/1.1/"
         href="http://purl.org/dc/dcmitype/Text" rel="dc:type">Work</span>
is licensed under a
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/">
    Creative Commons Attribution-Noncommercial-Share Alike 2.0 Germany License</a>.
                                            Embedded e.g. into the footer or
                                            attached to each post




                                                                               28
 Weblog systems vs. Content Management Systems (CMS)
Features of a basic CMS       Realisation in weblog systems
Admission control/ Access     • Users are assigned to user roles
control                       • But no separate allocation of read or write rights to
                                users for news entries or web pages
Log function                  • No extra log function
                              • To every news entry a time stamp and user
                                information is assigned
Multi user ability            • Various weblog articles can be edited independently
                                by many users, conflicts of concurrent editing of a
                                news entry are not notified
Check-in / check-out          • All changes of an article are changed at once
                              • No support of check-in and check-out
Meta information and          • Weblog systems only maintain minimal metadata
maintenance                   • E.g. posted time of a comment or an article,
functionality                   picture
Search function/ Retrieval    • Full text search or picture search in news entries
Mass operation                • In general: No need for mass operations
(modification of any mass     • If needed they have to be implemented explicitly
of data)
Backup / rollback             • In general: Not necessary
(reversal of modifications)   • No automatic backup
                              • No rollback function because of no log function
                                                                                     29
 Weblog systems vs. Content Management Systems (CMS)

Features of a basic CMS    Realisation in weblog systems
WYSIWYG                    • WYSIWYG-Editor is mostly implemented

Page integrity/ Link       • There are no tools to check the consistency of the
management                   link
Time-independent           • The author can decide whether he wants to publish
publishing                   his article at a different time
Version management         • In general: Not need for version management
                           • No version management
Workflow                   • Weblogs do not know the Content Lifecycle
Separation of structure,   • The separation is most often implemented
layout and raw data



 • Weblogs have few basic CMS features
 • In general a weblog is no multiuser web application     No need
   for version management, workflow management, rollback, etc.
 • Both have different scopes of application with a small area of
   intersection
                                                                             30
References

Pingback
specification    http://www.hixie.ch/specs/pingback/pingback-1.0

Trackback
specification    http://www.sixapart.com/pronet/docs/trackback_spec

Wordpress
documentation    http://codex.wordpress.org/Main_Page

Technorati search
engine            http://technorati.com/

„Did you pass math“
Plugin           http://www.herod.net/dypm/

Creative
Commons          http://creativecommons.org/

                                                                   31

				
DOCUMENT INFO
Shared By:
Tags: TrackBack
Stats:
views:88
posted:6/27/2011
language:English
pages:31
Description: TrackBack is a web log application tool that allows bloggers to know what people see their written after the article with the short. In Movable Type and WordPress software, includes this feature. This feature each other through the blog "ping" mechanism to achieve the interaction between the site notice; Therefore, it can also provide reminders.