Video Hosting Architecture by ps94506



                         Video Hosting Architecture
    Phillip Sutton, TBD, Technical Advisor, Dale Callahan, Ph.D., P.E., Managing Advisor

Abstract – This paper represents a high-level overview of a possible architecture
that might be used to build a video hosting and sharing service with little to no
upfront costs. While recent advances in commodity hardware have helped to drive
down the costs of hosting video files across the board, it can still be very
expensive for a new startup to establish their IT infrastructure. By implementing
the architecture presented here a new venture can build a very inexpensive
infrastructure for storing and delivering video content.

                                  I. PROBLEM DESCRIPTION

    According to Price Waterhouse Cooper, the worldwide filmed entertainment market will
reach $118 billion in 2009 [1]. In the U.S. alone consumers spent $36.4 billion in 2007 on
movie entertainment, with 68% going towards DVD rentals [2]. By 2013, Insight Research
predicts streaming content on the web will produce $70 billion in the U.S. alone [3].
Demand for online video content in one form or another is huge. Current technological
trends combined with deep fragmentation in this market are creating numerous niche
opportunities for entrepreneurs to deliver content to starving viewers. However, for the
small entrepreneur, upfront costs to build an infrastructure for the storage and delivery of
high-quality video content can be very costly. How then does a small company build an
infrastructure, able to store and deliver massive amounts of high-quality video, with as little
upfront costs as possible?

    A potential option would be to craft a solution based on current trends in technology
such as Web 2.0, SaaS, utility computing, and so forth. Web 2.0 is the use of the Internet
as a platform that aims to facilitate the sharing of information, collaboration, and creativity
[5]. Software as a Service (SaaS) is an application model where customers do not pay to
own software but rather pay for usage instead [6]. Utility computing, also known as on-
demand computing, packages computation and storage as a metered utility [7].

    What makes my idea unique is using a combination of existing technologies, Web 2.0
technologies, SaaS, and utility computing services to build an alternative solution to
traditional video hosting services and content delivery networks. A preliminary architecture
could be built using Amazon’s simple storage services, simple queue services, and elastic
computing cloud. I believe that by combining these services a simple infrastructure for
scalable media hosting can be created that is easily scaleable, provides free idle capacity, is
able to handle the spikes in bandwidth that often occur when today’s websites suddenly find
themselves at the center of the blogosphere.

   In summary, I plan to research some of the various methods used to build a scalable
media hosting infrastructure. My goal will be to determine a low cost solution for the small
business that doesn’t have deep pockets to compete with larger more established

                              II. SUPPORTING DOCUMENTATION

    Video sharing and distribution is big money. For example, YouTube claims over 100
million daily viewings and account for 60% of all videos watched online [10]. Quantacst
estimates YouTube attracts 60 million unique visitors per month [11]. There are plenty of
YouTube clones, with the vast majority of them serving up short segments of low to
medium quality videos with limited file size. In general, video sharing sites are extremely
popular and a very competitive marketplace to enter. However, there seems to be plenty
opportunities for making niche video websites.

    One such niche opportunity would be to deliver full-length DVDs to customers over the
Internet. While online video viewing is in the process of exploding many still prefer to see
high-quality DVD content on their home theatre systems from the comfort of their living
rooms with friends and family. An experience not yet easily replicated through the Internet
and personal computers.

    A small business that needs to deliver DVD quality videos for download will most likely
start by building a library of content for users to choose from. The library would consist of
DVD ISO images, video clips for each DVD, and video trailers. Storage and bandwidth
requirements of such a library could easily grow by terabytes each year.

     For any small business there’s always a price point to take into consideration.
Reliability, scalability, and resources must be factored into that price point. Reliability refers
to the guaranteed availability of your resources including uptime and connectivity.
Scalability covers increases in storage, bandwidth, and computing power. Resources refer to
those human factors such as system administrators or network technicians required to
maintain the system.

    What are the options today for the storage and delivery of massive amounts of video?
Does a better option exist? The majority of video sites build their infrastructure in-house,
through a web host-provider offering either hosted or dedicated server space, or on content
distribution networks (CDN). Other possibilities include using YouTube or its clones as an
infrastructure or using Amazon S3 services.

   A. YouTube

    YouTube is really great at serving short video clips to a massive number of viewers.
Serving up to 100 million videos on a daily basis is no small feat. Following on the success
of YouTube several dozen serious video sharing sites have sprung up and have been
growing in quality. There exists at least another 50 direct clones of YouTube offering various
levels of video sharing capabilities.

    Most of the YouTube category type sites offer free hosting of upload videos. Why then
would one not just build a content library based on these free services? Table I, illustrated
on the following page, lists some of the specifications for several of the most popular video
sharing websites being used today. Since being acquired by Google, YouTube has the
advantage of leveraging Google’s highly reliable and scalable infrastructure [12]. Virtually
unlimited storage and bandwidth exists however, the major limitations to hosting high-
quality content on popular sites seem to be imposed limits on file size, playing time,
resolution, and a widely varying quality per provider.

   Furthermore, no real content management system exists for these systems and there is
no way for an individual to monetize their content on these systems other than what little
ad-based revenue sharing program a particular site may employ. Perhaps in the future as

Google/YouTube opens up their APIs and and Google increases limits on file size uploads a
viable architecture for storing and delivering high-quality content can be devised.

                                                Table I
                                       Video Website Comparisons

Website                                   YouTube             Yahoo Video               Veoh        Vimeo

Unique Visitors per year               205,593,000            48,026,000              11,476,000    569,000

Max Video Bit Rate (kbps)                  ~2001                   3003                 1,500        1,600

Max Upload File Size (mb)                   1002                    150                  250        500/wk

Max Length (min)                             10                     N/A                  N/A         N/A

Max Screen Size(s)                       320x240                320x240                640x480     1280x7204

Host Format (streaming)                     FLV                     FLV                  FLV          FLV

Processing Time                        Up to several         Up to several            Few hours    Minutes5
                                           hours                 hours

1 estimated   2 increasing to 1 GB   3 upcoming 700 kbps   4 claims this capability

    B. In-House Hosting

    Hosting high-quality videos on your own servers can be a very expensive proposition?
Initial acquisition of equipment, ongoing maintenance, support, and expansion can lead to
significant expenditures. For example, let’s say you wanted to build a library of independent
films containing at least 5000 videos at a minimum DVD quality file size of 4.7 GB, that’s
approximately 23 TB of storage. Now that’s going to be 23 TB on a quality redundant raid
array with possibly multiple copies of each. Not to mention multiple versions and multiple
formats for different devices. Infrastructure needs such as bandwidth requirements,
incoming/outgoing connections and ongoing costs will also factor into costs.

    C. Managed Hosted

    A second popular option is to use a hosting solution provider such as HostGator.
Selected a dedicated hosting option with a quad core dedicated server, 4 GB of memory,
500 GB of storage, and 2,500 GB of monthly bandwidth will cost at least $374 per month,
including support. On average, each additional 500 MB will cost $5 per month and each
additional 5 GB of bandwidth will cost an additional $5 per month. So for 23 TB of additional
storage the cost will be roughly $241,000 per month. Additional bandwidth costs will be
around $141,312 assuming only 60% of your catalog is requested each month and roughly
10 copies per month are downloaded. That’s a whopping $4,587,774 per year.

   Managed hosting can’t scale with you and you can’t control hardware or make favorable
networking agreements with providers.

    D. Content Distribution Networks

    The third traditional hosting solution is the use of content distribution networks. A CDN
is a system of computers networked together across the internet that cooperate trans-

parently to deliver content, especially large media content) to end users [8]. CDNs have
many advantages over self-hosting and hosting such as direct backbone access, multiple
data centers, thousands of nodes with tens of thousands of servers per node. Some of the
optimizations come in the form reduced bandwidth costs and improved end-user

    The average price to deliver over a CDN varies by a number of factors. The going rate
for 100 TBs per month is between $0.19 and $0.29 per gigabyte [9]. That roughly equates
to between $19, 456 - $29,696 per month in bandwidth costs. And those costs come in
chunks of tiered rates. Storage rates in the terabyte range could average $1.00 per GB. A
big drawback to CDNs are monthly commitments and paying for bandwidth and storage you
may never actually use.

    CDNs replicate content in multiple places. Better chance of content being closer to the
user with fewer hops, and content will run over a friendlier network. Traditionally designed
for performance and marketed to the enterprise crowd.

   E. Amazon S3

    The post-Google world has begun to see the development of the distributed, on demand,
grid/cloud-computing, redundant, failure-tolerant, scalable systems architecture. Amazon
sorted out the fundamentals of S3 in developing their own infrastructure for
and in the process has opened up their proprietary infrastructure to the world at minimal

    Amazon Simple Storage Solutions, S3, provides a managed internet-accessible storage
service where anyone can share any amount of data and retrieve it later again. The
maximum amount of data per object is 5GB, and the maximum number of objects is not
limited. Amazon has a stable and predictable pricing model that’s fairly competitive with the
industry. Table II, below, lists the pricing structure provided by Amazon’s S3 service [13].

                                                 Table II
                                                S3 Pricing

          Storage         $0.10 per GB/month of storage used

          Data Transfer   $0.10   GB   –   all data transferred in
                          $0.18   GB   –   first 10TB/month of data transferred out
                          $0.16   GB   –   next 40TB/month of data transferred out
                          $0.13   GB   –   data transferred out/month over 50TB

          Requests        $0.01 per 1,000 PUT or LIST request
                          $0.13 per 10,000 GET and all other requests.
                          $0.00 for delete requests.

    Amazon S3 certainly provides an interesting alternative to traditional video hosting.
Virtually zero start up costs and fairly competitive pricing coupled with standard REST and
SOAP interfaces and HTTP transfers protocols with the option of building protocol or
functional layers. S3 was built to be scalable, reliable, fast, inexpensive, and simple to use.
Table III, below, lists the average costs of hosting 5000 4.7 GB DVDs and delivering 100 TB
of data.

                                          Table III
                                Costs associated with Hosting

                                         Hosted      CDN     Amazon S3
                     Storage            $241,000   $23,552     $3,523
                     Bandwidth          $141,312   $29,696    $15,153
                     Total Per Month    $382,312   $53,248    $18,676

                         Section III Amazon Simple Storage Service

A. Overview of S3

    S3 (Simple Storage Service) is Amazon’s online storage web service providing unlimited
storage through a web services interface. The design of S3 is intended to provide scalability,
high availability, and low latency at commodity prices. Amazon uses the same scalable
storage infra-structure to run its own global e-commerce network on [50]. Furthermore,
objects stored in S3 can be accessed by unmodified HTTP clients thereby providing the
possibility of replacing a portion of existing web hosting infrastructures.

   Highlights of Amazon’s S3 service:

      •   Storage of arbitrary objects up to 5 GB in size with 2 KB of metadata.
      •   Objects stored in buckets.
      •   Unlimited number of objects per bucket.
      •   Each bucket is owned by an Amazon Web Service (AWS) account.
      •   Each object is identified within each bucket by a unique user assigned key.
      •   Use REST-style HTTP, SOAP, or HTTP GET/PUT interfaces to created, list, and
          retrieved objects.
      •   Supports BitTorrent protocol.
      •   Requests authorized using action control lists associated with each bucket and
      •   Authenticated URLs can be created with time-bounded validity.

   Buckets are a simple way for S3 to group objects together much like a folder does.
Bucket names have global scope and no one else can create a bucket of the same name.
HTTP log information can also be configured for sibling buckets which can later be used for
data mining tasks.

    Objects are the actual files, along with their metadata, that get stored on the platform.
Objects can be created or deleted, and associated with a set of permissions. Every object is
assigned a key and uniquely identifies the object within a bucket.

                            Section IV Video Hosting Architecture

A. Proposed S3 Architecture

    S3 is an online storage service and economy hosting/bandwidth provider. It is an ideal
solution for a small startup just beginning to build a video content storage and sharing
service. Figure 1, shown on the next page, represents an oversimplified architecture using
Amazon S3 storage as the backbone of a video distribution service. The Content
Management System (CMS) keeps track of all assets contained in the S3 storage space.
When the web client is ready to upload a video a request is made to the Web server which

in turn creates a unique bucket for the user, if necessary, then creates a unique object for
the client’s file. Next, the Web server issues the appropriate unique user assigned key and
authorization so that the web client can then upload a file to the appropriate bucket on S3.
When the web client is ready to access a file, a request is made to the server, which then
queries the CMS for the proper Amazon Web Services access identifiers from which the web
client can then access the file directly from S3.

                                             / CMS

                                 Client                    S3

                       Figure 1 Oversimplified Video Hosting Architecture

    S3 is built with a minimal set of features. Though APIs are provided to interface with S3,
actual software utilities are sparse. Some tools do exist, such as S3 Organizer, which
integrates into Firefox’s browser, S3 Sync written in Ruby, and professional offerings such
as JungleDisk. However, these offerings are geared more towards backup operations
between a user’s client machine and S3 and not for large scale management of assets
between S3, web servers, and client browsers.

C. Issues

  •   May suffer from latency when compared to CDN networks.
  •   Still may need to host most popular content on CDNs.
  •   No server side processing; still need a server to perform server-side processing on
      scripts or to access a database.
  •   Need a mechanism to handle read/write failures.
  •   Must build your own software.

D. What’s Left

  •   Still lot’s of work left to do.
  •   Create more detailed architecture.
  •   Work out coding details.
  •   Begin implementing architecture and judging performance.

D. Future

  •   Fully integrate into content management systems.

  •   Integrate Amazon EC2 services for on-demand computing power.
  •   Experiment with Amazon’s Bittorrent services for greater throughput.

                                         V. Summary

A. Conclusion

    Video sharing is hugely popular today. And online distribution of video is becoming more
accepted as the quality and speed of video downloads continue to increase. For a small
startup, it can be dauntingly hard to enter the market given the amount of hardware
required to build a reliable and scalable hosting solution. Old standards like content delivery
networks usually require prepayments for chunks of storage and bandwidth that may never
be used. Of course, surpassing the allocated storage or bandwidth limits usually result in
steep costs as well.

    Amazon Web Services is paving the way for developing new applications based on utility
style computing and storage services. With a bottomless supply of cheap, worry-free
storage and CPU power, infrastructures no longer have to be built based on anticipated
traffic or have to pay for idle capacity. Furthermore, investments won’t have to be made in
a large amount of hosting infrastructure or services just to handle occasional traffic spikes.

    The video hosting architecture presented herein allows a cost effective solution to be
built with minimal costs, scalability, and reliability. It makes it possible for the smaller
startup to compete with huge, deep pocketed companies without having to raise substantial
amounts of cash for hardware.


[1] Independent Movie Market, indie, February 2008,
[2] Apple tunes into movie-rental market by Wailin Wong, Chicago Tribute Web Edition,
    Chicago, IL, January 16, 2008, http://www.chicagotribune.c om/business/chi-
    wed_applejan16, 0,7918475.story?coll=chi-home page-fea.
[3] Insight Research: Streaming Content to Generate $70 Billion By 2013, Seeking Alpha,
    April 2008, http://seekingalpha.c om/article/70673-streaming-content-to-generate-70-
[4] Theatrical Market Statistics 2007, MPAA, March 2008,
[5] Web 2.0, Wikipedia, March 2008, http: //
[6] SaaS, Wikipedia, March 2008,
[7] Utility Computing, Wikipedia, March 2008,
[8] Content Delivery Network, Wikipedia, March 2008, tribution_network
[9] Content Delivery Video Pricing Rises In the First Half of This Year, April 2008,
[10],2933,203959,00.html July 18, 2006.
[11] Guide to Video Marketing on YouTube, Search Engine Journal, February 2008,
[12] Google Architecture, January 2008,

[13] Amazon Simple Storage Service, March 2008,

To top