Embed
Email

wikimedia-media-recoding

Document Sample

Shared by: Kerala g
Categories
Tags
Stats
views:
0
posted:
12/15/2011
language:
pages:
3
Comprehensive Automated Multimedia Recoding for Wikimedia



Benefits to Wikimedia:

Wikimedia will be poised to easily present audio and video sourced from contributions

uploaded in a wide range of multimedia formats using an in-browser player.



Synopsis:

A quick analysis of contributions will occur at upload time to determine their type and, in

the case of audio or video files, whether the server knows how to recode them to an Ogg

format. In the case of files that can be recoded, the user will receive a simple “upload

successful” message, and the source file will be added to a queue of recoding jobs. In the

case of eccentric formats that the server can’t decode, the user will be informed that their

contribution won’t be available on the in-browser player, and directed to help on how to

upload contributions in compatible formats.

A second script (running perhaps as a cron job) will routinely monitor the job queue,

dispatching new jobs to the appropriate recoding software and monitoring its progress.

Upon job completion, the script will update the necessary databases to reflect the recoded

file’s availability. The script could also report jobs that fail unexpectedly at this stage to

editors/administrators in a similar fashion.



Deliverables:

* An upload analyzer script that rapidly determines if a given file is a valid audio/video

type, and if so determines the feasibility of recoding it to Vorbis or Theora. It will report

its findings to the contributors at upload time, and enqueue suitable jobs for recoding.

* A recode manager script, designed to run either daemon-style or as a frequently

recurring cron-job, will monitor the recoding job queue, issue commands to the necessary

applications to decode/recode many file types, and update appropriate databases when

jobs finish. At minimum, I would make use of ffmpeg and mplayer to support the formats

they can decode. One job would be run at a time per server.

* Minor tweaks to existing media playback pages to make them aware of complete/not

yet available recode jobs. Ideally this would be in collaboration with another student

working on the in-browser player.



Development Methodology:

There is a wealth of software I can make use of to help in the basic task, some of it even

native to PHP – see for example getID3.sourceforge.net, which can tell me a whole lot

about what an upload is or isn’t right from the get-go. Using information from this and

other utilities accessed through command-line interfaces I can establish exactly what an

uploaded file is and use this information to determine what software / commands will be

necessary to decode/resize/downsample it for use in streaming to a browser-based player.

I anticipate using ffmpeg2theora as the ultimate last step in all video encoding jobs.

Ffmpeg would also be a reasonable means of encoding vorbis files, but I might choose

something else.

The queue would likely be implemented as a single MySQL database table, containing

source file information, job status, and queue order. This design should be easily

adaptable to have the recode processing occur on another machine (or even multiple

machines) in a server cluster, especially if the servers have access to a shared MySQL

database. With that, all that is necessary is a means for the recoding system(s) to access

the original file over the network. The recoding script would then just download or mount

the source file after selecting an unclaimed job from the queue.



In projects like this, I have found that 98% of the work is maximizing compatibility with

as many media sources as possible. This means researching the most common formats

users are likely to upload content in (existing usage data from Commons would be a good

starting point), obtaining as many sample files in those formats as possible, and testing

them with both the upload analyzer and recode components. Often someone’s $40

hardware encoder or meta-information writer isn’t quite following official specifications

causing either the upload analyzer or decoder to choke up; uncovering problems like this

and finding workarounds to of them as possible is where I expect much of my time will

be invested in the later stages of the project.



An interesting challenge is investigating the possibility of developing code that behaves

reasonably on platforms not administered by Wikimedia, since they would not

necessarily provide the same set of multimedia analyzing and recoding utilities. This

would be a necessary step if this project were to be incorporated in the mainstream

MediaWiki. While my primary development goal will be achieving flawless performance

with one homogenous set of utilities for use on properly configured servers, if I have time

I will also take a look at probing Linux-like systems for common codecs and utilities. The

upload analyzer and recode manager would then take available utilities into consideration

when determining if an upload can be recoded.



Timeline:

Week 1: Search for and become familiar with additional open-source utilities that could

assist me in analyzing or recoding uploaded media.

Week 2-3: Make it. I expect a first implementation of the main deliverables can be

completed in this time.

A day or two around this time could be spent adapting media viewing pages on test

servers to make use of the new database information on ready/not yet available resources.

Week 4 – 8?: Testing against the numerous multimedia formats. Scour the Commons and

other multimedia repositories for content in a diverse range of formats, run it all through

the recoder, and fix issues as they are found.

Remainder: Assuming quality results are reliably obtained by this time, work on

extending the project to do the best/most possible with the set of utilities given on a

particular system. The intent would be to make it usable in mainstream MediaWiki.



Bio:

I am an undergraduate at the University of Minnesota Duluth, where I major in Computer

Science. I am self-taught in web development, but have had direct hands-on experience

with PHP and MySQL for nearly five years, mostly doing from-scratch web applications

for various clients acquired by word-of-mouth. Of these, two have required in-depth

analysis of multimedia files at upload time to determine their specifications, metadata,

etc. One required realtime audio recoding on a Linux platform (see

http://www.wazee.org/wmd/). In another I implemented an asynchronous recoding job

queue capable of providing done/not done status directly to the site’s end users, as would

be required for this project.



Other docs by Kerala g
union-budget-2012-13-highlights
Views: 102  |  Downloads: 0
notification M.Tech_05-03-09
Views: 59  |  Downloads: 0
India_Customs Regulation 1
Views: 56  |  Downloads: 0
CE Notification 39-2011-12.9.2011
Views: 54  |  Downloads: 0
STATISTICS
Views: 72  |  Downloads: 0
A Hero (R.K. Narayan)
Views: 91  |  Downloads: 6
RRBPatna-Info-HN
Views: 116  |  Downloads: 0
RRB-Notice-Para
Views: 113  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!