Eguide_LuceneRevolution_2011

Document Sample
Eguide_LuceneRevolution_2011 Powered By Docstoc
					LUCENE REVOLUTION San Francisco 2011




Welcome to San Francisco!
We are excited to be bringing you the second Lucene Revolution event, following quickly on the
success of our 2010 conference in Boston last year. In addition to all the great feedback we received
after Boston, many people asked about bringing the conference to the West Coast – and here we
are. It’s great to host the community here in our home state of California.
There’s now no question: the revolution is in full swing, and Lucene and Solr are shaping the future
of search. The diverse range of search technology and applications is without a doubt one of its
greatest strengths. For the extended community and ecosystem of open source search, Lucene
Revolution is an unmatched opportunity to learn, network, share experiences, see how others have
changed the world of search.
Speakers here at the conference hail from companies large and small, from innovative startups and
established companies, as well as from government, academia and non-profits. Even better, the
range of experience and application interests of your fellow-attendees should inspire you to seek out
new ways to put search technology to work. We’ve allotted ample time in breaks to have formal and
informal conversations. And be sure to join the Revolution social network at:
http://lucene.crowdvine.com/. Keep an eye out at the Registration Desk for agenda changes and
updates.
One group you should definitely seek out here is the core group of developers and committers who
are the heart and soul of the Apache Lucene/Solr project. You know them from the mailing lists;
these are the people who do the hard work of making the code do its magic, resolving challenging
technical and architectural issues that we all benefit from. Don’t just attend their roadmap panel and
technical sessions; make sure you avail yourself of the opportunity to put faces to names, so that
when you’re on the mailing lists, you’ll have more than a ‘to’ and a ‘from’ to go by.
As the commercial entity for Lucene/Solr, we at Lucid Imagination are always looking for new ways
to help make the most of open source search. Be sure to tell us what you like, what could be
improved, and what topics should be covered in future events. Think about sharing your own
successes with the community by speaking at the next Lucene Revolution.
Let the conference staff, or anyone on the Lucid Imagination team, know if you have any questions,
or if there’s anything you need.
Onward to the revolution!
Eric Gries, CEO
Lucid Imagination



                                                 1
                                                                                                     San Francisco 2011                      LUCENE REVOLUTION




Opening Letter .................................................................................................................................................... 1!
Contents ............................................................................................................................................................... 2!
Timetable at a Glance ........................................................................................................................................ 3!
Agenda .................................................................................................................................................................. 6!
About Lucid Imagination .................................................................................................................................. 8!
About Our Sponsors ........................................................................................................................................ 10!
Training .............................................................................................................................................................. 14!
Keynotes ............................................................................................................................................................ 18!
Sessions–Day 1.................................................................................................................................................. 19!
Lightning Talks ................................................................................................................................................. 25!
Sessions–Day 2.................................................................................................................................................. 28!
Speaker Bios ...................................................................................................................................................... 36!
Hotel, Maps & Transportation Info .............................................................................................................. 50!




Lucene, Apache Lucene, Solr, Apache Solr, Hadoop, Apache Hadoop and other Apache projects mentioned are trademarks of The Apache Software Foundation.



                                                                                  2
LUCENE REVOLUTION San Francisco 2011




SUNDAY MAY 22
16:00 - 18:00 ........................................................................................ REGISTRATION OPEN
                                                                             Sandpebble Foyer outside Grand Peninsula Ballroom

MONDAY MAY 23
8:00 – 9:00 ....................................................................................... TRAINING REGISTRATION OPEN
9:00 - 17:00 ...................................................................................... Training Workshops/Day 1
        ! Solr Application Development Workshop
        ! Developing Search Applications with LucidWorks Enterprise
        ! Lucene Application Development Workshop
        ! Scaling Search with Solr and Big Data
                                                                           See registration desk in Sandpebble Foyer for room assignment.

TUESDAY MAY 24
8:00 – 9:00 ....................................................................................... TRAINING REGISTRATION OPEN
9:00 - 17:00 ...................................................................................... Training Workshops/Day 2
         ! Solr Application Development Workshop
         ! Developing Search Applications with LucidWorks Enterprise
         ! Lucene Application Development Workshop
         ! Scaling Search with Solr and Big Data
16:00 – 18:00 .............................................................................................. Ticket Pickup for Giants Game
                                                                                               (advance tickets required). Tickets may be picked up
                                                                                 at the Conference Registration Desk in the Sandpebble Foyer
18:00.................................................................................................................. Buses depart for Giants Game
                                                                                                               from front entrance of Hyatt Hotel




                                                                       3
                                                                                              San Francisco 2011                   LUCENE REVOLUTION


WEDNESDAY, MAY 25
7:30 – 18:00............................................................................................................. REGISTRATION OPEN
7:30 – 8:30 ..................................................................................................................Light Breakfast Available
8:30 – 10:05 ................................................................................................ Welcome & Keynotes
                  Welcome .................................................................. Eric Gries, Lucid Imagination
                  Keynotes ......................................................Marc Krellenstein, Lucid Imagination
                                                                                  Stephen Dunn, The Guardian News and Media
10:05 – 10:35 .......................................................................................................................................... BREAK
10:35 - 11:25 ........................................................................................ Technical Track Sessions
11:25 – 11:35 .......................................................................................................................................... BREAK
11:35 - 12:25 ........................................................................................ Technical Track Sessions
12:25 - 13:30 ....................................................................................LUNCH AND SPONSOR EXHIBITS
13:30 - 14:20 ........................................................................................ Technical Track Sessions
14:20 - 14:30 ........................................................................................................................................... BREAK
14:30 - 15:20 ........................................................................................ Technical Track Sessions
15:20 - 15:50 .......................................................................................................................................... BREAK
15:50 - 16:40 ..................................................................................... Panel: “Stump the Chump”
16:40 – 17:00 ......................................................................................................................................... BREAK
17:00 - 18:30 ........................................................................................................ Lightning Talks
18:30........................................................................................................................... REVOLUTION PARTY

THURSDAY MAY 26
7:45 – 8:45 ..................................................................................................................Light Breakfast Available
8:45 – 10:15
                  Keynote ....................................................................... Stephen O’Grady, Redmonk
                  Panel ..................................................... Committers Q&A, Lucene/Solr Roadmap
10:15 – 10:45 .......................................................................................................................................... BREAK
10:45 - 11:35 ........................................................................................ Technical Track Sessions
11:35 - 11:45 ........................................................................................................................................... BREAK
11:45 - 12:35 ........................................................................................ Technical Track Sessions
12:35 - 13:45 ....................................................................................LUNCH AND SPONSOR EXHIBITS
13:45 - 14:35 ........................................................................................ Technical Track Sessions
14:35 - 14:45 ........................................................................................................................................... BREAK
14:45 - 15:35 ........................................................................................ Technical Track Sessions
15:35 - 15:45 ........................................................................................................................................... BREAK
15:45 - 16:35 ....................................................................................... Technical Track Sessions
16:35 - 17:30 ......................................... Panel: “Search for Tomorrow (RDBMS for Yesterday)”
17:30............................................................................................................................ CONFERENCE ENDS




                                                                             4
LUCENE REVOLUTION San Francisco 2011


LOGISTICS
       !   REGISTRATION is in the Grand Peninsula Foyer
       !   KEYNOTES and PANEL DISCUSSIONS are Grand Peninsula Ballroom D
       !   TRACK 1 is in Grand Peninsula Ballroom A/B/C
       !   TRACK 2 is in Grand Peninsula Ballroom D
       !   TRACK 3 is in Grand Peninsula Ballroom E/F/G
       !   TRACK 4 is in Sand Pebble A/B/C
       !   LUNCHES are in the Atrium (upstairs above Ballroom )
       !   THE REVOLUTION PARTY is in the Grand Peninsula Foyer
       !   TRAINING CLASSES will be held in the Sandpebble Conference Rooms
       !   TRAINING REGISTRATION is outside the Sandpebble Conference Rooms
              (please contact charelm@gmail.com if are unsure which class you are in):




                                              5
    San Francisco 2011   LUCENE REVOLUTION




6
LUCENE REVOLUTION   San Francisco 2011




                                         7
                                                          San Francisco 2011    LUCENE REVOLUTION




As the world’s leading source of expertise in open source search technology and the
commercial company for Apache Solr/Lucene, Lucid Imagination offers the products and
services you need for cost-effective development and production deployment of cutting edge search
applications that lower your cost of growth. Thousands of organizations around the world have
turned to the power of Apache Solr/Lucene open source technology to drive their cutting-edge
search applications.

LucidWorks: Enterprise Grade Solr/Lucene
LucidWorks Enterprise is a flexible, cost-effective scalable platform that simplifies development,
tuning, configuration and deployment of Solr/Lucene open source search technology. It features:

                                               POW ERFUL SEARCH
                                               !   Complete Apache Solr 4.x Release Integrated
                                                   and tested with powerful enhancements
                                               !   Scalability Distributed search and indexing
                                               !   Cloud-Ready Centrally managed search
                                                   replication and configuration
                                               !   REST API Simplifies integration
                                               SIM PLIFIED ADM INSTRATION
                                               !   Easy-to-use Installer & Admin UI
                                                   Streamlines      startup   and     common
                                                   configuration tasks
                                               !   Data Connectors for databases, file systems,
                                                   Web sites, SharePoint and more
                                               !   Multiple file types MS Office, PDF, native
                                                   XML format documents and more
                                               !   Security: LDAP-aware, document level, role-
                                                   based, policy-driven.
                                               ADVANCED USER EXPERIENCE
                                               !   Enriched Query Parsing: more resilient
                                                   interpretation of user input
                                               !   Click Scoring: boosts results based on user
                                                   behavior
                                               !   User     Alerts:    Automatic    notification
                                                   of new results
                                               !   Integrated Auto-complete and spellchecking.




                                               8
LUCENE REVOLUTION San Francisco 2011



Global Expertise: Training & 24x7 Services
Lucid Imagination offers a deep bench of resources in search and open source, backed by
unmatched experience with thousands of diverse search applications at the world’s largest
companies.
TRAINING
A comprehensive selection of courses and classes for developers, system administrators, managers,
and search application users on LucidWorks Enterprise, Solr and Lucene; instruction is offered in
a variety of formats around the world.
CONSULTING
Our unique ExpertLink Advisory Services provides consultative guidance on design and
optimization for search applications during development and production to ensure your
Lucene/Solr implementations meet the requirements of your business.
ENTERPRISE SUPPORT AND SUBSCRIPTIONS
Lucid Imagination offers attractively priced subscriptions that deliver Solr/Lucene technology in an
integrated, well-packaged format. Subscriptions combine stability, security, robust interfaces, and
predictable release schedules with unmatched support resources in reach 24 x 7 x 365 across the
globe.




                                                9
                                                            San Francisco 2011      LUCENE REVOLUTION




Platinum Sponsor: Basis Technology
Basis Technology provides software solutions for multilingual text analytics, information retrieval,
and name resolution. Our Rosette© Linguistics Platform is the text analysis engine behind many
commercial and government search-based applications, adding language support to Lucene and Solr
for better search precision and recall in English or 27 other languages. Starting with language
identification in 55 languages, our high quality linguistic analysis seamlessly integrates into Lucene
and Solr via a connector — enabling customizable tokenization and stemming/lemmatization for
languages like Chinese, Japanese, Arabic, and Persian. Dictionary-based decompounding is available
in German, Dutch, Danish, Swedish, Norwegian, and Korean. Entity extraction enriches search by
adding auto-generated metadata and faceted navigation to results. Implementing support for new
languages to Solr is less than a day’s work.
The Rosette Platform powers search, business intelligence, e-discovery, and other enterprise and
government applications for customers worldwide including: Microsoft/Bing, Cisco, EMC, Endeca,
Oracle, and Yahoo!
                                                                                !!!"#$%&%'()*")+,-
                                                                                                        -




                                                10
LUCENE REVOLUTION San Francisco 2011



Exhibitors
SALESFORCE.COM
Salesforce.com is the enterprise cloud computing leader and the world’s 4th fastest-growing
company. We’re also one of the “Best Places to Work” (FORTUNE). Salesforce.com’s Search Team
is strong and experienced, with deep architecture expertise. We’re dedicated to delivering the fastest,
most reliable cloud-scale enterprise search in the industry. In addition to innovating around
scalability and security, we strive to delight our end users with an original, intuitive user experience
and relevancy that’s adaptive, robust, and deeply satisfying. If you share our passion for search and
for solving tough problems, swing by our booth to chat.
                                                                                 !!!"%$.(%/+0)(")+,-
SEARCH TECHNOLOGIES
Search Technologies is the leading independent provider of search engine integration and support
services. Operating internationally, we help clients to gain business advantage using search. Our
technical team of more than 80 experts is the most experienced group of search implementation
professionals globally, and this mitigates risk for our customers. In short, we are the experts at fine-
tuning search applications to deliver business benefits.
                                                                       !!!"%($0)*'()*1+.+2&(%")+,-
DOCUM ILL
Documill is an independent software vendor (ISV) enabling browser-based access to Microsoft
Office and PDF documents and empowering high volume server-side content processing
solutions.Documill Visual Search dramatically improves search user experience and discoverability
of multi-page documents. Instant document previews and page-level search results improve
document data mining experience and accuracy. With page-level bookmarking features, Documill
Visual Search enables collaborative search, allowing users to take actions based on their findings,
share results and syndicate relevant pages into new documents.
                                                                                   !!!"3+)4,&..")+,-




                                                 11
                                                            San Francisco 2011      LUCENE REVOLUTION


Community Sponsors
SEM ATEXT
Sematext is a software products and services company focused on Search & Analytics using Lucene,
Solr, Nutch, Hadoop, HBase, Flume, Mahout, and other open-source technologies. Sematext also
offers Lucene & Solr technical support subscriptions, consulting packages, and training. The
company also runs the popular search-hadoop.com and search-lucene.com sites. Founded in 2007 in
New York, Sematext is privately held and self-funded with presence in North America and Europe.
Sematext’s customers include The Library of Congress, Lockheed Martin, Simon & Schuster,
Salesforce, NAVTEQ, Comcast, Cox Communications, ProQuest, Citysearch, Gilt Groupe,
Autodesk, and many others.
                                                                                 !!!"#$%&'$('")*%+
EM C CORPORATION
EMC Corporation is the world’s leading developer and provider of information infrastructure
technology and solutions that enable organizations of all sizes to transform the way they compete
and create value from their information.We can help you design, build, and manage flexible, scalable,
and secure information infrastructures. And with these infrastructures, you’ll be able to intelligently
and efficiently store, protect, and manage your information so that it can be made accessible,
searchable, shareable, and, ultimately, actionable.In short, with an information infrastructure, you
can avoid the potentially serious risks and reduce the significant costs associated with managing
information, while fully exploiting its value for business advantage.
                                                                                       !!!"$%)")*%+
SPRINGSOURCE, A DIVISION OF VM W ARE, INC.
SpringSource, a division of VMware, Inc., (NYSE: VMW), employs the open source leaders who
created and drive innovation for Spring, the de facto standard programming model for enterprise
Java applications, as well as the Java and web thought leaders within the Apache Tomcat, Apache
HTTP Server, RabbitMQ, Hyperic, Groovy and Grails open source communities. SpringSource
forges open source innovations to create lean and powerful technology that people love to use.
From high productivity developer tools and framework to lightweight application server runtimes
including data management solutions for the hardest enterprise and cloud scale problems,
SpringSource provides solutions for tomorrow’s enterprise challenges.
                                                                             !!!"#,-./0#*1-)$")*%"+




                                                 12
LUCENE REVOLUTION San Francisco 2011


M ANNING PUBLICATIONS
Manning Publications offers computer books for professionals—programmers, system
administrators, designers, architects, managers and others. Manning’s focus is on computing titles at
professional levels. We care about the quality of our books. Our books are designed without
gimmicks. Their main goal is elegance and readability—we feel the two are often the same. Our
covers are understated, decorated with pictures of worldwide regional dress habits of two hundred
years ago. Many of our books come with online reader support: authors answer the questions of
their readers in our Web-based Author Online discussion forums.
                                                                 -       -       !!!",$11&12")+,-
DZONE
DZone is a social linking and blogging network for the developer and IT communities. According to
PC Magazine, “DZone is a developer’s dream—a vast network of user-submitted links to message
boards, news, coding tricks, and more.” Launched in June, 2006, DZone is in Alexa’s top 3000 sites,
surpassing established leaders like DevX, Sys-con, FTP Online and TheServerSide.com. DZone is
the only vertically focused site regularly listed among the web’s largest social bookmarking sites. In
its first year of operation DZone sent over 5 million visitors to other developer websites. Today,
DZone has curated topic pages for Java, Solr/Lucene, Cloud Computing, PHP, Agile, Mobile, and
much more.
                                                                                    !!!"37+1(")+,-
TNR GLOBAL
TNR Global is a systems design and integration company focused on enterprise search and cloud
computing solutions. TNR develops scalable, fault-tolerant web-based search solutions built on the
open source LAMP stack and utilizing Amazon Web Services and/or physical servers. TNR has
over ten years of experience in web systems and enterprise search implementations, both proprietary
and open source, and specializes in Lucene Solr and FAST ESP search applications. TNR Global
builds solutions for: Vertical Search Engines, Publishing, Web Directories, News Sites, Information
Portals, Web Catalogs, Education. We also work with web based startups to build scalable services.
                                                                                !!!"'102.+#$.")+,-
UCHIDA SPECTRUM
Uchida Spectrum, Inc. (USI) is a leader in the Japan search market. USI provides SMART/Insight, a
search application that integrates and analyzes enterprise information. SMART/InSight is used by
leading blue chips, like Canon and Moody’s. USI is working with Lucid Imagination as its Strategic
Alliance Partner to integrate LucidWorks Enterprise into its products and offer Lucene/Solr support
services. In 2011, USI expanded its offerings to Enterprise Search and Web Services/Ecommerce
companies across Asia. USI now serves clients and partners in Japan, India, China and Singapore.
                                                                                !!!"%6()'04,")+"86-



                                                13
                                                            San Francisco 2011      LUCENE REVOLUTION




Scaling Search With Big Data And Solr
Scaling Search with Big Data and Solr is a 2-day instructor-led, hands-on classroom training course
delivered by instructors certified by Lucid in a shared classroom setting. The class is for Solr
developers who want to know how to leverage the flexible search functionality of Apache Solr and
the Big Data processing of Apache Hadoop, to create the indexes for both general search and
augmented data analytics. Lab exercises and real-world examples will be used to reinforce content.
We’ll start with Hadoop from the ground up, and cover MapReduce, HDFS—the Hadoop
Distributed File System, cluster management, “the shuffle,” etc., before continuing on to connecting
it to Solr. We’ll look at common use cases for generating search indexes from big data, typical
patterns for the data processing workflow, and how to make it all work reliably at scale. We will
explore in-depth an example of processing 1 billion records to create a faceted Solr search solution.
You’ll learn how Solr can be used as a NoSQL solution, and how it compares to classic NoSQL
projects such as Cassandra and HBase.
The class will continue with techniques for scaling your Solr installation, how to identify bottlenecks
in your Solr installation, how to monitor your installation, and how determine resource usage. We’ll
also cover various Solr architectures, their characteristics and use cases. We’ll examine how to apply
these to make appropriate tradeoffs to effectively scale your Solr installation.
THE COURSE COVERS
       !   An overview of Hadoop.
       !   Understanding MapReduce.
       !   Principles of Hadoop development, operations & eco-system.
       !   How to use Hadoop with Solr.
       !   How to Index large volumes of data.
       !   How to effectively search large indexes.
       !   Understanding NoSQL.
       !   How to shard/federate/replicate your data for large indexes.
       !   Understanding resources cost & tradeoffs for Solr Features.
PREREQUISITES
Prospective students should be familiar with Solr, obtained either through work experience with
Solr, or having completed the Lucid Imagination Solr training course. It is assumed the student does
not have prior Hadoop experience.




                                                 14
LUCENE REVOLUTION San Francisco 2011


Developing Search Applications With Lucidworks Enterprise
Developing Search Applications with LucidWorks Enterprise is a 2-day instructor-led, hands-on
classroom training course designed and developed by the engineers that developed LucidWorks
Enterprise (LWE), and delivered by instructors certified by Lucid in a shared classroom setting.
The objective of this course is to introduce LucidWorks Enterprise to users with no previous
experience working with search applications. Through a combination of lectures and hands-on lab
exercises you will learn how to get up and running with LucidWorks Enterprise, what the
components of a search application are, and how to make your content searchable and findable in a
search application built on LucidWorks Enterprise. There will be time for questions and discussion
to enhance your learning experience.
At the end of the course you will know what a search application is, and how to set up and use
LucidWorks Enterprise to index and search your content. You will also learn about all of the
features LWE such as highlighting, spell checking, and custom alerts, and how to use these features
to build a satisfying search experience for end users who will search your content.
THE COURSE COVERS
       !   What a search application is and how to build one with LucidWorks Enterprise.
       !   How to install and configure LWE.
       !   How to make your content searchable and findable.
       !   How to work with different data sources such as web pages, relational databases, and
           rich content files.
       !   How to build queries to search for content in LWE.
       !   Techniques and features in LWE that can be used to make results for end users more
           relevant.
       !   Different ways to process search results returned by LWE.
PREREQUISITES
No programming skills are necessary, however some technical background and familiarity with
application development will be helpful. There will be labs accompanying the lectures that will
require basic computer skills including how to run a simple command from the command line.No
previous experience with search applications is necessary.




                                               15
                                                           San Francisco 2011      LUCENE REVOLUTION


Solr Application Development Workshop
Solr Application Development Workshop is a two-day hands-on training course designed and
developed by the engineers that helped write the Apache Lucene/Solr code, and delivered by
instructors certified by Lucid in a shared classroom setting. The workshop is targeted at developers
who want to build applications with Apache Solr, the Lucene Search Server. You will learn how to
set up and use Solr to index and search, how to analyze and solve common problems, and how to
use optional Solr modules such as facets, spell check, and highlighting. Lab exercises and real-world
examples will be used to reinforce content.
There will be time for questions and discussion to enhance your learning experience. At the end of
the course you will understand how to set up and use Solr to index and search, how to analyze and
solve common problems, and how to use optional Solr modules such as facets, spell check, and
highlighting.
THE COURSE COVERS
       !   Principles of search application development
       !   Common search use cases and their application
       !   How to make content searchable
       !   Key Solr and Lucene concepts
       !   Basics of indexing and searching using Solr
       !   How to design and run a Solr application
       !   Best practices for indexing, searching and performance
       !   Techniques to analyze and resolve common search problems
       !   How to leverage Solr’s optional modules including spell checking, highlighting, Data
           Import Handler, Tika Integration and other popular capabilities
       !   Advanced topics in designing Solr apps and running a site
       !   Solr operations and deployment tools and strategies
       !   How to customize and extend Solr
PREREQUISITES
Some programming skill and experience with a modern programming language such as Java, PHP,
Perl, Ruby, .NET, or any language that supports HTTP and/or XML.




                                                16
LUCENE REVOLUTION San Francisco 2011


Lucene Application Development Workshop
Lucene Application Development Workshop is a two day instructor-led hands-on training
workshop, written and led by the engineers who helped write the Apache Lucene/Solr code. The
objective of this course is to provide you with real life use cases and teach you how to apply Lucene
to real business requirements. During the course you will learn to apply best practices in developing
scalable, highly available and high performance search applications.
There will be time for questions and discussion to enhance your learning experience.
THE COURSE COVERS
       !   Principals of search application development.
       !   Common search use cases and their application.
       !   How to make content searchable.
       !   Key Lucene concepts.
       !   Basics of indexing and searching with the Lucene APIs.
       !   Best practices for indexing, searching and performance.
       !   Analysis techniques for solving common search problems.
       !   Lucene Internals.
       !   Lucene’s optional modules to enable spell checking, highlighting and other common
           search features.
PREREQUISITES
Basic Java programming skills




                                                17
                                                            San Francisco 2011      LUCENE REVOLUTION




The Once and Future History
of Enterprise Search and Open Source
M ARC KRELLENSTEIN | LUCID IM AGINATION
While it remains challenging to build best practice search applications, core search technology has
become commoditized. Open source Lucene/Solr represents the best form of that commodity, as
good as or better than any commercial search technology while also providing the cost, control and
flexibility advantages of open source. In this talk, we’ll look at how past challenges in search were
met and new ones evolved, and the place of Lucene/Solr in that evolution.

From Publisher To Platform: How The Guardian
Embraced the Internet using Content, Search, and Open Source
STEPHEN DUNN | GUARDIAN NEW S AND M EDIA UK
In 2009 The Guardian launched The Open Platform, a suite of services and tools that enable
content partners and developers to build applications with The Guardian’s rich content. The content
API, hosted on Solr instances on EC2, contains JSON representations of all Guardian articles back
to 1999 - over 1 million articles, and is an increasingly complete representation of the output of the
organization. The DataStore contains curated data sets for use in applications and virtualizations.
This talk will cover how The Guardian opened up their business, enriched it, and reached new
markets with its Open Platform strategy. Stephen will cover the technical architecture,
implementation of Solr (the key technology powering the platform), and how The Guardian has
used it to embrace disruption in the media space, while finding new sources of revenue and
innovation. With two years since its launch, Stephen will cover some of the lessons learned, and
explain how the Guardian complements use of Solr with other open-source non-relational
technology, as it platform evolves.

All Data Big and Small
STEPHEN O’GRADY | REDM ONK
The last twenty four months have seen a veritable explosion in discussion around what is commonly
referred to as Big Data and the infrastructure technology employed to manage it. The wealth of
available open source software means that businesses from any industry have easily accessible tools
with which to tackle projects that would have been out of their reach just a few years prior. Less
heralded, however, has been the fact that making data actually useful - whatever its size - remains a
challenge. In this session we’ll explore the role of search in putting data - big and small - to work
answering the important questions for businesses and society by reducing the friction between
question and answer.

                                                18
LUCENE REVOLUTION San Francisco 2011




Integrating Advanced Text Analytics into Solr
STEVE KEARNS | BASIS TECHNOLOGY
Text analytics provides a number of interesting analytic capabilities that can enhance enterprise
search applications, though in practice it is not always obvious how these can be integrated
effectively into Solr. This presentation will describe some of the practical ways that leading
organizations are using text analytics by integrating them directly into Solr and their user interface to
improve relevance, navigate results, and discover new information. The combination of Solr and
quality text analytics can improve existing keyword search solutions, and enable new ways of
discovering knowledge hidden in existing data.

Finite State Automata in Lucene: Internals and Applications
DAW ID W EISS | POZNAN UNIVERSITY OF TECHNOLOGY, POLAND
Finite state automata and transducers made it into Lucene fairly recently, but already show a very
promising impact on search performance. This data structure is rarely exploited because it is
commonly (and unfairly) associated with high complexity. During the talk, I will try to show that
automata and transducers are in fact very simple, their construction can be very efficient (memory
and time-wise) and their field of applications very broad. This will be backed by an introduction to
how FSTs are implemented in Lucene (construction and traversals) and practical use cases of where
FSTs have been useful so far. If you’d like to see how to squeeze a 150MB of text data into 1.8MB
of compact data structure, this talk is for you.

Case Study - Panasonic Europe Powered by Apache Solr
DANIEL POTZINGER | AOE M EDIA GM BH
In 2010 Panasonic made the decision to replace their legacy enterprise search tool and switched the
search for all their European websites to a Apache Solr based solution. Now their customers benefit
from an incredibly fast and feature rich solution that is much more than just a search and has
become a valuable sales-driving tool for Panasonic. Features like relevancy manipulation,
autosuggest, contextual filtering for properties like color or product category were implemented
under not the most ideal circumstances mainly that there was no access to structured data. The
search was rolled out in close to 30 countries so far also putting Solr multi-lingual handling to a test.




                                                  19
                                                           San Francisco 2011     LUCENE REVOLUTION


Real-time Search at Yammer
BORIS ALEKSANDROVSKY | YAM M ER, INC.
This talk will be focused on the architecture, scalability concerns, performance bottlenecks,
operational characteristics and lessons learned while designing and implementing Yammer
distributed real-time search system. Yammer is an enterprise social network SaaS offering with over
100,000 networks (including 85% of the Fortune 100) and nearly 2 million users. The search system
we developed scales well up to 1B messages and serves a foundation of knowledge base analysis
services Yammer is developing.

Boosting Documents in Solr by Recency,
Popularity and Personal Preferences
TIM OTHY POTTER | NATIONAL RENEW ABLE ENERGY LABORATORY (NREL)
Attendees with come away from this presentation with a good understanding and access to source
code for boosting and/or filtering documents by recency, popularity, and personal preferences. My
solution improves upon the common “recipe” based solution for boosting by document age. The
framework also supports boosting documents by a popularity score, which is calculated and
managed outside the index. I will present a few different ways to calculate popularity in a scalable
manner. Lastly, my solution supports the concept of a personal document collection, where each
user is only interested in a subset of the total number of documents in the index. My presentation
will provide a good example of how to filter and/or boost results based on user preferences, which
is a very common requirement of many Web applications.

Jazzed about Solr: People as a Search Problem
JOSHUA TUBERVILLE | EHARM ONY
Search oriented architectures are obvious approaches for web pages, emails, documents, and other
text based entities. Often with traditional structured data, text searching is “added on” to the
traditional Boolean queries in relational stores. When Jazzed was initiated we wanted search to be
front and center. When we evaluated Solr we realized we could take the opposite approach “add on”
Boolean components to textual searches. This hybrid query approach makes transitioning to flexible
ranking easy and straightforward. In this talk we will cover
       !   How we model semi-structured user data in Solr
       !   Indexing strategies and their tradeoffs
       !   Where in Jazzed architecture Solr does and doesn’t fit
       !   What aspects of Solr we are using
       !   Future considerations




                                               20
LUCENE REVOLUTION San Francisco 2011


Heavy Committing: DocValues
aka. Column Stride Fields in Lucene 4.0
SIM ON W ILLNAUER | APACHE LUCENE PM C
Lucene 4.0 is on its way to deliver a tremendous amount of new features and improvements. Beside
Real-Time Search & Flexible Indexing DocValues aka. Column Stride Fields is one of the “next
generation” features. DocValues enable Lucene to efficiently store and retrieve type-safe Document
& Value pairs in a column stride fashion either entirely memory resident random access or disk
resident iterator based without the need to un-invert fields. Its final goal is to provide a
independently update-able per document storage for scoring, sorting or even filtering. This talk will
introduce the current state of development, implementation details, its features and how DocValues
have been integrated into Lucene’s Codec API for full extendability.

Search, APIs, capability management and the Sensis journey
CRAIG REES | SENSIS
Earlier this year, Sensis launched its Business Search API, which allows publishers to develop local
search propositions powered by the two million business listings contained in the Australian Yellow
Pages® and White Pages® directories.
This case study will explore Sensis’ strategic direction for search and explain how the framework and
metrics by which search is managed at Sensis were used to define our search roadmap. Key
architectural decisions including our use of Solr and MongoDB will be discussed as well as our
approach to real-time search tuning and quality management.

A Study of I/O and Virtualization Performance with
a Search Engine based on an XML database and Lucene
ED BUECHE | EM C
Documentum xPlore provides an integrated Search facility for the Documentum Content Server.
The standalone search engine is based on EMC’s xDB (Native XML database) and Lucene. In this
talk we will introduce xPlore and some of its key components and capabilities. These include aspects
of a tight integration of Lucene with the XML database: xQuery translation and optimization into
Lucene query/API’s as well as transactional update Lucene). In addition, xPlore is being deployed
aggressively into virtualized environments (both disk I/O and VM). We cover some performance
results and tuning tips in these areas.




                                                21
                                                               San Francisco 2011        LUCENE REVOLUTION



Four Pillars of Designing the Search Experience
TYLER TATE | TW IGKIT
Lucene and Solr provide many excellent tools for presenting information to users, but what makes
some search user interfaces better than others? Should you aim for a rich, advanced UI or should
you “just make it look like Google”? Through his work at TwigKit with blue-chip corporations,
scientific institutes, and governments, Tyler has identified four guiding pillars of the search
experience:
        ! User Expertise - Novices orienteer, experts teleport
        ! User Behaviour - Lookup, learn, and investigate
        ! Information Diversity - homogenous vs. heterogenous data
        ! Situational Context - factors from the surrounding environment
We’ll delve deep into each dimension and discuss how to achieve useful, useable, and beautiful
search interfaces using design patterns including: autocomplete, faceted navigation, breadcrumbs,
best bets, related searches, spelling suggestions, clickable metadata, result clustering, saved searches,
data visualisation, and more.

Using Solr in Online
Travel Shopping to Improve User Experience
ESTEBAN DONATO, SUDHAKARA KAREGOW DRA AND RAM ON RESM A | TRAVELOCITY
In this talk we would like to present three different use cases of Solr in the travel industry. First of all
we would describe how we implemented faceted navigation for hotel shopping. Then, we will
introduce how we implemented destination searching functionality like auto-complete and
misspelling. Lastly, we will show you how we integrated Solr to provide better experiences to mobile
users.

Solr @ eBay Kleinanzeigen
OLAF ZSCHIEDRICH | EBAY.DE
Attendees will learn how eBay Germany has implemented Solr, why Solr was selected, which Solr
features are utilized. and how Solr is configured and used in production. Recommended best
practices will be profiled alomng with eBay Kleinanzeigen plans for future deployment of Solr.




                                                   22
LUCENE REVOLUTION San Francisco 2011


Rapid Prototyping with Solr
ERIK HATCHER | LUCID IM AGINATION
Got data? Let’s make it searchable! This interactive presentation will demonstrate getting documents
into Solr quickly, will provide some tips in adjusting Solr’s schema to match your needs better, and
finally will discuss how showcase your data in a flexible search user interface. We’ll see how to
rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will
be enough time left to outline the next steps in developing your search application and taking it to
production.

Search Analytics: What? Why? How?
OTIS GOSPODNETIC | SEM ATEXT
You’ve indexed your data and people are searching it. But how do you know if they are happy with
the results? How do you know if they are finding what they need? With search increasingly
becoming the primary information access mechanism, knowing how your search is doing is not just
a matter of mere curiosity, but often has direct business impact. In this talk we’ll talk about Search
Analytics and how it can be used to answer questions like:
        ! Are too many users getting the dreaded “no matches” results?
        ! How deep into search results do people dig?
        ! Which hits are they clicking on, or what percentage of them don’t click on any hits?
        ! How much do they use the Did You Mean or Auto-Complete suggestions?
We’ll explore what specific Search Analytics reports tell us and what specific actions you should take
based on those reports.




                                                 23
                                                            San Francisco 2011      LUCENE REVOLUTION

“Stump The Chump”: Get
On The Spot Solutions To Your Real Life Solr/Lucene Challenges
GRANT INGERSOLL | LUCID IM AGINATION
Got a tough problem with your Solr or Lucene application? Facing challenges that you’d like some
advice on? Looking for new approaches to overcome a Lucene/Solr issue? Not sure how to get the
results you expected? Don’t know where to get started? Then this session is for you.
Now, you can get your questions answered live, in front of an audience of hundreds of Lucene
Revolution attendees! Back again by popular demand, “Stump the Chump” at Lucene Revolution
2011 is hosted by PMC chairman and Lucid Imagination co-founder Grant Ingersoll. All you need
to do is send in your questions to us here at info@lucenerevolution.org. You can ask anything you
like, but consider topics in areas like:
        ! Data modelling
        ! Query parsing
        ! Tricky faceting
        ! Text analysis
        ! Scalability
You can email your questions to info@lucenerevolution.org. Please describe in detail the challenge
you have faced and possible approach you have taken to solve the problem. Anything related to
Solr/Lucene is fair game. Our MC will read the questions, and Grant will have to formulate a
solution on the spot. A panel of judges will decide if he has provided an effective answer. Prizes will
be awarded by the panel for the best question—and for those deemed to have “stumped the
chump”.




                                                 24
LUCENE REVOLUTION San Francisco 2011




Improve Relevance by Using
Morphology and Named Entity Recognition
CHRISTOPH GOLLER, DIRECTOR, RESEARCH | INTRAFIND SOFTW ARE AG
This talk will show how the relevance of search results can be improved by using morphology and
named entity recognition. After briefly explaining the purpose of morphological analysis and of
named entity recognition we will analyze their potential advantages for search, faceting, and
clustering of search results. Based on these ideas we will briefly sketch details how to implement a
morphological analyzer in Lucene and how to implement a natural language question answering
system based on Lucene using named entity recognition. The talk will be accompanied by a life
demo of these ideas.
BIO:
Christoph Goller has more than 10 years of experience in the search industry. He got a Ph.D in computer science from
the Technical University of Munich where he worked in several research projects on artificial intelligence, machine
learning and neural networks. Christoph started his career at Lernout & Hauspie. Since 2002 he has been Director
Research of Intrafind Software AG (www.intrafind.de), a German company specializing in full-text search and text
mining based on Lucene/Solr. Christoph has been a Lucene committer since 2004. He has accompanied dozens of
commercial projects using Lucene and Solr. Christoph is author of more than 15 scientific papers, frequently gives
presentations on search related topics and is responsible for partner training at Intrafind.

Scientific Data Search
in the Pharmaceutical Industry with Solr
JEFFREY GUO, CEO | SEM TIFIC SOFTW ARE, INC.
Tremendous amount of experimental information and scientific knowledge has been locked or lost
in data silos in the forms of semi-structured or unstructured data in today’s pharmaceutical industry.
Out of the box full text search engines do not understand embedded scientific terms and objects
and their relationships to facilitate context sensitive and relevant searches. This presentation will
discuss a successful implementation at a major pharmaceutical company that utilizes Solr as
enterprise search platform and enhances it with chemistry (molecular entities and reactions) search
capabilities. The scope of the document indexing process is expanded to cover embedded chemistry
objects and terms of various types such as common chemical names, corporate IDs, SMILES, and
InChI from documents. Scientifically aware search based on query structure drawing or chemical
terms is therefore enabled. Enterprise scientific search strategies and lessons learned will be
discussed during the presentation.
Bio: Founder of Semtific Software, Inc., a company that provides products and services that streamline drug discovery
workflow and enterprise search of scientific research data.
                                                       25
                                                                   San Francisco 2011         LUCENE REVOLUTION


Using Lucene’s Test Framework
ROBERT M UIR | LUCID IM AGINATION
The Lucene/Solr community takes testing seriously: we have a suite of over 3500 tests to ensure
software quality. Over time we accumulated some useful extensions to JUnit testing, and several
people found themselves using our extensions for other projects. We released this “test framework”
for the first time in Lucene 3.1, and this talk is a short summary of its feature list to hopefully
encourage you to go check it out for yourself. Find out how you can:
! Improve test coverage for custom Lucene components.
! Speed up your unit test suite by running tests in parallel
! Find resource leaks, localization or timezone-sensitive bugs in your application
! Use our extensions to make unit tests easier to write.
Bio: Robert Muir, software engineer for Lucid Imagination, us a Lucene/Solr committer & PMC member.

Using Apache Solr and Active Directory to
unify data access across Intranet, ERP and Filesystem Cluster
ROBERT W EIßGRAEBER, PROJECT DIRECTOR | LIGHTW ERK
Solr is tightly linked into all available data and business intelligence sources in the enterprise:
Indexing the TYPO3 CMS-based Intranet, downloads, forms, handbooks, an Oxaion based ERP-
Database, and the file system Cluster running Microsoft Distributed File System – using TIKA for
full-text content extraction. All data is connected via ActiveDirectory servers into user based fine-
grained access control lists, which are evaluated in real-time and early-binding mode by Solr. A
worldwide Solr-Cluster using different shards gives additional security for world-wide deployment,
e.g. keeping confidential data inside the headquarters own data centers.
Bio: Robert Weißgraeber is Project Director at Lightwerk, primary specialized in designing, planning and executing
corporate portals.




                                                      26
LUCENE REVOLUTION San Francisco 2011


Thousands of Indexes in the Cloud
SHANEAL M ANEK, LEAD SEARCH ENGINEER | GREPLIN
Indexes at Greplin are strange - instead of having one giant index that is searched all the time and
updated infrequently, there are thousands of relatively small indexes that are updated much more
frequently than they are searched. These unorthodox requirements lead to an unorthodox
architecture that uses techniques inspired by Zoie and Bobo. We will discuss techniques that allowed
us to exploit the inherent shardability and access patterns of our data to build an extremely high
throughput information retrieval architecture. We will also examine some of the challenges and
opportunities presented by running Lucene on Amazon’s Elastic Compute cloud.
Bio: Shaneal Manek is the lead search engineer at Greplin. He was previously the founder and CTO of Signpost.com,
which built a geospatial search and recommendation engine on top of Lucene and Lisp.




                                                      27
                                                              San Francisco 2011       LUCENE REVOLUTION




Intuit’s Live Community
FLOYD M ORGAN | INTUIT
TurboTax Live Community is a large-scale web application that uses user contribution and open
source technology to assist millions of TurboTax users complete their tax returns. Other benefits
from Live Community include reducing support calls, highly effective advertising campaigns,
usability engineering and new for this year conversion prediction analytics. I will present how
Solr/Lucene powers the many facets of TurboTax Live Community now in the future.

Highly Relevant Search Result Ranking for
Large Law Enforcement Information Sharing Systems
RONALD M AYER | FORENSIC LOGIC
Law enforcement data has many interesting complexities for search. Cross-agency searches are even
more challenging because each agency has its own shorthand. Many different types of similarity
between search clauses and documents should influence the ranking of results. For example, a
search clause mentioning a “tall suspect” might want to include results with “6 foot 4 suspect”.
Spatial clusters are important, as are temporal patterns. Different fields may be more or less
important depending on the type of crime—for example, a victim’s race may matter more than a
vehicle’s make in a sex crime but less in an auto theft. Also, documents may be related to each other
in various ways that may also affect their ideal search ranking.
Solr’s great flexibility in its analyzers, filters, synonyms, and boosting make it excellent tool for such
diverse requirements. We’ve contributed a patch to Solr (#SOLR-2058) that helped further improve
search result ranking for cases where a search for a suspect with a “red baseball cap, black leather
jacket” is compared against many documents mentioning red caps, black caps, etc. This presentation
will describe how we addressed some domain-specific challenges of our data.

Using Solr/Lucene/LWE for eCommerce
GRANT INGERSOLL | LUCID IM AGINATION
If your user can’t find it, they can’t buy it right? In this talk, Apache Lucene and Solr committer
Grant Ingersoll will discuss architecture, techniques and tips for successfully deploying search tools
like Lucene, Solr and LucidWorks Enterprise in eCommerce environments.




                                                  28
LUCENE REVOLUTION San Francisco 2011


Flexible Indexing in Lucene 4.0
UW E SCHINDLER | SD DATASOLUTIONS
Apache Lucene’s next major release, 4.0, will introduce lots of flexibility into indexing, but also
fundamental changes to the well-known APIs: It features a new and consistent, 4-dimensional
iteration API on top of a low-level, pluggable codec API giving applications full control over the
postings data. Terms are now arbitrary opaque bytes enabling users to store terms in any encoding,
not necessarily UTF-8, natively in the index (e.g. numeric fields). Currently under development is a
higher performance postings iteration API, enabling interesting codecs based on recent encoding
algorithms to work effectively. Several codecs have already been created, including the default
“standard” codec, which enables sizable RAM reduction for searchers, and a “pulsing” codec that
inlines postings data directly into the terms dictionary, which provides a solid performance boost for
primary key fields. A lot of new codecs are under development like “PFOR”, “FOR”, “AFOR”, or
“Simple64”. In this talk, Uwe presents an overview of all of these exciting changes, as well as several
concrete, real-world examples of how applications can tap into these new features.

Transforming the House Hunting Experience: How Solr is Helping
Trulia Reshape the Real Estate Industry
ALEXANDER KANARSKY | TRULIA
Trulia is a real estate search company that helps customers find homes for sale or to rent and
provides them with information to help them make better decisions in the process. It is also a hub
for real estate professionals to market their listings, view real estate data and promote their services.
The presentation describes how Solr helped Trulia to transform the traditional real estate experience
and make real estate data accessible and understandable to millions of users. It discusses approaches
we took to achieve this by using custom-built distributed index management, indexing integration
with Hadoop and geospatial search enhancements to Solr.




                                                  29
                                                             San Francisco 2011      LUCENE REVOLUTION


Extending Solr: Behind CareerBuilder’s
Cloud-like Knowledge Discovery Platform
TREY GRAINGER| CAREERBUILDER
For CareerBuilder, a 1% deviance in search relevancy can mean millions of missed job opportunities
for our users. When CareerBuilder moved to Solr from an expensive, proprietary search vendor, our
top priorities were maintaining the quality of our search results and drastically improving our agility.
This talk will describe how we addressed both needs. For search quality, we’ll cover some of our
internal studies and resulting methods for dealing with multi-lingual content across dozens of
languages, as well as customizing and experimenting with relevancy calculations. For platform agility,
we’ll discuss CareerBuilder’s cloud-like search API framework which seamlessly handles millions of
searches an hour, processes hundreds of millions of documents, and is powered by hundreds of
globally-distributed servers. Come hear the results of our studies and some best practices for quality
and performance. Learn how our framework has lead to staggering improvements in both
maintainability and technology innovation, allowing us to learn from our content, not just find it.

Handy Installation Tool “Anuenue” for Solr Cluster & Implemen-
tation of “Did you mean” Facility for Queries in Japanese
TAKAHIKO ITO| M IXI
mixi is one of the largest social networking services in Japan, providing various communication
services for over 14M monthly active users. The latest internal mixi project is to replace the in-house
search engine with Apache Solr. This session covers two topics
a simple packaging system for Solr that eases the installation process and daily operations, and
implementation of a “Did you mean” facility for Japanese queries using a log mining tool. These
tools have been released as OSS projects.

Implementing Click-through
Relevance Ranking in Solr and LucidWorks Enterprise
ANDRZEJ BIALECKI | LUCID IM AGINATION
This talk will present what are click-through events and how to process them with LucidWorks
Enterprise. This innovative technique puts powerful search and relevancy at your fingertips—at a
fraction of the time and effort required to program them yourself with native Apache Solr. Andrzej
will discuss and present how you can use LucidWorks Enterprise for:
       !   Click Scoring to automatically configure relevance for most popular results
       !   Simplified implementation of auto-complete and “did-you-mean” functionality
       !   Unsupervised feedback to automatically provide relevance improvement on every query



                                                 30
LUCENE REVOLUTION San Francisco 2011


Using Solr to find the Right Person for the Right Job
LAURA KANG | THELADDERS
In this talk, we’ll describe how TheLadders.com uses Lucene/Solr to instantly recommend
candidates to a recruiter when he/she posts a job on the recruiter site. Our matching algorithm
scores candidates from our job seeker site based on the criteria and description of jobs and job
seekers’ resume and profile data. This helps recruiters quickly identify candidates that are right for
the job and increases the chance of our job seekers getting hired.
The talk covers an overview of our Solr architecture and a description of our matching algorithm.
We’ll also a discuss criteria for evaluating the algorithm, including an overview of our testing
sessions and their format. Finally, we’ll also demo the feature so you can see how it works in
practice.

Using Solr For Enabling Highly Customized Sitewide Navigation
SHANTANU DEO | AT&T
The organization needed to enable a very customizable form of Global Navigation for the various
types of users (based on their profile and other factors). This would normally have involved complex
logic to figure out the appropriate set of links to show for a customer, and would have been a
maintenance nightmare. Instead we approached the problem as a search problem. Coupled with a
novel encoding scheme we were able to solution the problem simply by searching on the customers
profile groups and return a coherent global navigation using Solr to index the data. This has resulted
in a very simple to understand and maintain solution that will stand in good stead in the future. The
presentation is meant to be a description of using Solr to implement a real-world application.

Building Specialized Industry Applications
Using Solr, And Migration From FAST ESP
RAHUL AGARW ALLA | UCHIDA SPECTRUM INC.
Uchida Spectrum, Inc. is a leader in the Japan search market. USI provides SMART InSight, a search
application used by many Fortune 500 companies for specialized industry applications like R&D and
quality assurance for manufacturing, claims and customer management etc.
Originally SMART/InSight was based on Microsoft FAST. This talk will review how
SMART/InSight has migrated from FAST ESP to LucidWorks Enterprise, and how
SMART/InSight incorporates virtual data integration, enterprise search, and the ability for users to
have a unified way to navigate diverse data sources, analyze data more easily, and personalize results.
Several use cases will be profiled with demonstrations of real-world use cases.




                                                 31
                                                           San Francisco 2011      LUCENE REVOLUTION


The Seven Deadly Sins of Solr
JAY HILL | LUCID IM AGINATION
Sloth. Greed. Pride. Lust. Envy. Gluttony. Wrath. Getting started with Solr can present some pitfalls
and temptations, often turning into a trial and error process. (Confess - some or all of these may
have been part of your development project.) Based on a broad swath of experience across Solr
implementations running in some of the largest Fortune 500 companies as well as some of the
smallest start-ups, this talk will cover common mistakes made by newbies and even veteran
developers—and how to avoid them. You’ll learn how best to face the challenges that can occur
either when starting out with a new Solr implementation, or in keeping up with the latest
improvements and changes.

Advanced Search and Analytics in 20 Minutes
M ARK DAVIS | KITENGA
Kitenga’s ZettaVox and ZettaSearch products support Solr and Lucene ecosystems at both the
ingestion point and for the search user. In this talk, I will show how ZettaVox, our professional
content mining platform on Hadoop, can be used to index content and rich metadata into a
LucidWorks Enterprise installation. Being built on Hadoop, ZettaVox scales up by scaling out. I will
then create an end-user search and analytics experience using our ZettaSearch solution that leverages
the faceted metadata to enhance information discovery and analysis. All in about 20 minutes.

Building SaaS Solutions for Online Media Using Apache Solr
ALBERTO M IJARES | CANOO ENGINEERING AG
SaaS applications have the advantage of remote web deployment that can be instantaneously be used
by potentially any consumer in internet, or of the cost reduction that a Web-based deployment
provides. The speaker explains in this talk the architecture of an innovative SaaS solution built for
Axel Springer media group (Switzerland). This application can extracting remotely the content of
multiple online newspaper articles, analyze them and classify them, determining which articles are
the most similar to a given one, and integrating back into the article to provide the user with a
“related articles” feature. The core components of the analysis process are: language-specific tools
(used to filter the superfluous language terms) and semantic knowledge bases (like Wikipedia, used
to enrich the indexed information with new context specific terms, or to disambiguate the extracted
terms). In a more technical layer, the speaker will explain the criteria to select the emerging
enterprise search framework Apache Solr as platform and how it reduced drastically the
development effort required.




                                                32
LUCENE REVOLUTION San Francisco 2011


Solr Performance: Key Innovations
YONIK SEELEY | LUCID IM AGINATION
Recent developments in Solr/Lucene have made significant contributions to distributed search
processing, scalability, and throughput. In this talk, Yonik Seeley, creator of Solr, will survey key
performance strategies for building search applications with Solr, and review innovations included in
Solr 3.1, as well as forthcoming development work in Solr 4.0 and beyond.

Solr and Lucene at Etsy
GREGG DONOVAN | ETSY
Etsy is using Solr and Lucene to serve queries at a rate of more than 8 billion per year (and growing).
In this case study, we will describe how Etsy has integrated Solr/Lucene into our continuous
deployment        infrastructure    (see:    http://codeascraft.etsy.com/2010/05/20/quantum-of-
deployment/), allowing for Solr configuration, Java-based indexers, and query parsing logic to go
from passing tests to production code in minutes. We’ll also discuss how we’re leveraging Solr’s new
Geo-search to power both local item search and GeoIP-personalized location autosuggest.
We’ll also share how we’ve extended Solr, adding personalized faceting and filtering as well as multi-
currency sorting and filtering that accounts for real-time currency fluctuation (contributed in SOLR-
2202) Note that code will be open-sourced/contributed for both of these features]. We will share
our real-time monitoring techniques, including how we track Solr replication, query, and GC times
in Ganglia. Finally, we’ll discuss how we’ve used Hadoop-based user analytics to improve relevance
and power data-driven spelling corrections, autocomplete suggestions, and related searches.




                                                 33
                                                           San Francisco 2011      LUCENE REVOLUTION



Lucene @ Yelp
SUDARSHAN GAIKAIW ARI | YELP
This talk describes how the Yelp uses Lucene to provide search services. It includes
       !   Statistics of Yelp search usage
       !   Overview of Yelp search architecture: Yelp uses different services to provide searches
           for different types of data. Some are based on Lucene and some on Solr
       ! Deeper dive into business and review search. This is the most important search service at
           Yelp.
We will cover:
       !   Yelp’s implementation of a micro sharded architecture and differences with Katta.
       !   Yelp extensions to Lucene to implement features such as filters and performance
           comparison with solr/Bobo
       !   Yelp’s implementation of index replication.
       !   Various tricks used at Yelp to make the service faster.

Using Solr Cloud to Tame an Index Explosion
JON GIFFORD | LOGGLY
We have hundreds of customers, each of whom may have dozens of shards. To manage this
explosion of indexes, I’ll describe how we’re using Solr Cloud to manage every index - from
creation, through migration from box to box, and finally destruction. I’ll describe some of the
performance issues we had to deal with, especially with ZooKeeper.

Lots of Facets, Fast
ANNE VELING | BEYONDTREES
We created a web application for a well-known US newspaper, to create a maps-like zooming
application on top of the 60,000 newspapers since 1850 and using Solr over the 28,000,000 articles
to create an interactive heatmap over it. The out-of-the-box faceting solution was optimized using
domain knowledge by order-of-magnitude which allowed us to create a great visual way of exploring
trends in historical newspapers.




                                                34
LUCENE REVOLUTION San Francisco 2011



CPython Embedded in Solr - Search Solution
for Python Lovers With the Speed of Native Java
ROM AN CHYLA | CERN
SPIRES is the biggest bibliographic database for High Energy Physics, ArXiv is the biggest full text
repository for the full text papers in High Energy Physics, and INSPIRE is the biggest digital library
that merges the two. We must work with result sets bigger than 1 million for citation related queries
and our partners from Astrophysics with 6 million sets, however INSPIRE is written in Python. So
how do we move several million result sets between the two systems fast? How do we take
advantage of our special NLP processing pipeline written in Python? How do we join them? We do
not use Jython. We do not use pipes. We do not embed Solr inside INSPIRE. We embed INSPIRE
into Solr! The talk shows benefits and challenges of this surprisingly elegant solution.




                                                35
                                                           San Francisco 2011     LUCENE REVOLUTION




Rahul Agarwalla
HEAD OF INTERNATIONAL BUSINESS, UCHIDA SPECTRUM INC
                                                                              !!!"%6()'04,")+"86-
Rahul Agarwalla heads international business for Uchida Spectrum Inc, Japan. Previously he has
built and exited two content/technology ventures including Matrix Information, the pioneer of
digital content syndication in India. He has over 14 years of experience with various search
technologies like Verity, FAST ESP and Solr/Lucene.

Boris Aleksandrovsky
SEARCH ARCHITECT, YAM M ER
                                                                               -!!!"9$,,(0")+,-
Boris Aleksandrovsky works for Yammer, the Enterprise Social Network company, where they are
trying to bring benefits of social media to enterprises by creating discoverable knowledge bases. He
specializes in solving problems of search, machine learning and data analysis on large scale by
employing distributed and scalable software architectures. Boris has almost completed his PhD in
Computer Science and Neuroscience at University of California at Irvine.

Josh Berkus
CORE TEAM , POSTGRESQL
                                                                              !!!"62(56(0'%")+,-
Josh Berkus has been working as a database application consultant for 8 years. Josh primarily builds
applications for the legal and HR industries and does performance tuning. He was also head of Sun
Microsystem's PosgtreSQL support staff for 2 years and helped launch BI startup Greenplum.




                                               36
LUCENE REVOLUTION San Francisco 2011



Ed Bueche
DISTINGUISHED ENGINEER, EM C
                                                                                        !!!"#$%"%&$'
Ed Bueche is an EMC Distinguished Engineer and one of the Architects of the Documentum xPlore
search engine (part of EMC’s Information Intelligence Group). He has been with Documentum/EMC
for 12+ years and has more than 23 years of experience in performance/development in the industry,
including companies like AT&T Bell Labs and Sybase. At Documentum he worked to improve
performance & scalability for all previous Documentum full-text integrations (Verity and FAST). Ed has
been a regular speaker for over 11 years at the Documentum worldwide user conferences (both in
America, Europe) as well as at EMC World.

Andrzej Bialecki
TECHNICAL ADVISOR, LUCID IM AGINATION
                                                                        !!!"()%*+*$,-*.,/*&."%&$'
Andrzej Bialecki, Apache Lucene PMC Member, also serves as project lead for Nutch, and as committer
in the Lucene-java, Nutch and Hadoop projects. He has broad expertise across domains as diverse as
information retrieval, systems architecture, embedded systems, networking and business process/e-
commerce modeling. He’s also author of the popular Luke index inspection utility.

Roman Chyla
RESEARCH FELLOW , CERN
                                                                                   !!!"%#0."%1'
Roman Chyla is a research fellow at CERN, Switzerland. He works in the INSPIRE team to build
the biggest digital library for the High Energy Physics. He is a developer and also information
specialist, presented at four conferences, two of them international: Knihovny soucasnosti 2006,
CASLIN 2007, IKI 2009, CASLIN 2009.

Mark Davis
CTO, KITENGA, INC
                                                                                  !!!"2*/#.-,"%&$'
Mark Davis is Founder and CTO of Kitenga, Inc. Previously he served as Principal Engineer at
Xerox PARC spin-out InXight (acquired by Business Objects) and designed their enterprise product
suite, as well as at Microsoft as a Program Manager for enterprise search and SharePoint. Mark spent
nearly a decade as an academic researcher in the defense/intelligence community specializing in
cross-language search and computational linguistics. He has extensive speaking experience in
professional and academic forums.

                                                37
                                                           San Francisco 2011      LUCENE REVOLUTION


Shantanu Deo
TECHNICAL DIRECTOR, AT&T
                                                                                      !!!"$''")+,-
Shantanu Deo is a Technical Director in AT&T, in charge of their ecommerce CMS team. He is a
patent holder and has in the past presented and published his work at the INFORMs conference on
Optimization. His interests include web technologies, optimization and lately mobile web
communications. Shantanu holds a BS in Computer Engineering from the university of Poona, India
and MS degrees in the areas of Operations Research and Computer Science from the Louisiana State
University.

Esteban Donato
LEAD ARCHITECT, TRAVELOCITY
                                                                              !!!"'0$;(.+)&'9")+,-
Esteban Donato works as Lead Architect for Travelocity. He has worked as Java Developer,
Technical Leader and Architect for the last 10 years in different industries. Esteban has been
working with Solr and Lucene technology for the last 2 years implementing it in different projects.
Esteban has given conferences about Solr and Data Mining in Travelocity and Universities in
Buenos Aires, Argentina.

Gregg Donovan
TECHNICAL LEAD SEARCH, ETSY
                                                                                     !!!"('%9")+,-
Gregg Donovan is currently Technical Lead, Search at Etsy.com, the world’s most vibrant
handmade marketplace. He has worked extensively with Solr and Lucene at Etsy, and, previously, at
TheLadders.com. At Etsy, located in Brooklyn, NY, he leads the search engineering team as it
tackles the challenges presented by a growing international marketplace with a half-million different
sellers in 150 different countries selling tens of millions of items.

Stephen Dunn
HEAD OF TECHNOLOGY STRATEGY, GUARDIAN NEW S AND M EDIA UK
                                                                           !!!"'*(24$03&$1")+"4:-
Stephen Dunn is Head of Technology Strategy for Guardian News and Media in the UK. He joined
The Guardian in 1999 where he helps guide the technology strategy for it’s multiple award winning
network of web sites and services. His professional interests include open web technologies, digital
identity and security. Prior to joining the Guardian, Stephen completed his PhD at the Center for
Computational Neuroscience and Robotics at Sussex University, UK.

                                                38
LUCENE REVOLUTION San Francisco 2011


Sudarshan Gaikaiwari
SOFTW ARE ENGINEER, YELP INC
                                                                                   !!!"9(.6")+,-
Sudarshan Gaikaiwari is a software engineer working on Yelp’s search team. Prior to Yelp he
worked on various information retrieval technologies at Symantec’s Data Loss Prevention group.

Jon Gifford
CO-FOUNDER, LOGGLY
                                                                                 !!!".+22.9")+,-
Jon Gifford is the CTO and co-founder of Loggly, where he spends all day coercing Solr into
playing nice with the cloud, and with high-volume real-time data streams. An active user and
frequent hacker of Lucene since 2004, he’s happy to let Solr take care of some of the hard work for
a change. Prior to Loggly, he has spent more than a decade working on Search systems at Minimal
Loop, Scout Labs, Technorati and LookSmart. He is concerned that his near-complete web-
anonymity is under threat.

Otis Gospodnetic
FOUNDER, SEM ATEXT
                                                                              !!!"%(,$'(5'")+,-
Otis Gospodnetic is a coauthor of Lucene in Action (1st and 2nd edition). He has been involved with
Lucene since 2000 and Solr since 2006. He is also a member of Nutch, and Mahout development
teams, as well as Lucene Project Management Committee. Otis is an Apache Software Foundation
member and the founder of Sematext, a software development and consulting company focused on
Search & Analytics using open-source technologies like Lucene, Solr, Nutch, Hadoop, HBase,
Flume, and more.




                                               39
                                                           San Francisco 2011      LUCENE REVOLUTION



Trey Grainger
SEARCH TECHNOLOGY DEVELOPM ENT TEAM LEAD, CAREERBUILDER
                                                                           !!!")$0((0#4&.3(0")+,-
Trey Grainger leads the Search Technology Development group at CareerBuilder.com. He
introduced Solr to CareerBuilder and led the successful conversion away from the Microsoft FAST
ESP platform. He has been with CareerBuilder for 4 years, and his search experience includes
handling multi-lingual content across dozens of markets/languages, genetic algorithm and user
group based relevancy tuning, geo-spatial search and validation, and work on customized payload
scoring models, data mining, clustering, and recommendations. He is responsible for architecting
CareerBuilder’s cloud-like search API exposing search as a simple, dynamic, and powerful generic
service abstracted away from a large, globally-distributed architecture. Trey is also the founder and
Chief Architect of Celiaccess.com, a gluten-free search engine and networking site.

Eric Gries
PRESIDENT AND CEO, LUCID IM AGINATION
                                                                        !!!".4)&3&,$2&1$'&+1")+,-
Eric Gries joined Lucid Imagination as the President and CEO, after spending more than 20 years in
executive leadership roles, where he built high-growth technology-based businesses. Prior to joining
the company, Eric was an Executive-in-Residence at Granite Ventures. Eric has served as CEO,
general manager and vice president for companies in application development, systems
management, networking, financial services and hardware systems, in both the U.S. and Europe.
Prior to joining Granite Ventures, Eric led XACCT, a pioneering network mediation market leader,
as its president and CEO. XACCT was acquired by Amdocs in 2004, at which time Eric joined
Amdocs’ executive team as Senior Vice President. Earlier in his career, Eric served as general
manager of Compuware’s Network and Systems Management division, and held product
management, marketing, sales and engineering positions at companies such as ACI, Cullinet
Software and DEC.

Erik Hatcher
TECHNICAL STAFF, LUCID IM AGINATION
                                                                        !!!".4)&3&,$2&1$'&+1")+,-
Erik Hatcher is the co-author of two books, Lucene in Action co-author of Java Development with Ant.
Erik has been an active member of the Lucene community - a leading Lucene and Solr committer,
member of the Lucene Project Management Committee, member of the Apache Software
Foundation as well as a frequent invited speaker at various industry events. Erik earned his B.S. in
Computer Science from University of Virginia, Charlottesville, VA.

                                                40
LUCENE REVOLUTION San Francisco 2011


Jay Hill
SENIOR SEARCH ARCHITECT, LUCID IM AGINATION
                                                                        !!!".4)&3&,$2&1$'&+1")+,-
Jay Hill has been building enterprise search applications since 2003, and has worked extensively with
Autonomy IDOL, Lucene, and Solr. He is a certified Solr trainer, and is lead author for Lucid
Imagination’s Solr training courses.

Grant Ingersoll
CO-FOUNDER, LUCID IM AGINATION
                                                                        !!!".4)&3&,$2&1$'&+1")+,-
Grant Ingersoll is a founder and member of the technical staff at Lucid Imagination. Grant’s
programming interests include information retrieval, machine learning, text categorization, and
extraction. Grant is a regularly featured speaker at ApacheCon and other industry events. He has
been an active member of the Lucene community – a Lucene and Solr committer, co-founder of the
Apache Mahout machine learning project, chairman of the Lucene Project Management Committee
(PMC) as well as a Vice President at the Apache Software Foundation. He is also the co-author of
Taming Text (Manning, forthcoming) covering open source tools for natural-language processing.
Grant’s prior experience includes work at the Center for Natural Language Processing at Syracuse
University in natural language processing and information retrieval. Grant earned his B.S. from
Amherst College in Math and Computer Science and his M.S. in Computer Science from Syracuse
University, NY.

Takahiko Ito
SOFTW ARE ENGINEER, MIXI, INC
                                                                                       !!!",&5&"86-
Takahiko Ito received his Ph.D. in Engineering at Nara Institute of Science and Technology,
specializing in graph mining. He was a specialist for Japanese and Asian language processing at Fast
Search and Transfer prior to joining mixi, Inc as an R&D engineer. Selected Papers include:
       !   Masashi Shimbo, Takahiko Ito, Daichi Mochihashi, Yuji Matsumoto. On the Properties
           of von Neumann Kernels for Link Analysis. Machine Learning, 75:37-67, 2009.
       !   Takahiko Ito, Massashi Shimbo, Taku Kudo, Yuji Matsumoto. Application of Kernels to
           Link Analysis, The Eleventh ACM SIGKDD International Conference on Knowledge
           Discovery and Data Mining. 2005.




                                                41
                                                           San Francisco 2011      LUCENE REVOLUTION




Alexander Kanarsky
SENIOR SOFTW ARE ENGINEER, TRULIA
                                                                                    !!!"'04.&$")+,-
Alexander Kanarsky is responsible for managing day-to-day operations of Trulia’s indexing and
search infrastructure and oversees the search related development there. Prior to Trulia he was a
member of core development team for Autonomy’s Digital Safe, world’s largest private archive of
electronic documents.

Laura Kang
TECHNICAL LEAD, SEARCH AND M ATCHING, THELADDERS
                                                                              !!!"'*(.$33(0%")+,-
Laura Kang holds a B.A. in computer science, mathematics, and economics from University of
California at Berkeley, and M.S. and Ph.D. in computational mechanism design from Harvard
University. She has presented her work at several conferences, including the International
Conference for Electronic Commerce and the ACM Conference on Electronic Commerce. Before
joining TheLadders, she was a manager at a NYC technology startup. At TheLadders, she focuses
on search and matching algorithms.

Sudhakara Karegowdra
PRINCIPLE ARCHITECT, TRAVELOCITY
                                                                              !!!"'0$;(.+)&'9")+,-
Sudhakara Karegowdra works as Principle Architect for Travelocity. He has worked as Java
Developer, Technical Leader and Architect for the last 14 years in different industries and 10 out of
those in Travel industry. Sudhakar has been working with Solr and Lucene technology for the last 3
years implementing it in different projects. Sudhakara has given conferences about Solr in
Travelocity.




                                                42
LUCENE REVOLUTION San Francisco 2011



Steve Kearns
ROSETTE PRODUCT M ANAGER
                                                                               !!!"#$%&%'()*")+,-
Steve is the product manager for the Rosette Platform and is also the subject matter expert for the
international compliance market within Basis Technology. Prior to Basis Technology, Steve worked
at BBN Technologies where he worked on the Broadcast and Web Monitoring Systems, which
capture and extract open-source intelligence from live television and internet news websites. He has
experience in information visualization, distributed systems architecture and received his MS in
Information Technology and BS in Computer Information Systems from Bentley University. He
also spoke at the Apache Lucene EuroCon 2010 in Prague, on the topic of Building Multilingual
Search Based Applications.

Marc Krellenstein
FOUNDER, LUCID IM AGINATION
                                                                       !!!".4)&3&,$2&1$'&+1")+,-
Marc Krellenstein is the founder of Lucid Imagination. Marc has 30 years’ experience in the
computer industry, focusing for the last 20 years on information retrieval technology and
applications. Marc was previously Chief Technology Officer and Vice President for Search and
Discovery Technology at Elsevier, the scientific, technical and medical publishing division of Reed-
Elsevier. Prior to Elsevier Marc was Chief Technology Officer and Senior Vice President of
Engineering at Northern Light Technology, where he was the founding technologist and led the
design and development of the Northern Light search service, including designing the data model,
query interpretation, relevancy ranking, automatic document classification and patented technology
for document clustering. Marc has an A.B. in philosophy from Cornell
he earned his M.S. in computer science from the University of Wisconsin at Madison and a Ph.D. in
psychology (cognitive science) from the New School for Social Research, NY.

Ronald Mayer
CTO, FORENSIC LOGIC, INC.
                                                                           !!!"/+0(1%&).+2&)")+,-
Ronald Mayer has spent his career with technology start-ups in a number of fields ranging from
medical devices to digital video to law enforcement software. Ron has also been involved in Open
Source for decades, with code that has been incorporated in the LAME MP3 library, the
PostgreSQL database, and the PostGIS geospatial extension. His most recent speaking engagement
was when he gave a presentation on a broader aspect of this system to the SD Forum’s Emerging
Tech SIG titled “Fighting Crime: Information Chokepoints & New Software Solutions”

                                               43
                                                            San Francisco 2011      LUCENE REVOLUTION


Alberto Mijares
CANOO ENGINEERING AG
                                                                                    !!!")$1++")+,-
Alberto Mijares is a software engineer with more than 10 years of experience. He is Scrum Master
and an agile practitioner. He has a large background in Web technologies and Java, having
participated in the past in W3C activities related with Semantic Web. His usual role is either leading
projects or designing architectures for web applications. He started working in Canoo Engineering
AG (Switzerland) in 2008 and speaks Spanish, English and German. He has a degree in Computer
Engineering. He has participated giving talks in Java and Web related conferences and user groups in
Switzerland and Spain.

Floyd Morgan
INTUIT
                                                                                     !!!"&1'4&'")+,-
Floyd is a Principal Software Engineer who works in the Central Technology Organization at Intuit,
makers of TurboTax, Quickbooks, Quicken and Intuit Payroll, to name a few. Floyd has developed
core features of the flagship TurboTax product line and recently co-founded Intuit’s newest social
driven technology Live Community. Under Floyd’s direction, Live Community has gone from a
small project to a widely adopted platform used by most Intuit products and services. Floyd earned
his B.S. from San Diego State University in Computer Science.

Stephen O’Grady
CO-FOUNDER AND PRINCIPAL ANALYST, REDM ONK
                                                                                 !!!"0(3,+1:")+,-
Stephen O’Grady is the co-founder and Principal Analyst of RedMonk, a boutique industry analyst
firm focused on developers. Founded in 2002, RedMonk provides strategic advisory services to
some of the most successful technology firms in the world. Stephen’s focus is on infrastructure
software such as programming languages, operating systems and databases, with a special focus on
open source and big data. Before setting up RedMonk, Stephen worked as an analyst at Illuminata.
Prior to joining Illuminata, Stephen served in various senior capacities with large systems integration
firms like Keane and consultancies like Blue Hammock. Regularly cited in publications such as the
New York Times, NPR, the Boston Globe, and the Wall Street Journal, and a popular speaker and
moderator on the conference circuit, Stephen’s advice and opinion is well respected throughout the
industry.




                                                 44
LUCENE REVOLUTION San Francisco 2011



Timothy Potter
SENIOR ENGINEER, NATIONAL RENEW ABLE ENERGY LABORATORY (NREL)
                                                                                      !!!"10(."2+;-
Timothy is a highly skilled technologist with over 13 years experience delivering innovative software
solutions that encompass a wide range of technologies and business sectors. Currently, Mr. Potter is
a Senior Engineer at the National Renewable Energy Laboratory (NREL) where he leads the effort
to build a large-scale distributed platform for handling smart grid related energy data using Hadoop
and NoSQL technologies. Prior to NREL, Timtohy was the CTO for Viyya Technologies where he
developed a large-scale content recommendation system based on Solr, Mahout, and Hadoop
running in the Amazon Cloud. As a Senior Software Engineer for the WebLogic Platform at BEA
Systems, he was the chief inventor of several US Patents that helped revolutionize J2EE-based
enterprise application integration. His technical blog (http://thelabdude.blogspot.com/) is highly
respected as a guide for other developers in the open-source Java community. Mr. Potter has a BS in
Mathematics and BA in Economics with honors (summa cum laude) from the University of
Colorado.

Daniel Potzinger
AOE M EDIA GM BH
                                                                                 !!!"$+(,(3&$"3(-
Daniel Potzinger has more than 10 years of web development experience under his belt. He is a
skillful hand at developing clean solutions with a particular love of elegant, easily maintained and
reusable coding. Daniel is always open to new projects and development methods, such as Agile
Software development.
Over the last few years since joining AOE media, Daniel has played “midwife” to more than 60
Enterprise CMS-Projects for such renowned clients as congstar, Cisco WebEx and VMware,
Panasonic and the like: taking care of client requirements, directing the development and launching
the results.




                                                45
                                                           San Francisco 2011     LUCENE REVOLUTION


Craig Rees
SENSIS
                                                                                     !"#$%$&'()&*+,
Craig Rees has been at Sensis since 2008. Craig heads up the content and search groups which
manage the search capabilities, platforms and operational teams that support the Yellow Pages® and
White Pages® businesses. Craig is the author of the Sensis Content Strategy and the technology
owner of the Sensis Business Search API. Prior to joining Sensis, Craig worked in digital strategy
development and implementation roles in the United Kingdom with companies including BBC, Sky
and Argos.

Ramon Resma
ARCHITECT, TRAVELOCITY
                                                                             ---&./*0"1('%.2&'(),
Ramon Resma works as an Architect for Travelocity Mobile. He has over 22 years of experience in
the travel industry and has worked on technical leadership roles for Travelocity Architecture, Sabre
Airline Solutions Architecture, and American Airlines. Ramon has been working with Solr and
Lucene technology for the last 2 years. Recently he worked on implementing Solr functions for
serving location-based content on travel mobile applications.

Yonik Seeley
CREATOR OF APACHE SO LR & CO-FOUNDER LUCID IM AGINATION
                                                                       ---&1+'%3%)*4%#*.%(#&'(),
Yonik Seeley is the creator of Solr. He is an expert in distributed search systems architecture and
performance. Yonik has been a prolific Lucene/Solr committer, a member of the Lucene PMC, and
a member of the Apache Software Foundation. Yonik’s work experience includes CNET Networks,
BEA and Telcordia. He earned his M.S. in Computer Science from Stanford University.




                                               46
LUCENE REVOLUTION San Francisco 2011



Uwe Schindler
M ANAGING DIRECTOR, SD DATASOLUTIONS GM BH
                                                                                !!!"6$12$($"3(-
Uwe is committer and PMC member of Apache Lucene and Solr. His main focus is on development
of Lucene Java. He implemented fast numerical search and is maintaining the new attribute-based
text analysis API. He studied Physics at the University of Erlangen-Nuremberg and works as
managing director for SD DataSolutions GmbH in Bremen, Germany, a company that provides
consulting and support for Apache Lucene and Solr. A primary customer of his company is
“PANGAEA – Publishing Network for Geoscientific & Environmental Data” where he
implemented the portal’s geo-spatial retrieval functions with Lucene Java. Uwe had talks about
Lucene at various international conferences like the previous Lucene Revolution, ApacheCon
EU/US, Lucene Eurocon, Berlin Buzzwords and various local meetups.

Tyler Tate
HEAD OF USER EXPERIENCE, TW IGKIT
                                                                                !!!"'!&2:&'")+,-
Tyler Tate leads user experience at TwigKit where he has helped governments, not-for-profits, and
blue-chip corporations build superb search experiences. Tyler also organises the Enterprise Search
London meetup and has written for a number of publications including UX Magazine, Johnny
Holland, Smashing Magazine, and UX Booth. Tyler lives in London with his wife Ruth and son
Galileo, and you can keep up with him on Twitter.

Joshua Tuberville
SEARCH ARCHITECT
                                                                             !!!"(=$0,+19")+,-
Joshua Tuberville is a Software Architect with eHarmony.com. With over 15 years of Internet
technology experience, he specializes in high-scale online architectures. He has been with eHarmony
for the past 9 years and previously worked with Sony, Disney, as well as several startups. He
regularly speaks at user groups and conferences. His recent focus has leading the architecture of
jazzed.com, a new dating site, which uses Solr to allow people to find highly relevant profiles.




                                               47
                                                           San Francisco 2011     LUCENE REVOLUTION



Anne Veling
SEARCH ARCHITECT, BEYONDTREES
                                                                           !!!"#(9+13'0((%")+,-
After a M.Sc. in Computer Science/Artificial Intelligence, Anne worked for several years in the
search engine industry, designing highly scalable knowledge extraction, clustering and visualization
modules for search applications. Currently self-employed, helping out global companies create web
applications that involve search. Anne is also busy doing performance troubleshooting, and gives
Lucene and Solr workshops

Dawid Weiss
ASSOCIATE PROFESSOR, INSTITUTE OF COM PUTING SCIENCE
POZNAN UNIVERSITY OF TECHNOLOGY, POLAND
                                                                           !!!")$00+'%($0)*")+,-
David Weiss shares academic and industrial background: he is an associate professor at the Institute
of Computing Science of Poznan University of Technology in Poland (PhD in Information
Retrieval) and co-owns Carrot Search, a company that provides commercial services revolving
around text processing, text mining and text clustering. In his spare time Dawid contributes to
several open source projects, including Carrot2.org, reads books and passionately plays basketball
with a bunch of his old friends. He lives in Poznan, Poland with his wife and two children.

Simon Willnauer
SOLR / LUCENE COM M ITTER, APACHE LUCENE PM C
                                                                                  !!!"$6$)*("+02-
Simon is a Lucene core committer and PMC member. During the last couple of years he worked on
design and implementation of scalable software systems and search infrastructure. He studied
Computer Science at the University of Applied Sciene Berlin. Currently, he work as a consultant for
Apache Solr, Lucene Java and Hadoop and is a co-organizer of the “BerlinBuzzwords” conference
on Scalability June 2011 in Berlin (Germany).




                                               48
LUCENE REVOLUTION San Francisco 2011



Olaf Zschiedrich
HEAD OF TECHNOLOGY EBAY KLEINANZEIGEN
                                                                         :.(&1$17(&2(1"(#$9"3(-
Olaf leads development for eBay Kleinanzeigen, Germany’s number one classifieds ad site. Before
that he was part of the core architecture team at the mobile.international GmbH. He also worked
for Siemens TS where he was involved in building the Customer Information System for the MTA
New York City Transit subway system. He has a passion for high-traffic web applications, search
technologies, agile development methods and is a believer in open source.




                                             49
                                                       San Francisco 2011     LUCENE REVOLUTION




Hotel Information

ADDRESS
     Hyatt Regency San Francisco Airport
     1333 Bayshore Highway,
     Burlingame, California, USA 94010
     Tel: +1 650 347 1234 Fax: +1 650 696 2669
                                                    !""#$%%&&&'()*+,)*-.(-/).,#/,"!0)""'-/12

DIRECTIONS
FROM SAN FRANCISCO INTERNATIONAL AIRPORT (2 M ILES):
     Take 101 South toward San Jose. Exit Millbrae Ave. Turn left on Millbrae Ave. Turn right at
     the second stoplight onto Bayshore Hwy. Proceed through 4 stoplights. Our Burlingame
     California hotel is on the right hand side.
FROM OAKLAND AIRPORT (APPROXIM ATELY 30 M ILES) AND POINTS EAST:
     Take I-880 South toward San Jose. Merge onto CA-92 W toward San Mateo Br. Merge onto
     US-101 N toward San Francisco to the Broadway Exit. Take the Airport Blvd ramp toward
     Bayshore Blvd, then turn left onto Bayshore Hwy to our Burlingame lodging.
FROM SAN JOSE AIRPORT (APPROXIM ATELY 30 M ILES) AND POINTS SO UTH:
     Take 101 North to the Broadway Exit. Take the Airport Blvd ramp toward Bayshore Blvd,
     then turn left onto Bayshore Hwy to the hotel.




                                            50
LUCENE REVOLUTION San Francisco 2011



HOTEL MAPS
M EETING ROOM S




                    Hyatt Regency San Francisco Airport
                    DIRECTIONS
                    From San Francisco Int’l Airport (2 miles): Take 101 South. Exit Millbrae Ave. East.
                    Turn right at stoplight onto Bayshore Hwy. Proceed through 4 stoplights. Hotel is
                    on right.




                                                   51
                                 San Francisco 2011   LUCENE REVOLUTION

M AP OF HOTEL AND AIRPORT




                                                         Hyatt Regency
                                                         DIRECTIONS
                                                         From San Francisco Int’l Airpor
                                                         Turn right at stoplight onto Bay
                                                         on right.




                            52
LUCENE REVOLUTION San Francisco 2011

PUBLIC TRANSPORTATION (BART):




                                       53
                                San Francisco 2011   LUCENE REVOLUTION

SAN FRANCISCO DOW NTOW N




                           54
Cloud-scale enterprise
  search begins here

Salesforce.com is the enterprise cloud computing leader and the worldís 4th fastest-growing company.
Our Search Team is experienced, with deep architecture expertise. Weíre dedicated to delivering the
fastest, most reliable cloud-scale enterprise search. If you share our passion, come introduce yourself.




                                        www.salesforce.com
                   !"#$%&''()&*$+'(,-+.#/(




                                                      )&*$+'(,-+.#/(I-J(<-+4$.-*(
www.documill.com




                                 !"#$%&''0(1-23&&2+34&-(560(785950(:*;""0(<=>?@>!
                                 4-'A(BC9D(97(67D(5DCE0(F+G(BC9D(E(895H(8878
     San Francisco 2011   LUCENE REVOLUTION




58

				
DOCUMENT INFO
Stats:
views:50
posted:5/19/2011
language:English
pages:59
Description: Welcome to San Francisco! We are excited to be bringing you the second Lucene Revolution event, following quickly on the success of our 2010 conference in Boston last year. In addition to all the great feedback we received after Boston, many people asked about bringing the conference to the West Coast – and here we are. It’s great to host the community here in our home state of California. There’s now no question: the revolution is in full swing, and Lucene and Solr are shaping the future of search. The diverse range of search technology and applications is without a doubt one of its greatest strengths. For the extended community and ecosystem of open source search, Lucene Revolution is an unmatched opportunity to learn, network, share experiences, see how others have changed the world of search.