Search-Comparison_Solr-FAST

Document Sample
Search-Comparison_Solr-FAST Powered By Docstoc
					Lightcrest	
  Enterprise	
  Search	
  Practice	
  

Solr/Lucene	
  	
  vs.	
  FAST	
  ESP	
  
  Introduction	
  
  Business	
  Goals	
  
  Technical	
  Goals	
  	
  
  Looking	
  Glass:	
  FAST	
  ESP	
  
  Looking	
  Glass:	
  Solr/Lucene	
  
  Technical	
  Juxtaposition	
  
  Business	
  Juxtaposition	
  
  Early	
  POC/Migration	
  Path	
  
The	
  future	
  of	
  enterprise	
  search	
  	
  must	
  be	
  driven	
  by	
  data	
  and	
  business	
  
opportunities,	
  not	
  by	
  vendor	
  precedence	
  or	
  legacy	
  infrastructure.	
  Search	
  
obviously	
  provides	
  value	
  to	
  the	
  customer	
  –	
  but	
  it	
  also	
  provides	
  value	
  within	
  the	
  
enterprise,	
  behind	
  the	
  corporate	
  firewall.	
  

While	
  the	
  search	
  architecture	
  as	
  a	
  whole	
  might	
  not	
  be	
  logically	
  consolidated,	
  
finding	
  a	
  common	
  platform	
  to	
  unify	
  the	
  search	
  needs	
  of	
  the	
  enterprise	
  into	
  a	
  
coherent,	
  comprehensive	
  solution	
  without	
  vendor	
  lock-­‐in	
  or	
  heavy	
  licensing	
  
costs	
  is	
  critical	
  to	
  maintaining	
  growth,	
  lowering	
  total	
  cost	
  of	
  ownership	
  and	
  
driving	
  future	
  innovation.	
  

Enterprise	
  data	
  is	
  at	
  the	
  core	
  of	
  the	
  search	
  value	
  proposition;	
  search	
  is	
  the	
  
fulcrum	
  that	
  provides	
  the	
  leverage.	
  It’s	
  imperative	
  that	
  stakeholders	
  explore	
  
opportunities	
  to	
  compare,	
  analyze,	
  and	
  prove	
  a	
  long-­‐term	
  search	
  solution	
  that	
  
will	
  continue	
  to	
  provide	
  the	
  enterprise	
  with	
  a	
  competitive	
  advantage.	
  	
  	
  
  Identify	
  features	
  that	
  will	
  continue	
  to	
  provide	
  customers	
  with	
  the	
  best	
  
   possible	
  search	
  experience	
  and	
  capabilities.	
  
  Identify	
  features	
  that	
  will	
  be	
  capable	
  of	
  providing	
  leverage	
  internally	
  with	
  
   respect	
  to	
  human	
  and	
  system	
  resources	
  (behind	
  corporate	
  firewall).	
  
  Identify	
  opportunities	
  for	
  cost	
  savings	
  with	
  respect	
  to	
  the	
  platform	
  as	
  a	
  
   whole	
  (licensing,	
  maintenance,	
  customization)	
  
  Business	
  and	
  Technology	
  teams	
  are	
  to	
  align	
  on	
  roadmap	
  for	
  the	
  unified	
  
   search	
  strategy.	
  
    Gather	
  a	
  list	
  of	
  requirements	
  from	
  key	
  stakeholders	
  (Technical	
  Owners,	
  Business	
  Owners)	
  

    Identify	
  platform	
  candidates	
  (	
  Solr/LucidWorks,	
  FAST,	
  Autonomy,	
  Vivisimo)	
  

    Iterate	
  over	
  platforms	
  and	
  identify	
  overlap	
  between	
  features	
  and	
  requirements	
  (i.e.	
  
     internationalization,	
  CJK	
  languages,	
  custom	
  tokenizers	
  &	
  filters,	
  advanced	
  linguistics,	
  etc.)	
  

    Iterate	
  over	
  platforms	
  and	
  determine	
  costs	
  with	
  respect	
  to	
  growth	
  forecasts	
  (5	
  years)	
  including	
  
     hardware	
  foot	
  print,	
  licensing,	
  ancillary	
  management	
  

    Iterate	
  over	
  platforms	
  and	
  determine	
  vendor-­‐specific	
  roadmaps	
  to	
  pre-­‐empt	
  any	
  vendor-­‐induced	
  
     environmental	
  dependencies	
  (i.e.	
  FAST	
  is	
  moving	
  to	
  a	
  Sharepoint/Windows	
  centric	
  model	
  with	
  
     FSIS).	
  	
  Other	
  proprietary	
  search	
  platforms	
  may	
  face	
  similar	
  outlook	
  in	
  the	
  future.	
  
FAST	
  is	
  the	
  leading	
  proprietary	
  platform	
  in	
  the	
  enterprise	
  search	
  space	
  
and	
  competes	
  aggressively	
  with	
  firms	
  like	
  Autonomy,	
  Vivisimo,	
  and	
  Funnelback	
  
for	
  market	
  share.	
  Born	
  of	
  a	
  Norwegian	
  startup,	
  FAST	
  quickly	
  became	
  the	
  de	
  facto	
  
“Oracle”	
  equivalent	
  of	
  search	
  platforms,	
  being	
  both	
  a	
  robust	
  and	
  modular	
  UNIX-­‐
based	
  search	
  system	
  that	
  leverages	
  both	
  open	
  source	
  and	
  proprietary	
  code	
  in	
  its	
  
document	
  processing	
  stack.	
  

Since	
  Microsoft’s	
  acquisition	
  of	
  FAST	
  in	
  2008,	
  a	
  shift	
  in	
  the	
  FAST	
  strategy	
  has	
  
occurred.	
  Microsoft’s	
  vision	
  for	
  FAST	
  is	
  primarily	
  as	
  a	
  Sharepoint	
  complement.	
  
While	
  FAST	
  maintains	
  excellent	
  performance	
  for	
  applications	
  like	
  web-­‐facing	
  
search	
  platforms,	
  Microsoft	
  sees	
  a	
  greater	
  sales	
  opportunity	
  in	
  up-­‐selling	
  existing	
  
Sharepoint	
  customers	
  with	
  a	
  “portal	
  search”	
  behind	
  the	
  firewall.	
  

Microsoft	
  has	
  planned	
  to	
  deprecate	
  the	
  UNIX	
  variant	
  of	
  ESP	
  and	
  has	
  announced	
  it	
  
will	
  end	
  support	
  in	
  coming	
  years.	
  Customers	
  on	
  UNIX	
  will	
  have	
  to	
  migrate	
  to	
  
FAST	
  FSIS	
  eventually	
  or	
  move	
  to	
  another	
  search	
  platform.	
  	
  
FAST	
  ESP	
  features:	
  
 Stackable	
  document	
  processing	
  pipeline	
  for	
  custom	
  linguistics,	
  tokenization,	
  and	
  data	
  enrichment.	
  	
  

 Requires	
  Python.	
  

 Support	
  for	
  a	
  multitude	
  of	
  mainstream	
  document	
  formats	
  including	
  PDF,	
  MS	
  Office,	
  HTML,	
  XML,	
  etc.	
  

 Multitude	
  of	
  enterprise-­‐grade	
  connectors,	
  including	
  Oracle,	
  ODBC,	
  MySQL,	
  Sharepoint,	
  Documentum	
  

 Multitude	
  of	
  feeding	
  mechanisms,	
  including	
  filetraverser,	
  Content	
  API,	
  and	
  web	
  crawler.	
  

 Support	
  for	
  all	
  major	
  western	
  languages	
  and	
  Asian	
  languages	
  (CJK+)	
  

 Navigators	
  and	
  Taxonomy	
  tools	
  for	
  building	
  taxonomy	
  trees.	
  

 Dynamic	
  rank	
  and	
  boost	
  (both	
  query	
  side	
  and	
  index	
  side)	
  

 Geo-­‐search	
  capabilities	
  including	
  distance,	
  polygon	
  bounds,	
  etc.	
  

 Spell-­‐check/Did	
  You	
  mean	
  (Levenstein	
  distance)	
  and	
  other	
  query	
  transformation	
  tools	
  

 Search	
  Federation	
  (Unity)	
  

 Entity	
  Extraction	
  (Dictionary	
  and	
  Rule-­‐Based),	
  Location	
  Extraction	
  (Dictionary	
  and	
  Rule-­‐Based)	
  

 Customizable	
  Query	
  pipeline	
  and	
  response	
  templates	
  	
  
FAST	
  ESP	
  architecture:	
  
    Tiered	
  architecture	
  in	
  the	
  form	
  of	
  rows	
  and	
  columns	
  
    Rows	
  represent	
  entire	
  data	
  sets	
  (think	
  of	
  a	
  RAID	
  stripe)	
  
    Column	
  members	
  represent	
  duplicate	
  data	
  
    Rows	
  grow	
  horizontally	
  as	
  	
  data	
  corpus	
  expands	
  over	
  time	
  
    Columns	
  grow	
  vertically	
  as	
  traffic	
  (load)	
  increases	
  
    Architecture	
  is	
  flexible,	
  but	
  limited	
  by	
  purchased	
  licensing	
  for	
  peak	
  QPS	
  and	
  index	
  size	
  
    Software	
  based	
  on	
  modular	
  CORBA	
  architecture,	
  allowing	
  for	
  granular	
  per-­‐process	
  control	
  
    Custom	
  document	
  routing	
  and	
  index	
  partitioning	
  schemes	
  
    Custom	
  query	
  routing	
  and	
  index	
  querying	
  schemes	
  
    Failover	
  between	
  rows	
  allows	
  for	
  search	
  and	
  index	
  redundancy	
  
FAST	
  ESP	
  Administration:	
  
    Mature	
  monitoring	
  tools	
  for	
  measuring	
  QPS,	
  DPS,	
  and	
  indexer	
  performance	
  
    Mature	
  benchmarking	
  software	
  (Clarity,	
  sbench,	
  http	
  tools)	
  
    Semi-­‐extensible	
  GUI	
  interface	
  for	
  document	
  processing	
  pipeline	
  	
  
    FAST	
  Impulse	
  for	
  e-­‐commerce	
  management	
  (product	
  	
  promotions,	
  
     campaigns,	
  synonym	
  management)	
  
    FAST	
  SBC	
  for	
  managing	
  view-­‐>collection	
  relationships,	
  SFE	
  testing,	
  and	
  
     synonym	
  management	
  
    Somewhat	
  cryptic	
  but	
  accessible	
  interfaces	
  for	
  clearing	
  fsearch	
  caches,	
  query	
  
     caches,	
  and	
  monitoring	
  memory	
  consumption	
  for	
  navigators	
  	
  
FAST	
  ESP	
  cost	
  structure:	
  
    Recurring	
  fees	
  for	
  service,	
  support	
  and	
  custom	
  development	
  
    Graduated	
  licensing	
  costs	
  largely	
  based	
  on	
  QPS,	
  index	
  size,	
  and	
  linguistics	
  capabilities	
  
    FAST	
  is	
  licensed	
  per	
  cluster.	
  Whatever	
  you	
  pay	
  for	
  in	
  QPS/Data,	
  multiply	
  by	
  N	
  operating	
  
     clusters	
  
    FAST	
  ESP	
  is	
  comparatively	
  expensive	
  to	
  other	
  solutions	
  albeit	
  high-­‐performing	
  
    FAST	
  add-­‐ons	
  such	
  as	
  Unity,	
  Unity	
  Dev,	
  custom	
  tokenizers,	
  Impulse,	
  and	
  other	
  products	
  are	
  
     separate	
  and	
  additional	
  costs	
  
    Many	
  FAST	
  add-­‐ons	
  are	
  based	
  on	
  OEM	
  deals	
  and	
  while	
  well	
  integrated,	
  force	
  the	
  customer	
  
     to	
  pay	
  a	
  mark-­‐up	
  on	
  these	
  products.	
  (i.e.	
  BASIS	
  Tech)	
  
    FAST	
  professional	
  services	
  are	
  comparatively	
  expensive	
  to	
  third-­‐party	
  search	
  
     consultancies	
  and	
  managed	
  search	
  providers	
  
    FAST	
  plans	
  on	
  moving	
  to	
  a	
  per-­‐node	
  licensing	
  model,	
  but	
  to	
  what	
  extent	
  this	
  will	
  apply	
  
     retroactively	
  to	
  customers	
  on	
  legacy	
  platform	
  is	
  uncertain	
  
With	
  respect	
  to	
  the	
  future,	
  there	
  are	
  strengths	
  FAST	
  has	
  to	
  offer:	
  

    Mature	
  product,	
  best-­‐of-­‐breed	
  enterprise	
  search	
  features	
  

    Very	
  strong	
  internationalization	
  and	
  linguistics	
  capabilities	
  

    Very	
  strong	
  document	
  processing	
  pipeline	
  

    Scale	
  and	
  Performance	
  that	
  is	
  leading-­‐edge	
  	
  

    Mature	
  administrative	
  and	
  testing	
  tools	
  

                   Deep	
  bench	
  of	
  search	
  expertise	
  at	
  Microsoft	
  Enterprise	
  Search	
  	
  	
  	
  
	
  	
  	
  	
  	
  group	
  (FAST	
  subsidiary)	
  	
  
With	
  respect	
  to	
  the	
  future,	
  there	
  are	
  certain	
  risks	
  associated	
  with	
  FAST:	
  
                              Traffic	
  volume	
  from	
  customers	
  will	
  only	
  increase,	
  ensuring	
  more	
  costs	
  associated	
  with	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  clusters	
  and	
  licensing.	
  
      Per	
  cluster	
  licensing	
  model	
  acts	
  as	
  a	
  cost	
  multiplier	
  in	
  a	
  redundant	
  operation	
  
                              FAST	
  is	
  proprietary	
  and	
  difficult	
  to	
  extend	
  –	
  often	
  requires	
  reverse	
  engineering.	
  Document	
  
	
  	
  	
  	
  	
  	
  	
  	
  processing	
  pipeline	
  is	
  extensible	
  but	
  core	
  indexing	
  and	
  querying	
  capabilities	
  are	
  proprietary.	
  
      FAST	
  is	
  moving	
  away	
  from	
  UNIX	
  and	
  standalone	
  product	
  support	
  for	
  ESP	
  
                              Microsoft	
  is	
  moving	
  to	
  FAST	
  FSIS	
  /	
  CTS	
  to	
  replace	
  the	
  document	
  processing	
  pipeline;	
  such	
  a	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  move	
  would	
  deprecate	
  one	
  of	
  the	
  most	
  attractive	
  components	
  of	
  the	
  FAST	
  ESP	
  platform.	
  
      Costs	
  will	
  go	
  up	
  as	
  data	
  sources	
  diversify	
  and	
  require	
  additional	
  connectors	
  
      Internal	
  corporate	
  search	
  will	
  incur	
  its	
  own	
  licensing	
  and	
  other	
  associated	
  costs	
  
      Search	
  federation	
  (FAST	
  Unity)	
  will	
  be	
  an	
  additional	
  cost	
  and	
  may	
  be	
  on	
  the	
  horizon	
  
Created	
  by	
  Doug	
  Cutting	
  in	
  1999,	
  Lucene	
  is	
  built	
  on	
  ideas	
  from	
  search	
  projects	
  created	
  at	
  Xerox	
  
PARC	
  and	
  Apple.	
  It	
  was	
  donated	
  to	
  the	
  Apache	
  Software	
  Foundation	
  in	
  2001	
  and	
  became	
  an	
  
Apache	
  top-­‐level	
  project	
  in	
  2005.	
  Lucene	
  is	
  an	
  open	
  source	
  Java-­‐based	
  IR	
  library	
  with	
  best	
  
practice	
  indexing,	
  search,	
  and	
  query	
  capabilities.	
  	
  
Lucene	
  and	
  Solr	
  “merged”	
  development	
  in	
  early	
  2010.	
  	
  
Solr	
  is	
  now	
  the	
  de	
  facto	
  open	
  source	
  enterprise	
  search	
  platform,	
  managed	
  under	
  the	
  Apache	
  
Lucene	
  Project.	
  Solr	
  uses	
  Lucene	
  as	
  the	
  “engine”,	
  but	
  adds	
  full	
  enterprise	
  search	
  server	
  features	
  
and	
  capabilities.	
  The	
  implementation	
  is	
  100%	
  Java,	
  has	
  a	
  stable,	
  mature	
  API,	
  and	
  has	
  been	
  
continuously	
  improved	
  for	
  more	
  than	
  10	
  years.	
  
Lucid	
  Imagination,	
  the	
  “Red	
  Hat	
  of	
  enterprise	
  search”	
  is	
  funded	
  by	
  the	
  CIA	
  In-­‐Q-­‐Tel	
  fund	
  and	
  
other	
  venture	
  capital	
  firms	
  and	
  is	
  quickly	
  defining	
  themselves	
  as	
  a	
  source	
  of	
  innovation	
  in	
  
enterprise	
  search.	
  Many	
  of	
  the	
  original	
  contributors	
  to	
  the	
  Apache	
  Lucene	
  Project	
  work	
  for	
  Lucid	
  
Imagination	
  in	
  professional	
  services	
  and	
  R&D.	
  	
  Likewise,	
  many	
  key	
  personnel	
  from	
  proprietary	
  
search	
  platform	
  vendors	
  have	
  transitioned	
  to	
  Lucid	
  Imagination.	
  
Solr	
  has	
  been	
  deployed	
  at	
  the	
  following	
  firms:	
  Ebay,	
  Ford,	
  IBM,	
  Nike,	
  FedEx,	
  HP,	
  CNET,	
  
Monster,	
  Myspace,	
  Central	
  Intelligence	
  Agency.	
  Some	
  Solr	
  customers	
  report	
  indexes	
  
containing	
  up	
  to	
  6	
  billion	
  documents	
  in	
  size.	
  
Solr/Lucene	
  features:	
  
 Support	
  for	
  a	
  multitude	
  of	
  mainstream	
  document	
  formats	
  including	
  PDF,	
  MS	
  Office,	
  HTML,	
  XML.	
  

 DataImportHandlers	
  that	
  can	
  pull	
  data	
  from	
  a	
  variety	
  of	
  sources,	
  i.e.	
  databases,	
  RSS	
  atom	
  feed,	
  XML	
  Xpath,	
  and	
  mail	
  server	
  
	
  	
  	
  repositories.	
  All	
  major	
  RDBMS	
  supported	
  including	
  Oracle	
  and	
  MySQL.	
  
 Crawling	
  and	
  classification	
  mechanisms	
  such	
  as	
  Nutch	
  and	
  Tika.	
  	
  

 Aesthetically	
  basic	
  but	
  powerful	
  administrative	
  interfaces	
  (not	
  quite	
  as	
  polished	
  as	
  FAST)	
  

 Faceted	
  categories	
  (equivalent	
  to	
  FAST	
  navigators)	
  

 Support	
  for	
  all	
  major	
  western	
  languages	
  and	
  Asian	
  languages	
  (CJK+)	
  

 Geosearch	
  

 Dynamic	
  rank	
  and	
  boost	
  (both	
  query	
  side	
  and	
  index	
  side)	
  

 Spellcheck/Did	
  You	
  mean	
  

 Entity	
  Extraction	
  	
  

 Filter	
  Queries	
  	
  

 Function	
  Queries	
  

 Dynamic	
  query-­‐side	
  boost	
  	
  
Solr	
  architecture:	
  
    Large	
  indexes	
  are	
  physically	
  split	
  into	
  “shards”	
  –	
  i.e.	
  10	
  shards	
  could	
  represent	
  an	
  entire	
  
     data	
  set.	
  (analogous	
  to	
  a	
  FAST	
  row)	
  
    Index	
  “Masters”	
  build	
  indexes	
  and	
  “Slave”	
  to	
  searchers	
  (think	
  MySQL-­‐style	
  replication)	
  	
  
    Solr	
  has	
  no	
  concept	
  of	
  rows	
  and	
  columns	
  –	
  up	
  to	
  the	
  user	
  to	
  properly	
  configure	
  load	
  
     balancers	
  and	
  network	
  infrastructure	
  to	
  handle	
  failures	
  in	
  any	
  given	
  shard	
  
    Index	
  segments	
  can	
  be	
  “optimized”	
  via	
  a	
  manual	
  merge	
  operation,	
  prompting	
  Solr	
  to	
  take	
  
     fragmented	
  binary	
  trees	
  and	
  rebuild	
  them	
  
    Replication	
  of	
  indices	
  across	
  nodes	
  allows	
  for	
  distribution	
  of	
  load	
  (analogous	
  to	
  a	
  FAST	
  
     column)	
  
    Solr	
  is	
  relatively	
  monolithic,	
  all	
  components	
  reside	
  in	
  JVM	
  and	
  application	
  container	
  
    Caches	
  do	
  not	
  expire,	
  and	
  remain	
  valid	
  for	
  the	
  lifetime	
  of	
  IndexSearchers	
  
    The	
  JVM	
  provides	
  automatic	
  memory	
  management	
  for	
  the	
  container	
  that	
  hosts	
  Solr	
  
Solr/Lucene	
  administration:	
  

    Administrative	
  web	
  front-­‐end	
  with	
  aesthetically	
  basic	
  but	
  
     powerful	
  management	
  tools.	
  LucidWorks	
  Enterprise	
  will	
  
     provide	
  more	
  comprehensive	
  UI	
  and	
  monitoring	
  
     capabilities	
  

    Debugging	
  tools	
  such	
  as	
  Luke	
  provide	
  complex	
  debugging	
  
     and	
  index	
  exploration	
  

    Flat-­‐file	
  configurations	
  in	
  the	
  form	
  of	
  XML	
  allow	
  user	
  to	
  
     tune	
  Solr	
  and	
  even	
  extend	
  it	
  to	
  custom	
  Java	
  classes	
  
Solr/Lucene	
  cost	
  structure:	
  
    Solr	
  is	
  open	
  source	
  –	
  it’s	
  free	
  to	
  use	
  and	
  license	
  
    If	
  a	
  prime	
  vendor	
  support	
  is	
  desired,	
  Lucid	
  Imagination	
  provides	
  subscription	
  
     based	
  support	
  bundled	
  and	
  billed	
  per	
  hour.	
  
    There	
  are	
  some	
  ancillary	
  features	
  that	
  work	
  with	
  FAST	
  out	
  of	
  the	
  box	
  that	
  are	
  
     simply	
  not	
  present	
  in	
  Solr/Lucene.	
  If	
  developed	
  in-­‐house,	
  these	
  are	
  
     development	
  costs	
  that	
  must	
  be	
  quantified	
  as	
  part	
  of	
  the	
  cost	
  structure	
  
    There	
  is	
  a	
  strong	
  network	
  of	
  Solr/Lucene	
  consultancies	
  that	
  provide	
  prime	
  
     vendor-­‐like	
  services	
  for	
  complete	
  maintenance	
  and	
  managed/hosted	
  search	
  
     clusters	
  (Lightcrest)	
  
    Solr	
  hardware	
  foot	
  print	
  is	
  expected	
  to	
  be	
  on	
  par	
  with	
  FAST	
  keeping	
  index,	
  
     DPS,	
  and	
  QPS	
  dimensions	
  constant.	
  In	
  some	
  cases	
  Solr	
  requires	
  more	
  
     hardware	
  to	
  do	
  the	
  same	
  with	
  FAST.	
  In	
  some	
  cases	
  less,	
  depending	
  on	
  the	
  
     environment	
  
With	
  respect	
  to	
  the	
  future,	
  there	
  are	
  strengths	
  that	
  Solr/Lucene	
  offers:	
  
 	
  Mature	
  product,	
  contains	
  real-­‐world	
  best-­‐of-­‐breed	
  enterprise	
  search	
  features	
  

 Solr	
  is	
  free	
  to	
  license.	
  	
  Multiple	
  instances	
  do	
  not	
  act	
  as	
  a	
  cost	
  multiplier	
  with	
  regard	
  to	
  licensing	
  

 Very	
  strong	
  internationalization	
  and	
  linguistics	
  capabilities	
  given	
  the	
  customer	
  interfaces	
  with	
  Basis	
  Tech.	
  

 Strong	
  tool	
  ecosystem,	
  including	
  machine	
  learning	
  (Mahout),	
  distributed	
  computing	
  (Hadoop),	
  data	
  
classification	
  (Tika),	
  and	
  web	
  crawling	
  (Nutch).	
  	
  All	
  unified	
  under	
  Apache	
  Software	
  Foundation	
  and	
  related	
  
commercial	
  entities	
  (Red	
  Hat	
  style)	
  
 Horizontally	
  scalable	
  indexing	
  and	
  search	
  in	
  the	
  form	
  of	
  user-­‐shards	
  (consistent	
  hashing	
  algorithms,	
  etc)	
  

 Vertically	
  scalable	
  indexing	
  in	
  the	
  form	
  of	
  user-­‐defined	
  segments	
  and	
  federation	
  mechanisms	
  

 Solr	
  is	
  100%	
  Java	
  and	
  can	
  be	
  easily	
  extended	
  by	
  enterprises	
  with	
  Java	
  talent	
  

 Lucene	
  can	
  be	
  used	
  standalone	
  outside	
  of	
  Solr	
  to	
  provided	
  an	
  embedded	
  search	
  runtime	
  for	
  any	
  Java	
  based	
  	
  	
  
	
  	
  	
  application.	
  Hooks	
  for	
  Jython	
  and	
  .NET	
  as	
  well	
  as	
  libraries	
  in	
  Python,	
  C#,	
  etc.	
  
 Solr	
  can	
  be	
  easily	
  extended	
  to	
  build	
  home-­‐grown	
  features	
  and	
  all	
  source	
  code	
  for	
  the	
  engine	
  itself	
  is	
  	
  	
  
	
  	
  	
  available	
  to	
  the	
  development	
  team.	
  
With	
  respect	
  to	
  the	
  future,	
  there	
  are	
  risks	
  associated	
  with	
  Solr/Lucene	
  :	
  
 Solr/Lucene	
  has	
  no	
  document	
  processing	
  pipeline.	
  This	
  is	
  up	
  to	
  the	
  customer	
  to	
  build.	
  

 Solr/Lucene	
  is	
  an	
  open	
  source	
  project,	
  and	
  releases	
  are	
  dictated	
  by	
  the	
  contributing	
  community,	
  not	
  any	
  	
  	
  
	
  	
  	
  single	
  vendor.	
  
 The	
  administrative	
  interfaces	
  for	
  Solr/Lucene	
  are	
  relatively	
  rudimentary	
  and	
  do	
  not	
  integrate	
  with	
  monitoring	
  	
  
	
  	
  	
  	
  tools.	
  This	
  issue	
  is	
  addressed	
  by	
  LucidWorks	
  Enterprise.	
  
 Solr/Lucene	
  may	
  or	
  may	
  not	
  provide	
  a	
  boost	
  in	
  performance	
  over	
  FAST	
  and	
  may	
  require	
  more	
  hardware	
  	
  	
  
	
  	
  	
  depending	
  on	
  the	
  features	
  being	
  leveraged.	
  Benchmarking	
  is	
  key	
  before	
  selection.	
  
 For	
  certain	
  internationalization	
  requirements,	
  products	
  from	
  Basis	
  Tech	
  must	
  be	
  purchased,	
  introducing	
  	
  	
  
	
  	
  	
  another	
  vendor	
  into	
  the	
  equation.	
  FAST,	
  however,	
  also	
  uses	
  Basis	
  Tech	
  –	
  they	
  just	
  middle	
  man	
  the	
  	
  	
  
	
  	
  	
  relationship	
  by	
  baking	
  it	
  into	
  their	
  product	
  and	
  charging	
  customers	
  a	
  premium.	
  	
  
 Solr/Lucene	
  is	
  not	
  a	
  canned	
  solution	
  like	
  FAST	
  but	
  features	
  are	
  growing	
  rapidly	
  

 Lack	
  of	
  a	
  central	
  vendor	
  gives	
  customer	
  more	
  control	
  but	
  also	
  more	
  responsibility.	
  This	
  risk	
  can	
  be	
  mitigated	
  	
  	
  
	
  	
  	
  with	
  an	
  experienced	
  third-­‐party	
  search	
  partner.	
  Customers	
  who	
  want	
  enterprise	
  backing	
  on	
  Solr	
  can	
  	
  	
  
	
  	
  	
  centralize	
  	
  through	
  Lightcrest	
  and	
  Lucid	
  Imagination.	
  
                                                                                               FAST	
  ESP	
     Solr/Lucene	
  
Technical	
  Juxtaposition	
                                      Best	
  of	
  breed	
  
                                                                  search	
  and	
  
                                                                  indexing	
  
                                                                  Multiple	
  content	
  
                                                                  formats	
  and	
  data	
  
                                                                  sources	
  

 Solr	
  does	
  not	
  have	
  	
  a	
  structured	
  
 	
                                                               Mature	
  feeding	
  
                                                                  mechanisms	
  
document	
  processing	
  pipeline	
  or	
  document	
  
                                                                  (crawlers,	
  JDBC,	
  
ingest	
  framework.	
  	
  	
                                    flat	
  files)	
  

 Solr	
  offers	
  granular	
  query	
  functionality	
  
 	
                                                               Document	
  
                                                                  processing	
  
such	
  as	
  function	
  queries	
  that	
  FAST	
  does	
  
                                                                  pipeline	
  
not.	
  
                                                                  Function	
  Queries	
  
 	
  
 FAST	
  has	
  more	
  mature	
  admin	
  tools.	
  
                                                                  Dynamic	
  Boost	
  	
  
 	
  
 Proprietary	
  code	
  (FAST)	
  has	
  strong	
                Scalable	
  
central	
  development	
  support	
  and	
  service	
             Architecture	
  	
  
contracts.	
  Open	
  source	
  code	
  (Solr)	
  can	
  be	
  
                                                                  Faceted	
  Search	
  /	
  
unstable,	
  but	
  stable	
  releases	
  get	
  more	
  
                                                                  Navigators	
  
worldwide	
  QA	
  than	
  proprietary	
  
counterparts.	
                                                   Fault	
  Tolerance	
  

                                                                  Multiple	
  
 	
  
 FAST’s	
  proprietary	
  codebase	
  makes	
  it	
              Languages	
  
difficult	
  to	
  extend,	
  though	
  it	
  has	
  more	
  
features	
  baked	
  in.	
                                        Baked-­‐in	
  Query	
  
                                                                  Side	
  language	
  
                                                                  detection	
  
                                                                  Strong	
  	
  Admin	
  
                                                                  Tools	
  
Business	
  Juxtaposition	
  
                                                                                                       FAST	
  ESP	
     Solr/Lucene	
  

                                                                    Central	
  commercial	
  
                                                                    Vendor	
  

                                                                    Strong	
  third	
  party	
  
 	
  
 FAST	
  has	
  a	
  strong	
  central	
  vendor	
                 vendors	
  
(Microsoft)	
  as	
  does	
  Solr	
  (Lucid	
  
Imagination)	
  via	
  LucidWorks.	
                                Open	
  source	
  code	
  
                                                                    base	
  
 	
  
 Solr	
  is	
  free.	
  FAST	
  is	
  expensive.	
  

 	
  
 Solr	
  and	
  FAST	
  hardware	
  outlay	
  roughly	
            Licensing	
  costs	
  per	
  
the	
  same	
  in	
  most	
  cases.	
  Hardware	
  costs	
          cluster	
  /	
  instance	
  
should	
  be	
  comparable.	
  
                                                                    Graduated	
  license	
  
 Microsoft	
  is	
  deprecating	
  UNIX	
  support,	
  
  	
                                                                costs	
  per	
  query	
  and	
  
moving	
  FAST	
  to	
  .NET.	
  	
  Solr	
  is	
  inexorably	
     GB	
  indexed	
  
linked	
  to	
  Java/UNIX	
  or	
  your	
  particular	
             Internationalization	
  
application	
  container.	
                                         capabilities	
  	
  
                                                                    Large	
  hardware	
  
 	
  
 Build	
  vs.	
  Buy	
  problem.	
  With	
  FAST,	
  you	
         footprint	
  and	
  
buy	
  product	
  licenses	
  and	
  deploy	
  with	
  costly	
     associated	
  costs	
  
service	
  contracts.	
  With	
  Solr,	
  you	
  build	
            Product	
  roadmap	
  
extensions	
  with	
  staff	
  and	
  partner	
                      that	
  parallels	
  
ecosystem.	
                                                        customer	
  needs	
  
    Run	
  through	
  a	
  POC	
  and	
  sizing	
  exercise	
  with	
  Solr	
  to	
  determine	
  query	
  
     latency,	
  index	
  expansion,	
  and	
  resulting	
  hardware	
  footprint	
  

    Explore	
  the	
  feasibility	
  of	
  federated	
  search	
  with	
  parallel	
  systems	
  
     (FAST/Solr)	
  for	
  particular	
  index	
  regions	
  

    Verify	
  all	
  existing	
  features	
  required	
  by	
  the	
  enterprise	
  can	
  carry	
  over	
  to	
  
     the	
  Solr	
  query	
  language	
  and	
  feature	
  set	
  

    Compare	
  costs,	
  both	
  actual	
  versus	
  best	
  projected	
  estimates	
  

    Define	
  feature-­‐related	
  search	
  engine	
  roadmap,	
  including	
  new	
  
     architecture	
  and	
  data	
  model	
  (index	
  profile	
  if	
  FAST	
  or	
  schema	
  if	
  Solr)	
  

    Execute	
  implementation	
  plan	
  

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:2/9/2012
language:English
pages:24