Docstoc

Implementing Netezza Spatial

Document Sample
Implementing Netezza Spatial Powered By Docstoc
					in	
  conjunc(on	
  with	
  
Data Management & Warehousing   http://www.datamgmt.com
What	
  is	
  the	
  Spa(al	
  Module?	
  

•  It’s	
  the	
  ability	
  to	
  analyse	
  informa(on	
  in	
  a	
  
   geographic	
  context:	
  
          –  Where	
  is	
  the	
  nearest	
  petrol	
  sta(on?	
  
          –  Which	
  road	
  am	
  I	
  on?	
  
          –  How	
  many	
  ATMs	
  are	
  in	
  this	
  area?	
  
•  It’s	
  not	
  maps	
  and	
  images	
  
          –  These	
  come	
  later	
  with	
  tools	
  that	
  help	
  present	
  the	
  
             informa(on	
  

Wednesday,	
  July	
  28,	
  2010	
     ©	
  2010	
  Data	
  Management	
  &	
  Warehousing	
     2	
  
The	
  three	
  types	
  of	
  data	
  &	
  many	
  ques(ons	
  

•  Points	
                                                                •  How	
  close	
  are	
  two	
  
          –  OS	
  Grid	
                                                     points?	
  
          –  La(tude	
  &	
  Longitude	
  	
                               •  Does	
  a	
  point	
  touch	
  a	
  
•  Lines	
                                                                    line?	
  
          –  Pairs	
  of	
  points	
                                       •  Is	
  a	
  point	
  inside	
  or	
  
          –  e.g.	
  Road	
  Segments	
                                       outside	
  a	
  polygon?	
  
•  Polygons	
                                                              •  Does	
  a	
  line	
  cross	
  a	
  
          –  A	
  series	
  of	
  points	
  that	
                            polygon?	
  
             define	
  a	
  boundary	
                                      •  How	
  many	
  points	
  are	
  in	
  
          –  e.g.	
  Postcode	
  Boundaries	
                                 a	
  polygon?	
  

Wednesday,	
  July	
  28,	
  2010	
     ©	
  2010	
  Data	
  Management	
  &	
  Warehousing	
                   3	
  
Using	
  Spa(al	
  Data	
  Is	
  Complex	
  
•  Different	
  distances	
  
   between	
  points	
  at	
  
   different	
  longitudes	
  and	
  
   la(tudes	
  
•  Measurement	
  over	
  a	
  
   curved	
  irregular	
  surface	
  
•  Mul(ple	
  input	
  and	
  output	
  
   formats	
  
•  Mul(ple	
  co-­‐ordinate	
  
   systems	
  see:
   A	
  Guide	
  to	
  Coordinate	
  
   Systems	
  in	
  Great	
  Britain	
  	
  

Wednesday,	
  July	
  28,	
  2010	
     ©	
  2010	
  Data	
  Management	
  &	
  Warehousing	
     4	
  
Sources	
  of	
  Informa(on	
  –	
  GPS	
  
•  In	
  Car	
  Device	
  
          –  Sends	
  frequent	
  data	
  sets	
  to	
  
             processing	
  centre	
  
          –  Point	
  Data	
  
                    •  Speed,	
  Direc(on,	
  	
  
                       Loca(on	
  and	
  G-­‐force	
  
          –  Aggregate	
  Data	
  
                    •  Speed	
  and	
  Direc(on	
  
•  Other	
  Devices	
  
          –  Sat	
  Nav	
  Systems	
  
          –  Smart	
  Phone	
  Apps	
  	
  
             e.g.	
  ‘GPS	
  Tracker’	
  
          –  Cameras	
  

Wednesday,	
  July	
  28,	
  2010	
          ©	
  2010	
  Data	
  Management	
  &	
  Warehousing	
     5	
  
Sources	
  of	
  Informa(on	
  –	
  Ordnance	
  Survey	
  

•  Integrated	
  Road	
  Network:	
  
   A	
  series	
  of	
  3	
  million	
  
   ‘linestrings’	
  and	
  17	
  million	
  
   points	
  that	
  describe	
  every	
  
   road	
  in	
  the	
  UK	
  
•  Linestrings	
  have	
  between	
  2	
  
   and	
  655	
  points,	
  most	
  have	
  
   less	
  than	
  10	
  
•  23	
  points	
  for	
  this	
  picture	
  	
  	
  

Wednesday,	
  July	
  28,	
  2010	
     ©	
  2010	
  Data	
  Management	
  &	
  Warehousing	
     6	
  
Sources	
  of	
  Informa(on	
  –	
  Post	
  Office/GAdm	
  

•  Postal	
  Address	
  File:	
  
   A	
  series	
  of	
  c.1.75M	
  UK	
  
   postcodes	
  
          –  Postcode	
  Boundaries	
  	
  
          –  Over	
  28M	
  complete	
  
             addresses	
  
•  Global	
  Admin	
  Boundaries	
  
          –  Na(onal	
  and	
  regional	
  
             boundaries	
  for	
  c.245	
  
             countries	
  
          –  hgp://www.gadm.org	
  	
  

Wednesday,	
  July	
  28,	
  2010	
     ©	
  2010	
  Data	
  Management	
  &	
  Warehousing	
     7	
  
Data	
  Layers	
  –	
  Enriching	
  what	
  you	
  have	
  

•  Data	
  Layers	
  are	
  sets	
  of	
  informa(on	
  (ed	
  to	
  a	
  
   geographic	
  point	
  
          –  Road	
  Speed	
  for	
  a	
  given	
  road	
  segment	
  
          –  ATM	
  Loca(on	
  
          –  House	
  Price	
  for	
  a	
  postcode	
  
•  Where	
  data	
  has	
  loca(on	
  informa(on	
  it	
  is	
  
   known	
  as	
  ‘Geo-­‐tagged’	
  


Wednesday,	
  July	
  28,	
  2010	
     ©	
  2010	
  Data	
  Management	
  &	
  Warehousing	
     8	
  
Data	
  Layer	
  Sources	
  (1)	
  

•  Ordnance	
  Survey	
  
          –  Road	
  Types,	
  Limits,	
  Closures,	
  etc.	
  
•  Government	
  
          –  UK	
  Government	
  now	
  providing	
  masses	
  of	
  	
  
             geo-­‐tagged	
  info	
  (hgp://data.gov.uk)	
  
•  Met	
  Office	
  /	
  HM	
  Nau(cal	
  Almanac	
  Office	
  	
  
          –  Weather,	
  Daylight	
  to	
  Postcode	
  Level	
  


Wednesday,	
  July	
  28,	
  2010	
     ©	
  2010	
  Data	
  Management	
  &	
  Warehousing	
     9	
  
Data	
  Layer	
  Sources	
  (2)	
  
•  Wikipedia	
  
          –  Geo-­‐tag	
  Access	
  API	
  –	
  what’s	
  nearby?	
  
•  Google	
  Maps	
  
          –  Road	
  level	
  photographic	
  images	
  
•  Commercial	
  Sources	
  
          –  Fast	
  Food	
  Outlets,	
  Supermarkets,	
  Petrol	
  Sta(ons,	
  ATMs,	
  
             etc.	
  

•  Massive	
  growth	
  in	
  both	
  commercial	
  and	
  public	
  domain	
  
   geo-­‐tagged	
  data	
  


Wednesday,	
  July	
  28,	
  2010	
     ©	
  2010	
  Data	
  Management	
  &	
  Warehousing	
     10	
  
Issues	
  with	
  Geo-­‐tagged	
  data	
  

•  Geo-­‐tagging	
  uses	
  different	
  formats	
  
          –  Longitude	
  &	
  La(tude,	
  OS	
  Grid	
  Reference,	
  etc	
  
•  Geo-­‐tagging	
  at	
  different	
  levels	
  
          –  Data	
  for	
  a	
  postcode	
  or	
  a	
  an	
  en(re	
  county	
  which	
  makes	
  
             it	
  difficult	
  to	
  compare	
  
•  Geo-­‐tagging	
  coverage	
  is	
  patchy	
  and/or	
  historic	
  
          –  Rate	
  of	
  change	
  of	
  fine	
  detail	
  data	
  is	
  very	
  high	
  	
  
          –  e.g.	
  OS	
  issues	
  monthly	
  updates	
  to	
  the	
  UK	
  mapping	
  
•  Mul(ple	
  standards	
  and	
  formats	
  
          –  XML	
  &	
  CSV,	
  different	
  file	
  formats,	
  etc.	
  	
  

Wednesday,	
  July	
  28,	
  2010	
     ©	
  2010	
  Data	
  Management	
  &	
  Warehousing	
     11	
  
Our	
  Model	
  For	
  Delivering	
  Spa(al	
  Data	
  

             Source	
                   1.       Load	
  Mul(ple	
  File	
  Formats	
                                                          Netezza	
  
                                        2.       Standardise	
  Geo-­‐Tagging	
  
                                        3.       Extract	
  &	
  Load	
  CSVs	
  




                                                                                                             	
  (Proximity,	
  Contains,	
  Excludes)	
  




                                                                                                                                                                                                         (Tableau,	
  Google	
  Maps,	
  etc.)	
  
                                                                                                                                                                                                          Query	
  &	
  Presenta(on	
  Tools	
  
             Source	
                   4.       Perform	
  Spa(al	
  Analysis	
  




                                                                                                                                                             (Sets	
  of	
  data	
  with	
  spa(al	
  
                                                                                                                                                               Spa(al	
  Presenta(on	
  
                                        5.       Create	
  User	
  Access	
  Area	
  




                                                                                                                        Spa(al	
  Analysis	
  




                                                                                                                                                                       agributes)	
  
             Source	
  


             Source	
                                      (Small)	
  
                                         1	
              Postgres	
                     3	
  

                                                          Database	
  
             Source	
  

                                                                  2	
  
             Source	
                                                                                                               4	
                                        5	
  




Wednesday,	
  July	
  28,	
  2010	
                     ©	
  2010	
  Data	
  Management	
  &	
  Warehousing	
                                                                                                            12	
  
Netezza	
  Spa(al	
  Value	
  Add	
  
•  Netezza	
  Spa(al	
  is	
  fast	
                                            •  Netezza	
  Spa(al	
  is	
  easy	
  
          –  Analysis	
                                                                  –  Distance	
  and	
  proximity	
  
                    •  Look	
  up	
  a	
  typical	
  18	
  point	
                          calcula(ons	
  are	
  simple	
  
                       trip	
  in	
  the	
  3M	
  linestrings	
  to	
                    –  ‘Touches’,	
  ‘Overlaps’	
  &	
  
                       find	
  the	
  roads	
  that	
  the	
                                 ‘Contains’	
  queries	
  allow	
  
                       vehicle	
  was	
  on	
  in	
  less	
  than	
  
                       1	
  second	
                                                        instant	
  value	
  add	
  	
  
                    •  Overnight	
  batch	
  process	
  of	
  
                       300,000	
  points	
  to	
  matching	
                    •  Netezza	
  Spa(al	
  integrates	
  
                       road	
  names	
  in	
  under	
  30	
  
                       minutes	
                                                         –  Works	
  well	
  with	
  Tableau	
  
          –  Presenta(on	
                                                               –  Easy	
  to	
  generate	
  KML	
  for	
  
                    •  Tools	
  rely	
  on	
  fast	
  query	
                               use	
  with	
  Google	
  Earth	
  and	
  
                       access	
  to	
  render	
  any	
                                      Google	
  Maps	
  
                       queried	
  map	
  with	
  sub-­‐
                       second	
  response	
  (mes	
  


Wednesday,	
  July	
  28,	
  2010	
               ©	
  2010	
  Data	
  Management	
  &	
  Warehousing	
                            13	
  
Netezza	
  Spa(al	
  Limita(ons	
  
•  Fails	
  the	
  Slar(barpast	
  Test:	
  
          –  Polygons	
  for	
  very	
  detailed	
  maps	
  
             are	
  too	
  big	
  to	
  be	
  loaded	
  as	
  
             Netezza	
  limits	
  the	
  maximum	
  
             block	
  size	
  to	
  64000	
  characters	
                                                   Norway	
  
          –  Named	
  aqer	
  the	
  Hitch-­‐Hikers	
  
             Guide	
  to	
  the	
  Galaxy	
  coastline	
  
             designer	
  responsible	
  for	
  the	
  
             twiddly	
  bits	
  around	
  the	
  
             Norwegian	
  rords	
  
•  Work-­‐around:	
  
          –  Use	
  regional	
  boundaries	
  (e.g.	
  
             UK	
  Coun(es,	
  US	
  States,	
  etc.)	
  
             and	
  then	
  aggregate	
  into	
  
             na(onal	
  boundaries	
  
          –  If	
  a	
  point	
  is	
  in	
  Berkshire	
  then	
  by	
                                         Slar(barpast	
  
             defini(on	
  it	
  is	
  also	
  in	
  England	
  

Wednesday,	
  July	
  28,	
  2010	
               ©	
  2010	
  Data	
  Management	
  &	
  Warehousing	
                           Page	
  14	
  
Current	
  Uses	
  …	
  

•        M/A/B	
  road	
  driving	
  profiles	
  
•        Time	
  of	
  day	
  driving	
  profiles	
  
•        Speed	
  Limits	
  vs.	
  Driven	
  Speed	
  
•        Matching	
  GPS	
  posi(ons	
  to	
  road	
  names	
  
•        Out	
  of	
  bounds	
  driving	
  
•        Customer	
  Demographic	
  Profiles	
  
     	
  …	
  but	
  this	
  is	
  only	
  the	
  start	
  in	
  a	
  very	
  short	
  (me	
  

Wednesday,	
  July	
  28,	
  2010	
     ©	
  2010	
  Data	
  Management	
  &	
  Warehousing	
     15	
  
in	
  conjunc(on	
  with	
  
Data Management & Warehousing   http://www.datamgmt.com

				
DOCUMENT INFO
Shared By:
Tags:
Stats:
views:20
posted:4/24/2012
language:
pages:16