Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

JAMES

VIEWS: 0 PAGES: 26

									              Data Quality:
    Opportunities, Data, and Examples




1
        Data Woes
                         We are agents of CHANGE!




    The Kübler-Ross grief cycle
    …roller-coaster ride of activity and passivity as the person wriggles and turns
    in their desperate efforts to avoid the change.

2
3
    Better and More Data

      –   Level of analysis
           • Take a quick look at what/why use data
           • Linking data from disparate and third party sources
      – Explore data types
      – Typical issues & Tricks
           •   Cross validation and sourcing
           •   Reverse Look-up
           •   GIS layering
           •   Backfill from text correlated to codes
      –   Information from operations
           • Text analytics

4
    General Organizational Overview
    An information business focused on risk taking.
    Make. Sell. Serve.
                                                    Producer Segmentation
                                                    Market Planning
                                                    Revenue Forecasting
                           Sales and Distribution   Cross sell and Up sell
                                                    Retention and Profitability



                    Underwriting              Claims



      Risk Selection and Pricing                       Payment Accuracy
      Portfolio Management                             Claim Collaboration
      Premium Adequacy                                  > Fraud Detection
      Billing and Collections Management                > Subrogation
                                                        > Risk Transfer
                                                        > 3rd Party Deductible
                                                        > Reinsurance Recoverable
5
    Same Problems – Different Lines of Business


    • Personal – Auto, HO, Umbrella
    • Small Commercial – BOP, CPP
    • Middle Market Commercial – CPP w/GL, CP, Crime,
        CIM, B&M, WC, Auto
    •   Large Commercial Accounts
    •   Commercial Auto
    •   Workers Comp
    •   Umbrella/Excess
    •   Specialty Lines – D&O, EPL, E&O, Farm, FI



6
    Data Types and Forms


                           Structured data
                           Semi-structured data
                           Unstructured data
                           Text
                           Spatial
                           Pictographic
                           Graphic
                           Voice
                           Video

7
    Multiple Data Systems which must be pulled together for
    analysis. Great opportunity for cross-validation and
    sourcing

        Archive,
                                                         Vendors/Partners
        Legacy Systems
        Current System   Claim                                    Medical Data
                                                                   - Bill Review
                                             Data                  - PPO
      External Data                                                - Case Management
                                                                   - Paradigm
                         Policy                            Multiple States
         Multiple Underwriting Systems                     Billing Systems
                                                           Finance Systems
               ACTIONS                                     CRM Systems, other data
               • Identify Data Systems
               • Get right data from right systems
               • Overcome internal Organizational Barriers
               • Bridge to legacy systems and archived data
               • Augment to create rich data mining environment
               • Expect the need to negotiate for resources


8
    Some typical external data sources and vendors


           Dun & Bradstreet
           Experian
           Bureau of Labor and Statistics
           Market Stance
           AM Best
           Equifax
           US Census
           Claritas
           Melissa Data
           ISO
           GIS vendors
           U&C Data sets
           Code Sets for ICD-s and CPT’s
           …




9
     Data Glitches – historical and on-going

     Systemic changes to data not process related
        – Changes in data layout / data types
        – Changes in scale / format
        – Temporary reversion to defaults
        – Missing and default values
        – Gaps in time series




10
     Process Reasons for poor data entry




11
 Defining Issues-sample

     Source Data




                   1-Define
                    Issues




12
MORE ISSUES…
Mapping across sources: Same Fact, Different Terms

                                       Name: Country Identifiers
              Data                     Context:
                                       Definition:
                                                                           Algeria
                                                                           Belgium

              Element                  Unique ID: 5769
                                       Conceptual Domain:
                                                                            China
                                                                          Denmark
                                       Maintenance Org.:                    Egypt
              Concept                  Steward:
                                       Classification:
                                                                           France
                                                                             ...
                                       Registration Authority:            Zimbabwe
                                       Others

                                        Data Elements
     Name:                 Algeria       L`Algérie             DZ           DZA             012
     Context:              Belgium       Belgique              BE           BEL             056
     Definition:            China          Chine               CN          CHN              156
     Unique ID: 4572
     Value Domain:        Denmark       Danemark               DK          DNK              208
     Maintenance Org.       Egypt         Egypte               EG           EGY             818
     Steward:              France       La France              FR           FRA             250
     Classification:
                             ...            ...                ...          ...             ...
     Registration
       Authority:         Zimbabwe      Zimbabwe               ZW          ZWE              716
     Others
                          ISO 3166       ISO 3166          ISO 3166       ISO 3166        ISO 3166
                        English Name   French Name       2-Alpha Code   3-Alpha Code   3-Numeric Code

13
 Data Filling

        •   Manual
        •   Statistical Imputation
        •   Temporal
        •   Spatial
        •   Spatial-temporal




14
     Geographic Hierarchy




15
     Deriving Data = Power


       Ø   Totals: Household Income
       Ø   Trends: Rate of Medical Bill Increases
       Ø   Ratios: Claims/Premium, Target/Median
       Ø   Friction: Level of inconvenience, ratio of rental to damage
       Ø   Sequences: Lawyer-Doctor, Auto-Life Policy
       Ø   Circumstances: Minimal Impact Severe Trauma
       Ø   Temporal: Loss shortly after adding collision
       Ø   Spatial: Distance to Service, proximity of stakeholders
       Ø   Logged: Progress Notes, Diaries,
                    Ø Who did it, When, “Why”




16
     Deriving Data = Power (Cont’d)



       Ø   Behavioral: Deviation from past usage, spike buying
       Ø   Experience Profiles: Vendor, Doctor, Premium Audit
       Ø   Channel: How applied, How reported, Service Chain
       Ø   Legal Jurisdiction: Venue Disposition, Rules
       Ø   Demographics: Working, Weekly wage, lost income
       Ø   Firmographics: Industry Class Code Vs Injuries Claimed
       Ø   Inflation: Wage, Medical, Goods, Auto, COLA
       Ø   Gov’t Statistics: Crime Rate, Employment, Traffic
       Ø   Other Stats: Rents, Occupancy, Zoning, Mgd Care




17
          “Search” versus “Discover”


                       Search            Discover
                    (goal-oriented)   (opportunistic)

     Structured         Data              Data
     Data              Retrieval         Mining

     Unstructured    Information          Text
     Data (Text)      Retrieval          Mining

18
 Searching


     Input Value                               Returns
        [Jim]                    Jimmy   “Similar Matches”
                                  Jim    All Records Found:
                                 James         Jimmy
              Word Replacement                   Jim
                    Lists                      James

     Transformed
                                 JAMES
      Input Value
                                 JAMES
        [JAMES]
                                 JAMES




19
             Motivation for Text Mining

     •   Approximately 90% of the world’s data is held in
         unstructured formats (source: Oracle Corporation)
     •   Information intensive business processes demand that we
         transcend from simple document retrieval to “knowledge”
         discovery.



                                 Structured Numerical or Coded
                                 Information
                      10%


                               Unstructured or Semi-structured
                90%            Information


20
     Convergence of Disciplines Example




21
 Techniques for attacking text data:


     ØRules-based
     ØStatistical Text Analysis and Clustering
     ØLinguistic and Semantic Clustering
     ØSupport Vector Machines
     ØPattern Matching or other statistical algorithms
     ØNeural Networks

     ØCombination of methods from above



             Text is like a data iceberg
22
 Claims processing – Progress notes and Diaries


                               Service




•Medical Management Staff     CLAIMS               •Home Office Staff
•Special Investigation Unit   ADJUSTER             •Field Office Claim Staff
•NICB                                              •Insured Risk Manager
•Vendor Management                                 •Agent or Broker
•Consulting Engineers
•Hearing Representative              •Diary forward – “call Dr Jones next week”
•Structured Settlement Unit          •Business Rule – large loss review
•Recovery Staff                      •System Reminder – update case reserves
•Legal Staff                         •Correspondence Tracking – legal letter sent
23
     Semantic processing:
      Named Entity Extraction



     • Identify and type language features
     • Examples:
          • People names
          • Company names
          • Geographic location names
          • Dates
          • Monetary amount
          • Phone #, zipcodes, SSN, FEIN
          • Others… (domain specific)
24
     Feedback to UW




25
               Data Quality:
     Opportunities, Data, and Examples




26

								
To top