Docstoc

robert

Document Sample
robert Powered By Docstoc
					   An Investigation into the
Relationship between Semantic
 and Content Based Similarity
          Using LIDC
        Grace Dasovich
           Robert Kim
      Midterm Presentation
        August 21 2009
                      Outline
                      Outline

•       Related Work
•       Data
•       Modeling Approach and Results
    –     Similarity Measures
    –     Artificial Neural Network
    –     Multivariate Linear Regression
•       Conclusions
•       Future Work
              Related Work


• Computer-Aided Diagnosis (CADx) based
  on low-level image features
  – Armato et al. developed a linear discriminant
    classifier using features of lung nodules
  – Need to find the relationship between the
    image features and radiologists’ ratings
               Related Work


• Image features and the semantic ratings
  – Lung Interpretations
    • Barb et al. developed Evolutionary System for
      Semantic Exchange of Information in Collaborative
      Environments (ESSENCE)
    • Raicu et al. used ensemble classifiers and decision
      trees to predict semantic ratings
    • Samala et al. used several combinations of image
      features and the radiologists’ ratings to classify
      nodules
                 Related Work


– Similarity
   • Li et al. investigated four different methods to
     compute similarity measures for lung nodules
      –   Feature-based
      –   Pixel-value-difference
      –   Cross correlation
      –   ANN
                   Data
                  Materials
• LIDC Dataset
• 149 Unique Nodules
  – One slice per nodule, largest nodule area
• 9 Semantic Characteristics
  – Calcification and Internal Structure had little
    variation, thus were not used
• 64 Content Features
  – Shape, size, intensity, and texture

                                                      6
                       Outline

•       Related Work
•       Data
•       Modeling Approach and Results
    –     Similarity Measures
    –     Artificial Neural Network
    –     Multivariate Linear Regression
•       Conclusions
•       Future Work
         Similarity Measures

• Cosine Similarity


• Jeffrey Divergence


• Euclidean Distance
                             Similarity Measures
                    0.6


                    0.5


                    0.4
Cosine Similarity




                    0.3


                    0.2


                    0.1


                      0


                    -0.1
                        0   0.1   0.2   0.3   0.4     0.5     0.6   0.7   0.8   0.9   1
                                              Euclidean Distance
                            Similarity Measures
                      4


                     3.5


                      3
Jeffrey Divergence




                     2.5


                      2


                     1.5


                      1


                     0.5


                      0
                       0   0.1   0.2   0.3   0.4     0.5    0.6   0.7   0.8   0.9   1
                                             Euclidean Distance
        Similarity Measures


• Computed feature distance measures
                      Outline
                      Outline

•       Related Work
•       Data
•       Modeling Approach and Results
    –     Similarity Measures
    –     Artificial Neural Network
    –     Multivariate Linear Regression
•       Conclusions
•       Future Work
                    Methods

• Two three-layer ANNs
  – Input (64 neurons), hidden layer (5 neurons), output
    (1)
  – Input (64 neurons), hidden layer (5 neurons), output
    (7)
• Input = 64 feature distances
• Output = Semantic similarity or difference in
  semantic ratings
• Hyperbolic tangent function, backpropagation
  algorithm, 200 iterations
                 Methods

• ANN with a single output
  – 640 random pairs from all 109 nodules
  – 231 pairs from nodules with malignancy > 3
  – 496 pairs from nodules with area > 122 mm2
                 Methods


• ANN with seven outputs
  – 640 random pairs from all 109 nodules
                 Methods

• Leave-one-out method
  – Cosine similarity or Jeffrey divergence or
    difference in Semantic ratings used as
    teaching data
  – An ANN trained with entire dataset minus one
    image pair
  – The pair left out used for testing
  – Correlation between calculated radiologists’
    similarity and ANN output calculated
                 Methods


• ANN with a single output
  – 640 random pairs from all 109 nodules
  – 231 pairs from nodules with malignancy > 3
  – 496 pairs from nodules with area > 122 mm2

• ANN with seven outputs
  – 640 random pairs from all 109 nodules
               Results


• ANN using 640 random pairs
                Results


• ANN using 231 pairs with malignancy
  rating > 3
                Results


• ANN using 496 pairs with area > 122 mm2
                                  Results

• ANN output vs. target values using Jeffrey
  divergence for the 640 pairs (r = 0.438)
                 1

                0.9

                0.8

                0.7

                0.6
       Target




                0.5

                0.4

                0.3

                0.2

                0.1

                 0
                  0   0.1   0.2    0.3    0.4     0.5   0.6   0.7   0.8
                                         Output
                Results


• ANN using random 640 pairs and the
  Jeffrey divergence with seven semantic
  ratings
                       Outline
                       Outline
•       Related Work
•       Data
•       Modeling Approach and Results
    –     Similarity Measures
    –     Artificial Neural Network
    –     Multivariate Linear Regression
•       Conclusions
•       Future Work
                   Methods
                   Methods
• Normalization of Features
  – Min-Max Technique
  – Z-Score Technique
• Pair Selection
  – Looked for matches between k number of
    most similar images based on semantic and
    content




                                                24
                  Methods
                  Methods
• Multivariate Regression Analysis
  – Select features with highest correlation
    coefficients



  – Feature distance measures




                                               25
                 Methods


• Nodule Analysis
  – Determine differences between selected and
    non-selected nodules
  – Define requirements for our model
                            Results
                            Results
               1                                                     2000




                                                                            Number of Pairs
Correlation




              0.5                                                    1000




               0                                                     0
                0   2   4   6   8      10       12   14   16   18   20
                                    Threshold




                                                                                              27
             Results




          d(i, j)   d2(i, j)   exp(d(i, j))
Cosine    0.871     0.849        0.866
Jeffrey   0.647     0.633        0.608
                  Results
                  Results
R2 = 0.871
   Correlation Coefficient   Feature
             0.1175          Equivalent Diameter
             0.1085          Energy (Haralick)
             0.0823          Gabor Mean 135_05
             0.0647          Convex Area
             0.0467          Gabor STD 135_04
             0.0322          Min Intensity BG
             0.0295          Markov 4
             0.0280          Variance (Haralick)
             0.0265          Gabor STD 45_05
             0.0238          SD Intensity

                                                   29
                                        Results
                                        Results
           0.6




           0.5




           0.4




           0.3
Semantic




           0.2




           0.1




             0




           -0.1
                  0   0.1   0.2   0.3   0.4     0.5     0.6   0.7   0.8   0.9    1
                                              Content
                                                                                30
                                Results
                                Results
              Lobulation                                       Malignancy
 1                                                1
                                79 Nodules
                                70 Nodules

0.5                                              0.5




 0                                                0
      1   2       3         4                5         1   2       3        4   5

                Margin                                         Sphericity
 1                                                1

0.8                                              0.8

0.6                                              0.6

0.4                                              0.4

0.2                                              0.2

 0                                                0
      1   2       3         4                5         1   2       3        4   5

              Spiculation                                       Subtlety
 1                                                1




0.5                                              0.5




 0                                                0
      1   2       3         4                5         1   2       3        4   5

               Texture
 1




0.5



                                                                                    31
 0
      1   2       3         4                5
                                                       Results
                    Equivalent Diameter
                                                      Results                                      Energy
0.4                                                                    0.4
                                                      79 nodules
                                                      70 nodules
0.2                                                                    0.2


 0                                                                      0
 -2         0             2               4           6            8    -2            0               2               4           6
                    Gabor Mean 135 5                                                          Convex Area
0.4                                                                     1


0.2                                                                    0.5


 0                                                                      0
 -1         0             1               2           3            4    -2   0            2           4           6           8   10
                     Gabor SD 135 4                                                       Min Intensity BG
0.2                                                                    0.4


0.1                                                                    0.2


 0                                                                      0
 -2   -1        0             1       2           3           4    5    -3       -2           -1              0           1       2
                              Markov4                                                              Variance
 1                                                                      1


0.5                                                                    0.5


 0                                                                      0
 -1   0         1             2       3           4           5    6    -2       0            2               4           6       8
                         Gabor SD 45 5                                                        SD Intensity
0.2                                                                    0.2


0.1                                                                    0.1


 0                                                                      0
                                                                                                                                       32
 -2    -1            0            1           2           3        4    -2            0               2               4           6
                                        Results
                                        Results
                           A                                   B
           0.4                                   0.2
                               79 Nodules                          79 Nodules
           0.3                 70 Nodules       0.15               70 Nodules

           0.2                                   0.1

           0.1                                  0.05

             0                                    0
             -5        0        5        10       -5       0        5       10
                           C                                   D
           0.8                                   0.8
                               79 Nodules                          79 Nodules
           0.6                 70 Nodules        0.6               70 Nodules

           0.4                                   0.4

           0.2                                   0.2

             0                                    0
              1    2       3        4       5      1   2       3        4       5

A. Equivalent Diameter, B. Standard Deviation of Intensity, C. Malignancy, D. Subtlety
             Conclusions
          Preliminary Issues

• The ANN also is not yet sufficient to
  predict semantic similarity from content
  – Best correlation 0.438
  – Malignancy correlation 0.521
  – Jeffrey performed better unlike linear model


• A semantic gap still exists
               Conclusions
               Conclusions

• Our linear model applies to a specific type
  of nodule
  – Characteristics: High malignancy, high
    texture, low lobulation, and low spiculation
  – Features: Larger diameter, greater intensity
• Linear models are not sufficient for
  determination of similarities
  – R2 of 0.871 with chosen nodules
                                                   35
              Future Work
              Future Work

• Reduce variability among radiologists
  – Use only nodules with radiologists’ agreement


• Find best combination of content features
  – 64 may be too many
  – Currently only using 2D
               Future Work


• Different semantic distance measures
  – Some ratings are ordinal, Jeffery is for
    categorical


• Different methods of machine learning
  – Incorporate radiologists’ feedback into training
  – Ensemble of classifiers
Thanks for Listening
Thanks for Listening


Any Questions?



                       38

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:8/8/2012
language:
pages:38