robert by wanghonghx

VIEWS: 6 PAGES: 38

									   An Investigation into the
Relationship between Semantic
 and Content Based Similarity
          Using LIDC
        Grace Dasovich
           Robert Kim
      Midterm Presentation
        August 21 2009
                      Outline
                      Outline

•       Related Work
•       Data
•       Modeling Approach and Results
    –     Similarity Measures
    –     Artificial Neural Network
    –     Multivariate Linear Regression
•       Conclusions
•       Future Work
              Related Work


• Computer-Aided Diagnosis (CADx) based
  on low-level image features
  – Armato et al. developed a linear discriminant
    classifier using features of lung nodules
  – Need to find the relationship between the
    image features and radiologists’ ratings
               Related Work


• Image features and the semantic ratings
  – Lung Interpretations
    • Barb et al. developed Evolutionary System for
      Semantic Exchange of Information in Collaborative
      Environments (ESSENCE)
    • Raicu et al. used ensemble classifiers and decision
      trees to predict semantic ratings
    • Samala et al. used several combinations of image
      features and the radiologists’ ratings to classify
      nodules
                 Related Work


– Similarity
   • Li et al. investigated four different methods to
     compute similarity measures for lung nodules
      –   Feature-based
      –   Pixel-value-difference
      –   Cross correlation
      –   ANN
                   Data
                  Materials
• LIDC Dataset
• 149 Unique Nodules
  – One slice per nodule, largest nodule area
• 9 Semantic Characteristics
  – Calcification and Internal Structure had little
    variation, thus were not used
• 64 Content Features
  – Shape, size, intensity, and texture

                                                      6
                       Outline

•       Related Work
•       Data
•       Modeling Approach and Results
    –     Similarity Measures
    –     Artificial Neural Network
    –     Multivariate Linear Regression
•       Conclusions
•       Future Work
         Similarity Measures

• Cosine Similarity


• Jeffrey Divergence


• Euclidean Distance
                             Similarity Measures
                    0.6


                    0.5


                    0.4
Cosine Similarity




                    0.3


                    0.2


                    0.1


                      0


                    -0.1
                        0   0.1   0.2   0.3   0.4     0.5     0.6   0.7   0.8   0.9   1
                                              Euclidean Distance
                            Similarity Measures
                      4


                     3.5


                      3
Jeffrey Divergence




                     2.5


                      2


                     1.5


                      1


                     0.5


                      0
                       0   0.1   0.2   0.3   0.4     0.5    0.6   0.7   0.8   0.9   1
                                             Euclidean Distance
        Similarity Measures


• Computed feature distance measures
                      Outline
                      Outline

•       Related Work
•       Data
•       Modeling Approach and Results
    –     Similarity Measures
    –     Artificial Neural Network
    –     Multivariate Linear Regression
•       Conclusions
•       Future Work
                    Methods

• Two three-layer ANNs
  – Input (64 neurons), hidden layer (5 neurons), output
    (1)
  – Input (64 neurons), hidden layer (5 neurons), output
    (7)
• Input = 64 feature distances
• Output = Semantic similarity or difference in
  semantic ratings
• Hyperbolic tangent function, backpropagation
  algorithm, 200 iterations
                 Methods

• ANN with a single output
  – 640 random pairs from all 109 nodules
  – 231 pairs from nodules with malignancy > 3
  – 496 pairs from nodules with area > 122 mm2
                 Methods


• ANN with seven outputs
  – 640 random pairs from all 109 nodules
                 Methods

• Leave-one-out method
  – Cosine similarity or Jeffrey divergence or
    difference in Semantic ratings used as
    teaching data
  – An ANN trained with entire dataset minus one
    image pair
  – The pair left out used for testing
  – Correlation between calculated radiologists’
    similarity and ANN output calculated
                 Methods


• ANN with a single output
  – 640 random pairs from all 109 nodules
  – 231 pairs from nodules with malignancy > 3
  – 496 pairs from nodules with area > 122 mm2

• ANN with seven outputs
  – 640 random pairs from all 109 nodules
               Results


• ANN using 640 random pairs
                Results


• ANN using 231 pairs with malignancy
  rating > 3
                Results


• ANN using 496 pairs with area > 122 mm2
                                  Results

• ANN output vs. target values using Jeffrey
  divergence for the 640 pairs (r = 0.438)
                 1

                0.9

                0.8

                0.7

                0.6
       Target




                0.5

                0.4

                0.3

                0.2

                0.1

                 0
                  0   0.1   0.2    0.3    0.4     0.5   0.6   0.7   0.8
                                         Output
                Results


• ANN using random 640 pairs and the
  Jeffrey divergence with seven semantic
  ratings
                       Outline
                       Outline
•       Related Work
•       Data
•       Modeling Approach and Results
    –     Similarity Measures
    –     Artificial Neural Network
    –     Multivariate Linear Regression
•       Conclusions
•       Future Work
                   Methods
                   Methods
• Normalization of Features
  – Min-Max Technique
  – Z-Score Technique
• Pair Selection
  – Looked for matches between k number of
    most similar images based on semantic and
    content




                                                24
                  Methods
                  Methods
• Multivariate Regression Analysis
  – Select features with highest correlation
    coefficients



  – Feature distance measures




                                               25
                 Methods


• Nodule Analysis
  – Determine differences between selected and
    non-selected nodules
  – Define requirements for our model
                            Results
                            Results
               1                                                     2000




                                                                            Number of Pairs
Correlation




              0.5                                                    1000




               0                                                     0
                0   2   4   6   8      10       12   14   16   18   20
                                    Threshold




                                                                                              27
             Results




          d(i, j)   d2(i, j)   exp(d(i, j))
Cosine    0.871     0.849        0.866
Jeffrey   0.647     0.633        0.608
                  Results
                  Results
R2 = 0.871
   Correlation Coefficient   Feature
             0.1175          Equivalent Diameter
             0.1085          Energy (Haralick)
             0.0823          Gabor Mean 135_05
             0.0647          Convex Area
             0.0467          Gabor STD 135_04
             0.0322          Min Intensity BG
             0.0295          Markov 4
             0.0280          Variance (Haralick)
             0.0265          Gabor STD 45_05
             0.0238          SD Intensity

                                                   29
                                        Results
                                        Results
           0.6




           0.5




           0.4




           0.3
Semantic




           0.2




           0.1




             0




           -0.1
                  0   0.1   0.2   0.3   0.4     0.5     0.6   0.7   0.8   0.9    1
                                              Content
                                                                                30
                                Results
                                Results
              Lobulation                                       Malignancy
 1                                                1
                                79 Nodules
                                70 Nodules

0.5                                              0.5




 0                                                0
      1   2       3         4                5         1   2       3        4   5

                Margin                                         Sphericity
 1                                                1

0.8                                              0.8

0.6                                              0.6

0.4                                              0.4

0.2                                              0.2

 0                                                0
      1   2       3         4                5         1   2       3        4   5

              Spiculation                                       Subtlety
 1                                                1




0.5                                              0.5




 0                                                0
      1   2       3         4                5         1   2       3        4   5

               Texture
 1




0.5



                                                                                    31
 0
      1   2       3         4                5
                                                       Results
                    Equivalent Diameter
                                                      Results                                      Energy
0.4                                                                    0.4
                                                      79 nodules
                                                      70 nodules
0.2                                                                    0.2


 0                                                                      0
 -2         0             2               4           6            8    -2            0               2               4           6
                    Gabor Mean 135 5                                                          Convex Area
0.4                                                                     1


0.2                                                                    0.5


 0                                                                      0
 -1         0             1               2           3            4    -2   0            2           4           6           8   10
                     Gabor SD 135 4                                                       Min Intensity BG
0.2                                                                    0.4


0.1                                                                    0.2


 0                                                                      0
 -2   -1        0             1       2           3           4    5    -3       -2           -1              0           1       2
                              Markov4                                                              Variance
 1                                                                      1


0.5                                                                    0.5


 0                                                                      0
 -1   0         1             2       3           4           5    6    -2       0            2               4           6       8
                         Gabor SD 45 5                                                        SD Intensity
0.2                                                                    0.2


0.1                                                                    0.1


 0                                                                      0
                                                                                                                                       32
 -2    -1            0            1           2           3        4    -2            0               2               4           6
                                        Results
                                        Results
                           A                                   B
           0.4                                   0.2
                               79 Nodules                          79 Nodules
           0.3                 70 Nodules       0.15               70 Nodules

           0.2                                   0.1

           0.1                                  0.05

             0                                    0
             -5        0        5        10       -5       0        5       10
                           C                                   D
           0.8                                   0.8
                               79 Nodules                          79 Nodules
           0.6                 70 Nodules        0.6               70 Nodules

           0.4                                   0.4

           0.2                                   0.2

             0                                    0
              1    2       3        4       5      1   2       3        4       5

A. Equivalent Diameter, B. Standard Deviation of Intensity, C. Malignancy, D. Subtlety
             Conclusions
          Preliminary Issues

• The ANN also is not yet sufficient to
  predict semantic similarity from content
  – Best correlation 0.438
  – Malignancy correlation 0.521
  – Jeffrey performed better unlike linear model


• A semantic gap still exists
               Conclusions
               Conclusions

• Our linear model applies to a specific type
  of nodule
  – Characteristics: High malignancy, high
    texture, low lobulation, and low spiculation
  – Features: Larger diameter, greater intensity
• Linear models are not sufficient for
  determination of similarities
  – R2 of 0.871 with chosen nodules
                                                   35
              Future Work
              Future Work

• Reduce variability among radiologists
  – Use only nodules with radiologists’ agreement


• Find best combination of content features
  – 64 may be too many
  – Currently only using 2D
               Future Work


• Different semantic distance measures
  – Some ratings are ordinal, Jeffery is for
    categorical


• Different methods of machine learning
  – Incorporate radiologists’ feedback into training
  – Ensemble of classifiers
Thanks for Listening
Thanks for Listening


Any Questions?



                       38

								
To top