Data Mining on ICDM Submission Data by 6izv2U

VIEWS: 4 PAGES: 20

									                  Data Mining
            on ICDM Submission Data

                          Shusaku Tsumoto
                     Ning Zhong and Xindong Wu



ICDM 2004 Business Meeting 11/4/2004             1
                                     Data Mining
                               on ICDM Submission Data
         38 countries, 445 Submissions
         Regular Papers: 39 (9%)
         Short Papers: 66 (14.8%)

         High Acceptance Ratio (Regular)
            – Germany:                 4/15 (26.7%)
            – Finland:                 2/ 9   (22.2%)
            – USA:                     20/109 (18.3%)



ICDM 2004 Business Meeting 11/4/2004                     2
                                   Country     Regular        Short        Total         Ratio

                                   USA                   20           28           109           44.0%

                                   China                  3            4            55           12.7%

                                   UK                     1            6            39           17.9%

                                   Japan                  0            5            28           17.9%

                                   Canada                 3            3            25           24.0%

                                   Taiwan                 0            1            18            5.6%


Country                            Australia

                                   Germany
                                                          2

                                                          4
                                                                       1

                                                                       5
                                                                                    17

                                                                                    15
                                                                                                 17.6%

                                                                                                 60.0%

                                   France                 0            2            14           14.3%

                                   India                  1            0            14            7.1%

                                   Singapore              0            3            12           25.0%

                                   Brazil                 0            1            12            8.3%

                                   Italy                  2            1            10           30.0%

                                   Finland                2            1             9           33.3%

                                   Spain                  0            1             7           14.3%

                                   HongKong               1            1             6           33.3%

                                   Top 15                39           63           390           26.2%

                                   Total                 39           66           445           23.8%

ICDM 2004 Business Meeting 11/4/2004                                                                     3
                                         Data Mining
                                   on ICDM Submission Data
    Top 5 Areas of Submissions:
       – Data mining applications
       – Data mining and machine learning algorithms and methods
       – Mining text and semi-structured data, and mining temporal, spatial and multimedia
         data
       – Data pre-processing, data reduction, feature selection and feature transformation
       – Soft computing and uncertainty management for data mining

    High Acceptance Ratio Areas (Regular+Short)
      – Quality assessment and interestingness metrics of data mining results
                                                         5/10     50.0%
      – Data pre-processing, data reduction, feature selection and feature
        transformation                                   14/35    40.0%
      – Complexity, efficiency, and scalability issues in data mining
                                                         4/11     36.4%
    ICDM 2004 Business Meeting 11/4/2004                                                     4
                                                                         Regul
         Topic                                                                   Short   Total   Ratio
                                                                         ar
         Data mining applications                                            4      10      84   16.7%
         Data mining and machine learning algorithms and methods             9      20      81   35.8%
         Mining text and semi-structured data, and mining temporal,
                                                                             3       8      44   25.0%
         spatial and multimedia data
         Data pre-processing, data reduction, feature selection and
                                                                             7       7      35   40.0%
         feature transformation
         Soft computing and uncertainty management for data mining                   3      34    8.8%


Topics   Foundations of data mining
         Mining data streams
                                                                             2
                                                                             3
                                                                                     1
                                                                                     4
                                                                                            26
                                                                                            25
                                                                                                 11.5%
                                                                                                 28.0%
         Human-machine interaction and visual data mining                            1      16    6.3%
         Security, privacy and social impact of data mining                  2       1      15   20.0%
         Data and knowledge representation for data mining                   1       1      12   16.7%
         Pattern recognition and trend analysis                                      1      11    9.1%
         Complexity, efficiency, and scalability issues in data mining       2       2      11   36.4%
         Quality assessment and interestingness metrics of data mining
                                                                             2       3      10   50.0%
         results
         Statistics and probability in large-scale data mining               1               9   11.1%
         Integration of data warehousing, OLAP and data mining                       1       9   11.1%
         Collaborative filtering/personalization                                     2       7   28.6%
         Post-processing of data mining results                              1       1       7   28.6%
         Others                                                              2               6   33.3%
         High performance and parallel/distributed data mining               1               2   50.0%
         Query languages and user interfaces for mining                                      1    0.0%
                                                                                                     5
         Total                                                              39      66     445   23.8%
                                     Corresponding Analysis
                                   (Country vs Final Decision)
                                                             2
                                                  r2=0.177
                                                         1.5 Slovenia

            Regular                         Finland        1
                                                       Italy
                                                                  Australia India
                                      Hong Kong          0.5 Canada

                       Germany              USA                                          r1=0.378
                                                             0   Reject
-2              -1.5             -1             -0.5             0 UK      0.5       1        1.5
                                                       -0.5
                                                                              France
                                                                          Japan
                             Short
                                                         -1


     ICDM 2004 Business Meeting 11/4/2004
                                                       -1.5
                                                                                                    6
                                   Corresponding Analysis
                                  (Topics vs Final Decision)
                                 1.5              r2=0.184
                          Collaborative Filtering
  Applications                     1              Short
                          Reject
                                 0.5    DM Methods
                                                           Quality-assessment
Soft-computing
                                      0                Preprocessing, Feature Selection
-1.5         -1           -0.5             0    0.5      1       1.5       2        2.5
                                  -0.5
                                 Security, privacy                              r1=0.280
                            -1                                   Regular
        Statistics and probability
                           -1.5

                                    -2                                High-performance
                                  -2.5
 ICDM 2004 Business Meeting 11/4/2004 -3                                                  7
                                               Post-processing
                                Corresponding Analysis
         Country vs Final Decision
            – Regular: Germany, USA
            – Short: ?
            – Reject: Most of the countries are located near this region.


         Topics vs Final Decision
            – Regular: Quality Assessment,
                       Preprocessing/Feature Selection
            – Short: DM/ML Methods, Collaborative Filtering
            – Reject: DM Applications


ICDM 2004 Business Meeting 11/4/2004                                        8
                                         Rule Mining
                                   on ICDM Submission Data
   Datasets
      – Sample Size: 445
      – Attributes: 5
         • Paper No. : ordered by submission date
         • # of Authors
         • # of Characters in Title
         • Country
         • Category
      – Analyzed by Clementine 7.1 (and SPSS12.0J)



ICDM 2004 Business Meeting 11/4/2004                         9
                                           Rule Mining (C5.0)
                                       on ICDM Submission Data
   C5.0
      – [Topic=Mining semi-structured data,…] & [129< Paper No.<=369]
        => Reject (Confidence 0.87, Support 10)
      – [Country=USA] & [Topic=Mining semi-structured data,…] &
        [Paper No.>369] & [# of Authors <=3]
          =>Accept (Confidence 0.667, Support 3)
      – [Topic=Preprocessing/Feature Selection] & [# of Authors>4]
          => Accept (Confidence: 1.0, Support 3)

      – Topic, Paper No, # of Authors : Important Features



    ICDM 2004 Business Meeting 11/4/2004                                10
                                           Rule Mining (GRI)
                                       on ICDM Submission Data
    Generalized Rule Induction
       – [# of Authors <2] & [Paper No. <120.5]
           => Rejected (Confidence 96.0%, Support 24)
       – [# of Chars in Title< 27] & [Paper No. > 212]
           => Accepted (Confidence 100%, Support 5)

    Paper No., # of Chars in Title, # of Authors: Important Features




    ICDM 2004 Business Meeting 11/4/2004                                11
                               Multidimensional Scaling
                                       (2004)
                                   8
                                  0.
                Country
                                   6
                                  0.

                                   4
                                  0.
                                Decision
                                   2
                                  0.
                                               Review Score       Paper No.
               Topics
                                    0
 -1                  5
                   -0.                   0             0.
                                                        5     1                5
                                                                              1.
                                  2
                                -0.          # of Authors

                                  4
                                -0.
      # of Chars in Title
                                   -0.
ICDM 2004 Business Meeting 11/4/2004 6                                             12
                                       Summary (2004) of Mining
                                       on ICDM Submission Data
   Do not submit a paper too fast !
     – Reflection not only on the contents, but also on the titles needed
   Mining Text/Web/Semi-structured Data are very popular.
   # of Application papers are growing now. (But, many: rejected)
   Strong Topics
     – Preprocessing/Feature-Selection
     – Postprocessing
     – Security and Privacy
   Several topics are emerging in ICDM2004:
     – Mining Data Streams
     – Collaborative Filtering
     – Quality Assessment
    ICDM 2004 Business Meeting 11/4/2004                                    13
                                          5.00

                                                                  1,176
                                                                  1,169


                                          4.00

 Comparison
between 02-04                             3.00

Review Scores:                    score


   Box-plot                               2.00




                                          1.00




                                          0.00


                                                 2002   2003   2004
                                                        year
  ICDM 2004 Business Meeting 11/4/2004                                    14
                                             Comparison between 02-04
                                                   Countries

                  Acceptance                           Acceptance               Acceptance
 Country                                     Country                  Country
                  Ratio (2002)                         Ratio (2003)             Ratio (2004)


Hong Kong                  64.7% Israel                     55.0% Germany             60.0%

USA                        47.9% Hong Kong                  50.0% USA                 44.0%

Canada                     45.5% Japan                      37.0% Finland             33.0%
Finland                    33.3% USA                        33.0% Hong Kong           33.0%
France                     33.3% Germany                    32.0% Italy               30.0%


      ICDM 2004 Business Meeting 11/4/2004                                                15
                            Comparison between 02 and 04
                                      Topics
Top 5      Acceptance   Top 5            Acceptance   Top 5                 Acceptance
in 2002    Ratio        in 2003          Ratio        in 2004               Ratio

Graph                   Process-
                75.0%                         80.0% Quality Assessment           50.0%
Mining                  centric DM

Temporal                Security,                     Preprocessing,
                52.6%                         57.0%                              40.0%
Data                    privacy                       Feature Selection

                        Statistics and                Complexity/Scalabil
Theory          42.9%                         47.0%                              36.4%
                        Probability                   ity

Text                    Visual Data                   DM and ML
                42.1%                         38.0%                              35.8%
Mining                  Mining                        Methods
                        Post-                         Collaborative
Rule            41.7%                         41.7%                              28.6%
                        processing                    Filtering
                                                      Post-processing            28.6%
                                                                                     16
                                        Multidimensional Scaling
                                            (2003 and 2004)
                     8
                    0.
                                                       Topological structure w.r.t. similarities
         Country0.6                                    seems not to be changed in 2003
                    0.
                     4                                 and 2004.
                     Decision
                     2
                    0.


        Topics       0                    Paper No.                           2004
-1        -0.
            5            0   Review Score 1
                                   5
                                  0.             5
                                                1.
                     2
                   -0.
                             # of Authors
                   -0.
                     4
                                                                   8
                                                           Country0.
                in
     # of Chars -0.6Title                                            6
                                                                    0.

                                                                     4
                                                                    0.

                    2003                                         Decision
                                                                   2
                                                                  0.

                                                           Topics 0    Review Score Paper No
                                                  -1           5
                                                             -0.          0         0.
                                                                                     5       1         5
                                                                                                      1.
                                                                      2
                                                                    -0.
                                                                              # of Authors
                                                                      4
                                                                    -0.
       ICDM 2004 Business Meeting 11/4/2004                                                      17
                                                                   -0.Title
                                                       # of Chars in 6
                                     Data Mining
                               on ICDM Submission Data
       Acknowledgements
        – Many thanks to
           • PC chairs, Vice Chairs and PC
             members
           • All the authors
           • All the contributors to ICDM2004
        – See you again in ICDM2005!

ICDM 2004 Business Meeting 11/4/2004                     18
                               Multidimensional Scaling
                                       (2004)
                                   8
                                  0.
                Country
                                   6
                                  0.

                                   4
                                  0.
                                Decision
                                   2
                                  0.
                                               Review Score       Paper No.
               Topics
                                    0
 -1                  5
                   -0.                   0             0.
                                                        5     1                5
                                                                              1.
                                  2
                                -0.          # of Authors

                                  4
                                -0.
      # of Chars in Title
                                   -0.
ICDM 2004 Business Meeting 11/4/2004 6                                             19
                                 Multidimensional Scaling
                                         (2003)
                                        8
                                       0.
                 Country
                                        6
                                       0.

                                        4
                                       0.
                                        Decision

                                        2
                                       0.
                                                                Paper No.
                                        0        Review Score
                             Topics
 -1                   5
                    -0.                     0              0.
                                                            5   1            5
                                                                            1.
                                   -0.
                                     2          # of Authors


                                     4
                                   -0.
      # of Chars in Title
ICDM 2004 Business Meeting 11/4/2004                                             20
                                     6
                                   -0.

								
To top