Min Occ Topic Local
1 91 55 10218 9994
2 92.5 73.5 5595 3720
3 92.5 82.5 4027 2393 12000
4 91.5 83 3202 1762
5 91 84 2649 1370 10000
6 90.5 83 2308 1133
7 90.25 81 2061 975
Number of Features
8000
8 90.25 82 1851 844
9 90 81.5 1670 741
6000
Size Acc
topic local topic local 4000
0.5 6 3 50.5 98.5
0.4 12 3 62.5 98.5
0.3 15 4 68.5 98.5 2000
0.2 27 4 70 98.5
0.1 35 4 74.25 98.5 0
0 46 6 78.5 98.5
-0.1 70 7 83.75 98.5
-0.2 102 9 85.25 98.5
-0.3 139 11 88.25 98
-0.4 208 13 88.75 98
-0.5 306 23 90.25 98
-0.6 431 39 91.75 97
-0.7 651 70 92.25 98.5
-0.8 908 95 93 99.5
-0.9 1461 248 93 97.5
-1 1929 529 92.5 96
-1.1 2616 1030 92 97
-1.2 4076 2147 91.5 92
-1.3 4814 2810 91.5 92
-1.4 10218 9993 91 55 100
-1.5 10218 9993 91 55
95
90
85
Accuracy in %
0.97 80
75
70
65
Extraction Success & Exact & Too Much &
Quantity & 191 & 4 & 60
Percentage & 95.50% & 2.00% &
55
50
50
1
-1.5
-1.4
-1.3
12000 -1.2
-1.1
10000 -1
-0.9
-0.8
Number of Features
8000 -0.7
-0.6
6000 -0.5
-0.4
-0.3
4000
-0.2
-0.1
2000 0
0.1
0 0.2
-1.5 -1.4 -1.3 -1.2 -1.1 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.3
Infromation Gain Threshold 0.4
0.5
100
90
80
Accuracy in %
70
60
50
40
-1.5 -1.4 -1.3 -1.2 -1.1 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0
Informaton Gain Threshold
55 & 9994 \\
73.5 & 3720
82.5 & 2393
83 & 1762
84 & 1370
83 & 1133
81 & 975
82 & 844
81.5 & 741
bristol
Ranking
Bristol
Durham
Manchester
London
0.5 & 6 & 50.5
0.4 & 12 & 62.5
0.3 & 15 & 68.5
0.2 & 27 & 70
0.1 & 35 & 74.25
0 & 46 & 78.5
-0.1 & 70 & 83.75
-0.2 & 102 & 85.25
-0.3 & 139 & 88.25
-0.4 & 208 & 88.75
-0.5 & 306 & 90.25
-0.6 & 431 & 91.75
-0.7 & 651 & 92.25
-0.8 & 908 & 93
-0.9 & 1461 & 93
-1 & 1929 & 92.5
-1.1 & 2616 & 92
-1.2 & 4076 & 91.5
-1.3 & 4814 & 91.5
-1.4 & 10218 & 91
-1.5 & 10218 & 91
12000
10000
8000
6000 Topic
Location
4000
2000
0
1 2 3 4 5 6 7 8 9
Document Frequency Threshold
None
2
1.00%
Topic
Location
Too Little & None
3 & 2
1.50% & 1.00%
1 2 3 4 5 6 7 8 9
Document Frequency Threshold
10218 9993 91 55 tp
10218 9993 91 55 0.28119684 0.5 138
4814 2810 91.5 92 0.55 137
4076 2147 91.5 92 0.6 136
2616 1030 92 97 0.65 136
1929 529 92.5 96 0.7 135
1461 248 93 97.5 0.75 131
908 95 93 99.5 0.8 127
651 70 92.25 98.5 0.85 126
431 39 91.75 97 0.9 119
Topic 306 23 90.25 98 0.95 109
208
Location 13 88.75 98
139 11 88.25 98
102 9 85.25 98.5
70 7 83.75 98.5
46 6 78.5 98.5 2 sets 2
35 4 74.25 98.5 3 sets 1
27 4 70 98.5
15 4 68.5 98.5
12 3 62.5 98.5
6 3 50.5 98.5
2sets 6
3sets
---- tp= fp= fn= tn=
0.50 138 12 12 38
0.55 137 13 12 38
0.60 136 14 12 38
0.65 136 14 13 37
0.70 134 16 14 36
Topic 0.75 131 19 14 36 ---- tp= fp= fn= tn=
Location 0.80 127 23 14 36 Accuracy = 86.0% (172/200) (classificati
0.85 126 24 16 34 50 89
0.90 119 31 19 31 Accuracy = 87.5% (175/200) (classificati
0.95 109 41 22 28 60 94
Accuracy = 90.5% (181/200) (classificati
70 89
Accuracy = 92.0% (184/200) (classificati
80 88
Accuracy = 87.5% (175/200) (classificati
90 86
Accuracy = 88.0% (176/200) (classificati
100 85
Accuracy = 83.5% (167/200) (classificati
110 87
Accuracy = 86.5% (173/200) (classificati
120 86
Accuracy = 87.5% (175/200) (classificati
130 95
50 124
60 128
70 119
80 118
90 131
100 132
110 131
120 125
130 134
35 18 3 3 11
36 17 2 3 14 100
34 15 3 3 13 2
98
34 11 5 2 16 3
Classification Accuracy
4 96
94
92
90
88
86
& Acc & Rate
False Positive & \\
False Negative Rate
& 82.86 & 8.57 & 8.57 \\
& 86.11 & 5.56 & 8.33 \\
& 82.35 & 8.82 & 8.82 \\
& 79.41 & 14.71 & 5.88 \\
Document Frequency Threshold & &
Vocabulary Size Classification Accuracy
1 & 10218 & 91
2 & 5595 & 92.5
3 & 4027 & 92.5
4 & 3202 & 91.5
5 & 2649 & 91
6 & 2308 & 90.5
7 & 2061 & 90.25
8 & 1851 & 90.25
9 & 1670 & 90
Kernel Type & Accuracy in %& Time in s
LINEAR & 86.5 & 28.306
Polynomial & 86.25 & 29.762
RBF & 91.5 & 29.896
fp fn tn
15 7 40 0.95172414 0.27272727
13 9 41 0.93835616 0.24074074
9 11 44 0.92517007 0.16981132
7 11 46 0.92517007 0.13207547
4 14 47 0.90604027 0.07843137
2 18 49 0.87919463 0.03921569
1 22 50 0.85234899 0.01960784
1 23 50 0.84563758 0.01960784
1 30 50 0.79865772 0.01960784
1 40 50 0.73154362 0.01960784
3 48 47
4 95 50 1
0.95
2 44 48 0.9
True Positive Rate
0.85
0.8
0.75
0.7
0 0.05
% (172/200) (classification)
21 3 37
% (175/200) (classification)
18 2 36
% (181/200) (classification)
12 5 44
% (184/200) (classification)
4 11 47
% (175/200) (classification)
11 11 42
% (176/200) (classification)
10 11 44
% (167/200) (classification)
29 0 34
% (173/200) (classification)
16 6 42
% (175/200) (classification)
18 1 36
199 Acc Precision
19 14 42 0.83417085 0.86713287
16 13 42 0.85427136 0.88888889
15 21 44 0.81909548 0.8880597
14 23 44 0.81407035 0.89393939
14 11 43 0.87437186 0.90344828
13 8 47 0.895 0.91034483
32 3 33 0.8241206 0.80368098
20 12 42 0.83919598 0.86206897
28 4 33 0.83919598 0.82716049
100 Location Topic 1
92 91 0.98
98
98.6 94.3 0.96
96 99.5 93 Accuracy Rate
0.94
0.92
94
0.9
92 Location
0.88
Topic 0.86
90
0.84
88 0.82
86 0.8
2 3 4 50 60 70
Number of Classes Size of Training Set (per Cla
//
//
//
//
//
//
//
//
//
//
\\
\\
\\
\\
Total Acc
0.035 0.2 0.89
0.045 0.205 0.89
0.055 0.22 0.9
0.055 0.23 0.91
0.07 0.235 0.91
0.09 0.245 0.9
0.11 0.25 0.885
0.115 0.25 0.88
0.15 0.25 0.845
0.2 0.25 0.795
0.05 0.1 0.15 0.2 0.25 0.3
False Positive Rate
Recall
0.89855072
0.90780142
0.85
0.83687943
0.92253521
0.94285714
0.97761194
0.91240876
0.97101449
Topic
70 80 90 100 110 120 130
Size of Training Set (per Class)