Template-Filtered Headline Summarization

Template-Filtered Headline Summarization Liang Zhou and Eduard Hovy USC/ISI {liangz, hovy}@isi.edu Talk Outline • • • • • • Previous work. Key word and phrase selection. Template creation. Headline refinement with templates. Evaluations. Future work. General Method • Two-step process: – Content selection. – Surface realization. • Related work: – Banko et al. (2000): MT-inspired. – Zajic et al. (2002): HMM model. – Dorr et al. (2003): non statistical, parse-and-trim. Selection Models • Bag-of-words models: 1. Sentence position M N k 1 j 1 Count _ Posi   P(H k | W j ) P(H | Posi )  Count_ Posi 2. Headline word position  Count_ Pos i 1 Q Q P(Posi | W h )   M Count(Posi ,W h ) Count(Pos i 1 Q Q ,W h ) 3. Text model  P(H w | Tw )  (doc _ tf (w, j) title_ tf (w, j)) j 1  doc _ tf (w, j) j 1 M 4. Unigram headline model 5. Bigram headline model  Keyword Selection • Scoring method: – – – – Unigram overlap. No stop words. Apply each model. Apply all combination of models. Model(s) 12345 2345 1345 1245 1235 1234 345 245 235 234 145 135 134 125 145 123 45 35 34 25 24 23 15 14 13 12 5 4 3 2 1 10w 79 74 74 63 87 96 61 54 82 67 55 84 97 70 55 131 46 72 58 62 38 100 72 69 154 74 58 35 86 45 113 20w 118 110 116 99 122 149 103 94 117 119 101 113 144 102 101 181 84 107 103 96 80 150 98 111 204 138 84 60 137 94 234 30w 147 145 146 144 155 187 134 137 148 167 126 144 186 146 126 205 117 134 136 135 114 187 139 144 244 174 114 87 169 135 275 1. Sentence position 2. Headline word position 3. Text model 4. Headline model 5. Bigram headline model 40w 189 178 176 176 187 214 170 168 183 192 149 181 212 179 149 230 140 166 165 172 144 215 158 169 271 199 140 111 208 163 298 50w 216 206 208 202 223 230 199 192 212 217 193 216 234 208 193 250 182 204 196 204 179 235 203 193 292 232 171 136 227 197 310 • Decision: – Sentence position & text model. • Problem: – Just keywords. – Need readability Readability: Phrase Clustering • Best explained with an example: We Ask the Question… • Coherence problem. • Humans still perform at a much superior level. • Growing headlines from seed words works reasonable. • But little control over how they are grown… coherence • Is there a set of rules for writing headlines? • Can templates help? Template Creation • Inspired by Information Extraction (IE): – Create structured info from unstructured texts using patterns. • Using existing headlines  potential templates – Need an abstract representation. – Templates at Part-of-Speech (POS) level: 60933 from training data Distribution of frequent templates – Size (template collection) = 1/2 size (headlines) • Not much reduction • Maybe it won’t help! N umber of oc c urrenc es 600 529 500 400 300 200 100 0 NN NN NNP NNP JJ N N templat es N N P V BZ N N N N N N P V BD N N P 184 345 320 70 Eval: Template Hypothesis • Validate hypothesis: – Better headlines would be produced by using structured patterns. – Method: • • • • Take (headline, text). Get template for its headline. Fill it with text body. How much of text is needed to fill the template? Text Size First sentence Files from corpus (%) 20.01 First two sentences First three sentences All sentences 32.41 41.90 75.55 Refinement • Maintain balance between: – Grammaticality. – Important content. • Approach: integrate – Clustering of key headline phrases – Fine-tuning using headline templates • Templates act as grammatical filters. score _ t(i)  W j1 N j | desired _ length_ template_ length | 1 • Generous tag-matching, but no partial template match allowed. • Measuring fullness: length(t i )  matched_ length(hi )  fti  length(t i )  length(hi ) Standalone Evaluation • Unigram overlap: – Created headlines at various lengths. – 615 files from DUC03 testing set. – Compared with assessors’ headlines. • What about comparing it with other systems? DUC Evaluation • From DUC2004: – Scored by ROUGE • Lin and Hovy, (2003). • a measure of n-gram recall between candidate headlines and a set of reference headlines. Phrase clustering alone – ROUGE-L: • Based on Longest Common Sequence (LCS) overlap. Why? How come? using template s – Need measure for grammaticality. Discussion and Future Work • The merging of content and structure. • Structural abstraction helpful. • But: – Pos tags do not generalized well and – Fail to model sub categorization. (large number of templates) • Need a more refined pattern language. – Incorporate named-entity and verb clusters…

Related docs
Template-Filtered Headline Summarization
Views: 17  |  Downloads: 0
Template-Filtered Headline Summarization
Views: 8  |  Downloads: 0
Template filtered
Views: 3  |  Downloads: 0
Annual report template
Views: 0  |  Downloads: 0
Other docs by moneu
Arnold Palmer Golf Co v Fuqua Ind
Views: 807  |  Downloads: 10
Property Outline -- Pepperdine (Knapland)
Views: 557  |  Downloads: 19
Massage Therapy Reference Summary
Views: 1384  |  Downloads: 36
app003
Views: 94  |  Downloads: 0
dv150k
Views: 103  |  Downloads: 0
Who Are the Churches of Christ
Views: 182  |  Downloads: 0
Be Still and Know
Views: 212  |  Downloads: 1
Medical Acupuncture
Views: 626  |  Downloads: 19
Baker v Weeden
Views: 335  |  Downloads: 3
I Just Want To Be Where You Are
Views: 366  |  Downloads: 1
Fisher v Carrousel Motor Hotel Inc
Views: 494  |  Downloads: 4
Civil Procedure -- Lynn
Views: 588  |  Downloads: 35
A Mighty Fortress
Views: 112  |  Downloads: 2
Instant Qualifier for Hard Money Mortgage
Views: 338  |  Downloads: 16
MERGERS ACQUISITIONS Outline
Views: 2719  |  Downloads: 298