"Hash Collisions A Crisis in Cryptography dr Benne de"
Hash Collisions: A Crisis in Cryptography? dr. Benne de Weger Eindhoven University of Technology (thanks to prof. Arjen Lenstra, EPFL Lausanne) presentation available from www.win.tue.nl/~bdeweger GOVCERT.NL Symposium, September 14, 2006 ‘Breaking’ News • Hash Function MD5 Broken August 2004 • Hash Function SHA-1 Under Attack February 2005 • by prof. Xiaoyun Wang and students from Shandong, China • your company certainly uses MD5 and/or SHA-1 in many security systems • what should you do? and why? GOVCERT.NL Symposium, September 14, 2006 2 my opinion • short term – abuse of attacks is • unlikely to occur • easy to detect – immediate replacement of hash functions • is not always necessary • may be costly – replacement of MD5 is necessary • new attacks can be expected any day • MD5 is broken, start replacing now • medium – long term – replacement of SHA-1 is necessary • SHA-1 will fall soon, do not use it in new applications • NIST: replace by 2010 – more flexible design of cryptographic systems needed GOVCERT.NL Symposium, September 14, 2006 3 outline • introduction – hash functions – theory, use, collisions – overview of recent events – business impact questions • more on collisions – requirements for hash functions – risks of collisions – practical abuse of collisions • the future – what more to expect – recommendations GOVCERT.NL Symposium, September 14, 2006 4 what is a hash? • hash input: arbitrary length • hash function: bit shuffling • hash output: fixed short length • no key involved • computing a hash is very fast • output ‘identifies’ input • hash value also called: message digest fingerprint – checksum is not a cryptographic hash GOVCERT.NL Symposium, September 14, 2006 5 uses of hash functions • digital signatures – strong binding between document document and private key – non-repudiation • other uses – integrity protection • for file systems (Tripwire) hash value • for downloads – password protection cryptographic digital signature – authentication codes method • keyed hashes – key derivation digital – many uses in cryptographic signature protocols – … GOVCERT.NL Symposium, September 14, 2006 6 hash collision I owe you € 100 I owe you € 5000 different documents identical hash = collision cryptographic digital signature method identical digital signature GOVCERT.NL Symposium, September 14, 2006 7 hash functions in use today • MD5 and SHA-1 – from the same ‘family’ – market share: > 99% – the only hash functions allowed in many standards • newer hash functions – SHA-2 family: SHA-224, SHA-256, SHA-384, SHA-512 – Whirlpool, Tiger, … – not yet cryptographically scrutinized – not yet widely adopted • MS Vista will support SHA-256 GOVCERT.NL Symposium, September 14, 2006 8 hash function internals • iterative construction – compression function updates IV with data blocks – fixed input IV – hash = output IV • enables streaming data GOVCERT.NL Symposium, September 14, 2006 9 recent events: MD5 • 1992: MD5 published by Ron Rivest • 1996: design was shown weaker than intended – collisions for the compression function found – but no full attack (Den Boer, Bosselaers, Dobbertin) • Aug. 2004: collisions found for the full MD5 – by Xiaoyun Wang and students (Shandong, P.R. China) – produce 2-block collision in a few hours on a supercomputer • Mar. 2005: speedup of collision finding – by Vlastimil Klima (Prague, Czech Republic) – produce collision in a few hours on a PC • Mar. - May 2006: further speedups – by V. Klima and Marc Stevens (Eindhoven) – produce collision in a few seconds on a PC – executable available on www.win.tue.nl/hashclash GOVCERT.NL Symposium, September 14, 2006 10 recent events: SHA-1 • 1995: published by NIST • Feb. – Aug. 2005: shown weaker than intended – by Xiaoyun Wang and students – idea similar to MD5, but much more complicated – collisions can be found 100,000 times faster than SHA-1 was designed for – runtime still huge • no collisions found at this moment • massively distributed computation might be feasible • reduced round SHA-1 (full SHA-1: 80 rounds) – Feb. 2005: 53 round SHA-1 collision (Biham-Chen) – Aug. 2005: 58 round SHA-1 collision (Wang et al.) – Aug. 2006: 64 round SHA-1 collision (de Cannière and Rechberger) GOVCERT.NL Symposium, September 14, 2006 11 a collision example CAB9E742C4B626871AB9A524846B05C1 8895FB9365E9A69F480392FF2C3B3F79 41AD3406FFADB4034BDF847A4D37014F DB3283CB19D46FA8A765C6B3F016BF30 6AFF7C2E5773689B3319B81564ABE7F5 B9CF66C5E4FE790CEE047D36CC77B0AE 5D087F30B560EB8872B34D406778662D D88464677DBD9B80989EF24FB82E0EA3 CAB9E742C4B626871AB9A524846B05C1 8895FB1365E9A69F480392FF2C3B3F79 41AD3406FFADB4034BDF847A4DB7014F DB3283CB19D46FA8A765C633F016BF30 6AFF7C2E5773689B3319B81564ABE7F5 B9CF6645E4FE790CEE047D36CC77B0AE 5D087F30B560EB8872B34D4067F8652D D88464677DBD9B80989EF2CFB82E0EA3 GOVCERT.NL Symposium, September 14, 2006 12 visualizing collisions what happens what should have happened block 1 block 2 MD5 collision 1 bit difference each line of pixels shows difference in internal state inside compression function (80 rounds) GOVCERT.NL Symposium, September 14, 2006 13 business impact questions • what does ‘broken’ mean? – answer: there’s a difference between breaking the ‘collision resistance’ and breaking the hash function • can recent attacks be abused? – answer: known attack scenarios are not very convincing • business impact – what are the risks involved? – is replacement of MD5 and SHA-1 worthwile? – are there workarounds? • lesson to be learned – be prepared for the real break that will come some day GOVCERT.NL Symposium, September 14, 2006 14 outline • introduction – hash functions – theory, use, collisions – overview of recent events – business impact questions • more on collisions – requirements for hash functions – risks of collisions – practical abuse of collisions • the future – what more to expect – recommendations GOVCERT.NL Symposium, September 14, 2006 15 1st requirement: pre-image resistance • pre-image resistance, one-wayness given a hash value, you cannot find data with that hash value ? X GOVCERT.NL Symposium, September 14, 2006 16 2nd requirement: 2nd pre-image resistance • 2nd pre-image resistant given data, you cannot find other data with identical hash value ? closely related to pre-image resistance X GOVCERT.NL Symposium, September 14, 2006 17 3rd requirement: collision resistance • collision resistant – you cannot find two different sets of data with identical hash value ? ? – much weaker property than 2nd pre-image resistance X X ? GOVCERT.NL Symposium, September 14, 2006 18 4th requirement: pseudo-randomness • hash function is a “pseudo-random function” • for any input one can come up with, the hash output is sufficiently random • useful property, e.g. for key derivation functions – many protocols such as SSL use this GOVCERT.NL Symposium, September 14, 2006 19 hash function theory • virtually nothing is known in theory • precise requirements for hashes can be formulated – (2nd) preimage and collision resistance, pseudo-randomness, … • not known whether MD5 and SHA-1 satisfy these requirements – or any other hash function, for that matter • expert opinions on hash function design: “we do not understand what we are doing” “we do not really know what we want” – panel at NIST 2nd Hash Workshop, Santa Barbara CA, August 24, 2006 panelists: Niels Ferguson, Antoine Joux, Bart Preneel, Ron Rivest, Adi Shamir GOVCERT.NL Symposium, September 14, 2006 20 keyed hashes • for given hash function H the input data can be treated with a key • gives “Message Authentication Code” • HMAC is a popular construction: HMAC(key,data) = H(k1║H(k2║data)) (k1 and k2 derived from key) • used in protocols such as SSL • Bellare – Canetti – Krawczyk: theoretical result (1996): HMAC is pseudo-random, provided that: the compression function inside H is pseudo-random and the hash function H is collison resistant GOVCERT.NL Symposium, September 14, 2006 21 did Bellare save HMAC from Wang? • Wang: MD5 (and maybe SHA-1) not collision resistant – does this make HMAC questionable? The proof does not apply anymore, as it requires collision resistance • Bellare (August 2006): new result: HMAC is pseudo-random, provided that only: the compression function inside H is pseudo-random so: collision resistance not needed anymore • does this save HMAC? – well, randomness properties of the MD5 and SHA-1 compression functions have never been established… GOVCERT.NL Symposium, September 14, 2006 22 attack work factor • generic attack: brute force search • on n-bit hash function – MD5: n = 128, SHA-1: n = 160 • work factor: number of compression function evaluations needed in the attack • theoretical work factors: – for (2nd) pre-images: brute force search, work factor 2n – for collisions: birthday attack, work factor 2n/2 • practical work factors: – 250 is not a big problem for an average programmer (student) with access to reasonable number of PC’s – 264 has never been done, might be feasible by massive parallel computation GOVCERT.NL Symposium, September 14, 2006 23 collision attack status, sept. 2006 • MD5 work factors: – brute force birthday attack : 264 – Wang’s original attack, Aug. 2004: 239 – Klima / Stevens improvements, March 2006: 232 • SHA-1 work factors: – brute force birthday attack : 280 – Wang’s original attack, Feb. 2005: 269 – Wang’s improvements, Aug. 2005: 263 – further improvements, until Aug. 2006: unclear • you see, we’re making progress… GOVCERT.NL Symposium, September 14, 2006 24 what’s unsafe now • attacks are on collision resistance only • when attacker can choose data at will, MD5 can be misused • two-block collisions can be extended – appending data keeps collision intact, due to iterative structure – prepending data keeps collision intact, because Wang’s methods work for any input-IV GOVCERT.NL Symposium, September 14, 2006 25 what’s still safe • attacks are not on (2nd) pre-image resistance • HMAC – security reassured by Bellare • when attacker could not have chosen data, even MD5 can still be trusted • SHA-1 has some collision resistance left – but decreasing… • SHA-2: SHA-256, SHA-384, SHA-512 – from the same family – somewhat different construction – no attacks known, not even weakened GOVCERT.NL Symposium, September 14, 2006 26 risks of not meeting the requirements • when (2nd) pre-image resistance broken: – forging a given digital signature – most uses of hash functions rely on (2nd) pre-image resistance • authentication codes, password protection, … all broken • when only collision resistance broken: – in principle security of digital signatures compromised • risk of two ‘documents’ with identical signature • but no direct risk of forging a given digital signature – trust not as large anymore as system was designed for – not many more applications directly rely on collision resistance • the good news – (2nd) pre-image resistance not broken, for MD5 and SHA-1 – risk of collision attack seems low • but… GOVCERT.NL Symposium, September 14, 2006 27 who controls data to be hashed / signed? • if the good guy controls the data – the bad guy can only mount a (2nd) pre-image attack • if the bad guy controls the data – the bad guy can mount a collision attack – produces two documents with identical hashes so with identical digital signatures – maybe one good document, one bad • when someone asks you to digitally sign an MS Word document, do you really know what you’re signing? – signing is a bit level operation • as a relying party, can you always tell who constructed / controlled the signed data? GOVCERT.NL Symposium, September 14, 2006 28 the “meaningful message” argument • colliding data cannot be chosen at will, but follow from Wang’s construction method – e.g.: two colliding data differ in a few bit positions only will most probably not constitute a “meaningful message” as input • this makes attacks more difficult – but not impossible, as we’ll see in a moment – meaningful message argument can be weakened by hiding collisions inside the bit level structure of a document GOVCERT.NL Symposium, September 14, 2006 29 abuse example • hide collision in macro – inside document (MS Word, Adobe pdf, PostScript) – by Daum-Lucks (May 2005) and Illies et al. (Oct. 2005) • file 1: macro coll.blk. 1 document 1 document 2 have this signed by trusted party • file 2: macro coll.blk. 2 document 1 document 2 has identical signature • relies on superficial inspection by signer and verifier • fraud easily detected by code inspection of one file only – two complete documents in there – strange block of random looking data GOVCERT.NL Symposium, September 14, 2006 30 another abuse example • hide collision in public key inside X.509 certificate – by Lenstra, Wang, de Weger (Mar. 2005) – http://www.win.tue.nl/~bdeweger/CollidingCertificates/ • two different certificates with identical CA signature public name key coll.blk. 1 CA signature • cert. 1 public • cert. 2 name key coll.blk. 2 CA signature • code inspection of only one certificate reveals nothing – cryptographic key is random-looking anyway GOVCERT.NL Symposium, September 14, 2006 31 does PKI have a problem? yes: • these colliding certificates should not be possible – a fundamental principle of PKI is violated – CA cannot guarantee proof of possession of private key anymore no: • no reasonable abuse scenario known – you cannot forge other people’s certificates – both certificates have identical owner name – sufficient control over the CA is required yet: • it seems prudent to not use MD5 in certificates anymore – certainly not after March 2005, when these colliding certificates were widely announced GOVCERT.NL Symposium, September 14, 2006 32 a real life certificate • do you trust this one? I’m one of this ISP’s happy customers o.t.o.h: cryptographers warned already in the late 90s that MD5 had weaknesses GOVCERT.NL Symposium, September 14, 2006 33 conclusion on collisions • at this moment, ‘meaningful’ hash collisions are – easy to make – but also easy to detect – still hard to abuse realistically • to do real harm, ‘real’ 2nd pre-image attack needed – real harm is e.g. forging digital signatures – this is not possible yet, not even with MD5 GOVCERT.NL Symposium, September 14, 2006 34 outline • introduction – hash functions – theory, use, collisions – overview of recent events – business impact questions • more on collisions – requirements for hash functions – risks of collisions – practical abuse of collisions • the future – what more to expect – recommendations GOVCERT.NL Symposium, September 14, 2006 35 the future • old saying: “attacks only get better, never worse” • to be expected in the next few years: – Moore’s law continues – complexity improvements of known collision attacks – new variants of collision attacks • leading to more dangerous applications • example on next slide – attacks on other hash functions (SHA-256 etc.)? • breaking hash functions is fashionable now • 2nd pre-image attacks?? hopefully not… (no signs yet) • new hash function designs – too early to tell the good ones from the bad ones – NIST will organize Advanced Hash Standard competition • time frame: 2007 - 2012 GOVCERT.NL Symposium, September 14, 2006 36 speculative collision variants • more flexible (MD5) collisions might be found – e.g. based on different prefixes that are made to collide – will e.g. allow X.509 certificates with identical signatures but different owner names public Alice key coll.blk. 1 CA signature public Bob key coll.blk. 2 CA signature – apparently higher risk – but still some control over CA needed • this will not be the end… GOVCERT.NL Symposium, September 14, 2006 37 similar variant in documents • hide collision in image (not macro) – inside document (MS Word, Adobe pdf, PostScript) • file 1: document 1 image coll.blk. 1 have this signed by trusted party • file 2: document 2 image coll.blk. 2 has identical signature • code inspection of one document reveals almost nothing – collision covers only a few pixels in the image – macro features not needed anymore GOVCERT.NL Symposium, September 14, 2006 38 hash functions in cryptographic systems the problem • changes in systems – MD5 and SHA-1 often built in very deeply and inflexibly – forced replacement of hash function is major operation • bit length of hash value sometimes hard coded – especially so in networked environments (web) • many different platforms and applications • most are not under your control • old and new will have to coexist for a long time • changes in protocols – protocols not prepared for new (types of) hash functions • e.g. TLS will need extension – risk of downgrade attacks GOVCERT.NL Symposium, September 14, 2006 39 hash functions in cryptographic systems: approaches from high security to high risk • very prudent approach (but expensive) – do risk analysis for all systems using hash functions – assess potential damage and replacement cost – start using SHA-256 immediately in all newly built systems • fundamental approach (save costs on the long term) – when designing new systems, build in flexibility so that broken crypto can be easily replaced – more standardization effort needed • cheap approach (quick and dirty, postponing cost) – do quick scan on applications using MD5 for digital signatures only • here replace MD5 by SHA-1 – continue use of MD5 and SHA-1 everywhere else • until SHA-256 widely deployed • shortsighted approach – do nothing, wait for the catastrophy to happen, and then panic GOVCERT.NL Symposium, September 14, 2006 40 hash functions in cryptographic systems: practical approach • follow NIST recommendation (March 2006) – SHA-2 may be used for all applications – stop using SHA-1 for digital signatures, etc., as soon as practical – must use SHA-2 for these applications after 2010 • phasing out SHA-1 for 2010 was planned anyway, due to Moore’s law – after 2010, may use SHA-1 only for the following applications: HMACs, key derivation, random number generators – regardless of use, encourage application and protocol designers to use SHA-2 for all new applications and protocols • your competitors will do this too • NIST Suite B Cryptography – new standard of US Government, but use with care: – unfortunately adaptability and flexibility are not design requirements GOVCERT.NL Symposium, September 14, 2006 41 hash functions in cryptographic systems: workarounds • continued use of SHA-1 (or even MD5) but in a more secure way – hash twice, e.g. using MD5 and SHA-1 • prepend hash to message, then hash again – performance problems (streaming not possible) • there are also fundamental problems – randomized hashing • prefix message with random ‘salt’ – attacker cannot choose salt – other message preprocessing techniques • all these remain workarounds GOVCERT.NL Symposium, September 14, 2006 42 conclusion • the cryptographer’s view: • main conclusion – be prepared by default to replace broken crypto – invest in more flexible designs and standards • hope that no one finds a 2nd pre-image attack on SHA-1 GOVCERT.NL Symposium, September 14, 2006 43 web resources NIST Hash Function pages (recommendations, Oct. 2005 and Aug. 2006 workshop proceedings) http://www.csrc.nist.gov/pki/HashWorkshop/ IETF RFC 4270 (Paul Hoffman and Bruce Schneier) ‘Attacks on Cryptographic Hashes in Internet Protocols’ http://ietf.org/rfc/rfc4270.txt ECRYPT Position Paper ‘Recent Collision Attacks on Hash Functions’ http://www.ecrypt.eu.org/documents.html Arjen K. Lenstra ‘Further Progress in Hashing Cryptanalysis’ http://cm.bell-labs.com/who/akl/hash.pdf Benne de Weger ‘Hash-functies onder vuur’, Informatiebeveiliging, April 2005 http://www.win.tue.nl/~bdeweger/Artikel GvIB Hashbotsingen.pdf GOVCERT.NL Symposium, September 14, 2006 44