Password Entropy and Length Correlations

February 27, 2007

Password Entropy and Length Correlations
By: Gary Hammock

Passwords are one of the oldest and most readily available methods of securing computers, data, and other information deemed sensitive. They are also one of the most sought after pieces of information by malicious crackers and other nefarious individuals. Passwords are a double-edged sword for those seeking to secure information—the data must be secure enough from unauthorized access, yet easily accessible for legitimate users.

Complicated passwords requiring a wide character set are a pain for users, yet simple passwords with long string lengths aren't entirely good enough. The best passwords are those that make good use of both string length and password entropy. Password entropy is the amount of variation of characters in the password. Think of a fair coin. One side of the coin is labeled "heads" while the other side of the coin is labeled "tails". On any given flip of the single coin, it is equally likely to have one of the two outcomes. This implies the coin flip situation has one bit of entropy.

Entropy is calculated as

h = L log2 W,

where W is defined as the character set width and L is defined as the password string length. By the nature of logarithms, the limit as W approaches infinity is equal to infinity. This means that a larger character set will give the password more entropy. Since for most uses and applications, passwords are bounded by a subset of the 256 extended ASCII set, namely the 94 printable characters of the standard ASCII set. Using W = 94 this gives an entropy of (6.555)L bits. This means that a password that contains the characters A-Z, a-z, 0-9, and the 32 "special" characters and punctuation will have an entropy of (6.555)L bits. Conversely a password that only uses 62 characters (namely those members of the set A-Z, a-z, and 0-9) will have an entropy of (5.954)L bits. This is a decrease of approximately ten percent. It is left to the reader to weigh the increased 10% entropy against the increased complexity seen by the user (i.e. is it worth the cost of increased password resets).

Keep in mind that these calculations assume no single byte is repeated. For example, the password, "password7" chosen from a set of the aforementioned 62 character set has an entropy, h, of 54 bits. If it is known by an attacker that there are two repeated characters in the password, the entropy is reduced to h = 48 bits. If it is known that the password only uses lowercase letters and the values 0-9, and that there are two repeated characters, the entropy, h = 42 bits.

Now assume that the password "{1seCRet}" is used with the larger, 94 character set. This password has an entropy, h = 59 bits. If it is known that there are two repeated characters, the entropy is reduced to 53 bits.

This shows that password strength is dependent upon implementation. Now compare the entropy and relative strength of the password, "MyLittleSEcReT40" from the 62 character set to the entropy and relative strength of the password, "7a*G4$Nb" from the 94 character set. (Many IT departments require passwords to have two uppercase letters, two lowercase letters, two numbers, and two "special" characters.) The password, "MyLittleSEcReT40" has an entropy h = 72 bits. The password, "7a*G4$Nb" has an entropy of 53 bits. In this example, the smaller character set has a larger password entropy than the larger set due to the length of the password (even with character repetition). For most people, "MyLittleSEcReT40" would be much easier to remember than "7a*G4$Nb" without the use of a mnemonic.

In summary, comparing the password string length to a constant entropy value. For instance,

(6.555)L94 = (5.954)L62
yields
L62 / L94 = 1.1

This means that for two passwords (one from each of the two character sets) to have the same entropy, the password from the 62 character set must have approximately 10% more non-repetitive characters. This means that a 9-unique-character password from the 62 character set has the same entropy as an 8-unique-character password from the 94 character set. It is left to the reader to determine password entropy versus password complexity as it applies to specific purposes. One primary factor in determining acceptable password entropy requirements is cost. It is prudent to determine which is greater, increased information technology support cost (for password resets), or the cost of leaking proprietary/confidential/personal data through improper password management.