Thilan Dissanayaka Cryptography April 16, 2020

AES — The Advanced Encryption Standard

Every time you connect to a website over HTTPS, send a WhatsApp message, encrypt a file with 7-Zip, or connect to a VPN — AES is doing the heavy lifting. It’s the most widely deployed encryption algorithm in history, trusted by everyone from banks to intelligence agencies.

In the DES article, we explored how the first modern cipher worked — a Feistel network with 16 rounds, S-boxes, and permutations. In the 3DES article, we saw the hack that extended DES’s life. Now it’s time for the algorithm that replaced them both.

AES is fundamentally different from DES. It’s not a Feistel cipher. It doesn’t split the block in half. Instead, it uses a Substitution-Permutation Network (SPN) — applying transformations to the entire block in every round. And unlike DES, the math behind AES is rooted in abstract algebra — specifically, finite field arithmetic in GF(2^8).

Don’t worry if that sounds intimidating. We’ll build up from first principles.

The NIST Competition

In 1997, NIST announced an open competition to find a replacement for DES. The requirements:

128-bit block size (DES used 64 bits — too small, as Sweet32 later proved)
Support for 128, 192, and 256-bit keys
Efficient in both hardware and software
Secure against all known attacks
Royalty-free and publicly available

Fifteen candidates were submitted from around the world. After three years of intense public analysis, NIST announced the winner in October 2000: Rijndael, designed by Belgian cryptographers Joan Daemen and Vincent Rijmen.

Rijndael was chosen for its combination of security, performance, and elegance. It was standardized as AES (FIPS 197) in November 2001.

Some of the other notable candidates:

Serpent — More conservative design, higher security margin, but slower
Twofish — Bruce Schneier’s entry, a Feistel cipher with strong security
RC6 — From RSA Labs, based on RC5
MARS — IBM’s entry (from the same lab that created DES)

Rijndael won because it was the best all-rounder — fast in software, efficient in hardware, clean mathematical foundation, and no exploitable weaknesses found during the competition.

AES at a Glance

Property	Value
Block size	128 bits (16 bytes)
Key sizes	128, 192, or 256 bits
Structure	Substitution-Permutation Network (SPN)
Rounds	10 (AES-128), 12 (AES-192), 14 (AES-256)
Operations	SubBytes, ShiftRows, MixColumns, AddRoundKey
Designed by	Joan Daemen and Vincent Rijmen (Belgium)
Standardized	2001 (FIPS 197)

The number of rounds increases with key size because a longer key provides more security, and more rounds are needed to fully mix the key material into the ciphertext.

The State Matrix

AES operates on a 4x4 matrix of bytes called the state. The 128-bit (16-byte) plaintext block is arranged into this matrix column by column:

Plaintext bytes: b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15

State matrix:
┌─────┬─────┬─────┬─────┐
│ b0  │ b4  │ b8  │ b12 │   ← Row 0
├─────┼─────┼─────┼─────┤
│ b1  │ b5  │ b9  │ b13 │   ← Row 1
├─────┼─────┼─────┼─────┤
│ b2  │ b6  │ b10 │ b14 │   ← Row 2
├─────┼─────┼─────┼─────┤
│ b3  │ b7  │ b11 │ b15 │   ← Row 3
└─────┴─────┴─────┴─────┘
  Col 0  Col 1  Col 2  Col 3

Notice: bytes fill column-first (column-major order). b0-b3 go in column 0, b4-b7 in column 1, and so on.

Every AES operation transforms this state matrix. After all rounds, the state is read out (column by column) as the ciphertext.

Let’s work with a concrete example. If our plaintext is the hex string:

32 43 f6 a8 88 5a 30 8d 31 31 98 a2 e0 37 07 34

The state matrix becomes:

┌──────┬──────┬──────┬──────┐
│  32  │  88  │  31  │  e0  │
├──────┼──────┼──────┼──────┤
│  43  │  5a  │  31  │  37  │
├──────┼──────┼──────┼──────┤
│  f6  │  30  │  98  │  07  │
├──────┼──────┼──────┼──────┤
│  a8  │  8d  │  a2  │  34  │
└──────┴──────┴──────┴──────┘

The Math Foundation — Galois Fields GF(2^8)

Before we look at the AES operations, we need to understand the math. AES performs arithmetic on bytes, but not ordinary arithmetic — it uses finite field arithmetic in GF(2^8), also called a Galois Field.

If you’ve never heard of this, don’t panic. It’s simpler than it sounds.

Why Not Regular Arithmetic?

In AES, every value is a byte (0x00 to 0xFF). If we used regular addition and multiplication:

0xFF + 0x01 = 0x100 — That’s 9 bits. Doesn’t fit in a byte.
0xFF × 0x02 = 0x1FE — Also too big.

We need arithmetic where the result is always a byte. That’s what GF(2^8) gives us.

Bytes as Polynomials

In GF(2^8), each byte is interpreted as a polynomial with coefficients that are either 0 or 1. The byte 0x57 (binary 01010111) represents:

0·x⁷ + 1·x⁶ + 0·x⁵ + 1·x⁴ + 0·x³ + 1·x² + 1·x¹ + 1·x⁰
= x⁶ + x⁴ + x² + x + 1

Each bit position corresponds to a power of x. The most significant bit is x⁷, the least significant is x⁰.

Addition in GF(2^8)

Addition is simply XOR. No carries, no overflow.

  0x57 = 01010111
⊕ 0x83 = 10000011
─────────────────
  0xD4 = 11010100

In polynomial form:

(x⁶ + x⁴ + x² + x + 1) + (x⁷ + x + 1) = x⁷ + x⁶ + x⁴ + x²

The x and 1 terms cancel out (1 + 1 = 0 in GF(2)). This is just XOR at the bit level.

Properties:

Every element is its own additive inverse: a + a = 0 (XOR with itself = zero)
Subtraction is the same as addition: a - b = a + b = a XOR b

Multiplication in GF(2^8)

This is where it gets interesting. Multiplication is polynomial multiplication modulo an irreducible polynomial. For AES, that polynomial is:

m(x) = x⁸ + x⁴ + x³ + x + 1     (which is 0x11B in hex)

This polynomial is “irreducible” — it can’t be factored into smaller polynomials over GF(2). Think of it as a prime number, but for polynomials.

Let’s multiply 0x57 × 0x83:

0x57 = x⁶ + x⁴ + x² + x + 1
0x83 = x⁷ + x + 1

Multiply (like regular polynomial multiplication, but coefficients mod 2):

(x⁶ + x⁴ + x² + x + 1) × (x⁷ + x + 1)

= x¹³ + x¹¹ + x⁹ + x⁸ + x⁷
  + x⁷ + x⁵ + x³ + x² + x
  + x⁶ + x⁴ + x² + x + 1

Combine like terms (mod 2, so pairs cancel):
= x¹³ + x¹¹ + x⁹ + x⁸ + x⁶ + x⁵ + x⁴ + x³ + 1

This result has degree 13 — too big for a byte. So we reduce modulo m(x):

(x¹³ + x¹¹ + x⁹ + x⁸ + x⁶ + x⁵ + x⁴ + x³ + 1) mod (x⁸ + x⁴ + x³ + x + 1)

This is polynomial long division. The result is a polynomial of degree 7 or less — which fits in a byte.

The final answer: 0x57 × 0x83 = 0xC1 in GF(2^8).

The xtime() Shortcut

In practice, AES only needs to multiply by small constants (0x01, 0x02, 0x03, 0x09, 0x0B, 0x0D, 0x0E). Multiplying by 0x02 is especially common and has a simple implementation called xtime:

xtime(a):
    1. Shift a left by 1 bit
    2. If the high bit (bit 7) was set before the shift,
       XOR the result with 0x1B (the lower 8 bits of the irreducible polynomial)

In code:

def xtime(a):
    result = (a << 1) & 0xFF
    if a & 0x80:      # If bit 7 was set
        result ^= 0x1B
    return result

Multiplying by 0x03 is xtime(a) XOR a (since 0x03 = 0x02 + 0x01, and multiplication distributes over addition/XOR).

Multiplying by higher constants uses repeated application of xtime:

× 0x04 = xtime(xtime(a))
× 0x09 = xtime(xtime(xtime(a))) XOR a
× 0x0B = xtime(xtime(xtime(a))) XOR xtime(a) XOR a

This is all the math AES needs. Now let’s see how it’s used.

The Four Round Operations

Each AES round applies four operations to the state matrix, in this order:

SubBytes — Byte-level substitution (confusion)
ShiftRows — Row-level shifting (diffusion)
MixColumns — Column-level mixing (diffusion)
AddRoundKey — XOR with the round key (key mixing)

The last round omits MixColumns (for mathematical reasons related to making decryption symmetrical).

Let’s examine each operation in detail.

Operation 1: SubBytes

SubBytes is a nonlinear byte substitution. Each byte in the state is independently replaced using a lookup table called the S-box. It’s the same idea as DES’s S-boxes, but the AES S-box is mathematically constructed rather than hand-designed.

How the S-box is Built

The AES S-box isn’t arbitrary. Each byte is transformed in two steps:

Multiplicative inverse in GF(2^8). The byte a is replaced by a⁻¹ (the element such that a × a⁻¹ = 1). The byte 0x00 has no inverse, so it maps to itself.
Affine transformation over GF(2). The inverse is then put through a bit-level affine (linear + constant) transformation.

The combination of inversion (highly nonlinear) and the affine transformation creates an S-box with excellent cryptographic properties — high nonlinearity, no fixed points (no byte maps to itself), and no opposite fixed points (no byte maps to its bitwise complement).

The S-box Table

Here’s the complete AES S-box. To look up byte 0xXY, find row X and column Y:

     0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
0 [ 63 7C 77 7B F2 6B 6F C5 30 01 67 2B FE D7 AB 76 ]
1 [ CA 82 C9 7D FA 59 47 F0 AD D4 A2 AF 9C A4 72 C0 ]
2 [ B7 FD 93 26 36 3F F7 CC 34 A5 E5 F1 71 D8 31 15 ]
3 [ 04 C7 23 C3 18 96 05 9A 07 12 80 E2 EB 27 B2 75 ]
4 [ 09 83 2C 1A 1B 6E 5A A0 52 3B D6 B3 29 E3 2F 84 ]
5 [ 53 D1 00 ED 20 FC B1 5B 6A CB BE 39 4A 4C 58 CF ]
6 [ D0 EF AA FB 43 4D 33 85 45 F9 02 7F 50 3C 9F A8 ]
7 [ 51 A3 40 8F 92 9D 38 F5 BC B6 DA 21 10 FF F3 D2 ]
8 [ CD 0C 13 EC 5F 97 44 17 C4 A7 7E 3D 64 5D 19 73 ]
9 [ 60 81 4F DC 22 2A 90 88 46 EE B8 14 DE 5E 0B DB ]
A [ E0 32 3A 0A 49 06 24 5C C2 D3 AC 62 91 95 E4 79 ]
B [ E7 C8 37 6D 8D D5 4E A9 6C 56 F4 EA 65 7A AE 08 ]
C [ BA 78 25 2E 1C A6 B4 C6 E8 DD 74 1F 4B BD 8B 8A ]
D [ 70 3E B5 66 48 03 F6 0E 61 35 57 B9 86 C1 1D 9E ]
E [ E1 F8 98 11 69 D9 8E 94 9B 1E 87 E9 CE 55 28 DF ]
F [ 8C A1 89 0D BF E6 42 68 41 99 2D 0F B0 54 BB 16 ]

Example: SubBytes(0x53) = row 5, column 3 = 0xED

In code:

SBOX = [
    0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76,
    0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0,
    # ... (256 entries total)
]

def sub_bytes(state):
    for i in range(4):
        for j in range(4):
            state[i][j] = SBOX[state[i][j]]
    return state

SubBytes provides confusion — it makes the relationship between the key and the ciphertext as complex and nonlinear as possible.

Operation 2: ShiftRows

ShiftRows is a simple byte-level transposition. Each row of the state matrix is cyclically shifted left by a different number of positions:

Before ShiftRows:
┌─────┬─────┬─────┬─────┐
│ s00 │ s01 │ s02 │ s03 │   Row 0: no shift
├─────┼─────┼─────┼─────┤
│ s10 │ s11 │ s12 │ s13 │   Row 1: shift left by 1
├─────┼─────┼─────┼─────┤
│ s20 │ s21 │ s22 │ s23 │   Row 2: shift left by 2
├─────┼─────┼─────┼─────┤
│ s30 │ s31 │ s32 │ s33 │   Row 3: shift left by 3
└─────┴─────┴─────┴─────┘

After ShiftRows:
┌─────┬─────┬─────┬─────┐
│ s00 │ s01 │ s02 │ s03 │   Row 0: unchanged
├─────┼─────┼─────┼─────┤
│ s11 │ s12 │ s13 │ s10 │   Row 1: shifted left 1
├─────┼─────┼─────┼─────┤
│ s22 │ s23 │ s20 │ s21 │   Row 2: shifted left 2
├─────┼─────┼─────┼─────┤
│ s33 │ s30 │ s31 │ s32 │   Row 3: shifted left 3
└─────┴─────┴─────┴─────┘

In code:

def shift_rows(state):
    # Row 0: no shift
    # Row 1: shift left by 1
    state[1] = state[1][1:] + state[1][:1]
    # Row 2: shift left by 2
    state[2] = state[2][2:] + state[2][:2]
    # Row 3: shift left by 3
    state[3] = state[3][3:] + state[3][:3]
    return state

Why does this matter? Without ShiftRows, each column would be processed independently throughout the entire cipher. ShiftRows moves bytes between columns, so that the MixColumns operation in the next round mixes bytes that originally came from different columns. After a few rounds, every byte of the output depends on every byte of the input.

ShiftRows provides diffusion in the horizontal direction.

Operation 3: MixColumns

MixColumns is the most mathematically complex operation. It treats each column of the state as a 4-element vector and multiplies it by a fixed matrix — with all arithmetic in GF(2^8).

The Matrix Multiplication

Each column [s0, s1, s2, s3] is multiplied by:

┌         ┐   ┌         ┐   ┌         ┐
│ 02 03 01 01 │   │ s0 │   │ s0' │
│ 01 02 03 01 │ × │ s1 │ = │ s1' │
│ 01 01 02 03 │   │ s2 │   │ s2' │
│ 03 01 01 02 │   │ s3 │   │ s3' │
└         ┘   └         ┘   └         ┘

Remember — this is GF(2^8) arithmetic. Multiplication by 02 is xtime(), multiplication by 03 is xtime(a) XOR a, multiplication by 01 is identity, and addition is XOR.

Expanding the first output byte:

s0' = (02 × s0) XOR (03 × s1) XOR (01 × s2) XOR (01 × s3)

In code:

def mix_single_column(col):
    a = col[:]
    result = [0, 0, 0, 0]
    result[0] = xtime(a[0]) ^ xtime(a[1]) ^ a[1] ^ a[2] ^ a[3]
    result[1] = a[0] ^ xtime(a[1]) ^ xtime(a[2]) ^ a[2] ^ a[3]
    result[2] = a[0] ^ a[1] ^ xtime(a[2]) ^ xtime(a[3]) ^ a[3]
    result[3] = xtime(a[0]) ^ a[0] ^ a[1] ^ a[2] ^ xtime(a[3])
    return result

def mix_columns(state):
    for j in range(4):
        col = [state[i][j] for i in range(4)]
        mixed = mix_single_column(col)
        for i in range(4):
            state[i][j] = mixed[i]
    return state

Why This Matrix?

The matrix {02, 03, 01, 01} (and its rotations) was chosen because:

It’s invertible in GF(2^8) — necessary for decryption. The inverse matrix is {0E, 0B, 0D, 09}.
It has the maximum branch number (5) — meaning that if you change one input byte, at least 5 output bytes change. This provides optimal diffusion.
It uses only small multipliers (01, 02, 03) — making it fast to compute without full GF(2^8) multiplication tables.

What MixColumns Achieves

While ShiftRows moves bytes between columns (horizontal diffusion), MixColumns mixes bytes within each column (vertical diffusion). Together, they ensure that after just a few rounds, changing a single input bit avalanches to affect every output bit.

After 2 rounds of ShiftRows + MixColumns, every output byte depends on every input byte. This is called full diffusion, and it’s a critical property for security.

MixColumns is not applied in the final round. This is a deliberate design choice — it makes encryption and decryption more symmetrical and simplifies hardware implementations. It doesn’t affect security because a MixColumns at the end would just be a linear transformation that could be absorbed into the key.

Operation 4: AddRoundKey

The simplest operation — XOR the entire state matrix with the round key.

state[i][j] = state[i][j] XOR round_key[i][j]

The round key is also a 4x4 matrix of bytes, derived from the cipher key through the key schedule (which we’ll cover next).

This is the only operation that introduces key material into the cipher. Without it, AES would be a fixed transformation — the same for everyone — which is obviously useless.

In code:

def add_round_key(state, round_key):
    for i in range(4):
        for j in range(4):
            state[i][j] ^= round_key[i][j]
    return state

The AES Key Schedule

The user provides a single key (128, 192, or 256 bits), but AES needs a separate 128-bit round key for each round, plus one for the initial AddRoundKey. For AES-128, that’s 11 round keys (176 bytes total) from a 16-byte key.

The key schedule generates these round keys through a process called key expansion.

AES-128 Key Expansion

The 16-byte key is viewed as four 32-bit words: W[0], W[1], W[2], W[3].

The expansion generates 40 more words (W[4] through W[43]), giving us 44 words total = 11 round keys.

For most words, the computation is simple:

W[i] = W[i-4] XOR W[i-1]

But every 4th word (W[4], W[8], W[12], …) gets special treatment:

W[i] = W[i-4] XOR SubWord(RotWord(W[i-1])) XOR Rcon[i/4]

Where:

RotWord — Rotates the 4 bytes left by one position: [a, b, c, d] -> [b, c, d, a]
SubWord — Applies the S-box to each byte
Rcon — Round constant, a fixed value that differs each round

Round Constants (Rcon)

The round constants prevent symmetry in the key schedule. They’re powers of 2 in GF(2^8):

Rcon[1]  = 01 00 00 00
Rcon[2]  = 02 00 00 00
Rcon[3]  = 04 00 00 00
Rcon[4]  = 08 00 00 00
Rcon[5]  = 10 00 00 00
Rcon[6]  = 20 00 00 00
Rcon[7]  = 40 00 00 00
Rcon[8]  = 80 00 00 00
Rcon[9]  = 1B 00 00 00   (0x80 × 2 mod 0x11B)
Rcon[10] = 36 00 00 00

Key Expansion in Code

def key_expansion(key):
    # key is 16 bytes -> 4 words
    words = []
    for i in range(4):
        words.append(key[4*i : 4*i+4])

    for i in range(4, 44):
        temp = words[i-1][:]
        if i % 4 == 0:
            # RotWord
            temp = temp[1:] + temp[:1]
            # SubWord
            temp = [SBOX[b] for b in temp]
            # XOR with Rcon
            temp[0] ^= RCON[i // 4]
        words.append([words[i-4][j] ^ temp[j] for j in range(4)])

    # Convert 44 words into 11 round keys (each is a 4x4 matrix)
    round_keys = []
    for r in range(11):
        rk = [[0]*4 for _ in range(4)]
        for j in range(4):
            for i in range(4):
                rk[i][j] = words[r*4 + j][i]
        round_keys.append(rk)

    return round_keys

AES-192 and AES-256

The key expansion for larger keys follows the same principle but with more words:

Key Size	Key Words	Total Words	Round Keys
AES-128	4 (Nk=4)	44	11
AES-192	6 (Nk=6)	52	13
AES-256	8 (Nk=8)	60	15

AES-256 has an additional SubWord step every 4th word (when i % Nk == 4), adding more nonlinearity to the key schedule.

Putting It All Together — AES Encryption

Now let’s assemble the complete algorithm.

Plaintext (128 bits)
      |
      v
[Arrange into 4x4 state matrix]
      |
      v
[AddRoundKey with Round Key 0]     ← Initial round (just key XOR)
      |
      v
┌──── Rounds 1 through 9 ─────┐    (for AES-128)
│  1. SubBytes                  │
│  2. ShiftRows                 │
│  3. MixColumns                │
│  4. AddRoundKey               │
└───────────────────────────────┘
      |
      v
┌──── Final Round (Round 10) ──┐
│  1. SubBytes                  │
│  2. ShiftRows                 │
│  3. AddRoundKey               │    ← NO MixColumns
└───────────────────────────────┘
      |
      v
[Read state matrix column-by-column]
      |
      v
Ciphertext (128 bits)

In code:

def aes_encrypt(plaintext, key):
    state = bytes_to_state(plaintext)
    round_keys = key_expansion(key)

    # Initial round key addition
    state = add_round_key(state, round_keys[0])

    # Rounds 1 through Nr-1
    for r in range(1, Nr):
        state = sub_bytes(state)
        state = shift_rows(state)
        state = mix_columns(state)
        state = add_round_key(state, round_keys[r])

    # Final round (no MixColumns)
    state = sub_bytes(state)
    state = shift_rows(state)
    state = add_round_key(state, round_keys[Nr])

    return state_to_bytes(state)

Where Nr is 10 for AES-128, 12 for AES-192, and 14 for AES-256.

AES Decryption

Decryption reverses each operation:

Encryption	Decryption
SubBytes	InvSubBytes (uses the inverse S-box)
ShiftRows	InvShiftRows (shift right instead of left)
MixColumns	InvMixColumns (multiply by the inverse matrix: {0E, 0B, 0D, 09})
AddRoundKey	AddRoundKey (XOR is its own inverse)

The rounds are applied in reverse order, with the round keys used in reverse:

[AddRoundKey with Round Key Nr]
      |
      v
┌──── Rounds Nr-1 down to 1 ──┐
│  1. InvShiftRows              │
│  2. InvSubBytes               │
│  3. AddRoundKey               │
│  4. InvMixColumns             │
└───────────────────────────────┘
      |
      v
[InvShiftRows]
[InvSubBytes]
[AddRoundKey with Round Key 0]
      |
      v
Plaintext

Note that InvSubBytes and InvShiftRows are swapped compared to encryption. This works because these two operations don’t interfere with each other — SubBytes operates on individual bytes while ShiftRows moves whole bytes. The order can be swapped without changing the result.

The Inverse MixColumns Matrix

The decryption MixColumns uses a different matrix:

┌             ┐
│ 0E 0B 0D 09 │
│ 09 0E 0B 0D │
│ 0D 09 0E 0B │
│ 0B 0D 09 0E │
└             ┘

These are larger constants than encryption (0E, 0B, 0D, 09 vs 02, 03, 01, 01), which makes decryption slightly slower than encryption. This is one reason why AES modes like CTR and GCM (which only use the encryption function, even for decryption) are preferred in practice.

The Avalanche Effect — Watching Diffusion Work

Let’s trace how a single bit change propagates through AES.

Start with a plaintext where we flip just one bit:

Plaintext A: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Plaintext B: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01  (last bit flipped)

After each round, count how many bits differ between the two ciphertexts:

After Round	Bits Different (out of 128)
0 (AddRoundKey)	1
1	~22
2	~60
3	~64
4	~64
…	~64
10 (final)	~64

By round 2, roughly half the bits have changed — which is the ideal. By round 3-4, the diffusion is complete and stays around 64 bits (half of 128), which is the expected value for random outputs.

This is the avalanche effect in action, and it’s why AES needs 10 rounds — it takes about 2-3 rounds for full diffusion, and the remaining rounds provide security margin against cryptanalysis.

Why Is AES Secure?

AES has been the subject of intense cryptanalytic research for over 25 years. No practical attack has ever been found. Here’s why:

The S-box Provides Confusion

The S-box is the only nonlinear component, and it was designed to resist:

Linear cryptanalysis — The S-box has maximum nonlinearity over GF(2^8)
Differential cryptanalysis — The S-box has the lowest possible differential uniformity
Algebraic attacks — While the S-box has a clean algebraic description (inversion in GF(2^8)), the combination with the linear layer makes algebraic attacks infeasible

ShiftRows + MixColumns Provide Diffusion

Together, these two operations ensure that after 2 rounds, every output byte depends on all 16 input bytes. The MixColumns matrix has the maximum branch number (5), meaning any differential input pattern activates the maximum number of S-boxes in the next round.

The Key Schedule

The key schedule uses the S-box and round constants to generate round keys that are:

Different enough from each other (no obvious patterns)
Dependent on all key bits (changing one key bit changes all round keys)

The Security Margin

AES-128 has 10 rounds. The best known attacks (biclique attack, 2011) reduce the complexity from 2^128 to about 2^126.1 — technically faster than brute force, but completely impractical. That’s a margin of about 7 rounds beyond what’s been broken.

For comparison:

AES-128: 10 rounds, best attack breaks ~3 rounds
AES-192: 12 rounds, best attack breaks ~8 rounds
AES-256: 14 rounds, best attack breaks ~9 rounds (related-key setting)

The “related-key” attacks on AES-256 exploit weaknesses in the key schedule when the attacker can observe encryptions under related keys. In the real world, this scenario almost never applies.

AES Modes of Operation

AES encrypts a single 128-bit block. But real data is almost always longer than 16 bytes. Modes of operation define how to apply a block cipher to multi-block messages.

ECB — Electronic Codebook (Don’t Use This)

Each block is encrypted independently with the same key.

C1 = E_K(P1)
C2 = E_K(P2)
C3 = E_K(P3)

The problem: identical plaintext blocks produce identical ciphertext blocks. If you encrypt an image in ECB mode, you can still see the image in the ciphertext — the patterns are preserved. This is the famous “ECB penguin” example.

Never use ECB for anything except single-block encryption.

CBC — Cipher Block Chaining

Each plaintext block is XORed with the previous ciphertext block before encryption:

C1 = E_K(P1 XOR IV)
C2 = E_K(P2 XOR C1)
C3 = E_K(P3 XOR C2)

The IV (Initialization Vector) is a random value used for the first block. This ensures that encrypting the same plaintext twice produces different ciphertext (as long as the IV is different).

CBC was the standard mode for decades and is still widely used. Its main drawbacks: encryption can’t be parallelized (each block depends on the previous one), and it requires padding.

CTR — Counter Mode

Turn a block cipher into a stream cipher. Encrypt a counter value and XOR the result with the plaintext:

C1 = P1 XOR E_K(Nonce || Counter_1)
C2 = P2 XOR E_K(Nonce || Counter_2)
C3 = P3 XOR E_K(Nonce || Counter_3)

Advantages:

Parallelizable — each block is independent
Random access — you can decrypt block N without decrypting blocks 1 through N-1
Only uses the encryption function — even for decryption (no need for InvMixColumns)
No padding needed — the last block can be truncated to match the plaintext length

GCM — Galois/Counter Mode

The current gold standard. GCM combines CTR mode encryption with a GHASH authentication tag, providing both confidentiality and integrity in a single operation (authenticated encryption).

AES-256-GCM is what you should use by default in 2024.

GCM is used in TLS 1.3, SSH, IPsec, and most modern protocols.

AES in Hardware — AES-NI

In 2010, Intel introduced AES-NI — dedicated CPU instructions for AES operations. These instructions (AESENC, AESDEC, AESKEYGENASSIST, etc.) perform a full AES round in a single clock cycle.

The impact was dramatic:

Implementation	Speed (cycles/byte)
Software (lookup tables)	~15-20 cycles/byte
Software (bitsliced)	~7-10 cycles/byte
AES-NI	~1-3 cycles/byte

AES-NI makes encryption essentially free — it’s faster than memory access. This removed the last argument against using encryption everywhere.

You can check if your CPU supports AES-NI:

# Linux
$ grep -o aes /proc/cpuinfo | head -1
aes

# macOS
$ sysctl -a | grep -i aes
hw.optional.aes: 1

Nearly every CPU made after 2012 (Intel, AMD, ARM) has hardware AES support.

AES in Practice — OpenSSL Examples

Encrypting and decrypting files:

# Encrypt with AES-256-CBC
$ openssl enc -aes-256-cbc -salt -in secret.txt -out secret.enc -pbkdf2

# Decrypt
$ openssl enc -aes-256-cbc -d -in secret.enc -out recovered.txt -pbkdf2

# Encrypt with AES-256-GCM (recommended)
$ openssl enc -aes-256-gcm -in secret.txt -out secret.enc -pbkdf2

In Python:

from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes

# AES-256-GCM encryption
key = get_random_bytes(32)  # 256-bit key
nonce = get_random_bytes(12)

cipher = AES.new(key, AES.MODE_GCM, nonce=nonce)
ciphertext, tag = cipher.encrypt_and_digest(b"Hello, AES!")

print(f"Ciphertext: {ciphertext.hex()}")
print(f"Tag: {tag.hex()}")
print(f"Nonce: {nonce.hex()}")

# Decryption
decipher = AES.new(key, AES.MODE_GCM, nonce=nonce)
plaintext = decipher.decrypt_and_verify(ciphertext, tag)
print(f"Decrypted: {plaintext.decode()}")

Always use authenticated encryption (GCM, CCM, or ChaCha20-Poly1305). Unauthenticated modes (ECB, CBC without HMAC) are vulnerable to padding oracle attacks and ciphertext manipulation.

AES-128 vs AES-256 — Which to Use?

Aspect	AES-128	AES-256
Key size	128 bits	256 bits
Rounds	10	14
Speed	~40% faster	Baseline
Security (brute force)	2^128	2^256
Best known attack	2^126.1 (biclique)	2^254.4 (biclique)
Quantum resistance	~2^64 (Grover’s)	~2^128 (Grover’s)

For most applications, AES-128 is sufficient. 2^128 operations is far beyond the reach of any technology we can foresee.

The main argument for AES-256 is post-quantum security. Grover’s algorithm (a quantum computing algorithm) can search a key space of size N in sqrt(N) operations. For AES-128, that reduces the effective security to 2^64 — still expensive, but potentially reachable by a large quantum computer. AES-256 under Grover’s gives 2^128 — which is considered safe even against quantum attacks.

If you’re deciding today: AES-256-GCM is the safe default. The performance difference is negligible on modern hardware with AES-NI.

Common Mistakes

1. Using ECB mode. Never use ECB for more than one block. Identical plaintexts produce identical ciphertexts.

2. Reusing IVs/nonces. In CBC, reusing an IV leaks information. In GCM, reusing a nonce is catastrophic — it completely breaks authentication and leaks the authentication key.

3. Not authenticating ciphertext. Using AES-CBC without HMAC allows an attacker to manipulate the ciphertext without detection (padding oracle attacks, bit-flipping). Always use authenticated encryption (GCM) or encrypt-then-MAC.

4. Using a password directly as a key. A password is not a key. Use a key derivation function (PBKDF2, scrypt, Argon2) to derive a proper key from a password.

5. Hardcoding keys in source code. Keys belong in secure storage (environment variables, key management services, HSMs), not in your codebase.

Final Thoughts

AES is a masterpiece of cryptographic engineering. It takes four simple operations — a byte substitution, a row shift, a column mix, and a key XOR — and combines them in a way that resists every known attack after 25 years of analysis by the world’s best cryptographers.

What makes AES special isn’t just its security. It’s the combination of security, performance, simplicity, and versatility. It runs on everything from 8-bit smartcards to 64-core servers. It has dedicated CPU instructions. It’s the default choice for TLS, disk encryption, file encryption, database encryption, and virtually every other encryption use case.

If you’ve followed the entire cryptography series — from Feistel ciphers and DES through 3DES to AES — you now understand the evolution of symmetric encryption over 50 years. From IBM’s Lucifer in the 1970s to Daemen and Rijmen’s Rijndael in 2000, each algorithm built on the lessons of its predecessor.

DES taught us about S-boxes, key schedules, and the Feistel structure. 3DES taught us about backward compatibility and the limits of patching old designs. AES taught us about mathematical elegance, GF(2^8) arithmetic, and how to build a cipher that stands the test of time.

Thanks for reading!

AES — The Advanced Encryption Standard

The NIST Competition

AES at a Glance

The State Matrix

The Math Foundation — Galois Fields GF(2^8)

Why Not Regular Arithmetic?

Bytes as Polynomials

Addition in GF(2^8)

Multiplication in GF(2^8)

The xtime() Shortcut

The Four Round Operations

Operation 1: SubBytes

How the S-box is Built

The S-box Table

Operation 2: ShiftRows

Operation 3: MixColumns

The Matrix Multiplication

Why This Matrix?

What MixColumns Achieves

Operation 4: AddRoundKey

The AES Key Schedule

AES-128 Key Expansion

Round Constants (Rcon)

Key Expansion in Code

AES-192 and AES-256

Putting It All Together — AES Encryption

AES Decryption

The Inverse MixColumns Matrix

The Avalanche Effect — Watching Diffusion Work

Why Is AES Secure?

The S-box Provides Confusion

ShiftRows + MixColumns Provide Diffusion

The Key Schedule

The Security Margin

AES Modes of Operation

ECB — Electronic Codebook (Don’t Use This)

CBC — Cipher Block Chaining

CTR — Counter Mode

GCM — Galois/Counter Mode

AES in Hardware — AES-NI

AES in Practice — OpenSSL Examples

AES-128 vs AES-256 — Which to Use?

Common Mistakes

Final Thoughts

Blockchain 0x000 – Understanding the Fundamentals

Identity and Access Management (IAM)

How I built a web based CPU Simulator

Writing a Shell Code for Linux

Exploiting a Stack Buffer Overflow on Windows

Access Control Models

Exploiting a Stack Buffer Overflow on Linux

Basic concepts of Cryptography

Common Web Application Attacks

Remote Code Execution (RCE)

Thilan Dissanayaka