Salsa20
Salsa20 and the closely related ChaCha are stream ciphers developed by Daniel J. Bernstein. Salsa20, the original cipher, was designed in 2005, then later submitted to the eSTREAM European Union cryptographic validation process by Bernstein. ChaCha is a modification of Salsa20 published in 2008. It uses a new round function that increases diffusion and increases performance on some architectures.
Both ciphers are built on a pseudorandom function based on add–rotate–XOR operations — 32-bit addition, bitwise addition and rotation operations. The core function maps a 256-bit key, a 64-bit nonce, and a 64-bit counter to a 512-bit block of the key stream. This gives Salsa20 and ChaCha the unusual advantage that the user can efficiently seek to any position in the key stream in constant time. Salsa20 offers speeds of around 4–14 cycles per byte in software on modern x86 processors, and reasonable hardware performance. It is not patented, and Bernstein has written several public domain implementations optimized for common architectures.
Structure
Internally, the cipher uses bitwise addition ⊕, 32-bit addition mod 232 ⊞, and constant-distance rotation operations <<< on an internal state of sixteen 32-bit words. Using only add-rotate-xor operations avoids the possibility of timing attacks in software implementations. The internal state is made of sixteen 32-bit words arranged as a 4×4 matrix.| 0 | 1 | 2 | 3 |
| 4 | 5 | 6 | 7 |
| 8 | 9 | 10 | 11 |
| 12 | 13 | 14 | 15 |
The initial state is made up of eight words of key, two words of stream position, two words of nonce , and four fixed words :
| Key | Key | Key | |
| Key | Nonce | Nonce | |
| Key | |||
| Key | Key | Key |
The constant words spell "expand 32-byte k" in ASCII. This is an example of a nothing-up-my-sleeve number. The core operation in Salsa20 is the quarter-round
QR that takes a four-word input and produces a four-word output:b ^= <<< 7;
c ^= <<< 9;
d ^= <<< 13;
a ^= <<< 18;
Odd-numbered rounds apply
QR to each of the four columns in the 4×4 matrix, and even-numbered rounds apply it to each of the four rows. Two consecutive rounds together are called a double-round:// Odd round
QR // column 1
QR // column 2
QR // column 3
QR // column 4
// Even round
QR // row 1
QR // row 2
QR // row 3
QR // row 4
An implementation in C/C++ appears below.
- include
- define ROTL << ) | )
- define QR, \
d ^= ROTL, \
a ^= ROTL)
- define ROUNDS 20
In the last line, the mixed array is added, word by word, to the original array to obtain its 64-byte key stream block. This is important because the mixing rounds on their own are invertible. In other words, applying the reverse operations would produce the original 4×4 matrix, including the key. Adding the mixed array to the original makes it impossible to recover the input.
Salsa20 performs 20 rounds of mixing on its input. However, reduced-round variants Salsa20/8 and Salsa20/12 using 8 and 12 rounds respectively have also been introduced. These variants were introduced to complement the original Salsa20, not to replace it, and perform better in the eSTREAM benchmarks than Salsa20, though with a correspondingly lower security margin.
XSalsa20 with 192-bit nonce
In 2008, Bernstein proposed a variant of Salsa20 with 192-bit nonces called XSalsa20. XSalsa20 is provably secure if Salsa20 is secure, but is more suitable for applications where longer nonces are desired. XSalsa20 feeds the key and the first 128 bits of the nonce into one block of Salsa20, and uses 256 bits of the output as the key for standard Salsa20 using the last 64 bits of the nonce and the stream position. Specifically, the 256 bits of output used are those corresponding to the non-secret portions of the input: indexes 0, 5, 10, 15, 6, 7, 8 and 9.eSTREAM selection of Salsa20
Salsa20/12 has been selected as a Phase 3 design for Profile 1 by the eSTREAM project, receiving the highest weighted voting score of any Profile 1 algorithm at the end of Phase 2. Salsa20 had previously been selected as a Phase 2 Focus design for Profile 1 and as a Phase 2 design for Profile 2 by the eSTREAM project, but was not advanced to Phase 3 for Profile 2 because eSTREAM felt that it was probably not a good candidate for extremely resource-constrained hardware environments.The eSTREAM committee recommends the use of Salsa20/12, the 12-round variant, for "combining very good performance with a comfortable margin of security."
Cryptanalysis of Salsa20
, there are no published attacks on Salsa20/12 or the full Salsa20/20; the best attack known breaks 8 of the 12 or 20 rounds.In 2005, Paul Crowley reported an attack on Salsa20/5 with an estimated time complexity of 2165 and won Bernstein's US$1000 prize for "most interesting Salsa20 cryptanalysis". This attack and all subsequent attacks are based on truncated differential cryptanalysis. In 2006, Fischer, Meier, Berbain, Biasse, and Robshaw reported an attack on Salsa20/6 with estimated time complexity of 2177, and a related-key attack on Salsa20/7 with estimated time complexity of 2217.
In 2007, Tsunoo et al. announced a cryptanalysis of Salsa20 which breaks 8 out of 20 rounds to recover the 256-bit secret key in 2255 operations, using 211.37 keystream pairs. However, this attack does not seem to be competitive with the brute force attack.
In 2008, Aumasson, Fischer, Khazaei, Meier, and Rechberger reported a cryptanalytic attack against Salsa20/7 with a time complexity of 2151, and they reported an attack against Salsa20/8 with an estimated time complexity of 2251. This attack makes use of the new concept of probabilistic neutral key bits for probabilistic detection of a truncated differential. The attack can be adapted to break Salsa20/7 with a 128-bit key.
In 2012, the attack by Aumasson et al. was improved by Shi et al. against Salsa20/7 to a time complexity of 2109 and Salsa20/8 to 2250.
In 2013, Mouha and Preneel published a proof that 15 rounds of Salsa20 was 128-bit secure against differential cryptanalysis.
In 2025, Dey et al. reported a cryptanalytic attack against Salsa20/8 with a time complexity of 2245.84 and data amounting to 299.47.
ChaCha variant
In 2008, Bernstein published the closely related ChaCha family of ciphers, which aim to increase the diffusion per round while achieving the same or slightly better performance. The Aumasson et al. paper also attacks ChaCha, achieving one round fewer but claims that the attack fails to break 128-bit ChaCha7.Like Salsa20, ChaCha's initial state includes a 128-bit constant, a 256-bit key, a 64-bit counter, and a 64-bit nonce, arranged as a 4×4 matrix of 32-bit words. But ChaCha re-arranges some of the words in the initial state:
| Key | Key | Key | Key |
| Key | Key | Key | Key |
| Counter | Counter | Nonce | Nonce |
The constant is the same as Salsa20. ChaCha replaces the Salsa20 quarter-round
QR with:a += b; d ^= a; d <<<= 16;
c += d; b ^= c; b <<<= 12;
a += b; d ^= a; d <<<= 8;
c += d; b ^= c; b <<<= 7;
Notice that this version updates each word twice, while Salsa20's quarter round updates each word only once. In addition, the ChaCha quarter-round diffuses changes more quickly. On average, after changing 1 input bit the Salsa20 quarter-round will change 8 output bits while ChaCha will change 12.5 output bits.
The ChaCha quarter round has the same number of adds, xors, and bit rotates as the Salsa20 quarter-round, but the fact that two of the rotates are multiples of 8 allows for a small optimization on some architectures including x86. Additionally, the input formatting has been rearranged to support an efficient SSE implementation optimization discovered for Salsa20. Rather than alternating rounds down columns and across rows, they are performed down columns and along diagonals. Like Salsa20, ChaCha arranges the sixteen 32-bit words in a 4×4 matrix. If we index the matrix elements from 0 to 15
| 0 | 1 | 2 | 3 |
| 4 | 5 | 6 | 7 |
| 8 | 9 | 10 | 11 |
| 12 | 13 | 14 | 15 |
then a double round in ChaCha is:
// Odd round
QR // column 1
QR // column 2
QR // column 3
QR // column 4
// Even round
QR // diagonal 1
QR // diagonal 2
QR // diagonal 3
QR // diagonal 4
ChaCha20 uses 10 iterations of the double round. An implementation in C/C++ appears below.
- include
- define ROTL << ) | )
- define QR, \
a += b, d ^= a, d = ROTL, \
c += d, b ^= c, b = ROTL)
- define ROUNDS 20
ChaCha is the basis of the BLAKE hash function, a finalist in the NIST hash function competition, and its faster successors BLAKE2 and BLAKE3. It also defines a variant using sixteen 64-bit words, with correspondingly adjusted rotation constants.