The goal of smcryptoR is to use China’s Standards of Encryption Algorithms(SM) in R. smcryptoR uses rust FFI(Foreign Function Interface) bindings for rust crate.
SM3: message digest
SM2: encrypt/decrypt, sign/verify, key exchange
SM4: encrypt/decrypt
SM3 is similar to other well-known hash functions like SHA-256 in terms of its security properties and structure, which provides a fixed size output of 256 bits.
The sm3_hash
function accepts a raw vector parameter,
which is equivalent to a byte array represented in hexadecimal format.
In R, the charToRaw()
or serialize()
functions
can be used to convert strings or objects into the raw vector type.
<- charToRaw('abc')
msg sm3_hash(msg)
#> [1] "66c7f0f462eeedd9d1f2d46bdc10e4e24167c4875cf2f7a2297da02b8f4ba8e0"
You can also use sm3_hash_string()
to hash a character
string directly.
sm3_hash_string('abc')
#> [1] "66c7f0f462eeedd9d1f2d46bdc10e4e24167c4875cf2f7a2297da02b8f4ba8e0"
sm3_hash_file()
is provided to hash a local file on your
machine. For example use sm3_hash_file('/etc/hosts')
.
SM2 is based on the elliptic curve cryptography (ECC), which provides stronger security with shorter key lengths compared to traditional cryptography algorithms.
In asymmetric encryption, public keys and private keys appear in pairs. The public key is used for encryption and verification, while the private key is used for decryption and signing. The public key can be derived from the private key, but not the other way around.
## generate a keypair
<- sm2_gen_keypair()
keypair <- keypair$private_key
sk <- keypair$public_key
pk
sk#> [1] "0dbf3ea63efd867a41822a1cd2ee485ebe3993432fbcf2e96bfbadd3be2b6ac3"
pk#> [1] "eab671dfeed05b8c4c89f7d5c4872f3985501dd9c7764063c13303d97ef899d611387457af41cb9ada26bb99559452fe19a88d74e16107a600f76e4b10d087c5"
You can also export the public key from a private key.
<- sm2_pk_from_sk(sk)
pk
pk#> [1] "eab671dfeed05b8c4c89f7d5c4872f3985501dd9c7764063c13303d97ef899d611387457af41cb9ada26bb99559452fe19a88d74e16107a600f76e4b10d087c5"
This is to ensure the integrity of the data and guarantee its authenticity. Typically, the data owner uses the SM3 message digest algorithm to calculate the hash value and signs it with the private key, generating signed data. Then the owner distributes the original data and the signed data of the original data to the receiver. The receiver uses the public key and the received signed data to perform the verification operation. If the verification is successful, it is considered that the received original data has not been tampered with.
<- 'someone@company.com' |> charToRaw()
id <- 'abc' |> charToRaw()
data <- sm2_sign(id, data, sk)
sign ## return 1 or 0
sm2_verify(id, data, sign, pk)
#> [1] 1
SM2 is an asymmetric encryption algorithm that can also be used to directly encrypt data. Typically, A encrypts a file or data using the public key, passes the ciphertext to B, and B decrypts it using the corresponding private key. SM2 encryption and decryption are suitable for shorter texts only. For larger files, the process can be very slow.
## encrypt using public key
<- sm2_encrypt(data, pk)
enc ## cipher text
enc#> [1] b2 dd a3 01 79 64 de 69 5c a2 ea 7e 61 61 5f 2c fe dc 4d 1c a6 af ec 40 51
#> [26] e3 84 34 83 3a 94 7c d4 e2 bc e2 ac 57 90 a0 8a a9 95 c7 d3 d2 23 7f a0 b1
#> [51] 72 f5 dd 02 e8 70 3e 81 89 64 a2 b4 bf ac 0c 5e c6 7f 99 e9 13 67 af 1c ea
#> [76] 37 35 5b a9 8d bc 01 9b f9 77 07 ec 51 0e 73 de 3b 77 1f c4 0f 3f 43 ca
## decrypt using private key
<- sm2_decrypt(enc, sk)
dec ## plain text
dec#> [1] 61 62 63
## convert to character string
rawToChar(dec)
#> [1] "abc"
For ease of use, we have provided functions to encrypt data into hex or base64 format and decrypt them from these formats.
<- sm2_encrypt_base64(data, pk)
enc ## cipher text as base64
enc#> [1] "IKKpuCTG0TgI0OwLek/nY/i7/iy9737Xe57GbmiTOxyBB4Ua+N/cZ5oVLrHknHM1EXL488JUiaDmU2d6rYu6lEGWvpTD+qyNS5t3a98u2VI8n+ZjoUx33PXVM2W6Vm7Lzmf2"
sm2_decrypt_base64(enc, sk) |> rawToChar()
#> [1] "abc"
Or you can use hex as output instead.
<- sm2_encrypt_hex(data, pk)
enc ## cipher text as hex
enc#> [1] "7dad0f006314f93d1e30126d1e436b5a104f1ffd9555cfa03e245b399f8933df8238109021ffc3c75df633c3d8be2efd605f39d9163823ff788b5dbf2402f386ffc486cb32aedb05bf72e679d76d2b2f50952e5bd2b6caf79f946516aabe2dc45bdcc1"
sm2_decrypt_hex(enc, sk) |> rawToChar()
#> [1] "abc"
If A and B want to generate a recognized key for encryption or authentication, this algorithm can ensure that the key itself will not be transmitted through untrusted channels, and the private keys of A and B will not be disclosed. Even if an attacker intercepts the data exchanged by A and B, they cannot calculate the key agreed upon by A and B.
## Step 1
<- 16
klen <- "a@company.com" |> charToRaw()
id_a <- "b@company.com" |> charToRaw()
id_b <- sm2_gen_keypair()$private_key
private_key_a <- sm2_gen_keypair()$private_key
private_key_b <- sm2_keyexchange_1ab(klen, id_a, private_key_a)
step_1_a <- sm2_keyexchange_1ab(klen, id_b, private_key_b)
step_1_b
## Step 2
<- sm2_keyexchange_2a(id_a, private_key_a, step_1_a$private_key_r, step_1_b$data)
step_2_a <- sm2_keyexchange_2b(id_b, private_key_b, step_1_b$private_key_r, step_1_a$data)
step_2_b $k
step_2_a#> [1] "00c365484451c918e6e30a43ffa33478"
$k
step_2_b#> [1] "00c365484451c918e6e30a43ffa33478"
The output key k
should be length of 16 and
step_2_a$k
and step_2_b$k
should be equal.
The SM4 algorithm is a block symmetric encryption algorithm with a block size and key length of 128 bits. SM4 supports both the ECB (Electronic Codebook) mode and the CBC (Cipher Block Chaining) mode. The ECB mode is a simple block cipher encryption mode that encrypts each data block independently without depending on other blocks. The CBC mode, on the other hand, is a chained block cipher encryption mode where the encryption of each block depends on the previous ciphertext block. Therefore, it requires an initialization vector (IV) of the same 128-bit length. The CBC mode provides higher security than the ECB mode.
In ECB mode, each block of plaintext is encrypted independently, without any chaining with previous blocks. This means that the same plaintext block will always produce the same ciphertext block, given the same key.
## ecb mode
<- '1234567812345678' |> charToRaw()
key <- sm4_encrypt_ecb(data, key)
enc ## cipher text
enc#> [1] 06 6f eb d7 55 4a 8f ed 55 5b a2 6c f8 2a ff 3b
## plain text
sm4_decrypt_ecb(enc, key) |> rawToChar()
#> [1] "abc"
In CBC mode, each block of plaintext is combined (usually through XOR operation) with the previous ciphertext block before being encrypted. This chaining of blocks ensures that even if there are repeated blocks in the plaintext, the resulting ciphertext blocks will be different due to the influence of the previous ciphertext blocks.
<- '0000000000000000' |> charToRaw()
iv <- sm4_encrypt_cbc(data, key, iv)
enc ## cipher text
enc#> [1] 4d 2b cf dc f0 c1 13 34 4b 54 0e 76 fa a2 2f 08
sm4_decrypt_cbc(enc, key, iv) |> rawToChar()
#> [1] "abc"