当前位置: 首页 > >

[crypto]-The Armv8 Cryptographic Extension在linux中的应用

发布时间:


快速链接:
.
??? 个人博客笔记导读目录(全部) ???

相关推荐:
Armv8 Cryptographic Extension介绍
Linux Kernel aarch64 Crypto原理和框架介绍
Linux kernel内核调用crypto算法的方法
Linux Kernel aarch64的ARM-CE aes-ecb的底层代码导读

说明: 在无特别的说明下,本文讲述得都是armv8-aarch64体系、linux kernel 4.14 arm64软件环境!




1、在linux crypto底层,实现aes/hash的算法有三种方式:
(1)、cpu的纯软实现,使用cpu的ALU,x0-x30等寄存器,加加减减的计算。(本文不讨论此项)
(2)、ARM-CE,就是The Armv8 Cryptographic Extension了,调用arm-ce的指令和寄存器,进行加加减减计算
(3)、ARM-NEON : 调用arm neon指令(128bit的寄存器v0-v31),进行加加减减计算


再进一步阐述ARM-CE,其实也调用NEON的浮点型运算器,读写arm-ce的寄存器、以前使用arm-ce的指令,进行加解密运算。


2、需要明确一点
The Armv8 Cryptographic Extension provides instructions for the acceleration of encryption and decryption
Armv8 Cryptographic Extension 并不是单独的硬件 ,只是ARM扩展了一套寄存器和命令,依然还是cpu计算的
ARM NEON也是,也是一套寄存器和命令,依然还是cpu在下执行


3、以aes为例,arm-ce和arm-neon的crypto的暴露给上层的接口代码,都在下列文件中
在aes-glue.c中, 注册了aes算法接口:


static struct crypto_alg aes_algs[] = { {
.cra_name = "__ecb-aes-" MODE,
.cra_driver_name = "__driver-ecb-aes-" MODE,
.cra_priority = 0,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER |
CRYPTO_ALG_INTERNAL,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct crypto_aes_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_blkcipher_type,
.cra_module = THIS_MODULE,
.cra_blkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = 0,
.setkey = aes_setkey,
.encrypt = ecb_encrypt,
.decrypt = ecb_decrypt,
},
}, {
.cra_name = "__cbc-aes-" MODE,
.cra_driver_name = "__driver-cbc-aes-" MODE,
.cra_priority = 0,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER |
CRYPTO_ALG_INTERNAL,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct crypto_aes_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_blkcipher_type,
.cra_module = THIS_MODULE,
.cra_blkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = aes_setkey,
.encrypt = cbc_encrypt,
.decrypt = cbc_decrypt,
},
}, {
.cra_name = "__ctr-aes-" MODE,
.cra_driver_name = "__driver-ctr-aes-" MODE,
.cra_priority = 0,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER |
CRYPTO_ALG_INTERNAL,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct crypto_aes_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_blkcipher_type,
.cra_module = THIS_MODULE,
.cra_blkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = aes_setkey,
.encrypt = ctr_encrypt,
.decrypt = ctr_encrypt,
},
}, {
.cra_name = "__xts-aes-" MODE,
.cra_driver_name = "__driver-xts-aes-" MODE,
.cra_priority = 0,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER |
CRYPTO_ALG_INTERNAL,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct crypto_aes_xts_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_blkcipher_type,
.cra_module = THIS_MODULE,
.cra_blkcipher = {
.min_keysize = 2 * AES_MIN_KEY_SIZE,
.max_keysize = 2 * AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = xts_set_key,
.encrypt = xts_encrypt,
.decrypt = xts_decrypt,
},
}, {
.cra_name = "ecb(aes)",
.cra_driver_name = "ecb-aes-" MODE,
.cra_priority = PRIO,
.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER|CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct async_helper_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_ablkcipher_type,
.cra_module = THIS_MODULE,
.cra_init = ablk_init,
.cra_exit = ablk_exit,
.cra_ablkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = 0,
.setkey = ablk_set_key,
.encrypt = ablk_encrypt,
.decrypt = ablk_decrypt,
}
}, {
.cra_name = "cbc(aes)",
.cra_driver_name = "cbc-aes-" MODE,
.cra_priority = PRIO,
.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER|CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct async_helper_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_ablkcipher_type,
.cra_module = THIS_MODULE,
.cra_init = ablk_init,
.cra_exit = ablk_exit,
.cra_ablkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = ablk_set_key,
.encrypt = ablk_encrypt,
.decrypt = ablk_decrypt,
}
}, {
.cra_name = "ctr(aes)",
.cra_driver_name = "ctr-aes-" MODE,
.cra_priority = PRIO,
.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER|CRYPTO_ALG_ASYNC,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct async_helper_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_ablkcipher_type,
.cra_module = THIS_MODULE,
.cra_init = ablk_init,
.cra_exit = ablk_exit,
.cra_ablkcipher = {
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = ablk_set_key,
.encrypt = ablk_encrypt,
.decrypt = ablk_decrypt,
}
}, {
.cra_name = "xts(aes)",
.cra_driver_name = "xts-aes-" MODE,
.cra_priority = PRIO,
.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER|CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct async_helper_ctx),
.cra_alignmask = 7,
.cra_type = &crypto_ablkcipher_type,
.cra_module = THIS_MODULE,
.cra_init = ablk_init,
.cra_exit = ablk_exit,
.cra_ablkcipher = {
.min_keysize = 2 * AES_MIN_KEY_SIZE,
.max_keysize = 2 * AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = ablk_set_key,
.encrypt = ablk_encrypt,
.decrypt = ablk_decrypt,
}
} };

4、aes-glue.c文件中并且定义了每一个接口指向的底层函数, 这里根据USE_V8_CRYPTO_EXTENSIONS分为了两中情况:
(1)、使用arm的crypto extension硬件算aes
(2)、使用arm的SIMD指令(ARM NEON)来算aes


#ifdef USE_V8_CRYPTO_EXTENSIONS
#define MODE "ce"
#define PRIO 300
#define aes_setkey ce_aes_setkey
#define aes_expandkey ce_aes_expandkey
#define aes_ecb_encrypt ce_aes_ecb_encrypt
#define aes_ecb_decrypt ce_aes_ecb_decrypt
#define aes_cbc_encrypt ce_aes_cbc_encrypt
#define aes_cbc_decrypt ce_aes_cbc_decrypt
#define aes_ctr_encrypt ce_aes_ctr_encrypt
#define aes_xts_encrypt ce_aes_xts_encrypt
#define aes_xts_decrypt ce_aes_xts_decrypt
MODULE_DESCRIPTION("AES-ECB/CBC/CTR/XTS using ARMv8 Crypto Extensions");
#else
#define MODE "neon"
#define PRIO 200
#define aes_setkey crypto_aes_set_key
#define aes_expandkey crypto_aes_expand_key
#define aes_ecb_encrypt neon_aes_ecb_encrypt
#define aes_ecb_decrypt neon_aes_ecb_decrypt
#define aes_cbc_encrypt neon_aes_cbc_encrypt
#define aes_cbc_decrypt neon_aes_cbc_decrypt
#define aes_ctr_encrypt neon_aes_ctr_encrypt
#define aes_xts_encrypt neon_aes_xts_encrypt
#define aes_xts_decrypt neon_aes_xts_decrypt
MODULE_DESCRIPTION("AES-ECB/CBC/CTR/XTS using ARMv8 NEON");
#endif

5、以ce_aes_cbc_encrypt为例,看下crypto extension的硬件实现:
在aes-ce-core.S中, 由此可以看出,这里最终调用的都是aesd、aesmc…等armv8 crypto extension的汇编指令


ENTRY(ce_aes_cbc_encrypt)
push {r4-r6, lr}
ldrd r4, r5, [sp, #16]
vld1.8 {q0}, [r5]
prepare_key r2, r3
.Lcbcencloop:
vld1.8 {q1}, [r1, :64]! @ get next pt block
veor q0, q0, q1 @ ..and xor with iv
bl aes_encrypt
vst1.8 {q0}, [r0, :64]!
subs r4, r4, #1
bne .Lcbcencloop
vst1.8 {q0}, [r5]
pop {r4-r6, pc}
ENDPROC(ce_aes_cbc_encrypt)

aes_encrypt:
add ip, r2, #32 @ 3rd round key
.Laes_encrypt_tweak:
do_block enc_dround, enc_fround
ENDPROC(aes_encrypt)

.macro enc_dround, key1, key2
enc_round q0, key1
enc_round q0, key2
.endm

.macro enc_round, state, key
aese.8 state, key
aesmc.8 state, state
.endm

6、以ce_aes_cbc_encrypt为例,看下NEON的硬件实现:
在aes-neon.S/aes-modes.S中, 由此可以看出,这里最终操作的都是v0-v31等128bit的SIMD寄存器


AES_ENTRY(aes_cbc_encrypt)
cbz w6, .Lcbcencloop

ld1 {v0.16b}, [x5] /* get iv */
enc_prepare w3, x2, x6

/* do preload for encryption */
.macro enc_prepare, ignore0, ignore1, temp
prepare .LForward_Sbox, .LForward_ShiftRows, emp
.endm

/* preload the entire Sbox */
.macro prepare, sbox, shiftrows, temp
adr emp, sbox
movi v12.16b, #0x40
ldr q13, shiftrows
movi v14.16b, #0x1b
ld1 {v16.16b-v19.16b}, [ emp], #64
ld1 {v20.16b-v23.16b}, [ emp], #64
ld1 {v24.16b-v27.16b}, [ emp], #64
ld1 {v28.16b-v31.16b}, [ emp]
.endm

7、疑问与答案
kernel进程A在cpu0上跑的时候,被调度了,此时会将通用寄存器(如x0-x30,sp,pc等)保存起来,然后调度。等再回来时,(假设被cpu1接着执行了)再恢复下这些寄存器。程序就可以接着跑。
调度时只是保存了x0-x30等通用寄存器,并没有保存NEON寄存器v0-v31、ARM-CE的寄存器。
假如kernel线程A正在cpu0上执行浮点型运算(ARM NEON运算),然后被调度了(没有保存v0-v31),等再回来时是cpu1再来执行该线程,那么怎么可以回复之前的状态呢?


答案在Documentation/arm/kernel_mode_neon.txt中:
Use only NEON instructions, or VFP instructions that don’t rely on support
(1)Isolate your NEON code in a separate compilation unit, and compile it with ‘-mfpu=neon -mfloat-abi=softfp’
(2)Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your NEON code
(3)Don’t sleep in your NEON code, and be aware that it will be executed with preemption disabled


也就是说在执行NEON的代码中 ,需要使用kernel_neon_begin()/kernel_neon_end()包上,这样这段代码就不会被抢占、属于原子操作了。就没有调度之说了
示例:请看arch/arnm64/crypto/aes-glue.c,在该代码中,在调用ARM-NEON或ARM-CE时,都是使用kernel_neon_begin()/kernel_neon_end()包上了


如下列代码所示,arm-neon和arm-ce的ecb_encrypt都是调用的下列函数,然后aes_ecb_encrypt是一个宏,有USE_V8_CRYPTO_EXTENSIONS宏来决定,其要么指向ARM-CE的函数,要么指向ARM-NEON的函数
ecb_encrypt的主体是被kernel_neon_begin()/kernel_neon_end()包裹着的,所以这段函数是原子操作。


static int ecb_encrypt(struct blkcipher_desc *desc, struct scatterlist *dst,
struct scatterlist *src, unsigned int nbytes)
{
struct crypto_aes_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
int err, first, rounds = 6 + ctx->key_length / 4;
struct blkcipher_walk walk;
unsigned int blocks;

desc->flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
blkcipher_walk_init(&walk, dst, src, nbytes);
err = blkcipher_walk_virt(desc, &walk);

kernel_neon_begin();
for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
(u8 *)ctx->key_enc, rounds, blocks, first);
err = blkcipher_walk_done(desc, &walk, walk.nbytes % AES_BLOCK_SIZE);
}
kernel_neon_end();
return err;
}

8、userspace调用、物理地址的连续性、对齐、block
linux kernel crypto和用户空间通过netlink交互,在linux kernel的algif_skcipher.c程序中,会将userspace sock传来的数据,保存在scatterlist结构体(离散的物理块).
然后在aes-ce-glue.c中,依次对每一块连续的物理地址数据调用底层的加解密函数
在底层的加解密函数中,需要自行解决对齐的处理、分block的处理



友情链接: