这是一篇写在公司的老文章。

最近碰到的一个比较棘手的问题。在深度学习框架Torch7上测试所有用例fail,鉴于AARM64生态非常不完善,倒也不足为奇。

当前AARM64架构芯片上只支持48bit的VA,而以高效率著称的LuaJIT在代码中限制64位实地址限制使用47bit。高位用作NaN Tagging。

下面。至于为什么要这样,我们要从LUA的数据类型说起。

大家都知道LUA表示数据都是double双精度的,LUA在虚拟机实现时首先要考虑的是如何表示数据类型。

LUA有8种基本数据类型:

nil,
boolean
string,
heavy userdata
table,function
number
thread

typedef struct {
   int  t;
   Value V;
}TObject;
 
typedef union {
 GCObject *gc;
 void *p;
 lua_Number n;
 int b;
}Value

GCObject *gc

是一个垃圾回收对象指针,它用来表示复杂数据类型(string, tables,function,heavy userdata,threads)

b
表示一个布尔值,boolean

n
表示一个数字,number

t
nil 不需要Value这个Union来表示了,直接用t就可以。

不要忘了还有一个特殊的数据类型。

p

存储一个指针,所谓的light userdata 类型。我们的问题就出在这个数据类型上。
使用lua的人可以调用lua_pushlightuserdata(Lua_state L, void p)来存储一个指针到栈上。
lightuserdata没有GC,用来指向任一userdata数据类型 ,普通userdata是一个对象,只能和自己做比较。而lightuserdata通常用来指向userdata所在地址。
应用使用的API接口: void lua_pushlightuserdata (lua_State L, void p); (https://www.lua.org/pil/28.5.html)

这样看来LUA的8种数据类型,用一个TObject就可以完全表示了。其中Value的SIZE就是64bit,TObject在需要对齐的情况下16Byte,不需要对齐情况下12Byte。

再说LuaJIT。LuaJIT是Lua的JIT实现。对数据类型的表示进一步复杂化。涉及到了64位系统的VA配置选项。

1, 不开启LJ_GC64模式

对于非LJ_GC64模式:
Internal tags overlap the MSW of a number object (must be a double).
Interpreted as a double these are special NaNs. The FPU only generates
one type of NaN (0xfff8_0000_0000_0000). So MSWs > 0xfff80000 are available
for use as internal tags. Small negative numbers are used to shorten the
encoding of type comparisons (reg/mem against sign-ext. 8 bit immediate).

                      --MSW--.------LSW---------
primitive types       | itype |                      |
lightuserdata          | itype | void *           | (32 bit platforms)
lightuserdata          |ffff| void *                | (64 bit platforms, 47 bit pointers)
GC objects              | itype | GCRef          |
int (LJ_DUALNUM)  | itype | int                |
number                   -------double------

2, 开启LJ_GC64模式

                        ---MSW---.--------LSW---------
    
primitive types        |1..1|itype|1............1 |
GC objects/lightud     |1..1|itype|-------GCRef--------|
int (LJ_DUALNUM)       |1..1|itype|0..0|-----int----------|
numer                  ------------double-------------

对于LG_GC64模式
The upper 13 bits must be 1 (0xfff8...) for a special NaN. The next
4 bits hold the internal tag. The lowest 47 bits either hold a pointer,
a zero-extended 32 bit integer or all bits set to 1 for primitive types

   那么问题来了,高13bit被用作NaN,4bit无论如何要用来保存数据类型tag,剩下47bit可以表示数据。

你肯定要说,修改下数据类型结构不就得了? 非也。LuaJIT的所有垃圾回收机制都建立在这个数据结构上,工作量是一回事。对整套代码的稳定性才是关键。

  所以作者并不建议修改。最简单的办法莫过于配置内核的VA长度了。

在当前情况下,这意味着。在应用中使用lua_pushlightuserdata,传入的64bit指针如果不校验,那么指针就会跑飞。

这个问题一出现,各大开源社区立即鸡飞狗跳:

firefox社区:如何解决sparcx64 aarm64上跑firefox
https://bugzilla.mozilla.org/show_bug.cgi?id=1143022
https://bugzilla.mozilla.org/show_bug.cgi?id=1275204

linaro&csico 着手解决ARM64生态问题,开始支持luajit
https://collaborate.linaro.org/display/TCWGPUB/LuaJIT+for+ARM64
https://github.com/sindrom91/LuaJIT/commit/d63e0af3978f24e230cccc017ca98ed1653497de

Debian社区:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=818616

openResty社区:
https://github.com/openresty/lua-nginx-module/issues/757
https://github.com/openresty/lua-nginx-module/issues/1152

LuaJIT社区本身:
https://github.com/LuaJIT/LuaJIT/issues/156

luaJIT官网版本发布列表:
http://www.luatex.org/svn/tags/0.98.1/source/libs/luajit/LuaJIT-src/doc/changes.html

官网对当前状态中GC64模式不支持>47bit的VA:
file:///E:/luajit-2.0/doc/status.html

LuaJIT on 64 bit systems provides a limited range of 47 bits for the legacy lightuserdata data type. This is only relevant on x64 systems which use the negative part of the virtual address space in user mode, e.g. Solaris/x64, and on ARM64 systems configured with a 48 bit or 52 bit VA. Avoid using lightuserdata to hold pointers that may point outside of that range, e.g. variables on the stack. In general, avoid this data type for new code and replace it with (much more performant) FFI bindings. FFI cdata pointers can address the full 64 bit range.

luaJIT beta3更新日志:
http://repo.or.cz/luajit-2.0.git/shortlog/refs/tags/v2.1.0-beta3

支撑mike pall开发arm64版本的过程:
https://github.com/cbaylis/luajit-aarch64

linaro某位同学的私人proposal:
https://github.com/zhongweiy/LuaJIT/commit/3fa648b6d8b7ecd2d06ce8397089f244564265bb
https://github.com/zhongweiy/LuaJIT/commit/dc3bd1626b4c28da7aa58da30bbd174e73badb65
https://www.freelists.org/post/luajit/fix-lightud-type-for-48bit-virtual-address
https://www.freelists.org/post/luajit/Proposals-for-fixing-light-userdata-issue-on-virtual-address-47-bits-platform

linaro effort to port LuaJIt to AARM64: Sep 29, 2016
https://www.youtube.com/watch?v=ZTtCHF4FoqM&feature=youtu.be

综上所收集的社区状态,该bug只能通过软件适配修改。作者Mike Pall不准备给luajit做任何的workaround来支撑该问题的解决。

下一步要按照Mike的意见,使用ffi-binding来替换lightuserdata问题。JIT编译器的代码最好保证不动为妙(luajit代码相当难阅读。lua+c+asm)

torch7要改的代码涉及如下(已经将其列入issue讨论列表:https://github.com/torch/torch7/issues/1035)

01 update_on_definition in ctype.c (extraluaffifb) :
lua_pushlightuserdata(L, &to_define_key); 02 update_on_definition in
ctype.c (extraluaffifb) : lua_pushlightuserdata(L, &to_define_key);
03 set_defined in ctype.c (extraluaffifb) : lua_pushlightuserdata(L,
&to_define_key); 04 set_defined in ctype.c (extraluaffifb) :
lua_pushlightuserdata(L, &to_define_key); 05 push_upval in ffi.c
(extraluaffifb) : lua_pushlightuserdata(L, key); 06 set_upval in
ffi.c (extraluaffifb) : lua_pushlightuserdata(L, key); 07
ffi_metatype in ffi.c (extraluaffifb) : lua_pushlightuserdata(L,
&user_mt_key); 08 push_user_mt in ffi.c (extraluaffifb) :
lua_pushlightuserdata(L, &user_mt_key); 09 luaopen_libcutorch in
init.c (extracutorch) : lua_pushlightuserdata(L, state); 10 hookf in
ldblib.c (exeluajit-rockslua-5.1src) : lua_pushlightuserdata(L,
(void )&KEY_HOOK); 11 hookf in ldblib.c (exeluajit-rockslua-5.1src)
: lua_pushlightuserdata(L, L); 12 gethooktable in ldblib.c
(exeluajit-rockslua-5.1src) : lua_pushlightuserdata(L, (void
)&KEY_HOOK); 13 gethooktable in ldblib.c
(exeluajit-rockslua-5.1src) : lua_pushlightuserdata(L, (void
)&KEY_HOOK); 14 db_sethook in ldblib.c (exeluajit-rockslua-5.1src)
: lua_pushlightuserdata(L, L1); 15 db_gethook in ldblib.c
(exeluajit-rockslua-5.1src) : lua_pushlightuserdata(L, L1); 16
hookf in lib_debug.c (exeluajit-rocksluajit-2.1src) :
lua_pushlightuserdata(L, KEY_HOOK); 17 LJLIB_CF in lib_debug.c
(exeluajit-rocksluajit-2.1src) : lua_pushlightuserdata(L,
KEY_HOOK); 18 LJLIB_CF in lib_debug.c
(exeluajit-rocksluajit-2.1src) : lua_pushlightuserdata(L,
KEY_HOOK); 19 lj_cf_package_require in lib_package.c
(exeluajit-rocksluajit-2.1src) : lua_pushlightuserdata(L,
sentinel); 20 lj_api.c (exeluajit-rocksluajit-2.1src) line 697 :
LUA_API void lua_pushlightuserdata(lua_State L, void p) 21 ll_require
in loadlib.c (exeluajit-rockslua-5.1src) : lua_pushlightuserdata(L,
sentinel); 22 lua.h (exeluajit-rockslua-5.1src) line 170 : LUA_API
void (lua_pushlightuserdata) (lua_State L, void p); 23 lua.h
(exeluajit-rocksluajit-2.1src) line 171 : LUA_API void
(lua_pushlightuserdata) (lua_State L, void p); 24 lua.h
(exeluajit-rocksluarockswin32lua5.1include) line 170 : LUA_API
void (lua_pushlightuserdata) (lua_State L, void p); 25 lua.h
(exeluarockswin32lua5.1include) line 170 : LUA_API void
(lua_pushlightuserdata) (lua_State L, void p); 26 lua.h
(installinclude) line 171 : LUA_API void (lua_pushlightuserdata)
(lua_State L, void p); 27 luaT_iscdata in luaT.c (pkgtorchlibluaT)
: lua_pushlightuserdata(L, CDATA_MT_KEY); 28 luaT_iscdata in luaT.c
(pkgtorchlibluaT) : lua_pushlightuserdata(L, CDATA_MT_KEY); 29
json_process_value in lua_cjson.c (extralua-cjson) :
lua_pushlightuserdata(l, NULL); 30 lua_cjson_new in lua_cjson.c
(extralua-cjson) : lua_pushlightuserdata(l, NULL); 31 parse_record in
parser.c (extraluaffifb) : lua_pushlightuserdata(L, &g_name_key); 32
parse_record in parser.c (extraluaffifb) : lua_pushlightuserdata(L,
&g_name_key); 33 append_type_name in parser.c (extraluaffifb) :
lua_pushlightuserdata(L, &g_name_key); 34 append_type_name in parser.c
(extraluaffifb) : lua_pushlightuserdata(L, &g_front_name_key); 35
append_type_name in parser.c (extraluaffifb) :
lua_pushlightuserdata(L, &g_back_name_key); 36 find_canonical_usr in
parser.c (extraluaffifb) : lua_pushlightuserdata(L, &g_name_key); 37
find_canonical_usr in parser.c (extraluaffifb) :
lua_pushlightuserdata(L, &g_front_name_key); 38 find_canonical_usr in
parser.c (extraluaffifb) : lua_pushlightuserdata(L,
&g_back_name_key); 39 find_canonical_usr in parser.c (extraluaffifb)
: lua_pushlightuserdata(L, &g_name_key); 40 calculate_constant1 in
parser.c (extraluaffifb) : lua_pushlightuserdata(L, P); 41
add_tmpname in paths.c (pkgpaths) : lua_pushlightuserdata(L,
(void)tmpnames_key); 42 add_tmpname in paths.c (pkgpaths) :
lua_pushlightuserdata(L, (void)tmpnames_key); 43 luaQ_setup in
qtluaengine.cpp (exeqtluaqtlua) : lua_pushlightuserdata(L,
(void)engineKey); 44 luaQ_setup in qtluaengine.cpp (exeqtluaqtlua) :
lua_pushlightuserdata(L, (void)d); 45 luaQ_setup in qtluaengine.cpp
(exeqtluaqtlua) : lua_pushlightuserdata(L, (void)metaKey); 46
luaQ_setup in qtluaengine.cpp (exeqtluaqtlua) :
lua_pushlightuserdata(L, (void)signalKey); 47 luaQ_setup in
qtluaengine.cpp (exeqtluaqtlua) : lua_pushlightuserdata(L,
(void)objectKey); 48 luaQ_setup in qtluaengine.cpp (exeqtluaqtlua) :
lua_pushlightuserdata(L, (void)qtKey); 49 luaQ_private_noerr in
qtluaengine.cpp (exeqtluaqtlua) : lua_pushlightuserdata(L,
(void)engineKey); 50 QtLuaEngine::Private::~Private in qtluaengine.cpp
(exeqtluaqtlua) : lua_pushlightuserdata(L, (void)engineKey); 51
luaQ_pushqt in qtluaengine.cpp (exeqtluaqtlua) :
lua_pushlightuserdata(L, (void)qtKey); 52 luaQ_buildmetaclass in
qtluaengine.cpp (exeqtluaqtlua) : lua_pushlightuserdata(L,
(void)qtKey); 53 luaQ_buildmetaclass in qtluaengine.cpp
(exeqtluaqtlua) : lua_pushlightuserdata(L, (void)qtKey); 54
luaQ_pushmeta in qtluaengine.cpp (exeqtluaqtlua) :
lua_pushlightuserdata(L, (void)metaKey); 55 luaQ_pushmeta in
qtluaengine.cpp (exeqtluaqtlua) : lua_pushlightuserdata(L,
(void)((((size_t)type)<<1)|1)); 56 luaQ_pushmeta in qtluaengine.cpp
(exeqtluaqtlua) : lua_pushlightuserdata(L,
(void*)((((size_t)type)<<1)|1)); 57 luaQ_pushmeta in qtluaengine.cpp
(exeqtluaqtlua) : lua_pushlightuserdata(L, (void*)qtKey); 58
luaQ_pushmeta in qtluaengine.cpp (exeqtluaqtlua) :
lua_pushlightuserdata(L, (void*)metaKey); 59 luaQ_pushmeta in
qtluaengine.cpp (exeqtluaqtlua) : lua_pushlightuserdata(L,
(void*)mo); 60 luaQ_pushmeta in qtluaengine.cpp (exeqtluaqtlua) :
lua_pushlightuserdata(L, (void*)mo); 61 luaQ_pushmeta in
qtluaengine.cpp (exeqtluaqtlua) : lua_pushlightuserdata(L,
(void*)qtKey); 62 luaQ_pushqt in qtluaengine.cpp (exeqtluaqtlua) :
lua_pushlightuserdata(L, (void*)objectKey); 63 luaQ_pushqt in
qtluaengine.cpp (exeqtluaqtlua) : lua_pushlightuserdata(L,
(void*)obj); 64 luaQ_pushqt in qtluaengine.cpp (exeqtluaqtlua) :
lua_pushlightuserdata(L, (void*)obj); 65
QtLuaEngine::Receiver::universal in qtluaengine.cpp (exeqtluaqtlua)
: lua_pushlightuserdata(L, (void*)signalKey); 66
QtLuaEngine::Receiver::universal in qtluaengine.cpp (exeqtluaqtlua)
: lua_pushlightuserdata(L, (void*)this); 67
QtLuaEngine::Private::processQueuedSignals in qtluaengine.cpp
(exeqtluaqtlua) : lua_pushlightuserdata(L, (void*)signalKey); 68
QtLuaEngine::Private::processQueuedSignals in qtluaengine.cpp
(exeqtluaqtlua) : lua_pushlightuserdata(L, (void*)this); 69
QtLuaEngine::Private::processQueuedSignals in qtluaengine.cpp
(exeqtluaqtlua) : lua_pushlightuserdata(L, (void*)signalKey); 70
QtLuaEngine::Private::processQueuedSignals in qtluaengine.cpp
(exeqtluaqtlua) : lua_pushlightuserdata(L, (void*)receiver); 71
luaQ_connect in qtluaengine.cpp (exeqtluaqtlua) :
lua_pushlightuserdata(L, (void*)signalKey); 72 luaQ_connect in
qtluaengine.cpp (exeqtluaqtlua) : lua_pushlightuserdata(L,
(void*)r); 73 luaQ_disconnect in qtluaengine.cpp (exeqtluaqtlua) :
lua_pushlightuserdata(L, (void*)signalKey); 74 luaQ_disconnect in
qtluaengine.cpp (exeqtluaqtlua) : lua_pushlightuserdata(L,
(void*)r); 75 luaQ_disconnect in qtluaengine.cpp (exeqtluaqtlua) :
lua_pushlightuserdata(L, (void*)r);

更新:ARM64的适配问题,大Nvidia厂用自家的JetPack解决。
https://jkjung-avt.github.io/torch7-on-tx1/

鉴于作者Mike Pall建议使用LightUserData数据类型的用户修改为FFI(Foreign Function Interface)组件。修改工作量不少。

然而,Mike Pall 也忽略了FFI本身也使用了LightUserData。此外lua自己的api也使用了这种数据类型。

1 1,lightuserdata is a legacy data type, primarily intended for the
classic Lua/C API. And even there, it doesn't have many good use
cases.It's easy enough to work around the 47 bit limit. 2 2,The
Solaris/x64 community has had the same problem, due to their use of
the negative part of the address space in user mode. Have a look what
they did to change the (few) libraries that didn't play nicely。 3
3,lightuserdata used as a unique table key, e.g. for the registry:
with lua_pushlightuserdata(L, &x)and x on the C stack, replace x with
a static variable. 4 4,lightuserdata used as some kind of cross-object
pointer: don't do that -- it doesn't play nicely with the GC, anyway.
Replace with proper userdata references. 5 5,lightuserdata used as a
misguided userdata replacement, possibly even setting the
lightuserdatametatable (ugh): throw away any such library, replace
with a proper FFI binding.

lua_cpcall:因为lua_cpcall内部使用了light user
data,所以在48bit虚地址情况下这个API会有问题。而且这种内部使用light user
data的情况,FFI也帮不上忙。与它类似的还有luaJIT_setmode API。具体一些的细节可以参看
https://www.freelists.org/post/luajit/Proposals-for-fixing-light-userdata-issue-on-virtual-address-47-bits-platform

解决策略:

1, 修改FFI的内部函数。对于调用到lua_pushlightuserdata的接口,使用静态变量替代。
2, 修改lua_cpcall接口,增加一个新接口,避免使用lightuserdata.
3 , 修改torch所有使用到的该数据类型的接口,使用ffi和新的api规避lightsuerdata。能最大程度上保证luajit虚拟机的稳定性和性能。

ps:还有最后一个方法。。。期待第二个Mike Pall出现。。。Peter Cawley会是那个人???

   
  LuaJIT的虚拟机代码真的不是那么好看和好改。应该是神作,当前最精巧最快速小巧和可移植的JIT了。
   
  拜Mike!。

标签: none

添加新评论