Tracking Down AOSP Build Bugs

I’ve been asked how I tracked down an AOSP build issue on OS X (http://goo.gl/J9mOL), so I thought it might be worth putting the process up here so others can get an idea of whats involved;

  1. I picked the first error which was stopping the build. With a build problem there may be many error messages, but by focusing on them one at a time in the order they occur you can find that fixing an early build fail also fixes several later ones and so you don’t end up wasting time fixing things which are symptoms of a problem rather than the cause.

In this case the first problem was an undefined symbol;

Undefined symbols for architecture i386: “llvm::Module::dump() const”, referenced from: glsl_ir_to_llvm_module(exec_list*, llvm::Module*, GGLState const*, char const*)in libMesa.a(ir_to_llvm.o)

  1. I found where the build failed by running a single threaded build (i.e. do not use -j when making) and looking for the last build message before the failure which, in this case, was;

host Executable: mesa (out/host/darwin-x86/obj/EXECUTABLES/mesa_intermediates/mesa)

  1. I found the module where the build command is created. I did this by grepping the android build files for mesa using this command;

find . -name Android.mk | xargs grep mesa

  1. This gave you a module to rebuild to reproduce the problem. In this case the output included the following;

./external/mesa3d/test/Android.mk:LOCAL_MODULE := mesa

which meant I could reproduce the build problem using;

make clean ; make mesa

Note : Most problems will limited to that project (in this case external/mesa3d). Unfortunately in this case it was a little more complex.

  1. I could see that llvm::Module was not defined in the mesa3d project, which indicated the compile error was a symptom of an earlier problem, so I the showcommands make target to give me a complete build dump;

make clean ; make showcommands mesa

Warning: This generates a lot of output. In this case the command which was causing the build to fail was;

g++ -o out/host/darwin-x86/obj/EXECUTABLES/mesa_intermediates/mesa -headerpad_max_install_names -Lout/host/darwin-x86/obj/lib -m32 out/host/darwin-x86/obj/lib/libbcc.dylib out/host/darwin-x86/obj/EXECUTABLES/mesa_intermediates/egl.o out/host/darwin-x86/obj/EXECUTABLES/mesa_intermediates/main.o out/host/darwin-x86/obj/EXECUTABLES/mesa_intermediates/cmain.o out/host/darwin-x86/obj/EXECUTABLES/mesa_intermediates/m_matrix.o out/host/darwin-x86/obj/STATIC_LIBRARIES/libMesa_intermediates/libMesa.a -Wl,-undefined,error -lpthread -ldl

As I knew that the location of the definition of llvm::Module wasn’t in the mesa3d project I knew that the problem must be in one of the other referenced libraries. With this build line I could see that the only library built during the AOSP build that wasn’t in the mesa project was libbcc, so I looked for where that was built using a simliar command line to step 3;

find . -name Android.mk | xargs grep libbcc

which had the following line in its output;

./frameworks/compile/libbcc/Android.mk:LOCAL_MODULE := libbcc

so my attention switched to frameworks/compile/libbcc.

  1. I checked the libbcc projects for references to external/llvm (where the missing symbol was defined) and found;

Android.mk:include $(LLVM_ROOT_PATH)/llvm-device-build.mk Android.mk:include $(LLVM_ROOT_PATH)/llvm-host-build.mk

which indicated I was in the right place.

  1. I have access to a copy of gitweb configured for the AOSP source code so I was able to see from that there were only a few commits in the frameworks/compile/libbcc project which were only on master, so I decided I’d try to roll back the last few commits one-by-one. This is a bit of a heavy handed approach, but I could see libbcc was the problem (because it was the only non-mesa built library), I knew that this was a problem which had started recently, and I believed stepping back through a few most recent checkins would be quicker than analysing each patch and it’s effect on the build process (especially given my lack of familiarity with libbcc and mesa).

It’s always worth looking through the patches in case anything pops out at you as being wrong, but in this case there was nothing obvious I could see.

  1. I cycled through git checkout [commit_id] in the libbcc project and then make clean ; make mesa going back through each commit in the log until the compile completed.

  2. I posted the email showing which commit to checkout to fix the build problem.