Wednesday, July 6, 2016

Keras Installation on Windows - CPU performance


1. Install the basics.  start with this stackoverflow:

  • Install TDM GCC x64.  (need to add to begin of path? probably not)
  • Install Anaconda x64.
  • Open the Anaconda prompt
  • Run conda update conda (You might need to open-command-prompt with admin privalage)
  • Run conda update --all
  • Run conda install mingw libpython
  • Install the latest version of Theano, pip install git+git://github.com/Theano/Theano.git
  • Run pip install git+git://github.com/fchollet/keras.git
  • Update the cxx option in the env variables: set THEANO_FLAGS=floatX=float32,device=cpu,cxx=E:\\TDM-GCC-64\\bin\\g++.exe
  • Check that a simple Keras Hello-World is working .
2. Now let's improve performance. Here I use CPU and not GPU. If you have a good nvidia GPU, check keras cuda install guide.

  • Change your BLAS library : OpenBlas/MKL can give you 400% speedup. Download a good BLAS library and add it (the \bin folder) to the system path. I used openblas 2.0.14 and got a huge boost  update the THEANO_FLAGS env variable with it:
    set THEANO_FLAGS=floatX=float32,device=cpu,cxx=E:\\TDM-GCC-64\\bin\\g++.exe,blas.ldflags=-LE:\\code\\openblas\\bin -lopenblas

    I tried to use intel mkl library, which should be faster, but could not successfully configure it for keras. (If you did, please leave a comment....)
  • OMP: usually 10-20% boost
  • set OMP_NUM_THREADS=2  (benchmark X on your-system,  maybe 4 is better?)
    update THEANO_FLAGS:
    set THEANO_FLAGS=openmp=True,floatX=float32,device=cpu,cxx=E:\\TDM-GCC-
    64\\bin\\g++.exe,blas.ldflags=-LE:\\code\\openblas\\bin -lopenblas

    If you get this error :
    UserWarning: Your g++ compiler fails to compile OpenMP code. We know this happen with some version... then make
    Make sure your cxx is configured properly, but one some machine-configuration (my very-old desktop for example), I could not solve the issue.

FAQ:
InvalidValueError: InvalidValueError  ...
        type(variable) = TensorType(float32, (True, True))
        variable       = TensorConstant{(1L, 1L) of inf}
        context        = ...
  TensorConstant{(1L, 1L) of inf} [id A]

Check THEANO_FLAGS , mode should not be debug_mode.  if it does not help do "conda update -all"

No comments: