Sunday, May 1, 2016

Python after C#/Java - Part 2 - Classes


Classes

class Account(object) :  # inherits object directly
     
     instanceCount= 0 # like static variable in java
     #this is similiar to java constructor, called automatically during creation : Account("assaf")                 def __init__(self, balance, someBool=true):
       self.Balance= balance#note:  no need to define them before, nice!!!
       self.SomeBool= someBool   #public style
       self._XProtected = 5 # one '_' prefix is like java-public, but the programmer ask politly.
       self.__Yprivate = 8  # two '_' prefix is like java-private. you 'll get exception when accessing.
       Account.instanceCount +=1

    #destructor (usually unused, like in Java)  called when no reference exists any more .
    #for example by using x=None,   or explicit calling del(x)
    def __del__(self):
         Account.instanceCount -=1 #static variable: note the usage of Account. and not self.
   
   #like the ToString method. optional, of course
    def __str__(self):
         return "balance: %f".format(self.Balance)
 
   #regular method
   def  deposit(self, x):  #note, when calling it "self" is not needed
        self.Balance += x
     

Usage:
account = Account("assaf")
account._XProtected = 666       #works, but '_' it means the programmer asked you not to do it
print( account._XProtected)    --> 666
print( account.__Yprivate)  --> AttributeError
print (account._Account__YPrivate) --> 8    #showing you there is no real way in python to defend against this. if someone want's, he can access it anyway



inheritance and static variables


class Counter:
    instanceCount = 0
    def __init__(self):
           type(self).instanceCount +=1 # and not Counter.instanceCount, cause it will be all sons.
    def __del__(self):
           type(self).instanceCount -=1

class Account(Counter):
      def __init__(self, x,y,z):
               Counter.__init__(self)  #explicit call is needed!

class MultipleInherit(Counter, Shouter):
        def __init__(self, x,y,z):
                 Counter.__init__(self)
                  Shouter_init__(self, y)

storage optimization (__slots__)

each instance has a built-in __dict__ hashmap, which contains all the dynamic memebers.  It means that you can always add members to a class, but it's heavier in storage on the RAM.
account = Account("assaf")
account.dynamicNewMember = 8  #works just fine

If you instantiate millions of these instances, instead of using the built-in __dict__, you can use a tighter static structure. Note: Don't optimize this way unless you have millions of instances.
Use __slots__ and define them before hand , by name , like:
def AccountWithLessStorage:
  __slots__ = ['Balance', 'someBool', '_XProtected', '__YPrivate']
   #everything else in the class is exactly the same, including usage in __init__
   

MetaClass , annotations and Reflection

java reflection-like operations is much easier in python,
instance.__dict__   # is a dictionary of the members (variables and methods), so calling the method append on a list can be done "in-reflection" very easily.
lst = [ 1 , 2  , 3 ]
non-reflection:   list.append(4)
reflection:           list.__dict__["append"](lst, 4) # the 1st parameter is for instance method is "self"     
Use decorators to wrap a method
Use metaclass for decorators like count call/timer/logging.



Monday, April 25, 2016

TensorFlow Installation


Tensor flow installation 

$sudo apt-get install oracle-java8-installer sudo apt-get install pkg-config zip g++ zlib1g-dev unzip wget https://github
Tried to install on GCE using two methods.
The first is the default installation instructions, when running the minst demo, it appears a bit slow (650-700ms)
Also tried this script.
Also tried to install from source: see original page

git clone --recurse-submodules https://github.com/tensorflow/tensorflow -b r0.7
$ sudo apt-get install python-numpy swig python-dev
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer sudo apt-get install pkg-config zip g++ zlib1g-dev unzip wget https://github.com/bazelbuild/bazel/releases/download/0.2.1/bazel_0.2.1-linux-x86_64.deb
 then google "how to install .deb"
./configure    --> say you don't have GPU  (my case. if it's not, you read yourself...)
bazel build -c opt --copt=-mavx //tensorflow/cc:tutorials_example_trainer


Jupiter notebook
look for their installation (pip..., including the dev too)
I also installed plots:
sudo apt-get install libfreetype6-dev libxft-dev
pip install matplotlib 
open port in GCE console:  gcloud compute firewall-rules create tcp8888 --allow=tcp:8888
run on the linux shell:  (ip 0.0.0.0 is a must, otherwise only local host will work)
jupyter notebook --ip=0.0.0.0 --port=8888 --no-browser
plotting:  if not visible, add this to the cell:  %matplotlib inline.   see a permanent solution here



installed via:
sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.7.1-cp27-none-linux_x86_64.whl

$ python -m tensorflow.models.image.mnist.convolutionalSuccessfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.Extracting data/train-images-idx3-ubyte.gzExtracting data/train-labels-idx1-ubyte.gzExtracting data/t10k-images-idx3-ubyte.gzExtracting data/t10k-labels-idx1-ubyte.gzInitialized!Step 0 (epoch 0.00), 7.2 msMinibatch loss: 12.053, learning rate: 0.010000Minibatch error: 90.6%Validation error: 84.6%Step 100 (epoch 0.12), 698.7 msMinibatch loss: 3.279, learning rate: 0.010000Minibatch error: 6.2%Validation error: 7.1%Step 200 (epoch 0.23), 670.1 msMinibatch loss: 3.503, learning rate: 0.010000Minibatch error: 12.5%Validation error: 3.6%Step 300 (epoch 0.35), 666.0 msMinibatch loss: 3.199, learning rate: 0.010000Minibatch error: 7.8%Validation error: 3.4%Step 400 (epoch 0.47), 658.3 msMinibatch loss: 3.239, learning rate: 0.010000Minibatch error: 10.9%Validation error: 2.6%Step 500 (epoch 0.58), 657.4 msMinibatch loss: 3.283, learning rate: 0.010000Minibatch error: 9.4%Validation error: 2.6%

Wednesday, April 20, 2016

Python over C# / Java

Most people say the python is elegnet over Java. They are correct :)

Let's go over the basics, and I believe you will agree in the end...


Syntax


Curly brackets for scopes are gone.  we use the indentation instead.
if (x>10) :
  print("somewhat big")
  if x>50 :
     print "big"

Strings can be anything between 'x'  or "x"  or """x""" , in the later case new line in the editor is translated to new line (\n) in the string, so what you see is what you get.


and, or and not  are the actual operators(!) not &&,||,!.
"in" replaces the contains operator: if  'a' in ('a','b','c')
[not so great?]  instead of max=(a>b)?a:b  ,  use max = a if (a>b) else b
very easy way to wrap a function with a decorator on function like @args-check




loops

while and for loops have an optional "else" clause which will be triggered only when the loop exists normally on the condition failure (and not in "break").  This is very useful and save ugly code in end of java loops where you are unsure what caused you to pass the loop.

for loops are always using "in"
for odd in range(1,10,2) : print odd  #   from 1 , as long as <10 ,jump of 2
for item in list : print item
for key in dictionary: print key
for value in dic.itervalues() : print value

built in data structures

no arrays!  python asks us to use a higher level structure, like list.
list = []  is like Java ArrayList of Objectstuple=()  is immutalbe ArrayList of Objects

some syntactic sugar for all sequences data structures (list,tuple and string too!):

list[0]
list[0,3] returns a slice with elements 0,1,2
list += [ "hello" , "world"]   #adds all elements of the second list
if  "hello" in list : print "world"

dict = { "key1" : "value1" , "key2" : value2 }
len(dict)  # return 2
dict.update( { "key1": "updated1" , "key100":"value100"}) #update by another dict
if "key100" in dict:  print ("this was expected")
dict["key100"] = "updated100"
del(dict["key100"])  #both key and value are deleted
dict.items = list of key,value tuples   [ ("key1","value1", "key2,value2")]
dict(dict.items)  # transform a list with tuples into a dictionary

set is similiar to hashset
mySet = { "key1" , "key2" , "key3")
mySet = set(["key1","key2","key3" )


Better than the ugly :  if (list!=null && list.length>0)

if  list :   #nil=false and empty list = false too
  #do something with the list
else
  print("empty list or nil list, do exception path!")


Super cool documentation and unit-testing feature

the first declaration inside a method can be a string, with code examples.
def add(x,y)
  """ adds x to y and return the sum
  >>> add(1,1)
  2
 >>> add(-1,101)
 100
"""
  return x+y

the documentation is accessed using add.__doc__
doctest can run all the code samples in the documentation.

import doctest
if __name__ == "__main__" ;  #optional "if" ,means run this code only if you don't import this
  doctest.testmod()
it will search for all the mehod documentation, and run each example.  if there is a problem, it will output the expected and the actual value.


printing


simple, in this case we indent to the left the name, for column size of 20, then price as float.
print("Name={name:<20s} Price:{price:8.2f}. format(a="golden-crown", b=2000.01) )
print("Name={name:<20s} Price:{price:8.2f}. format("golden-crown", 2000.01) )  #same


in many case, you have a varialbes instead, you can do this trick and use the general method locals() which pass the dictionary of current scope values:
print("Name={name:<20s} Price:{price:8.2f}. format(**locals() ))
As a sidenote: You can use the same trick with your own dictionary.
  def foo(a,b,c,d) :  print (a,b,c,d) #method with four arguments
  myDic = { "c":0.4 , d:"description" , "a": 55 , "b": true , }  #dictionay somewhere in the code
  foo(**myDic)  #will pass the right values from the dictionary to the right arguments
  myList = [ 55, true, 0.4 , "description"]
  foo(*myList) #does the same, but here order is important

print itlsef is quite strong
print( a,b, sep="\n")  will seperate with newline, default is one space.
fh = open ("data.txt", "w")
print("lets write to file. why should it be difficult or different then regular print?", file=fh)
fh.close()



exceptions (see the else part)

try:
   f = open("file.txt")
except IOError as (errnum, errstr):
  print("IO exception number {0} : {1}".format(errnum,errstr)
except (ValueError, InventedError)
  print("other known error happened")
except:  #other unknown error
   print("unknown error, for this, we propogate up", sys.exc_info()[0]))
   raise
else:
    #read the file
    print( f.readlines())
    f.close()

finally:
   print("just like in java, very optional"

assert is a synatactic sugar to if  not <condition>: raise AssertionError("msg")
assert  x>0 , "x must be above zero, it wasn't! fix this for assertion to pass"




Thursday, April 14, 2016

Choosing a Home Router

It's remarkable how many options there are, and how unorganized the information is.
Let's jump to the bottom line, and then get back to details and spec:
As of 2016 my recommendations:
If your modem speed is 50-100MBs, buy VDSL2, 802.11n (or better) and 10/100Mbs or better
If your modem speed is better, or if you pass large files inside the nwetwork, make sure your wireless and wire speeds match it. 

And for the gory details:

Wireless speed (this is total, if you have 2 clients, they will share bandwidth)
2.4Ghz/5Ghz need to be supported on your laptop/phone.  Most support 2.4Ghz and the newer (>2014) also support 5GHz, which is much less crowded.
In my neighborhood, using the first caps me at 6-10MBs, due to interfeences, and the later can achieve 30MBs.

 
802.11 ac -(both 2.4 and 5Ghz in parallel)  - >> 2GBs
802.11 n - (both 2.4 and 5Ghz in parallel) max of 150MB (1 antena) to 450MB(3 antenas)
802.11 a - (only 5GHz) - 54MB (lots of time less)
802.11 g - (only 2.4GHZ)  max 54MB  (sometimes more, lots of time less)
802.11 b - 11MB 

Wire speed
10/100/1000 Mbs - means supports network cards of 10Mbs and of 100Mbs and of 1000Mbs.
10/100 Mbps - only supports up to 100Mbps
I suggest, As of 2016, to buy at least 10/100Mbs, 

Modem speed
Do you have cable-type or DSL type?  see below for DSL type only:
VDSL, VDSL2   : max of 100MB download and upload (usually lower with distance)
ADSL2+ : max of 24MB download and 3.3 upload.
ADSL     : slower

Monday, January 11, 2016

Priority based decision of event


This is a domain-specific problem, not a general technology post, so read is as a computer-science riddle...

The problem

Match-making site between 2 people (like a dating site) or 2+ people activity (like a community basketball game of 5 on 5).
  • Each person can have a list of properties and preferences for activities, for example: age-group, distance from home, activity-type.
  • A person can choose multiple overlapping activities with priorities between them.
  • The selection should be bi-directional and we should make sure that there will not be "sterlie" selections where the other person did not even had a chance to choose him back.
Few other requirements:
  • There will be a rating system for people, after enough activities were made.

Discussion

Approaches:
CLUSTERING - TBD
One approach will be to try and cluster the data before any user-query is given, so that later queries will be answered faster. Many search engines use this approach.
The issue here is to define the clustering criteria.  It is clear that we can cluster by location (NYC is in different cluster than San Francisco.
We may also cluster in NYC by age  (a year for cluster).
And by degrees (none , 1 or 2 and above).
So now in NYC there are 40 clusters (20-60 age group) x 3 degree-types.
Although it can reduce the query time, a query on NYC women, any-degrees, age 25-35 will still yield a lot of data. (8m people in NYC area. This can yield 8m(2x4)=1m. Considrable reduction but still a big cluster...

QUERY

Storage: Each (1M) user has document/description of up to 1K, of which only small amount can be used for filtering. For a total of 1GB of storage.
Calculation: We assume the DB sort-by field is not enough, and that we need a smarter score function (like this)

[V1] Let`s first start with the trivial but but slow solution.

  • On Init, we will load all the N persons into memory, and check match between them (N^2), and will save a mapping of  person1-person2 = 50%.  person1-person3=95%, etc. we will have N^2/2 such mapping.
  • On a change in person preferences, we will recheck all his matches (N)
  • On a new person, we will check all his matches (N).

Assuming there are 1M persons, the match matrix can grow to be quite big. (1Mx1M)
In addition we need to save the history choices of every user (let`s say 1K for each) and filter them out, so they will not appear again. The issue here is that when we sort the results, for an old user, we will always filter out the first few thousand results.


[V2] Let`s try to refine this. As we know that only people with close-location proximity and age proximity are relevant, we first filter out big mismatches (using database-query)

On-demand, but cache query results
On Init, we do nothing
On change in preferences, we do the N/100 query, sort, and cache the results which will serve our paging result.
On new-person we do another query, and add the recent results to any cached result, as a secondary list.

To illustrate:
PersonA have a cached query  [10,000 sorted results, of them we choose the first 5,000] and a list of people already ranked by the user [the last 320 people he saw]
New PersonZ arrives and creates a query. Person C just changed it`s location to Far-away
PersonA now will also have a new-candidate [personF score 54] to combine during the next pagination, and [-MINUS- personF score ]

The first query (each time a user changes it`s preferences) requires ranking N/100. but it is cached.
The bigger issue, is how to save the updates to the cache. If a new user arrives, we may need to update 1,000-10,000 options.  No DB can simply persist this load. It may be solved by having the second-table in-memory with no-persistence.

[V3] Only on-demand, no-caching
Starting at a certain hour, only the logged-in users are the subset to query from. This will further considerably reduce the user set. Also use ElasticSearch to do the calculation in-server-farm
see function_score there.
There will be no caching  (except maybe client side cache of the next page of results, 5-10 tops).
Problem:  as we do re-ranking each time, of everything, it must be super fast, and it is hard to imagine how we can achieve it when needing to rank N/100 results.

[V4] On the first call, do a query on N/100 and save result and time of query.
On the second call, query only new data which changed preferences and merge to create a new snapshot.
Problem: snapshot change can be add-new-person.  update-up-score






BackEnd as a service (BaaS)

Standing on the shoulders of giants


Building software is becoming simpler and simpler.
Server-logic abstractions in the form of web-services (like google-apis, cloud databases).
Choosing the right level of abstraction will mean developing a website/app very fast, with most of the scaling problem already solved and instead of having 'opertaions' team, just use a credit card.

Why wasn't it done till now? why BaaS is becoming so popular today?
Cause of Mobile. When you had web, the developer already had his web-server and the skill-set to configure it to his needs.  iOS/Android applications are hosted in iOS/Google-play store, so there is no web-server, and their developers skill set is typically do not include web. Hence the need rose and many try to solve it.

A lot of the mobile apps, have common requirements from the BaaS:
Single-player games require: user-login, high-score, cloud save-game (upload file)
Social apps require: user-login, chat, uploading text and files,
All need analytics


  1. User Login abstraction
    1. Allow using identity from other site, like facebook/google.
    2. Allow email+password pair, including activation and password recovery
    3. Create the relevant user-tables and track their frequency of visiting 
  2. Analytics
  3. Push-notifications
  4. Database
    1. Usually non-SQL
    2. ** There are numerous hosted databases. MongolLab/Composeand/ they do only one aspect
    3. ** ElasticSearch capabilities  (uncommon need-  Compose/ Bonasi / Qbox)
  5. Files hosting (images/videos, but not the app itself)
    1. simple upload
    2. good download performance using CDNs.
  6. Chat messaging
    1. real-time chat (only a second delay between messages)
    2. history of the chat
    3. ** Pusher/PubNub



BaaS providers, see here
The most known: "Parse" and "Firebase"  (but they do not prvoide it all...)
The less known: Buddy, QuickBlox , appriaries

Parse (now part of facebook)

REST Based (polling)
Mature with great modules for Users,Analytics, Push,DB similiar to MongoDB (not sure if it is actually same interface) with great online editor.
Files hosing is lacking, Polling based (no chat!) , can be expensive.


Firebase (now part of Google)

Persistent connection based, which is totally great, except the cost per connection (user)
As of (11-Mar-2015) free: 50 Max connections,5 GB transfer, 100MB storage.
For 49$ a month -> 200 Max connections, 20GB transfer , 3GB storage



Pusher
Free: Max 20 connections, 100K messages.
For 49$ Max of 500 connections, 1M messages

HTML5 cross-platform (mobile/desktop) games


CocoonJS by Ludei

Replaces the PhoneGap(+PG Build) and supply Chromium browser.
It also provide libraries for some native/payment/push etc.
Your code should work on both desktop(without-it) and Mobile(with-it) . Need to look how the native/payment/push  works on desktop.
No special API for the game code itself, you can use any js libraries you want.

Compilation is done on the cloud. The result is a zip with debug&release apks, but sadly they are unsigned. you need to download them and using android java sdk:
%JAVA_HOME%\bin\jarsigner -verbose -keystore <keystore> -storepass <store-pass>  -keypass <key-pass> <apk_unsigned.apk> <key>
<android-sdk>\build-tools\21.1.2\zipalign.exe -v 4 <apk_unsigned.apk> <apk_signed.apk>
Then copy the signed_sdk to the android device (or send via drop-box) and install


Famo.us

Again replace the deployment with Chromium.
Have a special API for the game/app code, you don't work with the DOM regualrly and thus can't use JQuery for example, see here , not sure about the the level of abstraction of prepreated widgets (ready-to-use-list-view etc)
Great performance see this codepen on mobile


Game engines
nice list . another list review

low-level
pixi.js is a popular low-level engine
createjs - set of libs (easeljs,tweenks,soundjs,preloadjs)

turbulenz (no real mobile support)
http://biz.turbulenz.com/developers  - (like the game 'polycraft')
It uses WebGL for rendering, even on 2d games, with great performance on desktops, see this serious of articles on moving to HTML5.
But this means that on a lot of mobile devices the game will not work at all, not even chrome... There is still no native-app for it but the 2013 news says they are working on it.
Another note on the platform, the game engine can be used free of charge, and they encourage you to use their servers for hosting,multi-player, badges etc. The way they make money is 30% of the payment services, if you use theirs.  Note that you don`t need to.


TO TEST:
http://phaser.io/  ,   , http://www.kiwijs.org/ ??
http://www.gameclosure.com/ -movie - - comes with few pre-made "engines" for platformer/menu/maps etc
http://impactjs.com/


TOO BASIC  - http://craftyjs.com/ , http://melonjs.org/
 http://www.pandajs.net/ - no big games as of yet