Tuesday, January 27, 2015

Scale your projects with Eventlet ( concurrent networking library ) + monkeypatching without changing existing code base

So what if you have an existing python code base but you are unable to scale it because of varoius dependencies like blocking I/O, network connections e.t.c

Look into eventlet !!! (thanks to my co-worker who suggested not looking into twisted to solve the problem since it was just plainly painful .. but look into eventlet)

easy to install , easy to integrate into existing code base ( if its not too late and if you are not using any C based libraries in your python code base)

They call it "greening"

So I ran into a similar blocking issue and the way i greened my  application is simply by monkeypatching the standard library.

pip install eventlet

 

Then add these lines as soon as your programme starts ( to avoid late binding problems) 

 
import eventlet
eventlet.monkey_patch() 

Now try to think how on highlevel you can sort of create threads on the blocking process and spawn threads of that blocking call.

( for ex:- lets say the blocking function  has a network connection or some other blocking I/O and it is called foo(p1,p2) and it is using a standard python library  )

 pool = eventlet.GreenPool()
 pool.spawn_n(foo, p1, p2)
 pool.waitall()

voila! you are async again!!! no matter how much time foo takes to complete .. you can move ahead :)

Friday, January 16, 2015

Metrics on your way : What do you when you want to see statistics of you project (graphite+statsd)

Okay so you have project working spik and spank .. but now you want to monitor the health of your project . you want to know how much data is it handling? how well is it handled? how long does this call take? how many success and how many failures? e.t.c e.t.c
Basically its like getting the health stats of your project .. so how do you do it ?
These are the phases my project saw :-
  1. we all start with good old pint statements :)
  2.  at some point they have to be removed .. so introduced logging.
  3.  big log files were generated .. we needed better understanding of all this data.
  4.  Splunk(www.splunk.com) cam to rescue .. feed it any log file and put in proper   queries and it will give you nice data. graphs., trends e.t.c
  5. Now if  splunk and big log data file is the concern .. I introduce a database (sqllite) ..did some smart queries and displayed is neatly on a django dashboard. worked as a charm ( i still love this option) and looked very professional but then scalability issues alas!
  6.  The came "statsd+graphite" ..Generate counters/stats using stasd. Run statsd, configure and feed it to graphite ..run graphite and watch the magic.
Here are very nice instructions to " installing and configuring statsd and graphite" thanks digital ocean "https://www.digitalocean.com/community/tutorials/installing-and-configuring-graphite-and-statsd-on-an-ubuntu-12-04-vps"

I used the python "statsd" library and to use it was as simple as that :-
>>> pip install statsd
>>> import statsd
>>> c = statsd.StatsClient('localhost', 8125)
>>> c.incr('foo')  # Increment the 'foo' counter.
>>> c.timing('stats.timed', 320)  # Record a 320ms 'stats.timed'.

one issue which i hit during the installation and configuration part was a twisted error of "unknown command : carbon-cache"
I was able to bypass is by manually deleting twisted ( i was not using twisted anywhere)

when i ran graphite locally i was abel to see my stats where and graph them.
Pretty cool!

Tuesday, January 6, 2015

Happy new year



Here are few notes to self:-
1) learn more new stuff
2) openstack
3) more python practice
4) more outreach "girls/women in tech"
5) more volunteering
6) network more and increase visibility
7) have fun!
8) take some relevant Moocs