This and That!

Explorations in programming and computers.

Understanding Python Concurrency - Introduction

This post will be exploring the core concepts of concurrency so that we understand the terminology we will use when we talk about concurrency.

What is Concurrency?

From Wikipedia, Concurrency is a property of systems in which several computations are executing simultaneously and potentially interacting with each other.

Simple examples of concurrent systems  are A network server which processes multiple client requests, A number crunching job occuring over several CPUs.

Multitasking

Multitasking means a computer running multiple processes at the same point of time. If you have a single core machine, you can run multiple programs because your computer is multitasking. In that case at any point the computer is executing instructions from only one program, but it constantly switches between multiple programs and gives you the illusion that your are running many programs. Actually even when you have a multicore machine and the number of tasks is more than the number of CPUs you have, your computer still performs multitasking.

Concurrency implies multitasking.

What is Parallelism?

When we have multiple CPUs each cpu can process one task simultaneously. This is called Parallelism.

Difference between Concurrency and Parallelism

This SO thread explains it much better than I ever can.

Concurrency is when two tasks can start, run, and complete in overlapping time periods. It doesn’t necessarily mean they’ll ever both be running at the same instant. Eg. multitasking on a single-core machine.

Parallelism is when tasks literally run at the same time, eg. on a multicore processor.

Quoting Sun’s Multithreaded Programming Guide:

  • Parallelism: A condition that arises when at least two threads are executing simultaneously.

  • Concurrency: A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism.

Nature of Programs

Programs typically execute by alternating over CPU processing and I/O handling. When a task is performing I/O it must wait(sleep) and the underlying system will carry out the I/O operation and wake up the the task when it’s done.

A task is said to be CPU bound if it spends most of it’s time processing with little I/O. A simple example is Image Processing.

A task is said to be I/O bound if it spends most of the time waiting for I/O. A simple exam is file processing.

Typically most programs are I/O bound.

Nature of Concurrent Programs

Concurrent programs can come in different flavors.

  • A program having tasks running in the same memory space. They have simultaneous access to objects.
  • Tasks running in separate processes.
  • Tasks running on separate machines

There is a fair bit of commentary on Dabeaz’s slides but I will be skipping that here as that seems more like fodder for programmer debates but I would recommend you go through it if only to understand why “Python is slow” is often wrong.

Understanding Python Concurrency - Prologue

I have been programming for a couple of years in python and one thing I do not understand very well is how to write code for concurrency in python. If I am being very honest, I don’t understand the concept of concurrency very well either. This series of blog posts is an attempt to understand both concurrency and the tools that python offers for concurrency.

I will be basing my self study on the slides from dabeaz on his course on concurrency. The goal is not to become an expert on Threads, rather not to remain so ignorant about them.

I will try to understand each concept in depth and hence this might be a long task, but hopefully it will be worth it. It’s very rare than one needs to use threads in Python but I guess an understanding of the concepts will hold me in good stead.

Here goes nothing.

A presentation on slightly advanced python

As part of a talk series at work, I had to give a presentation. I chose to give it on the topic of “Advanced Python”, though I renamed the talk as “Slighltly Advanced Python”, because I could not objectively decide on what constitutes advanced Python. 

The topics I covered as part of the talk were:

1. Python’s function calling mechanism

2. Magic Methods

3. Decorators

4. Iterators and Iterables

5. Generators

6. Using VirtualEnv.

Should I have included any other topic?

C,s&s from sid6376

Efficiently get a random item from Postgres using Django ORM

At my work, we often have to generate a random item from a queryset. The initial code that we wrote was a lot like this

count = MyModel.objects.filter(myfield__isnull = True).count()

rand_item = MyModel.objects.filter(myfield__isnull = True)[randint(0,count -1)]

While this worked without many problems when we had  a smaller dataset, once our dataset reached the low hundred thousand number we started having issues. The CPU usage of the database box would often reach 100% and the performance was very bad.

After some digging around I figured it was because of the above lines of code. The count query was computationally intensive and horribly slow. 

So I was stuck with finding a solution that did not completely overwhelm my somewhat modest database box. Turns out, the Django ORM has a nice trick to order items randomly.  Using this we can generate a random item in a more efficient way.

rand_item = MyModel.objects.filter(myfield__isnull = True).order_by('?')[0]

Not only does it make fewer queries, its also less computationally intensive than doing a .count().

My database box was much happier and so were my co-workers who no longer had to deal with an unresponsive server. 

Python’s function calling model

Though people much more intelligent than me have explained python’s calling model numerous times [1][2] , I will still try to elaborate as I find  I understand things better when I explain them to somebody and also when I try out examples

Everything is an object

In python, everything is an object.[3] Variables are essentially names that are assigned to an object and not a placeholder in memory as in languages like C++ etc. Where we can use these name bindings is essentially defined by the scope of the variable which is the block where the variable is housed.

#Illustrating how variables behave in python

#Variables are just a name assigned to an object.

a = "abc"

b= a

a += 'd'

print a

#Output: abcd

print b

#Output: abc

Mutable and Immutable objects

In python, objects are of two kinds - one whose value can be changed after the initial creation i.e. mutable objects and one whose value cannot be changed after the initial create i.e. immutable objects.

Lists ,Dictionaries etc. are mutable objects whereas  Tuples , Strings etc are immutable objects.

Calling mechanisms in python

So what calling mechanism does python use? 

Lets see by means of some examples

a = ['hello','world']

def func1(a):

    a = ['goodbye', 'world']

    print a

def func2(a):

    a[0] = "goodbye"

    print a

func1(a)

#Output :['goodbye','world']

print a

#Output : ['hello','world']

func2(a)

#Output: ['goodbye', 'world']

print a

#Output: ['goodbye', 'world']

In func1 the changes done inside to a are not reflected outside func1. So its not call by reference. In func2 the changes done inside to a are reflected outside func2. So its not call by value.

In func1 the name x is now assigned to a different object , but no change is made to the original object and hence it does not show up outside func1.

In func2 the object referred to by x is modified and this change does indeed show up outside of the function.

The function calling method used in python is neither call by value nor is it call by reference.  Its instead call by object or call by object reference or call by sharing.

Python follows the calling convention similar to a language called CLU[4], whose manual explains this calling mechanism wonderfully well.

“We call the argument passing technique _call by sharing_, because the argument objects are shared between the caller and the called routine. This technique does not correspond to most traditional argument passing techniques (it is similar to argument passing in LISP). In particular it is not call by value because mutations of arguments per-formed by the called routine will be visible to the caller. And it is not call by reference because access is not given to the variables of the caller, but merely to certain objects.”

That’s it. I wrote this post because i have often been stumped by this as the first languages i learnt were C and C++. Hopefully this helps you too. 

[1] http://effbot.org/zone/call-by-object.htm

[2] http://www.jeffknupp.com/blog/2012/11/13/is-python-callbyvalue-or-callbyreference-neither/

[3]http://www.diveintopython.net/getting_to_know_python/everything_is_an_object.html

[4] http://en.wikipedia.org/wiki/CLU_(programming_language)

Analyzing the typical python website set-up

The typical python website setup involves a WSGI compatible web-app, probably written with the help of a web-framework like Django, Flask etc., with  a Apache/mod_wsgi combination or a Nginx/(Gunicorn, uWSGI) combination.

I was curious about the part each component played in responding to a request. This is a description of what I learnt.

WSGI:

WSGI or Web Services Gateway Interface defines an interface between web servers and web applications or frameworks. It was primarily defined to remove the coupling between an application framework and a server. It was inspired by Java’s servlet API. Additionally it supports Middleware components for pre and post handling of requests.

It has two components:

1. Web Application: This can be any python callable which takes two arguments -   an environment object and a start_response callable. The start_response callable sets the HTTP response status and the headers while the environment object captures information about the request made. The callable must return an iterable.

2. Web Server : This calls the python callable with the environment and start_response callable. The primary responsibility of the server is to set certain environment specific variables which give information about the type of server they are running on and to pass on the status and the headers to the browser.

Hopefully this gives some context about how a WSGI app and a WSGI  Server play together.

So where does Nginx/Apache come in?

While its perfectly possible to serve up your web application using the application server, most sites in production use Nginx or Apache. Nginx acts as a reverse proxy whereas in Apache, the application server is part of Apache as a module.

This is done for the many reasons. Some of them are given below.

1. The SSL encryption can be taken care of by the reverse proxy.

2. Compression and caching of resources can be taken care by the servers instead of the application server which will consume more system resources.

3. They can help in serving up static content.

At styloot.com  we use Nginx with Gunicorn. We use nginx in interesting ways to fit our needs.

1. We statically generate some of our db-intensive views and then instruct nginx to point to the generated html.

2. We host most of our images on S3. A part of our workflow requirs manually identifying colors in items. We use HTML5 Canvas to do that but HTML% canvas doesn’t allow getting pixel values for images loaded from a different server, so we proxy S3. 

Thats it from us. Please let me know if something was incorrect.

Resources:

[1] http://en.wikipedia.org/wiki/Application_server

[2] http://en.wikipedia.org/wiki/Web_Server_Gateway_Interface

[3] http://en.wikipedia.org/wiki/Reverse_proxy

[4] http://pylonsbook.com/en/1.1/the-web-server-gateway-interface-wsgi.html

[5] http://www.python.org/dev/peps/pep-0333/

Declarative vs Imperative Knowledge

While going through SICP, I came across a wonderful quote about the difference between declarative and imperative knowledge

“The contrast between function and procedure is a reflection of the general distinction between describing properties of things and describing how to do things, or, as it is sometimes referred to, the distinction between declarative knowledge and imperative knowledge.”

While I did understand the difference between declarative and imperative approaches to programming, this simple quote does a better job of explaining it than probably any other text.