最近在招人,招聘pythonista, 和我们一起搞openstack,所以想找几个能考察大家基础的小题目聊聊天,谈谈理想和人生,结果有篇blog让人眼前一亮,转来让大家看一看,特别是那些在简历上动辄就精通ABC的同学真该回家面壁思考人生去了。


For the last few weeks I have been interviewing several people for Python/Django developers so I thought that it might be helpful to show the questions I am asking together with the answers. The reason is … OK, let me tell you a story first. I remember when one of my university professors introduced to us his professor – the one who thought him. It was a really short visit but I still remember one if the things he said. “Ignorance is not bad, the bad thing is when you do no want to learn.” So back to the reason – if you have at least taken care to prepare for the interview, look for a standard questions and their answers and learn them this is a good start. Answering these question may not get you the job you are applying for but learning them will give you some valuable knowledge about Python. This post will include the questions that are Python specific and I’ll post the Django question separately.

1.How are arguments passed – by reference of by value?

The short answer is “neither”, actually it is called “call by object” or “call by sharing”(you can check here for more info). The longer one starts with the fact that this terminology is probably not the best one to describe how Python works. In Python everything is an object and all variables hold references to objects. The values of these references are to the functions. As result you can not change the value of the reference but you can modify the object if it is mutable. Remember numbers, strings and tuples are immutable, list and dicts are mutable.

May be more clear answer will be something like this (there is no short answer):
Python works differently compared to other languages and there is no such a thing like passing an argument by reference or by value. If we want to compare it it will be closer to passing by reference because the object is not copied into memory instead a new name is assigned to it. I say closer and this does not mean exact because in other languages where you can pass an argument by reference, you can modify the value. In Python you also have the ability to modify the passed object but only if it is mutable type (like lists, dicts, sets, etc.). If the type of the passed object is string or int or tuple or some other kind of immutable type you can not modify it in the function.

2.Do you know what list and dict comprehensions are? Can you give an example?

List/Dict comprehensions are syntax constructions to ease the creation of a list/dict based on existing iterable. According to the 3rd edition of “Learning Python” list comprehensions are generally faster than normal loops but this is something that may change between releases. Examples:

# simple iteration
a = []
for x in range(10):
a.append(x*2)
# a == [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

# list comprehension
a = [x*2 for x in range(10)]

# dict comprehension
a = {x: x*2 for x in range(10)}
# a == {0: 0, 1: 2, 2: 4, 3: 6, 4: 8, 5: 10, 6: 12, 7: 14, 8: 16, 9: 18}
3.What is PEP 8?

PEP 8 is a coding convention(a set of recommendations) how to write your Python code in order to make it more readable and useful for those after you. For more information check PEP 8.

4.Do you use virtual environments?

I personally and most(by my observation) of the Python developers find the virtual environment tool extremely useful. Yeah, probably you can live without it but this will make the work and support of multiple projects that requires different package versions a living hell.

5.Can you sum all of the elements in the list, how about to multuply them and get the result?
# the basic way
s = 0
for x in range(10):
s += x

# the right way
s = sum(range(10))


# the basic way
s = 1
for x in range(1, 10):
s = s * x

# the other way, this cool!!!
from operator import mul
reduce(mul, range(1, 10))

As for the last example, I know Guido van Rossum is not a fan of reduce, more info here, but still for some simple tasks reduce can come quite handy.

6.Do you know what is the difference between lists and tuples? Can you give me an example for their usage?

First list are mutable while tuples are not, and second tuples can be hashed e.g. to be used as keys for dictionaries. As an example of their usage, tuples are used when the order of the elements in the sequence matters e.g. a geographic coordinates, “list” of points in a path or route, or set of actions that should be executed in specific order. Don’t forget that you can use them a dictionary keys. For everything else use lists.

7.Do you know the difference between range and xrange?

Range returns a list while xrange returns a generator xrange object which takes the same memory no matter of the range size. In the first case you have all items already generated(this can take a lot of time and memory) while in the second you get the elements one by one e.g. only one element is generated and available per iteration. Simple example of generator usage can be find in the problem 2 of the “homework” for my presentation Functions in Python.

Just doing my duty by noting that xrange is NOT a generator:
xrange can be indexed (generators cannot)
xrange has no next method! It is iterable but not an iterator.
Because of the previous item, xrange objects can be iterated over multiple times (generators cannot)

8.Tell me a few differences between Python 2.x and 3.x

There are many answers here but for me some of the major changes in Python 3.x are: all strings are now Unicode, print is now function not a statement. There is no range, it has been replaced by xrange which is removed. All classes are new style and the division of integers now returns float.

9.What are decorators and what is their usage?

According to Bruce Eckel’s Introduction to Python Decorators “Decorators allow you to inject or modify code in functions or classes”. In other words decorators allow you to wrap a function or class method call and execute some code before or after the execution of the original code. And also you can nest them e.g. to use more than one decorator for a specific function. Usage examples include – logging the calls to specific method, checking for permission(s), checking and/or modifying the arguments passed to the method etc.

10.The with statement and its usage.

In a few words the with statement allows you to executed code before and/or after a specific set of operations. For example if you open a file for reading and parsing no matter what happens during the parsing you want to be sure that at the end the file is closed. This is normally achieved using the try… finally construction but the with statement simplifies it usin the so called “context management protocol”. To use it with your own objects you just have to define __enter__ and __exit__ methods. Some standard objects like the file object automatically support this protocol. For more information you may check Understanding Python’s “with” statement.


一直想对python里的关键字yield仔细了解一下。 今天终于沉得住气静的下心写个demo体会一下如何使用这个比较古怪的feature。

1
iter_a = [x*x for x in range(5)]

1
iter_b = (x*x for x in range(5))

a&b 有什么区别?
type(iter_a) is list
type(iter_b) is <generator object <genexpr> at 0x00000000>

yield expressions

The yield expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.

generator

Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called, the generator resumes where it left-off (it remembers all the data values and which statement was last executed).
Gernerators are iterators, but your can only iterate over them once. It’s because they do not store all the values in memory, they generate the values on the fly.

iterator

Everything you can use “for…in…” on is an iterator: lists, strings, files… These iterables are handy because you can read them as much as you wish, but you store all the values in memory and it’s not alwayss what you want when you have a lot of values.
An iterator object that defines the method next() which accesses elements in the container one at a time. When there are no more elements, next() raises a StopIteration exception which tells the for loop to terminate.

for…in

The for statement calls iter() on the container object. The function returns an iterator object that defines the method next() which accesses elements in the container one at a time. When there are no more elements, next() raises a StopIteration exception which tells the for loop to terminate.

example 斐波那契数列

假设我们现在有个一个求解斐波那契数列的函数fab(n),我的测试用例如下:

    >>> for n in fab(5): 
... print n
...
1
1
2
3
5

我们用最最显而易见的方式实现第一个版本:

    def fab_by_list(n):
i, a, b = 0, 0, 1
result = []

while i < n:
result.append(b)
a, b = b, a+b
i = i + 1

return result
fab = fab_by_list

测试一下:

1
for i in fab(1):print i

1
for i in fab(5):print i

1
for i in fab(100):print i

嗯,运行结果正确,性能十分好,收工。

既然for…in…的调用过程是先对container执行iter(),然后next(),那么使用oop的方式也可以实现,并且输入结果也没有异常,但是不支持多次迭代,需要添加reset函数支持。

    class Fab(object):-

def __init__(self, max):-
self.max = max-
self.n, self.a, self.b = 0, 0, 1-

def __iter__(self):-
return self-

def next(self):-
if self.n < self.max:-
r = self.b-
self.a, self.b = self.b, self.a + self.b-
self.n = self.n + 1-
return r-
raise StopIteration()

1
l = fab(500000)

我艹,谁手贱,怎么这么慢,慢,慢,看来代码必须要优化了。

尝试yield

    def fab_by_yield(n):
i, a, b = 0, 0, 1

while i < n:
yield b
a, b = b, a+b
i = i + 1

return
fab = fab_by_yield

测试一下:

1
for i in fab(1):print i

1
for i in fab(5):print i

1
for i in fab(100):print i

嗯,运行结果正确,性能十分好。
再试试手贱的测试:
1
l = fab(500000)
,:-O,秒杀。

简单地讲,yield 的作用就是把一个函数变成一个 generator,带有 yield 的函数不再是一个普通函数,Python 解释器会将其视为一个 generator,调用 fab(5) 不会执行 fab 函数,而是返回一个 iterable 对象!在 for 循环执行时,每次循环都会执行 fab 函数内部的代码,执行到 yield b 时,fab 函数就返回一个迭代值,下次迭代时,代码从 yield b 的下一条语句继续执行,而函数的本地变量看起来和上次中断执行前是完全一样的,于是函数继续执行,直到再次遇到 yield。 也可以手动调用 fab(5) 的 next() 方法(因为 fab(5) 是一个 generator 对象,该对象具有 next() 方法),这样我们就可以更清楚地看到 fab 的执行流程:

    >>> f = fab(5) 
>>> f.next()
1
>>> f.next()
1
>>> f.next()
2
>>> f.next()
3
>>> f.next()
5
>>> f.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration

一个带有 yield 的函数就是一个 generator,它和普通函数不同,生成一个 generator 看起来像函数调用,但不会执行任何函数代码,直到对其调用 next()(在 for 循环中会自动调用 next())才开始执行。虽然执行流程仍按函数的流程执行,但每执行到一个 yield 语句就会中断,并返回一个迭代值,下次执行时从 yield 的下一个语句继续执行。看起来就好像一个函数在正常执行的过程中被 yield 中断了数次,每次中断都会通过 yield 返回当前的迭代值。 yield 的好处是显而易见的,把一个函数改写为一个 generator 就获得了迭代能力,比起用类的实例保存状态来计算下一个 next() 的值,不仅代码简洁,而且执行流程异常清晰。 如何判断一个函数是否是一个特殊的 generator 函数?可以利用 isgeneratorfunction 判断:

    >>> from inspect import isgeneratorfunction 
>>> isgeneratorfunction(fab)
True

>>> import types
>>> isinstance(fab, types.GeneratorType)
False
>>> isinstance(fab(5), types.GeneratorType)
True

>>> from collections import Iterable
>>> isinstance(fab, Iterable)
False
>>> isinstance(fab(5), Iterable)
True

exception

1
Python SyntaxError: (“'return' with argument inside generator”,)
You cannot use return with a value to exit a generator. You need to use yield plus a return without an expression.

应用场景

对于读取大文件,在不可预估内存占用量的情况下,最好利用固定长度的缓冲区来不断读取文件内容,通过 yield,我们不再需要编写读文件的迭代类,就可以轻松实现文件读取:

   def read_file(fpath): 
BLOCK_SIZE = 1024
with open(fpath, 'rb') as f:
while True:
block = f.read(BLOCK_SIZE)
if block:
yield block
else:
return

[参考资料]
[The Python yield keyword explained]
[Python yield 使用浅析]
[return with argument inside generator)]
[python doc for yield]

公司用的是台式机,安装ubuntu12.04

uname -a

1
Linux zhanghui-pc 3.2.0-32-generic #51-Ubuntu SMP Wed Sep 26 21:33:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/lsb-release

1
DISTRIB_ID=Ubuntu

1
DISTRIB_RELEASE=12.04

1
DISTRIB_CODENAME=precise

1
DISTRIB_DESCRIPTION="Ubuntu 12.04.2 LTS"

tmux

1
sudo apt-get install tmux

[tmux]

terminator

1
sudo apt-get install terminator

设置teminator自动启动tmux
terminaor preferences -> profiles -> command -> run a custome command instead of my shell ->

1
([[ -f "$TMUX" ]] && tmux -2 -S $TMUX) || (TMUX="" tmux -2)

zsh

1
sudo apt-get install zsh

sudo chsh 输入: /bin/zsh 回车 我的zsh配置

git

1
sudo apt-get install git

vim

1
sudo apt-get install vim

我的vim配置

    call pathogen#helptags() " generate helptags for everything in 'runtimepath'
filetype plugin indent on

syntax on
set tabstop=4
set softtabstop=4
set shiftwidth=4
set nu
set autoindent
set smartindent
set expandtab
set mouse=nv
set cursorline

set list
set listchars=tab:>-,trail:<

autocmd FileType c set expandtab
autocmd FileType python set expandtab

set hlsearch
set statusline=[%n]\ %f%m%r%h\ %=\|\ %l,%c\ %p%%\ \|\ %{((&fenc==\"\")?\"\":\"\ \".&fenc)}\ \|\ %{hostname()}

配置jedi-vim jedi-vim
安装pathoge

    mkdir -p ~/.vim/autoload ~/.vim/bundle;
curl -Sso ~/.vim/autoload/pathogen.vim https://raw.github.com/tpope/vim-pathogen/master/autoload/pathogen.vim

编辑vimrc,添加到文件头部。

    " Pathogen
execute pathogen#infect()
call pathogen#helptags() " generate helptags for everything in 'runtimepath'
syntax on
filetype plugin indent on

安装jedi

    sudo pip install jedi
cd ~/.vim/bundle
git clone https://github.com/davidhalter/jedi-vim.git

配置完成jedi后通过vim写代码就可以有只能提示了,默认是通过.和ctrl+space的方式呼出。 不过对于在桌面系统的coder来说ctrl+space默认是切换输入法,所以编辑.vimrc修改默认jedi快捷键。

1
let g:jedi#autocompletion_command = "<C-j>"


现在基本可以用vim来流畅的写python代码了,如果愿意的话可以再给vim装点插件代码高亮之类,视个人爱好折腾。

Python 标准库中有很多实用的工具类,但是在具体使用时,标准库文档上对使用细节描述的并不清楚,比如 urllib2 这个 HTTP 客户端库。这里总结了一些 urllib2 库的使用细节。 1.Proxy 的设置 2.Timeout 设置 3.在 HTTP Request 中加入特定的 Header 4.Redirect 5.Cookie 6.使用 HTTP 的 PUT 和 DELETE 方法 7.得到 HTTP 的返回码 8.Debug Log

    import urllib2

def test_proxy(enable_proxy=True):
'''
urllib2 默认会使用环境变量 http_proxy 来设置 HTTP Proxy。如果想在程序中明确控制 Proxy,而不受环境变量的影响,可以使用下面的方式.
这里要注意的一个细节,使用 urllib2.install_opener() 会设置 urllib2 的全局 opener。这样后面的使用会很方便,但不能做更细粒度的控制,比如想在程序中使用两个不同的 Proxy 设置等。比较好的做法是不使用 install_opener 去更改全局的设置,而只是直接调用 opener 的 open 方法代替全局的 urlopen 方法。
'''

proxy_handler = urllib2.ProxyHandler({'http':'http://www.baidu.com/'})
null_proxy_handler = urllib2.ProxyHandler({})
if enable_proxy:
opener = urllib2.build_opener(proxy_handler)
else:
opener = urllib2.build_opener(null_proxy_handler)

urllib2.install_opener(opener)
    def test_timeout(timeout=10):
'''
在老版本中,urllib2 的 API 并没有暴露 Timeout 的设置,要设置 Timeout 值,只能更改 Socket 的全局 Timeout 值。
在新的 Python 2.6 版本中,超时可以通过 urllib2.urlopen() 的 timeout 参数直接设置。
'''

urllib2.socket.setdefaulttimeout(timeout)
#socket.setdefaulttimeout(10)
#response = urllib2.urlopen('http://www.facebook.com/',timeout=10)
    def test_httpheader(url):
'''
要加入 Header,需要使用 Request 对象
对有些 header 要特别留意,Server 端会针对这些 header 做检查

User-Agent 有些 Server 或 Proxy 会检查该值,用来判断是否是浏览器发起的 Request
Content-Type 在使用 REST 接口时,Server 会检查该值,用来确定 HTTP Body 中的内容该怎样解析。

常见的取值有:
application/xml :在 XML RPC,如 RESTful/SOAP 调用时使用
application/json :在 JSON RPC 调用时使用

application/x-www-form-urlencoded :浏览器提交 Web 表单时使用
……

在使用 RPC 调用 Server 提供的 RESTful 或 SOAP 服务时, Content-Type 设置错误会导致 Server 拒绝服务。

'''

request = urllib2.Request(url)
request.add_header('User-Agent','fake-client')
response = urllib2.urlopen(request)
    def test_redirect(url):
'''
urllib2 默认情况下会针对 3xx HTTP 返回码自动进行 Redirect 动作,无需人工配置。要检测是否发生了 Redirect 动作,只要检查一下 Response 的 URL 和 Request 的 URL 是否一致就可以了。
如果不想自动 Redirect,除了使用更低层次的 httplib 库之外,还可以使用自定义的 HTTPRedirectHandler 类。
'''

response = urllib2.urlopen(url)
is_redirected = response.geturl() == url

class RedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_301(self,req,fp,code,msg,headers):
pass
def http_error_302(self,req,fp,code,msg,headers):
pass
opener = urllib2.build_opener(RedirectHandler)
opener.open(url)
    def test_cookie(url):
'''
urllib2 对 Cookie 的处理也是自动的。如果需要得到某个 Cookie 项的值,可以这么做
'''

import cookielib
cookie = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
response = opener.open(url)
for item in cookie:
if item.name == 'xxx':
print item.value
    def test_http_put_delete(url,data=None):
'''
urllib2 只支持 HTTP 的 GET 和 POST 方法,如果要使用 HTTP PUT 和 DELETE,只能使用比较低层的 httplib 库。虽然如此,我们还是能通过下面的方式,使 urllib2 能够发出 HTTP PUT 或 DELETE 的包,这种做法虽然属于 Hack 的方式,但实际使用起来也没什么问题。
'''

request = urllib2.Request(url,data=data)
request.get_method = lambda:'PUT' #or delete
response = urllib2.urlopen(request)
    def test_http_status_code(url):
'''
对于 200 OK 来说,只要使用 urlopen 返回的 response 对象的 getcode() 方法就可以得到 HTTP 的返回码。但对其它返回码来说,urlopen 会抛出异常。这时候,就要检查异常对象的 code 属性了
'''

try:
reponse = urllib2.urlopen(url)
except urllib2.HTTPError,e:
print e.code
    def test_debug_log(url):
'''
使用 urllib2 时,可以通过下面的方法把 Debug Log 打开,这样收发包的内容就会在屏幕上打印出来,方便我们调试,在一定程度上可以省去抓包的工作。
'''

httpHandler = urllib2.HTTPHandler(debuglevel=1)
httpsHandler = urllib2.HTTPSHandler(debuglevel=1)
opener = urllib2.build_opener(httpHandler,httpsHandler)

urllib2.install_opener(opener)
response=urllib2.urlopen(url)
print '-'*30
import pprint as p
print p.pprint(response.__dict__)
print response.readlines()
    if __name__ == '__main__':
url = 'http://www.baidu.com/'
urllib2.socket.setdefaulttimeout(6)
#test_proxy(url)
#test_timeout(url)
#test_http_header(url)
#test_redirect(url)
#test_cookie(url)
#test_http_put_delete(url)
#test_http_status_code(url)
test_debug_log(url)

print 'done...'

###需求:加密python代码

最近有个需求,是要对我们的openstack代码进行加密,前期我已经使用pycompile将py代码编译成pyc,并且删除了py源文件。不过众所周知的原因,pyc所适用的业务场景并不是加密,所以我们将py文件编译成pyc文件的方式也只是自娱自乐,pyc还是可以轻易反编译成py文件,见decompile python2.7 pyc

程序员在无奈的时候只有一个地方可去,let’s go stackoverflow.com, search keywords: python encrypt. 我们查询到这篇问答:

How do I protect python code?

question:

I am developing a piece of software in python that will be distributed to my employer’s customers. My employer wants to limit the usage of the software with a time restricted license file.
If we distribute the .py files or even .pyc files it will be easy to (decompile), and remove the code that checks the license file.
Another aspect is that my employer does not want the code to be read by our customers, fearing that the code may be stolen or at least the “novel ideas”.
Is there a good way to handle this problem? Preferably with an off-the-shelf solution.
The software will run on Linux systems (so I don’t think py2exe will do the trick)

answer:

Python, being a byte-code-compiled interpreted language, is very difficult to lock down. Even if you use a exe-packager like py2exe, the layout of the executable is well-known, and the Python byte-codes are well understood.
Usually in cases like this, you have to make a tradeoff. How important is it really to protect the code? Are there real secrets in there (such as a key for symmetric encryption of bank transfers), or are you just being paranoid? Choose the language that lets you develop the best product quickest, and be realistic about how valuable your novel ideas are.
If you decide you really need to enforce the license check securely, write it as a small C extension so that the license check code can be extra-hard (but not impossible!) to reverse engineer, and leave the bulk of your code in Python.

通过这个问题我们可以得出,对于python代码的加密并没有一个太好的方法,而且一个脚本语言和一个开源项目做加密更是和开源社区的精神背道而驰。

不过通过大家的回复我们也看到,如果我们的python代码包含核心业务,应该需要把核心业务抽取出来,通过c extension的方式实现,对于c 语言的加密和混淆也是有成熟方案供选择的,这个我们后续再继续研究,不过我可能不会在openstack项目试用这个方法了,原因你懂的。

关于py2exe,有个回复是:py2exe just stores the .pyc byte code files in a .zip archive, so this is definitely not a solution. Still, that can be useful when combined with a suitable starup script to make it run out Linux。 py2exe打包了pyc文件,这个以前还没注意到。

这个问题中还有一个回复也是值得我们好好思考的,引用如下:

Python is not the tool you need
You must use the right tool to do the right thing, and Python was not designed to be obfuscated. It’s the contrary; everything is open or easy to reveal or modify in Python because that’s the language’s philosophy. If you want something you can’t see through, look for another tool. This is not a bad thing, it is important that several different tools exist for different usages.

Obfuscation is really hard
Even compiled programs can be reverse-engineered so don’t think that you can fully protect any code. You can analyze obfuscated PHP, break the flash encryption key, etc. Newer versions of Windows are cracked every time.

Having a legal requirement is a good way to go
You cannot prevent somebody from misusing your code, but you can easily discover if someone does. Therefore, it’s just a casual legal issue.

Code protection is overrated
Nowadays, business models tend to go for selling services instead of products. You cannot copy a service, pirate nor steal it. Maybe it’s time to consider to go with the flow…

怎么样,现在你对代码加密有了新的认识吗?


[参考]
[1]pycompile
[2]decompile
[3]py2exe
[4]how-do-i-protect-python-code