一、概述

在Python中,WSGI(Web Server Gateway Interface)定义了Web服务器与Web应用(或Web框架)之间的标准接口。在WSGI的规范下,各种各样的Web服务器和Web框架都可以很好的交互。

由于WSGI的存在,用Python写一个简单的Web框架也变得非常容易。然而,同很多其他的强大软件一样,要实现一个功能丰富、健壮高效的Web框架并非易事;如果您打算这么做,可能使用一个现成的Web框架(如 Django、Tornado、web.py 等)会是更合适的选择。

本文尝试写一个类似web.py的Web框架。好吧,我承认我夸大其辞了:首先,web.py并不简单;其次,本文只重点实现了 URL调度(URL dispatch)部分。

二、从demo_app开始

首先,作为一个初步体验,我们可以借助 wsgiref.simple_server 来搭建一个简单无比(trivial)的Web应用:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# demo_app是wsgiref提供的一个简单的application实现
from wsgiref.simple_server import make_server, demo_app

httpd = make_server('', 8086, demo_app)
sa = httpd.socket.getsockname()
print 'http://{0}:{1}/'.format(*sa)

# Respond to requests until process is killed
httpd.serve_forever()

运行脚本:

$ python code.py
http://0.0.0.0:8086/

打开浏览器,输入http://0.0.0.0:8086/后可以看到:一行"Hello world!" 和 众多环境变量值。

三、WSGI中的application

WSGI中规定:application是一个 可调用对象(callable object),它接受 environ 和 start_response 两个参数,并返回一个字符串迭代对象,其中:

  • 可调用对象 包括 函数、方法、 或者 具有__call__方法的 实例;
  • environ 是一个字典对象,包括CGI风格的环境变量(CGI-style environment variables)和 WSGI必需的变量(WSGI-required variables);
  • start_response 是一个可调用对象,它接受两个 常规参数(status,response_headers)和 一个 默认参数(exc_info);
  • 字符串迭代对象 可以是字符串列表、生成器函数或者具有__iter__方法的可迭代实例。更多细节参考 Specification Details

The Application/Framework Side 中给出了一个典型的application实现:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""application.py"""

def simple_app(environ, start_response):
    """Simplest possible application object"""
    status = '200 OK'
    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)
    return ['Hello world!\n']

现在用simple_app来替换demo_app:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""code.py"""

from wsgiref.simple_server import make_server
from application import simple_app as app

if __name__ == '__main__':
    httpd = make_server('', 8086, app)
    sa = httpd.socket.getsockname()
    print 'http://{0}:{1}/'.format(*sa)

    # Respond to requests until process is killed
    httpd.serve_forever()

运行脚本code.py后,访问http://0.0.0.0:8086/ 就可以看到那行熟悉的句子:Hello world!

四、区分URL

倒腾了一阵子后,您会发现不管如何改变URL中的path部分,得到的响应都是一样的。因为simple_app只识别host+port部分。

为了对URL中的path部分进行区分处理,需要修改application.py的实现。

首先,改用 来实现application:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""application.py"""

class my_app:
    def __init__(self, environ, start_response):
        self.environ = environ
        self.start = start_response

    def __iter__(self):
        status = '200 OK'
        response_headers = [('Content-type', 'text/plain')]
        self.start(status, response_headers)
        yield "Hello world!\n"

然后,增加对URL中path部分的区分处理:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""application.py"""

class my_app:
    def __init__(self, environ, start_response):
        self.environ = environ
        self.start = start_response

    def __iter__(self):
        path = self.environ['PATH_INFO']
        if path == "/":
            return self.GET_index()
        elif path == "/hello":
            return self.GET_hello()
        else:
            return self.notfound()

    def GET_index(self):
        status = '200 OK'
        response_headers = [('Content-type', 'text/plain')]
        self.start(status, response_headers)
        yield "Welcome!\n"

    def GET_hello(self):
        status = '200 OK'
        response_headers = [('Content-type', 'text/plain')]
        self.start(status, response_headers)
        yield "Hello world!\n"

    def notfound(self):
        status = '404 Not Found'
        response_headers = [('Content-type', 'text/plain')]
        self.start(status, response_headers)
        yield "Not Found\n"

修改code.py中的from application import simple_app as app,用my_app来替换simple_app后即可体验效果。

为什么实现了__iter__()的类可以作为WSGI application?

因为WSGI 服务器需要遍历WSGI application的输出结果,所以在遍历WSGI application的过程中会执行__iter__()方法,所以可以通过__iter__()方法将响应结果发送给WSGI 服务器

五、重构

上面的代码虽然奏效,但是在编码风格和灵活性方面有很多问题,下面逐步对其进行重构。

1、将URL抽取出来

urls = [
        ("/", "index"),
        ("/hello", "hello")
    ]
def __iter__(self):
    path_info = self.environ['PATH_INFO']
    method = self.environ['REQUEST_METHOD']
        
    for path, name in self.urls:
        if path == path_info:
            funcname = method.upper() + "_" + name
            func = getattr(self, funcname)
            return func()
    return self.notfound()

2、正则匹配URL

上面的代码消除URL硬编码,看起来好了一点,但是如何才能有将URL参数话呢?可以使用正则表达式,这样就可以使用类似/hello/foo这样的URL类,增加URL调度的灵活性:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""application.py"""

import re ##########修改点

class my_app:

    urls = (
        ("/", "index"),
        ("/hello/(.*)", "hello"),
    ) ##########修改点

    def __init__(self, environ, start_response):
        self.environ = environ
        self.start = start_response

    def __iter__(self): ##########修改点
        path = self.environ['PATH_INFO']
        method = self.environ['REQUEST_METHOD']

        for pattern, name in self.urls:
            m = re.match('^' + pattern + '$', path)
            if m:
                # pass the matched groups as arguments to the function
                args = m.groups()
                funcname = method.upper() + '_' + name
                if hasattr(self, funcname):
                    func = getattr(self, funcname)
                    return func(*args)

        return self.notfound()

    def GET_index(self):
        status = '200 OK'
        response_headers = [('Content-type', 'text/plain')]
        self.start(status, response_headers)
        yield "Welcome!\n"

    def GET_hello(self, name): ##########修改点
        status = '200 OK'
        response_headers = [('Content-type', 'text/plain')]
        self.start(status, response_headers)
        yield "Hello %s!\n" % name

    def notfound(self):
        status = '404 Not Found'
        response_headers = [('Content-type', 'text/plain')]
        self.start(status, response_headers)
        yield "Not Found\n"

3、URL处理部分抽离出独立方法

The _iter_ method is really doing delegation.,你不觉得应该将它分离出一个独立的方法吗?

def __iter__(self):
        return self.delegate()
            
def delegate(self):
    path = self.environ['PATH_INFO']
    method = self.environ['REQUEST_METHOD']
        
    for pattern, name in self.urls:
        m = re.match('^' + pattern + '$', path)
        if m:
            # pass the matched groups as arguments to the function
            args = m.groups()
            funcname = method.upper() + "_" + name
            func = getattr(self, funcname)
            return func(*args)
                
    return self.notfound()

4、创建父类

你知道吗?我认为现在有很多代码与我的应用程序完全无关。也许将其移动到基类中是个好主意。

class wsgiapp:
    """Base class for my wsgi application."""
    def __init__(self, environ, start_response):
        self.environ = environ
        self.start = start_response
        
    def __iter__(self):
        return self.delegate()
            
    def delegate(self):
        path = self.environ['PATH_INFO']
        method = self.environ['REQUEST_METHOD']
            
        for pattern, name in self.urls:
            m = re.match('^' + pattern + '$', path)
            if m:
                # pass the matched groups as arguments to the function
                args = m.groups()
                funcname = method.upper() + "_" + name
                func = getattr(self, funcname)
                return func(*args)
                    
        return self.notfound()
        
class application(wsgiapp):
    urls = [
        ("/", "index"),
        ("/hello/(.*)", "index")
    ]

    def GET_index(self):
        status = '200 OK'
        response_headers = [('Content-type', 'text/plain')]
        self.start(status, response_headers)
        yield "Welcome!\n"
    
    def GET_hello(self, name):
        status = '200 OK'
        response_headers = [('Content-type', 'text/plain')]
        self.start(status, response_headers)
        yield "Hello %s!\n" % name

5、去除重复代码

现在好多了

但是! 你发现了吗,还是有重复? The self.start method is called almost in the same way in both GET_ methods. How about keeping the status and headers in the wsgiapp and provide a header function to add any new headers, if required?

class wsgiapp:
    """Base class for my wsgi application."""
    def __init__(self, environ, start_response):
        self.environ = environ
        self.start = start_response
        self.status = "200 OK"
        self._headers = []
            
    def header(self, name, value):
        self._headers.append((name, value))
            
    def __iter__(self):
        x = self.delegate()
        self.start(self.status, self._headers)
            
        # return value can be a string or a list. we should be able to 
        # return an iter in both the cases.
        if isinstance(x, str):
            return iter([x])
        else:
            return iter(x)
    
    ...

class application(wsgiapp):
    urls = [
        ("/", "index"),
        ("/hello/(.*)", "index")
    ]

    def GET_index(self):
        self.header("content-type": "text/plain")
        return "Welcome!\n"
    
    def GET_hello(self, name):
        self.header("content-type": "text/plain")
        return "Hello %s!\n" % name

现在看起来好多了?

I introduced an error by mistake and the server started giving a blank response. Believe me, it was too hard to spot the mistake. Let me add a check for it.

def __iter__(self):
        try:
            x = self.delegate()
            self.start(self.status, self._headers)
        except:
            headers = [("Content-Type": "text/plain")]
            self.start("500 Internal Error", headers)
            x = "Internal Error:\n\n" + traceback.format_exc()
            
        # return value can be a string or a list. we should be able to 
        # return an iter in both the cases.
        if isinstance(x, str):
            return iter([x])
        else:
            return iter(x)

Oh my god! 我好像搞了一个新的Web框架😂.

下面是完整的代码:

import re
import traceback
    
class wsgiapp:
    """The most beatiful micro web framwork.
        
    How to use:
        
        class application(wsgiapp):
            urls = [
                ("/(.*)", "index"),
            ]
            def GET_hello(self, name):
                self.header("Content-Type", "text/plain")
                return "Hello, %s!" % name
    """
        
    def __init__(self, environ, start_response):
        self.environ = environ
        self.start = start_response
        self.status = "200 OK"
        self._headers = []
            
    def header(self, name, value):
        self._headers.append((name, value))
            
    def __iter__(self):
        try:
            x = self.delegate()
            self.start(self.status, self._headers)
        except:
            headers = [("Content-Type", "text/plain")]
            self.start("500 Internal Error", headers)
            x = "Internal Error:\n\n" + traceback.format_exc()
            
        # return value can be a string or a list. we should be able to 
        # return an iter in both the cases.
        if isinstance(x, str):
            return iter([x])
        else:
            return iter(x)

    def delegate(self):
        path = self.environ['PATH_INFO']
        method = self.environ['REQUEST_METHOD']
            
        for pattern, name in self.urls:
            m = re.match('^' + pattern + '$', path)
            if m:
                # pass the matched groups as arguments to the function
                args = m.groups() 
                funcname = method.upper() + "_" + name
                func = getattr(self, funcname)
                return func(*args)
                    
        return self.notfound()

6、抽象出框架

为了将类my_app抽象成一个独立的框架,需要作出以下修改:

  • 剥离出其中的具体处理细节:urls配置 和 GET_*方法(改成在多个类中实现相应的GET方法)
  • 把方法header实现为类方法(classmethod),以方便外部作为功能函数调用
  • 改用具有__call__方法的 实例 来实现application

修改后的application.py(最终版本):

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""application.py"""

import re

class my_app:
    """my simple web framework"""

    headers = []

    def __init__(self, urls=(), fvars={}):
        self._urls = urls
        self._fvars = fvars

    def __call__(self, environ, start_response):
        self._status = '200 OK' # 默认状态OK
        del self.headers[:] # 清空上一次的headers

        result = self._delegate(environ)
        start_response(self._status, self.headers)

        # 将返回值result(字符串 或者 字符串列表)转换为迭代对象
        if isinstance(result, basestring):
            return iter([result])
        else:
            return iter(result)

    def _delegate(self, environ):
        path = environ['PATH_INFO']
        method = environ['REQUEST_METHOD']

        for pattern, name in self._urls:
            m = re.match('^' + pattern + '$', path)
            if m:
                # pass the matched groups as arguments to the function
                args = m.groups()
                funcname = method.upper() # 方法名大写(如GET、POST)
                klass = self._fvars.get(name) # 根据字符串名称查找类对象
                if hasattr(klass, funcname):
                    func = getattr(klass, funcname)
                    return func(klass(), *args)

        return self._notfound()

    def _notfound(self):
        self._status = '404 Not Found'
        self.header('Content-type', 'text/plain')
        return "Not Found\n"

    @classmethod
    def header(cls, name, value):
        cls.headers.append((name, value))

对应修改后的code.py(最终版本):

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""code.py"""

from application import my_app

urls = (
    ("/", "index"),
    ("/hello/(.*)", "hello"),
)

wsgiapp = my_app(urls, globals())

class index:
    def GET(self):
        my_app.header('Content-type', 'text/plain')
        return "Welcome!\n"

class hello:
    def GET(self, name):
        my_app.header('Content-type', 'text/plain')
        return "Hello %s!\n" % name

if __name__ == '__main__':
    from wsgiref.simple_server import make_server
    httpd = make_server('', 8086, wsgiapp)

    sa = httpd.socket.getsockname()
    print 'http://{0}:{1}/'.format(*sa)

    # Respond to requests until process is killed
    httpd.serve_forever()

当然,您还可以在code.py中配置更多的URL映射,并实现相应的类来对请求作出响应。

六、参考

本文主要参考了 How to write a web framework in Python(作者 anandology 是web.py代码的两位维护者之一,另一位则是大名鼎鼎却英年早逝的 Aaron Swartz),在此基础上作了一些调整和修改,并掺杂了自己的一些想法。

如果您还觉得意犹未尽,Why so many Python web frameworks? 也是一篇很好的文章,也许它会让您对Python中Web框架的敬畏之心荡然无存:-)