Reader Question: Adding Methods to Python Lists
Reddit user /u/erkelep asks the question, "is there a way to add new methods to the Python list class?"
Specifically, can you add methods to lists in a way that makes those methods available to all list objects in python without subclassing them? While this is something you might not actually ever want to do, it is an interesting exercise to determine whether or not we can do it.
This question was first asked on reddit, I'm reposting my answer here. And yes, I do moonlight as a supervillain on /u/learnpython. I've reproduced the answer here with a few edits for clarity.
At first glance, the answer to the question of whether or not you can extend python builtins is cursorily yes, simply because the answer to many such questions about the workings of python tends to be 'yes'. Creating new methods ex post facto is commonly called 'monkey patching' and is something Ruby-folk tend to use as an everyday technique, and I know for a fact that it can be done in general. Observe.
>>> class Foo(object):
... pass
...
First, we create a class with no methods on it.
>>> my_foo = Foo() # Create a new object
>>> my_foo.monkey_method()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Foo' object has no attribute 'monkey_method'
As of here, the method monkey_method
does not yet exist and a traceback is thrown.
>>> def monkey_method(self): # Create said method
... print("Imma monkey")
...
Now, we create the method as a standalone function, then we attach to the class via assignment.
>>> Foo.monkey_method = monkey_method # Attach it as a bound method
>>> my_foo.monkey_method() # Call it, and all is well.
Imma monkey
Suddenly, the method works without having changed anything about the already-existing instance.
Knowing this, we attempt the same mechanism with filter
and attach it to the list
class.
>>> def filter(self, filter_fn):
... filtered_items = []
... for item in self:
... if filter_fn(item):
... filtered_items.append(item)
... return filtered_items
...
>>> list.filter = filter
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't set attributes of built-in/extension type 'list'
So that worked just fine with our class Foo
, but it's failing with the built-in list
. Just to be sure, let's try to do the assignment another way.
>>> setattr(list, 'filter', filter)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't set attributes of built-in/extension type 'list'
It seems we've run afoul of python's protection mechanisms. Built-ins are special classes and python will have none of your nonsense. That's something I just can't accept, and python can piss off. There's gotta be a way.
Working in python long enough you'll find that 'dictionaries are always the answer'. Indeed, it is true on a fundamental level. Secretly, all data in python is an object and all objects are secretly dictionaries in their heart of hearts. If we have access to those dictionaries, the world opens before us. Returning to our Foo
class, we repeat the exact same process by modifying the dictionary underlying the class, rather than the class itself.
>>> class Foo():
... pass
...
>>> def monkeymethod(self):
... print "Imma monkey"
...
>>> a = Foo()
>>> dir(Foo)
['__doc__', '__module__']
Notice that it only has two attributes, __doc__
and __module__
.
>>> Foo.__dict__
{'__module__': '__main__', '__doc__': None}
But when we explicitly access it, the __dict__
attribute appears suddenly. We can then add entries into the dictionary the usual way.
>>> Foo.__dict__['monkeymethod'] = monkeymethod
When we inspect the class again, we can see that modifying the dictionary modified the class as well.
>>> dir(Foo)
['__doc__', '__module__', 'monkeymethod']
>>> a.monkeymethod()
Imma monkey
Subtly, I've switched syntax to Python2 and used something called 'old style classes'. I just wanted to show that the theory is sound. When you're working with new-style classes (mandatory in Python3, optional in Python2 by subclassing from object
), we see some protection schemes in place that prevent this.
>>> class Foo(object):
... pass
...
>>> Foo.__dict__['monkeymethod'] = monkeymethod
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'dictproxy' object does not support item assignment
The dictproxy error was unexpected. This is the first time I had encountered it, and it is vexing that it prevents me from doing what I want. On further investigation, it appears that it exists specifically to vex me in doing exactly what I'm trying to do. Still, it's a start.
Near as I can tell, the dictproxy class is a wrapper around a dictionary that explicitly prevents you from modifying the dictionary. Accessing the dictionary is still permitted. The takeaway from that statement is that there is indeed a dictionary hiding somewhere inside that dictproxy, but there's no obvious way to get at it.
After researching for a bit, I have to conclude that there's no way to get at that dictionary through any of the usual python mechanisms. It's an implementation detail of CPython, hidden in the bowels of the C internals. For you experienced folks, that should get your wheels turning. Python (the one most of us use, anyway), is not only written in C, but has a fairly robust FFI (foreign function interface) for accessing functions and data structures in the language.
I hope you feel a knot in your stomach, because what we're going to attempt is pretty disgusting, all things considered.
The ctypes module lets us point at a chunk of memory and say, "Here is the shape of the C struct sitting at that memory location." Having done that, we can treat the chunk of memory as a C structure and access its members in python. As a C struct, All Python objects are backed by a C structure called PyObject, savvy? You may yet be in disbelief, so let me outline exactly what I'm proposing.
- Find the location of the
list
class'sdictproxy
in memory. - Cast the data at that location into the C version of the dictproxy.
- Once the data is a C structure, we have unfettered access to the dict object underneath (in C struct form)
- Take the struct representing the dictionary and recast it back into a Python dictionary.
- Laugh at Python's feeble attempt to keep us from our true heart's desire.
If you haven't picked up on it, this is not something anyone should be doing on a regular basis. Explicitly bypassing type protection is generally considered poor form. Those of you with weak stomachs can turn back now, because we're going to start looking at memory layouts.
To cast a location in memory into a struct, we need to know what the shape of the data is at that location. Poking around in the C documentation, we can see that all Python objects start the same way.
struct PyObject { // note: this is fudged quite a bit.
int ob_refcnt; // The number of references to this object
PyObject *ob_type; // A pointer to the object representing the Python class of the object
};
dictproxy
objects are an extension of that. The extension is complicated and I couldn't make sense of the entire thing, but we don't need the entire thing, just the location of the dictionary that it hides.
struct DictProxy {
int ob_refcnt; // Standard PyObject headers
PyObject *ob_type;
PyObject *dict; // The beginning of DictProxy specific attributes
};
Taking cues from a module, forbidden fruit that did exactly this process, we have the following definitions.
import ctypes
class PyObject(ctypes.Structure):
pass
PyObject._fields_ = [ ('ob_refcnt', ctypes.c_int),
('ob_type', ctypes.POINTER(PyObject)) ]
class DictProxy(PyObject):
_fields_ = [('dict', ctypes.POINTER(PyObject))]
list_proxy_dict = DictProxy.from_address(id(list.__dict__))
These are python classes which map directly to C structures which are imposed onto a chunk of memory that contains a python object. In no less confusing terms, we've cast the Python object to a C struct in Python's C structure interface. Yeah. It may take a bit of chewing to swallow that one. Now all we need to do is extract the dict
member and cast it back to a regular python dictionary.
This turns out to be an equally tricky process, and credit once again goes to the forbidden fruit module for this trick. Without dragging it on too much, here's what we do:
- Create a dictionary in Python
- Using C functions, assign the C version of the
DictProxy
's dictionary to an element in that dictionary - Access that element using regular Python dictionary access, and hopefully it's been converted back to regular Python object.
In practice, this looks like this:
conversion_dict = {}
ctypes.pythonapi.PyDict_SetItem(
ctypes.py_object(conversion_dict),
ctypes.py_object("list_dict"),
list_proxy_dict.dict
)
hackable_list = conversion_dict['list_dict']
It took some doing and crossing some language boundaries that were never really meant to be crossed, but we have access to the dictionary that powers lists, which means we can indeed add arbitrary methods to builtins. Using our filter from earlier,
hackable_list['filter'] = filter
my_l = range(5)
def odd(item)
return item % 2 == 1
print(my_l.filter(odd))
>> [1, 3, 5]
TL;DR: Yes. You can monkeypatch new methods into python's built-in classes.