How I Patched Python to Include This Ruby Feature
How I modified python’s source code and compiled it to accept “else-less” if expressions
In this post, I'll present how I changed Python’s source code and compiled from scratch to accept "else-less" if expressions, similar to Ruby's "inline if", also known as conditional modifier 👇
Why?
The idea of having an else-less if expression in Python came to my mind when I had to work with a Ruby service at my past job. Ruby, contrary to Python, makes lots of things implicit [Citation needed], and this kind of if expression is one of them. I say it's implicit because it returns nil
if the expression evaluates to false
. This is also called conditional modifier.
$ irb
irb(main):001:0> RUBY_VERSION
=> "2.7.1"
irb(main):002:0> a = 42 if true
=> 42
irb(main):003:0> b = 21 if false
=> nil
irb(main):004:0> b
=> nil
irb(main):005:0> a
=> 42
In Python, one cannot do that without explicitly adding an else
to the expression. In fact, as of this PR, the interpreter will tell right away that the else
is mandatory in the SyntaxError
message.
$ ./python
Python 3.11.0a0 (heads/main:938e84b4fa, Aug 6 2021, 08:59:36) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 42 if True
File "<stdin>", line 1
a = 42 if True
^^^^^^^^^^
SyntaxError: expected 'else' after 'if' expression
However, I find Ruby's if actually very convenient. This convenience became more evident when I had to go back to Python and write things like this:
>>> my_var = 42 if some_cond else None
So I thought to myself, what would be like if Python had similar feature? Could I do it myself? How hard would that be?
How?
Me trying to make sense of CPython's source code.
Digging into CPython's code and changing the language's syntax sounded not trivial to me.
Luckily, during the same week, I found out on Twitter that Anthony Shaw had just written a book on CPython Internals and it was available for pre-release. I didn't think twice and bought the book.
I've got to be honest, I'm the kind of person who buys things and doesn't use them immediately. As I had other plans in mind, I let it "getting dust" in my home folder for a while.
Until... I had to work with that Ruby service again. It reminded me of the CPython Internals book and how challenging hacking the guts of Python would be.
First thing was to go though the book from the very start and try to follow each step. The book focus on Python 3.9, so in order to follow though it, one needs to checkout the 3.9 tag, and that's what I did.
I learned about how the code is structured and then how to compile it. The next chapters show how to extend the grammar and add new things, such as a new operator.
As I got familiar with the code base and how to tweak the grammar, I decided to give it a spin and make my own changes to it.
The First (Failed) Attempt
As I started finding my way around CPython's code from the latest main branch, I noticed that lots of things had changed since Python 3.9, yet some fundamental concepts didn't.
My first shot was to dig into the grammar definition and find the if expression rule. The file is currently named Grammar/python.gram
. Locating it was not difficult, an ordinary CTRL+F for the 'else' keyword was enough.
file: Grammar/python.gram
...
expression[expr_ty] (memo):
| invalid_expression
| a=disjunction 'if' b=disjunction 'else' c=expression { _PyAST_IfExp(b, a, c, EXTRA) }
| disjunction
| lambdef
....
Now with the rule in hand, my idea was to add one more option to the current if expression where it would match a=disjunction 'if' b=disjunction
and c
expression would be NULL
.
This new rule should be placed immediately after the complete one, otherwise the parser would match a=disjunction 'if' b=disjunction
always, returning a SyntaxError
.
expression[expr_ty] (memo):
| invalid_expression
| a=disjunction 'if' b=disjunction 'else' c=expression { _PyAST_IfExp(b, a, c, EXTRA) }
| a=disjunction 'if' b=disjunction { _PyAST_IfExp(b, a, NULL, EXTRA) }
| disjunction
| lambdef
....
Regenerating the Parser and Compiling Python From Source
CPython comes with a Makefile
containing lots of useful commands. One of them is the regen-pegen
command which converts Grammar/python.gram
into Parser/parser.c
.
Besides changing the grammar, I had to modify the AST for the if expression. AST stands for Abstract Syntax Tree and it is a way of representing the syntactic structure of the grammar as a tree. For a more information about ASTs, I highly recommend the Crafting Interpreters book by Robert Nystrom.
Moving on, if you observe the rule for if expression goes like this:
| a=disjunction 'if' b=disjunction 'else' c=expression { _PyAST_IfExp(b, a, c, EXTRA) }
The means, when the parser finds this rule, it calls the _PyAST_IfExp
which gives back a expr_ty
data structure. So this gave me a clue, in order to implement the behavior of the new rule, I'd need to change _PyAST_IfExp
.
To find where is located, I used my rip-grep
skills and searched for it inside the source root.
$ rg _PyAST_IfExp -C2 .
[OMITTED]
Python/Python-ast.c
2686-
2687-expr_ty
2688:_PyAST_IfExp(expr_ty test, expr_ty body, expr_ty orelse, int lineno, int
2689- col_offset, int end_lineno, int end_col_offset, PyArena *arena)
2690-{
[OMITTED]
... And the implementation goes like this:
expr_ty
_PyAST_IfExp(expr_ty test, expr_ty body, expr_ty orelse, int lineno, int
col_offset, int end_lineno, int end_col_offset, PyArena *arena)
{
expr_ty p;
if (!test) {
PyErr_SetString(PyExc_ValueError,
"field 'test' is required for IfExp");
return NULL;
}
if (!body) {
PyErr_SetString(PyExc_ValueError,
"field 'body' is required for IfExp");
return NULL;
}
if (!orelse) {
PyErr_SetString(PyExc_ValueError,
"field 'orelse' is required for IfExp");
return NULL;
}
p = (expr_ty)_PyArena_Malloc(arena, sizeof(*p));
if (!p)
return NULL;
p->kind = IfExp_kind;
p->v.IfExp.test = test;
p->v.IfExp.body = body;
p->v.IfExp.orelse = orelse;
p->lineno = lineno;
p->col_offset = col_offset;
p->end_lineno = end_lineno;
p->end_col_offset = end_col_offset;
return p;
}
Since I pass orelse as NULL
, I thought it was just a matter of changing the body of if (!orelse)
and assign None
to orelse
.
if (!orelse) {
- PyErr_SetString(PyExc_ValueError,
- "field 'orelse' is required for IfExp");
- return NULL;
+ orelse = Py_None;
}
Now time to test it, I compile the code with make -j8 -s
and fire up the interpreter.
$ make -j8 -s
Python/Python-ast.c: In function ‘_PyAST_IfExp’:
Python/Python-ast.c:2703:16: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
orelse = Py_None;
Despite the glaring obvious warnings, I decided to ignore it just to see what happens. 😅
$ ./python
Python 3.11.0a0 (heads/ruby-if-new-dirty:f92b9133ef, Aug 2 2021, 09:13:02) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 42 if True
>>> a
42
>>> b = 21 if False
[1] 16805 segmentation fault (core dumped) ./python
Ouch! It works for the if True
case, but assigning Py_None
to expr_ty orelse
causes a segfault.
Time to go back to see what is wrong.
The Second Attempt
It wasn't too difficult to figure out where I messed up. orelse
is a expr_ty
and I'm assigning to it a Py_None
which is a PyObject *
. Again, thanks to rip-grep
I found its definition.
$ rg constant -tc -C2
Include/internal/pycore_asdl.h
14-typedef PyObject * string;
15-typedef PyObject * object;
16:typedef PyObject * constant;
Now, how did I find out Py_None
was a constant?
Whilst reviewing Grammar/python.gram
file, I found that one of the rules for the new pattern matching syntax is defined like this:
# Literal patterns are used for equality and identity constraints
literal_pattern[pattern_ty]:
| value=signed_number !('+' | '-') { _PyAST_MatchValue(value, EXTRA) }
| value=complex_number { _PyAST_MatchValue(value, EXTRA) }
| value=strings { _PyAST_MatchValue(value, EXTRA) }
| 'None' { _PyAST_MatchSingleton(Py_None, EXTRA) }
However, this rule is a pattern_ty
not an expr_ty
. But that's fine. What really matters is to understand what _PyAST_MatchSingleton
actually is. Then, I searched for it in Python/Python-ast.c
.
file: Python/Python-ast.c
...
pattern_ty
_PyAST_MatchSingleton(constant value, int lineno, int col_offset, int
end_lineno, int end_col_offset, PyArena *arena)
...
Now back to the "drawing board", I look for the definition of a None
node in the grammar. To my great relief, I find it!
atom[expr_ty]:
| NAME
| 'True' { _PyAST_Constant(Py_True, NULL, EXTRA) }
| 'False' { _PyAST_Constant(Py_False, NULL, EXTRA) }
| 'None' { _PyAST_Constant(Py_None, NULL, EXTRA) }
....
At this point, I had all the information I needed. To return a expr_ty
representing None
I need to create a node in the AST which is constant by using the _PyAST_Constant
function.
| a=disjunction 'if' b=disjunction 'else' c=expression { _PyAST_IfExp(b, a, c, EXTRA) }
- | a=disjunction 'if' b=disjunction { _PyAST_IfExp(b, a, NULL, EXTRA) }
+ | a=disjunction 'if' b=disjunction { _PyAST_IfExp(b, a, _PyAST_Constant(Py_None, NULL, EXTRA), EXTRA) }
| disjunction
Now I must revert Python/Python-ast.c
as well. Since I'm feeding it a valid expr_ty
, it will never be NULL
.
file: Python/Python-ast.c
...
if (!orelse) {
- orelse = Py_None;
+ PyErr_SetString(PyExc_ValueError,
+ "field 'orelse' is required for IfExp");
+ return NULL;
}
...
Let's compile it again and see what happens!
$ make -j8 -s && ./python
Python 3.11.0a0 (heads/ruby-if-new-dirty:25c439ebef, Aug 2 2021, 09:25:18) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> c = 42 if True
>>> c
42
>>> b = 21 if False
>>> type(b)
<class 'NoneType'>
>>>
WOT!? It works! 🎉🎉🎉
Now, we need to do one more test. Ruby functions allow returning a value if a condition matches and if not, the rest of the function body gets executed. Like this 👇
At this point I wonder if that would work out-of-the-box. I rush to the interpreter again and write the same function.
>>> def f(test):
... return 42 if test
... print('missed return')
... return 21
...
>>> f(False)
>>> f(True)
42
>>>
Ooopss...
The function returns None
if test is False
... To help me debug this, I summoned the ast
module. The official docs define it like so:
The ast module helps Python applications to process trees of the Python abstract syntax grammar. The abstract syntax itself might change with each Python release; this module helps to find out programmatically what the current grammar looks like.
Let's print the AST for this function...
>>> fc = '''
... def f(test):
... return 42 if test
... print('missed return')
... return 21
... '''
>>> print(ast.dump(ast.parse(fc), indent=4))
Module(
body=[
FunctionDef(
name='f',
args=arguments(
posonlyargs=[],
args=[
arg(arg='test')],
kwonlyargs=[],
kw_defaults=[],
defaults=[]),
body=[
Return(
value=IfExp(
test=Name(id='test', ctx=Load()),
body=Constant(value=42),
orelse=Constant(value=None))),
Expr(
value=Call(
func=Name(id='print', ctx=Load()),
args=[
Constant(value='missed return')],
keywords=[])),
Return(
value=Constant(value=21))],
decorator_list=[])],
type_ignores=[])
Now things make more sense, my change to the grammar was just a syntax sugar. It turns an expression like this a if b
into this a if b else None
. The problem here is that Python will return no matter what, so the rest of the function is ignored.
We can also look at the bytecode generated to understand what exactly is executed by the interpreter. And for that, we can use the dis
module. According to the docs:
The dis module supports the analysis of CPython bytecode by disassembling it.
>>> import dis
>>> dis.dis(f)
2 0 LOAD_FAST 0 (test)
2 POP_JUMP_IF_FALSE 4 (to 8)
4 LOAD_CONST 1 (42)
6 RETURN_VALUE
>> 8 LOAD_CONST 0 (None)
10 RETURN_VALUE
What this basically means is that in case the test is false, the execution jumps to 8, which loads the None
into the top of the stack and returns it.
Supporting "return-if"
To support the same Ruby feature, I can turn the expression return 42 if test
into a regular if statement that returns if test
is true.
To do that, I needed to add one more rule. This time, it would be a rule that matches the return <value> if <test>
piece of code. Not only that, we need a _PyAST_
function that creates the node for us. Let's then call it _PyAST_ReturnIfExpr
.
file: Grammar/python.gram
return_stmt[stmt_ty]:
+ | 'return' a=star_expressions 'if' b=disjunction { _PyAST_ReturnIfExpr(a, b, EXTRA) }
| 'return' a=[star_expressions] { _PyAST_Return(a, EXTRA) }
As mentioned previously, the implementation for all these functions reside in Python/Python-ast.c
, and the their definition in Include/internal/pycore_ast.h
, so I put _PyAST_ReturnIfExpr
there.
file: Include/internal/pycore_ast.h
stmt_ty _PyAST_Return(expr_ty value, int lineno, int col_offset, int
end_lineno, int end_col_offset, PyArena *arena);
+stmt_ty _PyAST_ReturnIfExpr(expr_ty value, expr_ty test, int lineno, int col_of
fset, int
+ end_lineno, int end_col_offset, PyArena *arena);
stmt_ty _PyAST_Delete(asdl_expr_seq * targets, int lineno, int col_offset, int
end_lineno, int end_col_offset, PyArena *arena);
file: Python/Python-ast.c
}
+stmt_ty
+_PyAST_ReturnIfExpr(expr_ty value, expr_ty test, int lineno, int col_offset, int end_lineno, int
+ end_col_offset, PyArena *arena)
+{
+ stmt_ty ret, p;
+ ret = _PyAST_Return(value, lineno, col_offset, end_lineno, end_col_offset, arena);
+
+ asdl_stmt_seq *body;
+ body = _Py_asdl_stmt_seq_new(1, arena);
+ asdl_seq_SET(body, 0, ret);
+
+ p = _PyAST_If(test, body, NULL, lineno, col_offset, end_lineno, end_col_offset, arena);
+
+ return p;
+}
+
stmt_ty
Let's pause for a bit to examine the implementation of _PyAST_ReturnIfExpr
. Like I mentioned previously, I want to turn return <value> if <test>
into if <test>: return <value>
.
Both return
and the regular if
are statements, so in CPython they're represented as stmt_ty
. The _PyAST_If
expectes a expr_ty test
and a body, which is a sequence of statements. In this case, body
is asdl_stmt_seq *body
.
As a result, what we really want here is a if
statement with a body where the only statement is a return <value>
one.
CPython disposes of some convenient functions to build asdl_stmt_seq *
and one of them is _Py_asdl_stmt_seq_new
. So I used it to create the body and add the return statement I created a few lines before with _PyAST_Return
.
Once that's done, the last step is to pass the test
as well as the body
to _PyAST_If
.
And before I forget, you may be wondering what on earth is the PyArena *arena
. Arena is a CPython abstraction used for memory allocation. It allows efficient memory usage by using memory mapping mmap()
and placing them in contiguous chunks of memory [reference].
Now it's time to regenerate the parser and test it one more time.
>>> def f(test):
... return 42 if test
... print('missed return')
... return 21
...
>>> import dis
>>> f(False)
>>> f(True)
42
Oh no! It doesn't work...
Let's check the bytecodes...
>>> dis.dis(f)
2 0 LOAD_FAST 0 (test)
2 POP_JUMP_IF_FALSE 4 (to 8)
4 LOAD_CONST 1 (42)
6 RETURN_VALUE
>> 8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>>
... the same bloody bytecode instructions again!
Going Back to the Compilers Class
At that point, I was clueless. I had no idea what was going on until... I decided to go down the rabbit hole of expanding the grammar rules.
The new rule I added went like this 'return' a=star_expressions 'if' b=disjunction { _PyAST_ReturnIfExpr(a, b, EXTRA) }
.
My only hypothesis is that a=star_expressions 'if' b=disjunction
is being resolved to the else-less rule I added in the beginning.
By going over the grammar one more time, I figure that my theory holds. star_expressions
will match a=disjunction 'if' b=disjunction { _PyAST_IfExp(b, a, NULL, EXTRA) }
.
The only way to fix this is by getting rid of the star_expressions
. So I change the rule to:
return_stmt[stmt_ty]:
- | 'return' a=star_expressions 'if' b=disjunction { _PyAST_ReturnIfExpr(a, b, EXTRA) }
+ | 'return' a=disjunction guard=guard !'else' { _PyAST_ReturnIfExpr(a, guard, EXTRA) }
| 'return' a=[star_expressions] { _PyAST_Return(a, EXTRA) }
You might be wondering, what is guard
and what is !else
and what is star_expressions
?
This 'guard' is a rule that is part of the pattern matching rules. The new pattern matching feature added in Python 3.10 allows things like this:
match point:
case Point(x, y) if x == y:
print(f"Y=X at {x}")
case Point(x, y):
print(f"Not on the diagonal")
And the rule goes by this:
guard[expr_ty]: 'if' guard=named_expression { guard }
With that, I added one more check. To avoid it failing with SyntaxError
, we need to make sure the rule matches only code like this: return value if cond
. Thus, to prevent code such as return an if cond else b
being matched prematurely, I added a !'else'
to the rule.
Last, but not least, the star_expressions
allow us to return to return destructured iterables. For example:
>>> def f():
...: a = [1, 2]
...: return 0, *a
...:
>>> f()
(0, 1, 2)
In this case, 0, *a
is a tuple, which falls under the category of star_expressions
. The regular if-expression doesn't allow using star_expressions
with it AFAIK, so changing our new return
rule won't be an issue.
Does it work yet?
After fixing the return rule, I regenerate the grammar one more time and compile it.
>>> def f(test):
... return 42 if test
... print('missed return')
... return 21
...
>>> f(False)
missed return
21
>>> f(True)
42
And... IT WORKS!!
Let's check the bytecode then...
>>> import dis
>>> dis.dis(f)
2 0 LOAD_FAST 0 (test)
2 POP_JUMP_IF_FALSE 4 (to 8)
4 LOAD_CONST 1 (42)
6 RETURN_VALUE
3 >> 8 LOAD_GLOBAL 0 (print)
10 LOAD_CONST 2 ('missed return')
12 CALL_FUNCTION 1
14 POP_TOP
4 16 LOAD_CONST 3 (21)
18 RETURN_VALUE
>>>
That's precisely what I wanted. In fact, to make sure, let also see if the AST is the same as the one with regular if.
>>> import ast
>>> print(ast.dump(ast.parse(fc), indent=4))
Module(
body=[
FunctionDef(
name='f',
args=arguments(
posonlyargs=[],
args=[
arg(arg='test')],
kwonlyargs=[],
kw_defaults=[],
defaults=[]),
body=[
If(
test=Name(id='test', ctx=Load()),
body=[
Return(
value=Constant(value=42))],
orelse=[]),
Expr(
value=Call(
func=Name(id='print', ctx=Load()),
args=[
Constant(value='missed return')],
keywords=[])),
Return(
value=Constant(value=21))],
decorator_list=[])],
type_ignores=[])
>>>
And indeed it is!
If(
test=Name(id='test', ctx=Load()),
body=[
Return(
value=Constant(value=42))],
orelse=[]),
This node is the same as the one that would be generated by
if test: return 42
If It's Not Tested, It's Broken?
To conclude this journey, I thought it'd be a good idea to add some unit tests as well. Before writing anything new, I wanted to get an idea of what I had broken.
With the code tested manually, I run all tests using the 'test' module, python -m test -j8
. The -j8
means we'll use 8 processes to run the tests in parallel.
$ ./python -m test -j8
To my surprise, only one test fail! 😱
== Tests result: FAILURE ==
406 tests OK.
1 test failed:
test_grammar
Since I ran all tests, it's hard to navigate on the output so I can run only this one again in isolation.
======================================================================
FAIL: test_listcomps (test.test_grammar.GrammarTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/miguel/projects/cpython/Lib/test/test_grammar.py", line 1732, in test_listcomps
check_syntax_error(self, "[x if y]")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/miguel/projects/cpython/Lib/test/support/__init__.py", line 497, in check_syntax_error
with testcase.assertRaisesRegex(SyntaxError, errtext) as cm:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: SyntaxError not raised
----------------------------------------------------------------------
Ran 76 tests in 0.038s
FAILED (failures=1)
test test_grammar failed
test_grammar failed (1 failure)
== Tests result: FAILURE ==
1 test failed:
test_grammar
1 re-run test:
test_grammar
Total duration: 82 ms
Tests result: FAILURE
And there it is! It expects a syntax error when running a [x if y]
expression. We can safely remove it and re-run the tests again.
== Tests result: SUCCESS ==
1 test OK.
Total duration: 112 ms
Tests result: SUCCESS
Now that everything is OK, it's time to add a few more tests. It's important to test not only the new "else-less if" but also the new return
statement.
By navigating though the test_grammar.py
file we can find a test for pretty much every grammar rule. The first one I look for is test_if_else_expr
. This test doesn't fail, so it only tests for the happy case. To make it more robust we need to add two new tests to check if True
and if False
case.
self.assertEqual((6 < 4 if 0), None)
self.assertEqual((6 < 4 if 1), False)
I run everything again, all tests pass this time.
ps: bool
in Python is subclass of integer, so you can use 1 to denote True
and 0 for False
Ran 76 tests in 0.087s
OK
== Tests result: SUCCESS ==
1 test OK.
Total duration: 174 ms
Tests result: SUCCESS
Lastly, we need the tests for the return
rule. They're defined in the test_return
test. Just like the if expression one, this test pass with no modification.
To test this new use case, I create a function that receives a bool
argument and returns if the argument is true, when it's false, it skips the return, just like the manual tests I have been doing up to this point.
def g4(test):
a = 1
return a if test
a += 1
return a
self.assertEqual(g4(False), 2)
self.assertEqual(g4(True), 1)
Now, save the file and re-run test_grammar
one more time.
----------------------------------------------------------------------
Ran 76 tests in 0.087s
OK
== Tests result: SUCCESS ==
1 test OK.
Total duration: 174 ms
Tests result: SUCCESS
All good in the hood! test_grammar
passes with flying colors and the last thing, just in case, is to re-run the full test suite.
$ ./python -m test -j8
After a while, all tests pass and I'm very happy with the result.
Limitations
If you know Ruby well, by this point you've probably noticed that what I did here is not 100% the same as a conditional modifier. For example, in Ruby you can run actual expressions in these modifiers.
irb(main):002:0> a = 42
irb(main):003:0> a += 1 if false
=> nil
irb(main):004:0> a
=> 42
irb(main):005:0> a += 1 if true
=> 43
I cannot do the same with my implementation.
>>> a = 42
>>> a += 1 if False
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +=: 'int' and 'NoneType'
>>> a += 1 if True
>>> a
43
What this reveals is that the return
rule I created is just a workaround. If I want to make it as close as possible to Ruby's conditional modifier, I'll need to make it work with other statements as well, not just return
.
Nevertheless, this is fine. My goal with this experiment was just to learn more about Python internals and see how would I navigate a little-known code base written in C and make the appropriate changes to it. And I have to admit that I'm pretty happy with the results!
Conclusion
Adding a new syntax inspired by Ruby is a really nice exercise to learn more about the internals of Python. Of course, if I had to convert this as a PR, the core developers would probably find a few shortcomings, as I have already found and described in the previous section. However, since I did this just for fun, I'm very happy with the results.
The source code with all my changes is on my CPython fork under the branch ruby-if-new
.
I hope you've found this post cool and let's see what comes next!
Other posts you may like:
See you next time!