CodeBytes: Transform your Python code with functional programming

While I was writing code for my "markdown2gist" project (a small Python script that extracts code blocks from Markdown files and uploads them as a Github Gist), I had to write a function that extracts the code blocks from the contents of the markdown file.

The function would accept the markdown file content as a list of strings. The function would then return the code blocks as a list, containing a list of strings (each code block is a list of strings, and I would want a separation of each snippet). All this results in the following function:

https://gist.github.com/lyubolp/de37154285a87eff85a5249aaee229a9

My initial idea for the implementation was the following: Loop through each line. If the line contains the markdown symbols for a code block "```", the function would check the boolean variable signaling if a code block snippet has started. If it code block has not started, we set the variable to True, and save the current index. If the code block has started (meaning that the code block ends at the current line), I would append the lines between the previously saved index and the current line into the result list and set the has_code_block_started variable to False.

A bit confusing. Let's look at the code:

https://gist.github.com/lyubolp/169d3fdcb488ecfa9952f8474e8403d0

The code looks simpler than my explanation, but still, it's not that easy to read and/or debug. The implementation relies on mutating variables and flags, which alter the flow of the code. Having a lot of moving parts increases the complexity, which makes our code harder to read, harder to test, and harder to debug.

But what does functional programming have to do with this? Well, the main complexity comes from the approach - loop through the lines set the flags to the proper values, and change action based on that flag. If we rewrite this function to eliminate the side effects and the mutating variables, we can improve the readability, testability, and debugability.

Leaving the code aside, another approach to solve this problem would be - let's gather the indices where a code block starts or ends (so any line containing "```"). From there, we can pair up the indices - if we have a list of [1, 2, 3, 4], we should pair it the following way: [(1, 2), (3, 4)]. Once we have that, we can use slicing to take the lines we care about (we know where each segment starts and ends).

How do we do that in Python then?

To take the indices where a code block starts or ends, we can use the following list comprehension - [i for i, line in enumerate(content) if MARKDOWN_CODE_SNIPPET_SYNTAX in line]. Here we filter out the lines containing the '```' symbols and return their indices.
To pair up the indices, we can use some more functional magic:
- The first part of the magic is the zip function - it takes two (or more) collections and pairs the n-th elements of each collection together. As an example, if we have two lists a = [1, 2, 3] and b = ['a', 'b', 'c'], zip(a, b) would give us the list [(1, 'a'), (2, 'b'), (3, 'c')] (Okay, not exactly true - zip returns a generator, so if we want the list, we need to do list(zip(a, b))).
- The second part, is how do we get the two lists that we will pass to the zip? We need a list containing only the 1st, 3rd, 5th, etc. items (so indices 0, 2, 4, etc. - the even indices), and another list containing the 2nd, 4th, etc. items (so the odd-numbered indices). For this, we can use slicing. As we know, a slice is built from the values - the start, the end, and the step. For both of our lists, the step will be 2. The only difference is the start - to get the even-numbered indices, we will start from 0, and for the odd-numbered, we will start from 1.
- Putting both things together, we can pair the indices in the required way by using this one-liner - zip(code_block_indexes[::2], code_block_indexes[1::2])
Now, we need to transform our pair of start/end indices into lists of lines - for each start, end pair, we need the slice starting at start and ending at end - [content[start:end+1] for start, end in code_block_bounds]

Putting it all together, our refactored extract_code_blocks will look something like this:

https://gist.github.com/lyubolp/bfb0be58ff80585155c2fe0c28e4493b

Much better. This code is easier to read, easier to test, and easier to debug.

I hope you found this CodeByte useful! If you did, make sure to leave a like. If you want to read more CodeBytes, you can find them here. If you want to read more about functional programming in Python, click here. If you want to read something longer, you can take a look here. As always, happy coding!