themodernscientist

data scientist, biophysicist, mac-unix zealot, pythonista

Easy Syntax Highlighting With Pygments

As a scientist who splits her time between both wet lab and computational work, I have learned the importance of saving code snippets for future reference. These snippets are handy when I later can't recall a clever Pandas indexing trick or when I want to share code with a colleague.

I spend most of my time writing code in text editors1 that have syntax highlighting and I prefer my snippets similarly highlighted. While there are many syntax highlighting engines and websites in existence, one of the most powerful engines happens to be Pygments, which is python-based, as the name implies.

Pygments supports an incredible list of input languages, called lexers, and has numerous formatters for output. There are also many options available for output colors and styles. All of these aspects are customizable as well. Lastly, Pygments has few dependencies and can be installed using most Python package managers.

While Pygments does include many styles, including my favorite, Monokai, it doesn't have a version of Monokai with a light background. For storing and sharing snippets, I prefer a light background to a dark one. So, I created my own Monokai Light style based on the color scheme here. The style I created can be downloaded directly from here and placed in the style directory within Pygments, or it can be installed using either this conda package or with MacPorts by cloning this GitHub repository.

Pygments can be run from within a python script or from the terminal using a helper command called pygmentize. Both methods are demonstrated for comparison.

Input the Python Snippet

The code snippet to be highlighted will usually reside in a file or on the clipboard. For the purpose of a self-contained tutorial, it will be input as a string. Since this is predominantly a python blog, it is easy to guess which language is being used.

In [1]:
python_snippet = r"""
#!/usr/bin/env python

import numpy as np

A = np.random.random((4,4))
Ainv = np.linalg.inv(A)

print 'A ', A
print 'Ainv', Ainv
print 'A*Ainv', np.dot(A,Ainv)"""

Syntax Highlighting in Python

Pygments is a python module and can be used in the usual fashion to perform syntax highlighting within a python script. This requires the highlighter, lexer, and formatter to be imported. Several formatter options must also be set. First, full=True is required to output html with a complete header. Syntax highlighting won't work without this option. Second, the selected style is specified with style='monokailight'.

In [2]:
from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter

html_snippet = highlight(python_snippet, PythonLexer(),
                         HtmlFormatter(full=True, style='monokailight'))

The style names of the snippet conflict with those used by this blog, so a regular expression substitution is needed to modify the style names of the snippet. This subsitution should not be necessary in most other use cases.

In [3]:
import re
html_snippet_sub = re.sub(r"""(body \.|class=")""", r"""\1pygm""",html_snippet)

Normally, the output would be saved to a file or sent to the clipboard. For this tutorial, it will be displayed using the HTML command, which is one of IPython's powerful display options.

In [4]:
from IPython.core.display import HTML
HTML(html_snippet_sub)
Out[4]:

#!/usr/bin/env python

import numpy as np

A = np.random.random((4,4))
Ainv = np.linalg.inv(A)

print 'A ', A
print 'Ainv', Ainv
print 'A*Ainv', np.dot(A,Ainv)

Highlighting in the Terminal Using Pygmentize

The most common way I used Pygments is from the terminal via pygmentize, which is an accessory command to Pygments. Here is an example with options similar to those used in the python code above:

pygmentize -l python -f html -O full,style=monokailight \
           -o ./python_snippet.html ./python_snippet.py

This command will read the file python_snippet.py and output to the file whose name follows the -o flag, which is python_snippet.html in this case. If the output file is omitted, the html result is returned to the terminal as output. The language (lexer), output format, and options are set with the -l, -f, and -O flags, respectively. The options are full, which is analogous to the python full=True option and the style selection of monokailight. In many cases, pygmentize can guess lexer and output name from the input and output file extensions, so the -l and -f flags are not always necessary.

On Mac OS X, code snippets can be copied to the clipboard, sent to pygmentize, and then the syntax highlighted result returned to the clipboard using the pbpaste and pbcopy commands.

pbpaste | pygmentize -l python -f html -O full,style=monokailight | pbcopy

Customizing Output: Adding Line Numbers

There are various options for altering the appearance of the output. Line numbers are one common option, and they can be added with linenos=True.

In [5]:
html_snippet_linenos = highlight(python_snippet, PythonLexer(), 
                                 HtmlFormatter(full=True, style='monokailight', linenos=True))

Alternatively, line numbers can be included in pygmentize output by adding the option linenos=1.

pygmentize -l python -f html -O full,style=monokailight,linenos=1 \
-o ./python_snippet_linenos.html ./python_snippet.py

Here is the result with line numbers after again making the necessary modifications to avoid conflicts with styles associated with this blog.

In [6]:
html_snippet_linenos_sub = re.sub(r"""(body \.|class=")""", 
                                  r"""\1pygm""", html_snippet_linenos)
HTML(html_snippet_linenos_sub)
Out[6]:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#!/usr/bin/env python

import numpy as np

A = np.random.random((4,4))
Ainv = np.linalg.inv(A)

print 'A ', A
print 'Ainv', Ainv
print 'A*Ainv', np.dot(A,Ainv)

That's it! Pygments makes it extremely easy to create syntax highlighted code snippets.

This post was written in an IPython notebook, which can be downloaded here, or viewed statically here.


  1. Vim and Sublime Text really are the only editors worth using.