Umar Ahmad

Faster php syntax highlighting with tree-sitter

PHP mode in emacs sadly has slow performance for large files. In my work, I have to sometimes deal with large PHP code bases and in certain cases, it becomes so slow that it is practically useless for me to use php-mode. I grudgingly have to switch to fundamental-mode to make smaller changes, in that case. There are alternatives like web-mode that is better suited for mixed syntax highlighting with files containing multiple web languages but there are other php-mode goodies like support for different coding styles and better indentation that make it worthy enough to not be discarded completely.

I recently decided to deep dive into the performance issues of PHP and found syntax highlighting being the primary reason for slowness. To solve for it I eventually ended up removing the complete syntax highlight code and replacing it with tree-sitter. This made highlighting much faster and reduced the latency considerably in writing larger files.

This is what I ended up doing to achieve that:

(advice-add 'php-syntax-propertize-function :override #'return-false)
(advice-add 'php-syntax-propertize-extend-region :override #'return-false)
(remove-hook 'syntax-propertize-extend-region-functions #'php-syntax-propertize-extend-region)

Here the return-false is a small utility function and is defined as follows:

(defun return-false(&rest _)
  "Return nil no matter what the inputs here.
Useful to override functions to become empty"
  nil)

The tree sitter configuration is as follows

(use-package tree-sitter
  :ensure tree-sitter
  :ensure tree-sitter-langs
  :defer 2
  :config
  (add-hook 'tree-sitter-after-on-hook #'tree-sitter-hl-mode))

For the sake of comparison my typing latency reduced from 600ms-2500ms to 35-50ms. This is closer to fundamental-mode where I get around 20-25ms of typing latency in the same file.

The actual syntax highlighting has slight differences.

Before:

/images/php-highlight-sans-tree-sitter.png

After:

/images/php-highlight-tree-sitter.png

As is evident from the screenshots, the tree-sitter provides slightly more consistent syntax highlighting by correctly highlighting the function names in a single color. A clear drawback is the missing doc-block syntax highlight, which the tree-sitter highlighter has ignored as comment.

It might make sense for php-mode to adopt tree-sitter as the primary syntax highlight system and some effort can be diverted there in order to improve the syntax highlighting from tree-sitter. Integration with tree-sitter might also become a standard for major modes once there is native tree-sitter support in Emacs. But, until that time, using hacks like this would solve for such issues.

#emacs