One of the fun parts of the ESP32-S3 microcontroller is that it got upgraded to the newer Cadence Xtensa LX7 processor core, which turns out to have a range of SIMD instructions that can help to significantly speed up a range of tasks. [Shranav Palakurthi] recently used this to speed up the processing of video frames to detect corners using the FAST method. By moving some operations that benefit from SIMD over to an optimized version written in LX7 ASM, the algorithm’s throughput was increased by 220%, from 5.1 MP/s to 11.2 MP/s, albeit with some caveats.
The problem with the SIMD instructions in the LX7 other than them being very poorly documented – unless you sign an NDA with Cadence – is that it misses many instructions that would be really useful. For [Shranav] the lack of support for direct misaligned reads and comparing of unsigned 8-bit numbers were hurdles, but could be worked around, with the results available on GitHub.
Much of the groundwork for this SIMD implementation was laid by [Larry Bank], who reverse-engineered the SIMD instructions from available documentation and code samples, finding that the ESP32-S3 misses quite a few common SIMD instructions, including various shifts and unaligned reads and writes. Still, it’s good enough for quite a few tasks, as long as you can make it work with the available instructions.