Validating UTF-8 In Less Than One Instruction Per Byte
Created on 2021-10-26T00:20:59-05:00
- Testing a range of ASCII chars by loading them in to SIMD instruction, AND'ing by 0x8080(cont'd) to see if anything in the range has a header flag.
- Folding two words of ASCII by OR'ing two eight byte sections then AND'ing the remaining word by 0x8080(cont'd)
- Cramming validation in to a state machine
- Taking half of a byte from each byte in a two byte set then looking the pattern up in a dictionary to see if it matches a known probable error pattern.