S2 string operations are operating on bytes instead of characters
Short, concise description of the idea
S2 string functions, such as substr() and length(), should count in characters, not in bytes.
Full description of the idea
For an example and further details, see the original discussion over at the s2styles community. The main issue is that counting in bytes causes UTF8 encoded strings to behave strangely.
- It is what most people would expect such functions to do.
- Makes the internationalisation work more smoothly.
An ordered list of problems/issues involved
- Code using the byte counting functionality would break.
- Code with workarounds for this 'feature' would probably break too.
An organized list, or a few short paragraphs detailing suggestions for implementation
- My prefered solution is to just change the functioning of those functions.
- An alternative is to create new duplicate string functions that count in characters, leaving the original functions that count in bytes.
- Another alternative is to add an optional parameter to the string functions that tells it work in characters.