> It’s also pretty safe for Unicode symbols to make them active and add some extra logic to them if you like.

I would add: AFAWK (as far as we know) here.  Also, we do yet have a standard interface for doing this.

More generally, all such actions on the input text must allow for simply keeping the character tokens that were actually input since this information  is needed for output to both internal files such as .aux and for output to other systems that require character data, such as .html and friends and some parts of .pdf files.