> Don't we need support for UCS-2 encoded files?

UCS-2 encoded TeX code is not feasible, since TeX assumes e.g. that \
is a single byte, not a sequence.

> And what about Unicode surrogates (i.e., characters that consist of
> four octets)?

In UTF-8 these are not needed (UTF-8 can encode the whole Unicode
range directly). Think of the surrogates as a relict.

DniQ.