Developers’ Weblog

Sponsored by
HostEurope Logo

Developers’ Weblog

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

FS — field separator?

Tags: bug

I’ve been using “a Unicode (and ASCII) field separator” for my SSV flavour of CSV. I thought I should be using the FS control character (considering “FS”, according to much documentation, is a field separator).

Turns out most Unicode control characters have shitty official names and/or acronyms/abbreviations… such as…

  • PLD: partial line forward (not: partial line down)
  • SPA: start of guarded area (not: start of protected area)
  • VTS: line tabulation set (not: vertical tabulation set)
  • DC1: device control one (not: XON)
  • RI: reverse line feed (not: reverse index)
  • NP: form feed (probably for “new page”)
  • NL: line feed (not newline, but we weren’t expecting that either, as an ASCII newline is CR+LF plus Unicode C1 has NEL (next line)…
  • Adding insult to injury, U+0080, U+0081, U+0084 and U+0099 do not even have a name (but Unicode “name aliases” which include an acronym (which (of course) WTF knows about) and at least a longer name.

… and so forth. There’s separators, too!

  • FS: [U+001C] [␜] INFORMATION SEPARATOR FOUR [file separator]
  • GS: [U+001D] [␝] INFORMATION SEPARATOR THREE [group separator]
  • RS: [U+001E] [␞] INFORMATION SEPARATOR TWO [record separator]
  • US: [U+001F] [␟] INFORMATION SEPARATOR ONE [unit separator]

And guess what… ASCII and Unicode FS is file separator (US is field separator). Oops. Sorry.

So… I guess when I use SSV next I’ll update (change in an incompatible way) the spec. Again, sorry about that.

It’s only in another 48 minutes but enjoy the Solstice! Blessed be!

MirBSD Logo