I recently came across a weird bug in a project I was working on, and the tl;dr of it is I had U+FEFF in a text string, more commonly known as zero width no-break space or a Byte Order Mark. It was messing with a string.blank? check.

It all started when I had some blank text and it was failing the string.blank? check. For those of you who don't know, in Ruby on Rails, the object.blank? is basically a .nil? and an .empty? wrapped all into one... or so I though. I went into the database where my string lived, and copied it out.

puts "   \n   \n".blank?

My output to my surprise was false. Scratching my head, I made some additional test cases to see how blank? behaved:

puts "       ".blank?
puts "\n\n\n\n\n\n".blank?

And to my expectation all of these returned true. Now I was really puzzled. I took a look at the string in hex form, and thats when I found the culprit. The U+FEFF character got in there somehow and my otherwise blank string was now returning false. The string was actually \uFEFF \n \n"! A-ha!

Since nbsp or a Non-breaking space maps to U+00A0, it is considered whitespace and captured by the empty regex. I checked a bunch of other UTF non-breaking spaces, such as U+202F (Narrow no-break space) and U+2007 (Figure space), and decided I was just going to extend/wrap blank? to also cover my invisible BOM case.

I originally took the naive approach of just deleting the BOM character from the string and performing the check again, which resulted in this monstrosity.

def blank_or_invisible?(string)
  return string.blank? || string.delete(/\uFEFF/).blank?
end

Running some perf tests on the above yielded some gross results:

puts Benchmark.measure {
  50_000.times do
    blank_or_invisible?("\uFEFF\uFEFF\uFEFF\uFEFF \n\n")
  end
}

puts "blank?"
puts Benchmark.measure {
  50_000.times do
    "\uFEFF\uFEFF\uFEFF\uFEFF \n\n".blank?
  end
}
  0.126713   0.000700   0.127413 (  0.127421)
blank?
  0.007405   0.000004   0.007409 (  0.007422)

That is roughly 15-17x slower! How terrible! I decided to then look into how obj.blank? was actually implemented. It seems that for many of the built-in types, it was hard-coded (NilClass would return true, for example), but inside of the String class is where the magic happens. It's simply the following function (newer versions of ActiveSupport have encoding exception wrappers):

class String
  BLANK_RE = /\A[[:space:]]*\z/
    def blank?
    empty? || BLANK_RE.match?(self)
    end
end

I didn't want to mess with how .blank? was working across my app for this one use-case, so I wrote a simple function that emulated this behavior with a different regex expression.

BLANK_RE2 = /\A[[:space:]\uFEFF]*\z/.freeze
def blank_or_invisible?(string)
  return string.blank? || BLANK_RE2.match?(string)
end

Running the performance numbers again got me:

  0.019259   0.000166   0.019425 (  0.019426)
blank?
  0.007405   0.000004   0.007409 (  0.007422)

Roughly 2-3x the performance of .blank?... much better.