Managing UTF8-Strings in PHP

PHP has an god multibyte support. But some functions are missing which I need often when dealing with UTF8 strings.

1. Testing if a string is UTF8 encoded

This function uses the good capability of the build in PHP function mb_detect_encoding() to “guess” the correct character encoding of an multibyte-string and transforms it in a boolean TRUE vs. FALSE.

  1. function is_utf8($str){
  2.       $ret = false;
  3.       if (mb_detect_encoding($str, 'auto', true) == 'UTF-8'){
  4.             $ret = true;
  5.       }
  6.       return $ret;
  7. }
function is_utf8($str){
      $ret = false;
      if (mb_detect_encoding($str, 'auto', true) == 'UTF-8'){
            $ret = true;
      }
      return $ret;
}

2. Transform a string to UTF8 encoding

This function first checks whether a string is already UTF8 oremb_detect_encoding()returning an empty string. It is mostly the case on strings with encoding mix which get broken in the conversion process.

Before conversion PHP is told to discard unsupported characters instead of printing a “?” via theini_set('mbstring.substitute_character', 'none') AND via the//IGNOREdirective. In principle on of both should be enough but it works for me in more cases if both set together.

TheTRANSLITdirective enshures that when a character can’t be represented in UTF8, it can be approximated through one or several similarly looking characters.

  1. function to_utf8($str){
  2.       $ret = $str;
  3.  
  4.       $enc = mb_detect_encoding($str, 'auto', true);
  5.       if($enc != 'UTF-8' && $enc != ''){
  6.             ini_set('mbstring.substitute_character', 'none');
  7.             $ret = iconv($enc, 'UTF-8//TRANSLIT//IGNORE', $str);
  8.       }
  9.       return $ret;
  10. }
function to_utf8($str){
      $ret = $str;

      $enc = mb_detect_encoding($str, 'auto', true);
      if($enc != 'UTF-8' && $enc != ''){
            ini_set('mbstring.substitute_character', 'none');
            $ret = iconv($enc, 'UTF-8//TRANSLIT//IGNORE', $str);
      }
      return $ret;
}

You are free to use my code samples if you respect this small license.