php的substr截斷中文會出現截到半個漢字出現亂碼(修正版)

woff 發表於 2011-11-6 13:48:50

現在可以使用mb_substr()函數安全的截取，後來看康盛的uchome，裡面有一個用代碼實現的getstr函數，寫得真好。拿下來了。
tring substr ( string $string , int $start [, int $length ] )
返回string中從start位置開始長度為length的字符串
substr函數在截取字符時是按字節來截取的，中文字符在GB2312編碼時為2個字節，utf-8編碼時為3個字節，所以截取指定長度的字符串時如果截斷了漢字，那麼返回的結果顯示出來便會出現亂碼。

以前的寫法會截到半個漢字function cutstr($string, $length, $dot = ' ...') {
global $charset;
if(strlen($string) <= $length) {
return $string;
}
$string = str_replace(array('&', '"', '<', '>'), array('&', '"', '<', '>'), $string);
$strcut = '';
if(strtolower($charset) == 'utf-8') {
$n = $tn = $noc = 0;
while($n < strlen($string)) {
$t = ord($string[$n]);

if($t == 9 || $t == 10 || (32 <= $t && $t <= 126)) {
$tn = 1; $n++; $noc++;
} elseif(194 <= $t && $t <= 223) {
$tn = 2; $n += 2; $noc += 2;
} elseif(224 <= $t && $t <= 239) {
$tn = 3; $n += 3; $noc += 2;
} elseif(240 <= $t && $t <= 247) {
$tn = 4; $n += 4; $noc += 2;
} elseif(248 <= $t && $t <= 251) {
$tn = 5; $n += 5; $noc += 2;
} elseif($t == 252 || $t == 253) {
$tn = 6; $n += 6; $noc += 2;
} else {
$n++;
}

if($noc >= $length) {
break;
}
}
if($noc > $length) {
$n -= $tn;
}
$strcut = substr($string, 0, $n);
} else {
for($i = 0; $i < $length; $i++) {
$strcut .= ord($string[$i]) > 127 ? $string[$i].$string[++$i] : $string[$i];
}
}
$strcut = str_replace(array('&', '"', '<', '>'), array('&', '"', '<', '>'), $strcut);
return $strcut.$dot;
}解決辦法：
1、改用mb_substr()函數
string mb_substr ( string $str , int $start [, int $length [, string $encoding ]] )
類似substr()函數，只是計數按字符數來計，保證字符安全
使用mb_substr()函數可保證不會出現亂碼，但缺點是長度統計變成了字符數統計，而不是按字節數統計。用於顯示時，同樣長度的中文結果和英文結果會出現較大的顯示長度的差別。
2、來自康盛的substr功能
中文字符按2個長度單位來計算，使得中英文混用環境下字符串截取結果最後的顯示長度接近；捨棄最後一個不完整字符，保證不會出現顯示上的亂碼；且兼容了中文字符常用的utf-8編碼和GB2312編碼，有很好的通用性。

PHP代碼
function getstr($string, $length, $encoding='utf-8') {
$string=trim($string);
if($length&&strlen($string)>$length) {
//截斷字符
$wordscut = '';
if(strtolower($encoding)=='utf-8') {
//utf8編碼
      $n = 0;
      $tn = 0;
      $noc = 0;
      while ($n < strlen($string)) {
      $t = ord($string[$n]);
      if($t == 9 || $t == 10 || (32 <= $t && $t <= 126)) {
      $tn = 1;
      $n++;
      $noc++;
      } elseif(194 <= $t && $t <= 223) {
      $tn = 2;
      $n += 2;
      $noc += 2;
   } elseif(224 <= $t && $t < 239) {
      $tn = 3;
      $n += 3;
      $noc += 2;
   } elseif(240 <= $t && $t <= 247) {
      $tn = 4;
      $n += 4;
      $noc += 2;
   } elseif(248 <= $t && $t <= 251) {
      $tn = 5;
      $n += 5;
      $noc += 2;
   } elseif($t == 252 || $t == 253) {
      $tn = 6;
      $n += 6;
      $noc += 2;
   } else {
      $n++;
   }
   if ($noc >= $length) {
         break;
   }
   }
   if ($noc > $length) {
   $n -= $tn;
   }
$wordscut = substr($string, 0, $n);
      } else {
   for($i = 0; $i < $length - 1; $i++) {
if(ord($string[$i]) > 127) {
         $wordscut .= $string[$i].$string[$i + 1];
         $i++;
      } else {
         $wordscut .= $string[$i];
      }
      }
}
$string = $wordscut;
}
return trim($string);
}
很強大的代碼。

頁: [1]

TShopping's Archiver

php的substr截斷中文會出現截到半個漢字出現亂碼(修正版)