id summary reporter owner description type status priority milestone component version resolution keywords cc blockedby blocking 1371 Probleme nach Update bei Verwendung von HTML-Komprimierung und MS Word Artikelbeschreibungen Torsten Riemer somebody "Nach Update von 1.0x nach 2.x kann es durch die neue ""/includes/external/compactor/compactor.php"" teilweise zu weißen Artikeldetail-Seiten kommen. Schuld ist hier der Regex in Zeile 261: {{{ $html = preg_replace('//', '', $html); }}} Kommentiert man diesen aus, dann kommt es auch bei Verwendung von MS-Word Artikelbeschreibungen nicht zu weißen Seiten, so unschön diese MS-Word XML-Markups auch sein mögen. Ich hatte mir aus diesem Grund mal einen Smarty Modifier gebastelt aus folgender Funktion: {{{ ^{_{')
{
mb_regex_encoding('UTF-8');
//replace MS special characters first
$search = array('/‘/u', '/’/u', '/“/u', '/”/u', '/—/u');
$replace = array('\'', '\'', '""', '""', '-');
$text = preg_replace($search, $replace, $text);
//make sure _all_ html entities are converted to the plain ascii equivalents - it appears
//in some MS headers, some html entities are encoded and some aren't
$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
//try to strip out any C style comments first, since these, embedded in html comments, seem to
//prevent strip_tags from removing html comments (MS Word introduced combination)
if(mb_stripos($text, '/*') !== FALSE){
$text = mb_eregi_replace('#/\*.*?\*/#s', '', $text, 'm');
}
//introduce a space into any arithmetic expressions that could be caught by strip_tags so that they won't be
//'<1' becomes '< 1'(note: somewhat application specific)
$text = preg_replace(array('/<([0-9]+)/'), array('< $1'), $text);
$text = strip_tags($text, $allowed_tags);
//eliminate extraneous whitespace from start and end of line, or anywhere there are two or more spaces, convert it to one
$text = preg_replace(array('/^\s\s+/', '/\s\s+$/', '/\s\s+/u'), array('', '', ' '), $text);
//strip out inline css and simplify style tags
$search = array('#<(strong|b)[^>]*>(.*?)#isu', '#<(em|i)[^>]*>(.*?)#isu', '#]*>(.*?)#isu');
$replace = array('$2', '$2', '$1');
$text = preg_replace($search, $replace, $text);
//on some of the ?newer MS Word exports, where you get conditionals of the form 'if gte mso 9', etc., it appears
//that whatever is in one of the html comments prevents strip_tags from eradicating the html comment that contains
//some MS Style Definitions - this last bit gets rid of any leftover comments */
$num_matches = preg_match_all(""/\/isu', '', $text);
$text = preg_replace('//isu', '
', $text);
return $text;
}
}}}
Testweise könnte man mal die Zeile 261 in der ""/includes/external/compactor/compactor.php"":
{{{
$html = preg_replace('//', '', $html);
}}}
ersetzen mit:
{{{
$html = preg_replace('//isu', '', $html);
}}}
Aber generell bin ich dann doch dafür, dass wir mal einen vernünftigen MS-Word Filter optional zur Verfügung stellen." Bug/Fehler new hoch modified-shop-2.0.4.0 Shop 2.0.3.0}}