This function will convert any sort of curly or smart quotation marks with regular straight quotation marks.

The most common scenario comes from a Microsoft Word document that uses quotes, as Microsoft Word will automatically convert any quotation marks to an encoded format which makes the quotes appear more curly. When used as-is in web pages, the curly quotes will often face encoding issues where the characters are not displayed properly, and instead little boxes or question marks show.

What most people don't know is that you can disable smart quotes in Microsoft Word. However, this convertCurlyQuotes function also solves this problem by mapping all types of double quotes and single quotes to " and '.

/**
 * Convert curly/smart quotes to regular quotes
 *
 * This function will convert Windows-1252, CP-1252, and other UTF-8 single and double quotes to regular quotes,
 * otherwise known as Unicode character U+0022 quotion mark (") and U+0027 apostrophe (') which typically do not have
 * any sort of encoding issues that the others run into.
 *
 * @param string $text The text that contains curly quotes
 * @return string Normalized text using regular quotes
 */
function convertCurlyQuotes($text): string
{
    $quoteMapping = [
        // U+0082⇒U+201A single low-9 quotation mark
        "\xC2\x82"     => "'",

        // U+0084⇒U+201E double low-9 quotation mark
        "\xC2\x84"     => '"',

        // U+008B⇒U+2039 single left-pointing angle quotation mark
        "\xC2\x8B"     => "'",

        // U+0091⇒U+2018 left single quotation mark
        "\xC2\x91"     => "'",

        // U+0092⇒U+2019 right single quotation mark
        "\xC2\x92"     => "'",

        // U+0093⇒U+201C left double quotation mark
        "\xC2\x93"     => '"',

        // U+0094⇒U+201D right double quotation mark
        "\xC2\x94"     => '"',

        // U+009B⇒U+203A single right-pointing angle quotation mark
        "\xC2\x9B"     => "'",

        // U+00AB left-pointing double angle quotation mark
        "\xC2\xAB"     => '"',

        // U+00BB right-pointing double angle quotation mark
        "\xC2\xBB"     => '"',

        // U+2018 left single quotation mark
        "\xE2\x80\x98" => "'",

        // U+2019 right single quotation mark
        "\xE2\x80\x99" => "'",

        // U+201A single low-9 quotation mark
        "\xE2\x80\x9A" => "'",

        // U+201B single high-reversed-9 quotation mark
        "\xE2\x80\x9B" => "'",

        // U+201C left double quotation mark
        "\xE2\x80\x9C" => '"',

        // U+201D right double quotation mark
        "\xE2\x80\x9D" => '"',

        // U+201E double low-9 quotation mark
        "\xE2\x80\x9E" => '"',

        // U+201F double high-reversed-9 quotation mark
        "\xE2\x80\x9F" => '"',

        // U+2039 single left-pointing angle quotation mark
        "\xE2\x80\xB9" => "'",

        // U+203A single right-pointing angle quotation mark
        "\xE2\x80\xBA" => "'",

        // HTML left double quote
        "“"      => '"',

        // HTML right double quote
        "”"      => '"',

        // HTML left sinqle quote
        "‘"      => "'",

        // HTML right single quote
        "’"      => "'",
    ];

    return strtr(html_entity_decode($text, ENT_QUOTES, "UTF-8"), $quoteMapping);
}

This code snippet was published on It was last edited on

0

2 Comments

  • Votes
  • Oldest
  • Latest
NA
367 8
Commented
Updated

This actually came in handy just now! Thanks!

add a comment
0
Commented
Updated

Thanks for publishing this code snippet. I had a situation where my legacy code was failing when it appeared that it should be succeeding and the issue turned out to be a case of receiving "curly quotes" instead of vanilla ASCII quotes. I hate dealing with Unicode in any case, but your function saved me a ton of work in having to cover all of those bases (including some I probably would not have thought of).

You rock!

add a comment
1