Categories
Code WordPress

How to fix UTF-8 filename issues when using wp_handle_upload()

PHP said the file didn’t exist. Except it did.

file_exists() wasn’t working for a file uploaded using wp_handle_upload(). It worked for every other file, except for one provided by a German customer.

I confirmed the following:

  • The file did exist in the correct wp-uploads sub-directory
  • The file had correct permissions
  • The directory and its parents had correct permissions

I was stumped.

Then I renamed the file

I converted indlæsning to indlaesning and it worked. The problem was with UTF-8 characters in the filename.

But how can this be? Let’s see where the file name comes from:

  • _wp_handle_upload() calls
  • wp_unique_filename() to generate a unique file name, which calls
  • sanitize_file_name() to prepare the file name
  • sanitize_file_name() then checks against a list of special characters that are not allowed in file names:
$special_chars = array("?", "[", "]", "/", "\\", "=", "", ":", ";", ",", "'", "\"", "&", "$", "#", "*", "(", ")", "|", "~", "`", "!", "{", "}", "%", "+", chr(0));

This list of characters can be modified using the sanitize_file_name_chars filter, but that seemed more complicated than I wanted.

The sanitize_file_name() function also includes a sanitize_file_name filter, which allows you to modify the name of the file after it’s already been sanitized.

Luckily, WordPress already has a function to convert other UTF-8 characters into Latin equivalents: remove_accents(). I used that to convert the filename into something file_exists() could handle.

Here’s how to fix UTF-8 issues with wp_handle_upload()

That converted the filename from: indlæsning.csv to indlaesning.csv. Note that the æ character got converted to ae.

The result? Finally, file_exists().

By Zack Katz

Zack Katz is the founder of GravityKit and TrustedLogin. He lives in Leverett, Massachusetts with his wife Juniper.