How to fix UTF-8 filename issues when using wp_handle_upload()

PHP said the file didn’t exist. Except it did.

file_exists() wasn’t working for a file uploaded using wp_handle_upload(). It worked for every other file, except for one provided by a German customer.

I confirmed the following:

  • The file did exist in the correct wp-uploads sub-directory
  • The file had correct permissions
  • The directory and its parents had correct permissions

I was stumped.

Then I renamed the file

I converted indlæsning to indlaesning and it worked. The problem was with UTF-8 characters in the filename.

But how can this be? Let’s see where the file name comes from:

  • _wp_handle_upload() calls
  • wp_unique_filename() to generate a unique file name, which calls
  • sanitize_file_name() to prepare the file name
  • sanitize_file_name() then checks against a list of special characters that are not allowed in file names:
$special_chars = array("?", "[", "]", "/", "\\", "=", "", ":", ";", ",", "'", "\"", "&", "$", "#", "*", "(", ")", "|", "~", "`", "!", "{", "}", "%", "+", chr(0));

This list of characters can be modified using the sanitize_file_name_chars filter, but that seemed more complicated than I wanted.

The sanitize_file_name() function also includes a sanitize_file_name filter, which allows you to modify the name of the file after it’s already been sanitized.

Luckily, WordPress already has a function to convert other UTF-8 characters into Latin equivalents: remove_accents(). I used that to convert the filename into something file_exists() could handle.

Here’s how to fix UTF-8 issues with wp_handle_upload()

That converted the filename from: indlæsning.csv to indlaesning.csv. Note that the æ character got converted to ae.

The result? Finally, file_exists().

Author: Zack Katz

Zack Katz is the President of Katz Web Services and the developer of WordPress plugins with over 700,000 downloads. He lives in Southwest Colorado with his wife and two cats.

Leave a Reply

Your email address will not be published. Required fields are marked *