I'm in the middle of a project that needs to create URLs from the titles the users define, same for example as blog posts URLs coming from their titles.
This is the typical problem a programmer can easily underestimate and I was close to. At the end of the day you just have to substitute non-English characters, substitute spaces with hyphens and remove symbols and your are good to go, right? Yeah, you wish.
I didn't even try to do it myself, just Googled a little bit and came up with a couple of very easy regexp PHP functions to do the job.
Too easy. They were doing the job, but my arachnid sense was telling me something wasn't quite working.
That's when
Wordpress came to my mind. I thought nobody better than one of the top blogging platforms to solve the problem. They sure must deal with people from all over the world and they sure must be doing a good job creating URLs from titles. And guess what? WP is Open Source, you can go there and
steal not reinvent the wheel reuse their code. I'm not going to copy here the functions (+200 lines of code), they are in
wp-include/formatting.php and are "remove_accents", "seems_utf8", "utf8_uri_encode" and some other code. So,
thanks very much again to the Wordpress guys for sharing, Open Source rocks.
And this got me thinking about the HUGE amount of time and the HUGE pain in the arse handling different languages is. I was talking to Martin Kleppmann (a fellow developer here in Cambridge) about it and it turns out he just blogged about it:
i18n and social web: We still haven’t figured it out. I quote:
i18n has been an issue which software engineers have loved to ignore, because (a) it’s difficult, (b) it’s not cool, and (c) if you’re in North America, you can find enough customers in North America for the first few years, so there isn’t a strong business requirement to work internationally.
Amen. I wonder if whoever created "the computers" 50 or 60 years ago would have thought about the problem back then things would be much easier now. This is the classic example of "
let future me take care of a future problem" or "
this is not a problem now, go on with it and we'll figure it out later". Well, fast forward 60 years and here we are. Guess what again? Yes, the problem still sucks.
And if you have an application and you think you've done your work regarding i18n, give it a go with Turkish, apparently is the final test:
What's Wrong With Turkey?