Note: Somewhere in this post it says "know your audience", so let's try that: If you don't do software or work in IT, you may find this post of very little use, although "The Great Timezone Panic of 2011" is an entertaining ditty on misunderstood copyright or - depending on your view of the world - attempted abuse.
Notes on Date, Time & Time Zones and the Great Time Zone Panic of 2011
Processing of date, time and time zone data are fundamental operations across most software products. The importance of datetime and timezone handling depends very much on the purpose of the given product. The algorithms and data used are largely dependent of the programming stack plus additional libraries or external data. Calendar systems organize date and time for "social, religious, commercial, or administrative purposes" (http://en.wikipedia.org/wiki/Calendar).
In this document, calendar systems are treated as "materialized views" of date and time. Only minimal attention is given to the display and data entering considerations for user interface artifacts that represent calendars and calendar-based operations, such as scheduling.
Definitions
Day: 24 hour day (see en.wikipedia.org/wiki/Day)
Time: time of day.
Time zone: region of the earth that keeps the same time (for a map, see this time zone map).
Legal time zone or Legal Entity time zone: time zone used in trading or time-sensitive business transactions that span more than one time zone.
View locale: think of this as the text and its layout in a program window, following the rules of a certain locale (or just a language). Available in some of the fancier development frameworks.
Formatting locale: the formatting of certain bits and pieces on the screen in accordance with the rules for a certain locale.
You can use a view locale that is different from a formatting locale and, magically, show, for example, text in English but dates in French. Available in some of the fancier development frameworks.
Don't forget that AM/PM may be returned by a programming language (Java, right) despite the fact that the desired locale does not use am/pm at all.
What's in a timezone name
There are a number of ways to "name" a time zone.
Legal name: in the United States, the legal names are defined in the U.S. Code, Title 15, Chapter 6, Subchapter IX - Standard Time , for example, "The standard time of the first zone shall be known and designated as Atlantic standard time; that of the second zone shall be known and designated as eastern standard time".
In Canada, "Time Zones and Daylight Saving Time usually have been regulated by provincial and territorial governments. Starting in 2007, clocks following the new North American standard for Daylight Saving Time are to be turned forward by one hour on the second Sunday in March and turned back on the first Sunday of November." National Research Council Canada). NOTE: This is national naming (with some transnational elements).
Other countries do the same, that is, national laws and sometimes transnational agreements determine the timezone and the official name (for example, German time law). NOTE: This is unique national naming (with some transnational elements).
Common name: many time zones have common names, often shortened versions of legal names. For example, in the U.S., "Pacific Time" or "Eastern Time". These names are unique for a region but not in terms of actual time (no indication of standard or DST).
Letters: the Admiralty of the United Kingdom publishes the timezone that uses letters as names, for example, "Z", "A", "B", etc. (Timezone map). NOTE: This is a transnational naming.
tz database: (or Olson, or zoneinfo database); Wikipedia definition: "Within the zoneinfo database, a time zone is any national or sub-national region where local clocks have all agreed since 1970".
The tz database is not authoritative and turns out to be the main cause for problematic practices in the software industry. It is neither unique nor standardized.
Offset based: this is the practice of using the UTC value with an offset, for example, 2008-10-05Z-4:00 NOTE: This is a unique transnational naming.
Recommendation: when referring to a time zone use "name" only for official, legal names. Use "display name" or "id" in all other cases.
Problems with the Olson tz implementation
The tz database is a collection of entries by different people using different naming conventions for almost anything. The main problem we can find with tz based timezone libraries is bad naming.
For example, some database systems and other software allow either a correct zone setting (add or subtract a number of hours) or set the time zone implicitly as a "location within a timezone".
SET TIME_ZONE='-05:00';
SET TIME_ZONE='Europe/London';
The second one is awful: "Europe/London" is not a timezone
"Europe/London" - and all other entries like it - is a geographic location that can be associated with a timezone offset to identify which timezone the city of London belongs to. Without an offset, an id like this cannot be correctly mapped to a timezone "name". Internal use in software is fine, but exposing these to any end user is not a good idea.
The CLDR, Java and .NET
Unicode.org appears to have recognized the implicit naming problems of the tz. The have chosen "zone" with the attribute "type" for IDs of the kind "Africa/Algiers" and "metazone" for actual timezone names and common abbreviations. The CLDR is at this location: http://cldr.unicode.org/.
The CLDR also contains translations for timezone names and identifiers.
Recommendation: use CLDR translations to get translated ids and zone names for your product if you read timezone ids and names from a file.
Java terminology is as follows: TimeZone.getAvailableIDs() returns the list that includes the continent/city values. The method getDisplayName() returns legal names and offset based values.
For a standard timezone, .NET will show a timezone "name" that matches the displayNames value used in Java. Use .StandardName and .DaylightName to get the respective names.
UTC and GMT
Coordinated Universal Time (UTC) is the civil standard time that replaced Greenwich Mean Time (GMT). In everyday usage, GMT continues to be used as the "name". Just listen to the BBC News once to understand.
Issue: end-user is faced with offset displayed with label "UTC", for example, "UTC+1"
Solution: use only "GMT" in end-user ui, for example, "GMT + 1". Unless, of course, you are coding software for scientific users.
User interface requirements
End-user date and time: display date and time in accordance with the end-user expectations. If the ui is not translated, it should appear in English but date, time and possibly other items should be formatted in accordance with the locale of the user.
Timezone selection lists: you can use a "main cities" list plus offset (which is what MS Windows does); or you can use "standard names". The least preferable option is to use ids (Olson tz ids, getAvailableIDs) plus an offset.
Calendar display, selection, entry: support for the default Gregorian calendar is obvious. The display should take into account the "official" first day of the week. ISO 8601 calls for Monday as the first day of the week. Note that (usually) printed calendars in many countries still have different first days, and calls to standard libraries, for example, Java, are not guaranteed to return "Monday". See The week for more information.
AM/PM: never assume that the existence of an "am"/"pm" value in a standard library means that the datetime format used in a country/region uses the AM/PM format. Java, for example, lets you retrieve "AM/PM strings" but they are just that: strings.
Date (time) formats: in many regions, there are multiple usable date (time) formats, not just a short, medium, or long. For example, on official U.S. government forms, you may find the date format to be day-month-year.
Non-gregorian calendars: The website Frequently Asked Questions about Calendars is a great resource for calendar information. The nuts and bolts around Java 6 calendar support are at Supported Calendars. For .NET, see System.Globalization Namespace.
Persistence and transfer requirements
UTC: store date and time values as UTC values, with ISO 8601 as the preferred format.
Issue: what about the timezone and daylight saving time?
Solution: you can use a UTC value (for example, 1998-12-01T12:03Z) and separately store the offset and a timezone name or even a daylight saving flag. Alternatively, you can use the UTC with offset 1953-08-02T18:50:00+04:30 and add a timezone name and a daylight saving flag if needed.
Duration/intervals: ISO 8601 has a mechanism for this. See, for example, time interval
Domain-specific standards: do not forget to check for domain-specific requirements around date, time, and time zone information, for example, standard email datetime formats.
Interoperability: if your application/component receives time zone data from other products or external libraries, be aware that you may need to handle tz ids.
Translation considerations
For date, times, time zones and related text (long day of the week, short day, month name, etc.) the best practice is: never send lists to translation. The only exception is when you plan to use the CLDR values. But these can be extracted and translators never need to work on them. You cannot expect unique translations for geographic locations: what is the "name" of a given Chinese city in English?
Think of time zone ids, "names", "names" of weekdays, etc. only as labels. And then remember that you never base programming logic on string matching operations against labels.
MM/DD/YYYY - pattern considerations
This section provides a slightly different take on patterns than many other practitioners have.
Claim: For any date/time pattern considerations within (mainly) gregorian systems, we are dealing with "mechanics".
Reason: Many man-years have been spent on this, and there are really only two stumbling blocks in general patterns. One is that pattern parsing may incorrectly assume that pattern chars like m, d, h, etc. exist only in the pattern. The second is that single quotes do not exist as literals in datetime formatted string. A common manifestation of this is incorrect handling of, say, Portuguese, where we can find that the expanded pattern contains both "d" and a single quote. Tip: If you use regular expressions to populate patterns characters, especially those with the 'd', change the pattern chars to something never found in a date, some odd Unicode chars, then populate and put the "d" back in.
Claim: Yes, you can display pattern strings to show a user what to enter.
Reason: You can for example, tell a German user that the required time format is: hh:mm:ss
This is possible despite the fact that any dictionary you may consult will tell you that "hour" == "Stunde" in German. If you continue to look, you will then find that the scientific or mathematical abbreviation commonly used by German speakers is "h".
Conclusion: it really is more mechanical than you might think, and: know your audience.
Other Calendars
You can have lots of fun with other calendars, such as the Jewish and the Islamic calendar.
If you need to code one of these, good luck.
The Great Timezone Panic of 2011
The “tz” authors/maintainers
Olson and Eggert faced a copyright infringement lawsuit by Astrolabe.
There had been rumors in the industry about “the tz going away soon”,
but the bosses kept mum. I read the paperwork and promptly sent a nice
email to their lawyer but not the kind you might expect. Instead of
focusing on “historic data”, science or fair use, I noted that the parts
dealing with international time zones drew on numerous publications
from around the world, and that a copyright infringement suit against
Olson and Eggert might be a good time to look at how the copyright
situation in the “ACS International Atlas” held up. In other words: the
ACS compilation was very likely doing exactly what the new owners
accused Olson and Eggert of.
The Electronic Frontier Foundation worked on the issue. And in all fairness, Astrolabe deserves a thank you.
(Published with permission of the original author)
No comments:
Post a Comment