White Paper: International Text-Encoding

QuickBase Applications with International Text-Encoding

by Zachary Glennie, Technical Director
The Cambridge Institute of International Education
Eric Segal, President
The Data Collaborative, Inc.


As the marketplace continues to expand globally, more and more companies are facing the challenge of integrating information gathered not only in multiple languages, but also in languages that use non-Western characters. The Cambridge Institute of International Education is a case in point. Headquartered outside of Boston, its mission is to extend quality education to students from every nation.

That’s a lot of languages.

The main problem with handling and displaying non-Western characters in QuickBase is that QuickBase tells every web browser to use the Western (ISO-8859-1) character encoding when loading QuickBase webpages. When using Chrome (20.0) as your web browser, for example, to get Chinese characters to show up on the database, you have to change the encoding to Unicode (UTF-8) every time you access the database. If you opened a new tab the encoding would revert to normal and you would no longer be able to read the Chinese. 

There is a work around for this issue. It isn't pretty, but it is pretty simple:

  • To begin, click the wrench icon in the top right hand corner of the window. 
  • Then click settings, then on show advanced settings.
  • Under web content, click customize fonts.
  • At bottom of pop up window, click on encoding dropdown.
  • Select Unicode (UTF-8) which should be near the top of the list.

By doing this, you’re telling Chrome to encode every page in Unicode, which has the potential to break some websites (they might show up funny), but Cambridge Instiute has been using this for a while without any issues, so it shouldn't cause many problems. If you run into anything weird, you can just run through these options again and switch your encoding back to Western (ISO 8859-1). This should save Chrome users some time in the long run.

If you require other non Western-encodings, there are workarounds for most situations. But, there will be things that you can't do.

For starters, field names with non-Western characters will not behave consistently. Column headings and field names on forms will also be problematic. Some interface elements can be written with HTML, but many will simply not be able to use non-Western characters. (If your goal is to create a fully non-English-language application, you may need to look elsewhere.)

 

Using Numerical Character References (NCRs)

If you need to include tooltips or form text in Chinese or another non-Western character set, the best solution is HTML Numerical Character References (NCRs). For example, 台北, the characters for "Taipei", can be written as 台&#21271. This is a useful webpage for converting to NCR, but there are likely many others: http://www.pinyin.info/tools/converter/chars2uninumbers.html

The advantage of NCRs is that they will display correctly regardless of the user's browser encoding choice. The disadvantage is that this will inflate the size of the page (because "台" is eight full bytes, whereas 台in UTF-8 is only two or three bytes). This is not a problem for small interface elements, such as text elements and tooltips in a form.

This capability is a feature of HTML, so any text fields which include NCR-coded letters should be set to use HTML.

 

in Formula - Text fields, check  

 

on Text elements of Forms, check 

 

Data Entered By Users

If your users need to submit and retrieve text in non-English languages, you face a greater challenge. QuickBase will store the text as ones and zeros without making a record of the encoding with which the text is submitted. There is a full explanation of this here, which you should definitely read carefully if you plan to store non-Western data.

http://quickbase.intuit.com/developer/knowledge-base/does-quickbase-support-unicode-or-other-multi-byte-character-encodings-big5-gb-hz-shift-jis-/

To make this work, you will need to cause (or even “force”) your users to use the desired encoding in their web browser whenever they submit or retrieve data. The Cambridge Institute employs a number of strategies: 

Visible Feedback 
There is a "Custom Page Banner" which includes UTF-8 encoded text. (To add a custom page banner Customize > Application > Properties > Branding > check Custom Page Banner and configure.) One example of this text is: [ SUCCESS:ã€€ï¼µï¼®ï¼©ï¼£ï¼ which displays as “[  SUCCESS: UNICODE ENABLED! ]” when Unicode is enabled. These are full-moon characters, so they are easy to read for English speakers, but they don't display correctly unless they are decoded with UTF-8.

User Browser Configuration

  • Most users are on Chrome. They’re instructed to set UTF-8 as the default for all webpages (rather than "detect"), using the steps outlined in the introduction. This overrides QuickBase's header, which instructs the browser to use Western encoding.
  • For Firefox, there is an extension called charset-switcher which allows domain-specific overriding of the text encoding setting.

Policing with Javascript - warning users and locking field

  • Javascript has been injected into some of the pages. If you are running javascript in this manner, you may include code like this, which displays an alert and locks a field (in this case, FID #8) if the user is not using your preferred encoding.

var encoding = document.inputEncoding;

if (encoding!="UTF-8") { alert("WARNING!\n\nYour encoding is set to << "+encoding+" >>.\n\nIf you enter any Non-Western characters, they will not appear correctly. For best results, use UTF-8 (Unicode).\n\n\~Zack\");
}

document.getElementById("_fid_8").disabled = true;

Note on Importing Data

You can import multiple records containing UTF-8 data, but you may need to be careful. Here's one proven method:

  1. Use CSV or TSV as your import file format (this may not be necessary, but I always do it).
     
  2. Set your browser to UTF-8 before submitting your file for import.
     
  3. If you have trouble, review the file in a good text editor such as Notepad++. In Notepad++, you can create and edit non-Western text files by setting the encoding to UTF-8. 

 

 

_________________________________________________________________

Published: August 2012

 

Comments

This whitepaper provides a comprehensive guide to setup and full implementation, along with lmiits that might impact the user experience. To date there appears to have been limited demand for use of QuickBase outside of "western" languages, but this wrietup should help open the door for customers who want more flexibility in their international interfaces. There can be iinteresting possiibilities for including two languages in the same application.

Post new comment

Safe HTML

  • Web page addresses and e-mail addresses turn into links automatically.

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd> <p> <br>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.