The road to implementing new software is not a straight line. We knew where we wanted to end up but we needed help getting there. Enter Data Collaborative.
Quick Base Applications with International Text-Encoding
by Zachary Glennie, Technical Director
The Cambridge Institute of International Education
Eric Segal, President
The Data Collaborative, Inc.
As the marketplace continues to expand globally, more and more companies are facing the challenge of integrating information gathered not only in multiple languages, but also in languages that use non-Western characters. The Cambridge Institute of International Education is a case in point. Headquartered outside of Boston, its mission is to extend quality education to students from every nation.
That’s a lot of languages.
The main problem with handling and displaying non-Western characters in Quick Base is that Quick Base tells every web browser to use the Western (ISO-8859-1) character encoding when loading Quick Base webpages. When using Chrome (20.0) as your web browser, for example, to get Chinese characters to show up on the database, you have to change the encoding to Unicode (UTF-8) every time you access the database. If you opened a new tab the encoding would revert to normal and you would no longer be able to read the Chinese.
There is a work around for this issue. It isn't pretty, but it is pretty simple:
- To begin, click the wrench icon in the top right hand corner of the window.
- Then click settings, then on show advanced settings.
- Under web content, click customize fonts.
- At bottom of pop up window, click on encoding dropdown.
- Select Unicode (UTF-8) which should be near the top of the list.
By doing this, you’re telling Chrome to encode every page in Unicode, which has the potential to break some websites (they might show up funny), but Cambridge Instiute has been using this for a while without any issues, so it shouldn't cause many problems. If you run into anything weird, you can just run through these options again and switch your encoding back to Western (ISO 8859-1). This should save Chrome users some time in the long run.
If you require other non Western-encodings, there are workarounds for most situations. But, there will be things that you can't do.
For starters, field names with non-Western characters will not behave consistently. Column headings and field names on forms will also be problematic. Some interface elements can be written with HTML, but many will simply not be able to use non-Western characters. (If your goal is to create a fully non-English-language application, you may need to look elsewhere.)
Using Numerical Character References (NCRs)
If you need to include tooltips or form text in Chinese or another non-Western character set, the best solution is HTML Numerical Character References (NCRs). For example, 台北, the characters for "Taipei", can be written as 台北. This is a useful webpage for converting to NCR, but there are likely many others: http://www.pinyin.info/tools/converter/chars2uninumbers.html
The advantage of NCRs is that they will display correctly regardless of the user's browser encoding choice. The disadvantage is that this will inflate the size of the page (because "台" is eight full bytes, whereas 台in UTF-8 is only two or three bytes). This is not a problem for small interface elements, such as text elements and tooltips in a form.
This capability is a feature of HTML, so any text fields which include NCR-coded letters should be set to use HTML.
in Formula - Text fields, check
on Text elements of Forms, check
If your users need to submit and retrieve text in non-English languages, you face a greater challenge. Quick Base will store the text as ones and zeros without making a record of the encoding with which the text is submitted. There is a full explanation of this here, which you should definitely read carefully if you plan to store non-Western data.
To make this work, you will need to cause (or even “force”) your users to use the desired encoding in their web browser whenever they submit or retrieve data. The Cambridge Institute employs a number of strategies:
There is a "Custom Page Banner" which includes UTF-8 encoded text. (To add a custom page banner Customize > Application > Properties > Branding > check Custom Page Banner and configure.) One example of this text is: ï¼»ã€€ï¼³ï¼µï¼£ï¼£ï¼¥ï¼³ï¼³:ã€€ï¼µï¼®ï¼©ï¼£ï¼ which displays as “［ ＳＵＣＣＥＳＳ: ＵＮＩＣＯＤＥ ＥＮＡＢＬＥＤ！ ］” when Unicode is enabled. These are full-moon characters, so they are easy to read for English speakers, but they don't display correctly unless they are decoded with UTF-8.
User Browser Configuration
- Most users are on Chrome. They’re instructed to set UTF-8 as the default for all webpages (rather than "detect"), using the steps outlined in the introduction. This overrides Quick Base's header, which instructs the browser to use Western encoding.
- For Firefox, there is an extension called charset-switcher which allows domain-specific overriding of the text encoding setting.
var encoding = document.inputEncoding;
if (encoding!="UTF-8") |LF| alert("WARNING!\n\nYour encoding is set to << "+encoding+" >>.\n\nIf you enter any Non-Western characters, they will not appear correctly. For best results, use UTF-8 (Unicode).\n\n\~Zack\");
document.getElementById("_fid_8").disabled = true;
You can import multiple records containing UTF-8 data, but you may need to be careful. Here's one proven method:
- Use CSV or TSV as your import file format (this may not be necessary, but I always do it).
- Set your browser to UTF-8 before submitting your file for import.
- If you have trouble, review the file in a good text editor such as Notepad++. In Notepad++, you can create and edit non-Western text files by setting the encoding to UTF-8.