This was a quick romp around “plain text”: Plain Text – Dylan Beattie – NDC Oslo 2021. Would recommend if you don’t already know about such things.
Tag Archives: encoding
Fixing character encoding issue in ViewVC
So I was having an issue with ViewVC wherein UTF-8 content (a copyright symbol) was being garbled in the web browser.
I chased a number of red herrings (Content-Type headers, http-equiv, XHTML vs HTML5) but eventually found the culprit in the viewvc.conf settings.
I needed to change the ‘detect_encoding’ setting from ‘1’ to ‘0’. Once that was done my content was presented correctly:
## detect_encoding: Should we attempt to detect versioned file ## character encodings? [Requires 'chardet' module, and is currently ## used only by the syntax coloration logic -- if enabled -- for the ## 'markup' and 'annotate' views; see 'enable_syntax_coloration'.] ## # 2019-06-02 jj5 - OLD: this was bollocksing things up... #detect_encoding = 1 # 2019-06-02 jj5 - NEW: so I changed it... detect_encoding = 0 # 2019-06-02 jj5 - END
Salt file.recurse source file not found (file encoding issue)
So I was running this:
/var/www/jj-web-1-www.jj5.net-sixsigma: file.recurse: - clean: True - user: root - group: root - dir_mode: 755 - file_mode: 644 - source: salt://inst/mediawiki-1.29 - require: - pkg: apache2
And getting an error like this:
---------- ID: /var/www/jj-web-1-www.jj5.net-sixsigma Function: file.recurse Result: False Comment: #### /var/www/jj-web-1-www.jj5.net-sixsigma/vendor/james-heinrich/getid3/getid3/module.audio.ac3.php #### Source file 'salt://inst/mediawiki-1.29/vendor/james-heinrich/getid3/getid3/module.audio.ac3.php?saltenv=base' not found #### /var/www/jj-web-1-www.jj5.net-sixsigma/vendor/james-heinrich/getid3/getid3/module.audio-video.mpeg.php #### Source file 'salt://inst/mediawiki-1.29/vendor/james-heinrich/getid3/getid3/module.audio-video.mpeg.php?saltenv=base' not found Started: 14:27:18.352264 Duration: 134735.945 ms Changes: ----------
The issue was that the source files mentioned weren’t in UTF-8 format. To convert the files I ran, e.g.:
$ iconv -f WINDOWS-1252 -t UTF-8//TRANSLIT < module.audio-video.mpeg.php.bak > module.audio-video.mpeg.php
(Actually I couldn’t get the ‘iconv’ command to work so I edited manually in Vim)
Fixing bug in /etc/cron.daily/etckeeper on Ubuntu Lucid
I was getting an error like this:
/etc/cron.daily/etckeeper: bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 34: ordinal not in range(128) Traceback (most recent call last): File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 853, in exception_to_return_code return the_callable(*args, **kwargs) File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 1055, in run_bzr ret = run(*run_argv) File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 661, in run_argv_aliases return self.run_direct(**all_cmd_args) File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 665, in run_direct return self._operation.run_simple(*args, **kwargs) File "/usr/lib/python2.6/dist-packages/bzrlib/cleanup.py", line 122, in run_simple self.cleanups, self.func, *args, **kwargs) File "/usr/lib/python2.6/dist-packages/bzrlib/cleanup.py", line 156, in _do_with_cleanups result = func(*args, **kwargs) File "/usr/lib/python2.6/dist-packages/bzrlib/builtins.py", line 659, in run no_recurse, action=action, save=not dry_run) File "/usr/lib/python2.6/dist-packages/bzrlib/mutabletree.py", line 50, in tree_write_locked return unbound(self, *args, **kwargs) File "/usr/lib/python2.6/dist-packages/bzrlib/mutabletree.py", line 521, in smart_add for subf in sorted(os.listdir(abspath)): UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 34: ordinal not in range(128) bzr 2.1.4 on python 2.6.5 (Linux-2.6.35.4-rscloud-x86_64-with-Ubuntu-10.04-lucid) arguments: ['/usr/bin/bzr', 'add', '-q', '.'] encoding: 'ANSI_X3.4-1968', fsenc: 'ANSI_X3.4-1968', lang: None plugins: bzrtools /usr/lib/python2.6/dist-packages/bzrlib/plugins/bzrtools [2.1.0] etckeeper /usr/lib/python2.6/dist-packages/bzrlib/plugins/etckeeper [unknown] launchpad /usr/lib/python2.6/dist-packages/bzrlib/plugins/launchpad [2.1.4] netrc_credential_store /usr/lib/python2.6/dist-packages/bzrlib/plugins/netrc_credential_store [2.1.4] news_merge /usr/lib/python2.6/dist-packages/bzrlib/plugins/news_merge [2.1.4] *** Bazaar has encountered an internal error. This probably indicates a bug in Bazaar. You can help us fix it by filing a bug report at https://bugs.launchpad.net/bzr/+filebug including this traceback and a description of the problem. etckeeper warning: bzr add failed Committing to: /etc/ modified apache2/passwd.htdigest modified apache2/sites-available/svn.jj5.net-ssl Committed revision 87.
I’ve tried to fix it by adding:
export LANG=en_AU.UTF-8 export LANGUAGE=en_AU:en
As lines 2 and 3 in /etc/cron.daily/etckeeper.
Now I’ll wait a day or two and see if it worked…
PHP htmlentities encoding
I needed to know my options for htmlentities character encoding support today. The PHP documentation had everything I needed to know. I ended up adding these constants to my code:
const UTF8_ENCODING = 'UTF-8'; const ASCII_ENCODING = 'ISO-8859-1';
Changing content type with iconv
Today I needed to convert a UTF-16 file to UTF-8 and I did it with iconv:
iconv -f UTF-16 -t UTF-8 /path/to/input > /path/to/output
XML on the Web Has Failed
Here’s an interesting article from way back in 2004: XML on the Web Has Failed.
Handling character encodings in HTML and CSS
On my list of things to do is read the document Handling character encodings in HTML and CSS from the W3C. For some reason I can’t quite bring myself to concentrate on it right now.