FastBlogIt Internationalization and DBCS

seth: 2006-01-14 10:36:21
Mark 2006-01-14 10:32:07 [item 2523]
One interesting question is how we could possibly moniter for Chinese Spam and maliciousness.
I would not allow foreign characters here at fastblogit.  This is an English domain.  That would be for a different fastblogit domain that was devoted to a different country.  Presumably the person running the domain would police their own spam.
Mark: 2006-01-14 10:42:48
seth 2006-01-14 10:39:00 [item 2523]
Mark 2006-01-14 10:32:07 [item 2523]
The double-byte character sets like the ones used in the orient, Japanese, hongul, Chinese require 2 bytes for every one used in English. Wikipedia describes it here under DBCS and internationalization.
Yes, note that your [item 2512] has one of those characters in it by virtue of the title in the site you blogged.  I know this because it is fucking up the feeds.
I fixed the fucking character
Mark: 2006-01-14 11:10:54
Internationalization is usually done at the end of a project - since a moving target is hard to deal with. If they mess up the feeds it sounds like the way to go is a filete. I am NOT an expert in this area.
Mark: 2006-01-14 11:14:43
maybe the fix to the feed is something in the character set line of the xml
seth: 2006-01-14 12:16:03
Mark 2006-01-14 11:14:43 [item 2523]
maybe the fix to the feed is something in the character set line of the xml
Well in the feed element there is the property specification xml:lang="en-US".  The question is what should the xml:lang specification be such that we allow what we want to allow, and how can we exclude that which the lnag specification would barf upon. 
The forgoing was moved here to accommodate isolating it in a better resting place being slightly off-topic material from 2523 where it was.

Tags

  1. internationalization
  2. dbcs
  3. problem~solution
  4. off-topic
  5. character sets
  6. feeds

Comments


Mark de LA says
Apparently an & in the title is not a good idea either. This article bombed the feed with a sql error and I replace the & with an "and" & then the feed worked.

Seth says
Mark 2006-01-15 06:55:10 2529
Apparently an & in the title is not a good idea either. This article bombed the feed with a sql error and I replace the & with an "and" & then the feed worked.
Take a look at the title element of url http://www.drudgereport.com/flashp.htm and you see 
source:
<title>DRUDGE REPORT FLASH 2006?</title>
However when you fastblogit that url the title in the add screen ends up being expressed as
source:
DRUDGE REPORT FLASH 2006

Mark de LA says
Yes I acknowledged and fixed the registered trademark trademark in a report on the drudgereport. I discovered a new one when (before i fixed it) I had an ampersand in the title of [this: item 2529] . I can change it back if you want to debug it or you can make one of your own.

Mark de LA says
Actually, it is kinda fascinating to search the lefthand list for a title with a ampersand by putting an ampersand followed by a space and then pressing the enter button - seems to wait forever.

Seth says
Incidentally, there is a work around for the mutating character, which crops up a lot. When you fast blog something, just edit the title about to remove the extra  character and all comes out correctly.

The & character in the title is another matter.

Seth says
source: Repairing broken documents that mix UTF-8 and ISO-8859-1
A perpetual (if thankfully not too frequent) problem on the web are documents claiming to be encoded in either UTF-8 or ISO-8859-1, but containing characters encoded according to the respective other charset. Such documents will display incorrectly, regardless of which way you look at them. Worse, if the document in question is XML (such as, say, a newsfeed) and claims to be encoded in UTF-8, upset ensues that leads the XML parser to halt and catch fire as soon as it encounters the first invalid byte.
...


Seth says
source: Sam Ruby
Ian Hixie: I think it may be time to retire the Content-Type
header, putting to sleep the myth that it is in any way
authoritative, and instead have well-defined content-sniffing rules
for Web content.

The reason why people can safely enter non-Latin-1 characters in
my comments and have them presented properly to all consumers that
have installed the appropriate fonts is that these pages specify
charset=utf-8 in the content-type header.

Sniffing for the character encoding used is clearly not the
answer.  Nor am I convinced that meta http-equiv is either.
...


See Also

  1. Thought about: twitter. it's what's happening. with 175 viewings related by tag "feeds".
  2. Thought We can enter unicode into fastblogit ! with 127 viewings related by tag "feeds".
  3. Thought about: oreilly.com -- Online Catalog: What Are Syndication Feeds, First Edition with 14 viewings related by tag "feeds".
  4. Thought about: Media RSS Specification with 12 viewings related by tag "feeds".
  5. Thought about: Microsoft Team RSS Blog : Icons: still orange with 7 viewings related by tag "feeds".
  6. Thought about: Micro Persuasion: The Missing Piece in the RSS Puzzle with 6 viewings related by tag "feeds".
  7. Thought about: MAKE: Blog: The MAKEbot is here! with 6 viewings related by tag "feeds".
  8. Thought The Rest Of The Story with 6 viewings related by tag "feeds".
  9. Thought Top Thoughts style proposal with 6 viewings related by tag "feeds".
  10. Thought Feed button in groups with 5 viewings related by tag "feeds".
  11. Thought about: Google Help : Add to Google button with 5 viewings related by tag "feeds".
  12. Thought Feeds at least for groups are working. with 5 viewings related by tag "feeds".
  13. Thought Anonymous Blogging with 5 viewings related by tag "character sets".
  14. Thought the feed icon shows in the navigation panel of rooms that have feeds with 4 viewings related by tag "feeds".
  15. Thought tag uri with 4 viewings related by tag "feeds".
  16. Thought about: RSS MAD - the most surfer friendly internet content aggregator with 4 viewings related by tag "feeds".
  17. Thought feeds project with 3 viewings related by tag "feeds".
  18. Thought moving items from one group to another with 3 viewings related by tag "feeds".
  19. Thought We have switched to Atom 1.0 Feeds with 3 viewings related by tag "feeds".
  20. Thought White Fences by Dave Winer with 3 viewings related by tag "feeds".
  21. Thought Some things that need to be fixed in feeds with 3 viewings related by tag "feeds".
  22. Thought about: FeedTree: collaborative RSS and Atom delivery with 3 viewings related by tag "feeds".
  23. Thought about: Top 10 Sources with 2 viewings related by tag "feeds".
  24. Thought Correcting unicode in the title with 2 viewings related by tag "feeds".
  25. Thought re: RMIUG Meeting with 2 viewings related by tag "feeds".
  26. Thought about: Backbase AJAX RSS Reader with 2 viewings related by tag "feeds".
  27. Thought about: the atom 1.0 feed for the river with 1 viewings related by tag "feeds".
  28. Thought about: TechCrunch ? Outlook 12 to have RSS Integration with 1 viewings related by tag "feeds".
  29. Thought about: O'Reilly Radar > Google Maps Extension for GeoRSS with 1 viewings related by tag "feeds".
  30. Thought about: weaverluke: Logical identities and re-contextualisation with 1 viewings related by tag "feeds".
  31. Thought Syndicating our products with 0 viewings related by tag "feeds".
  32. Thought about: Democracy - Internet TV Platform - Free and Open Source with 0 viewings related by tag "feeds".
  33. Thought about: Google Data APIs Overview with 0 viewings related by tag "feeds".
  34. Thought collecting facts for feeds best practices manual with 0 viewings related by tag "feeds".
  35. Thought tagtalking feeds with 0 viewings related by tag "feeds".
  36. Thought about: Read/Write Web: The Second Coming of Content and RSS Feeds with 0 viewings related by tag "feeds".
  37. Thought about: XML Developer Center: Simple Sharing Extensions for RSS and OPML with 0 viewings related by tag "feeds".
  38. Thought Here is the blog created from feeds of major posters to the irc channel with 0 viewings related by tag "feeds".
  39. Thought about: Feed Validator for Atom and RSS with 0 viewings related by tag "feeds".
  40. Thought about: The Atom Syndication Format 0.3 (PRE-DRAFT) with 0 viewings related by tag "feeds".
  41. Thought RSS for new comments with 0 viewings related by tag "feeds".
  42. Thought my google feed reader with 0 viewings related by tag "feeds".
  43. Thought It's on our server now ... wonder if it works with 0 viewings related by tag "feeds".
  44. Thought How to solve trackback spam ? with 0 viewings related by tag "problem~solution".
  45. Thought RSS feeds are going into Google Desktop Search ... interesting with 0 viewings related by tag "feeds".
  46. Thought fuzzy connecting to google searches with 0 viewings related by tag "problem~solution".
  47. Thought about: Geeking with Greg: RSS was designed by geeks for geeks with 0 viewings related by tag "feeds".
  48. Thought about: XML.com: Escaped Markup Considered Harmful with 0 viewings related by tag "feeds".
  49. Thought Announcing the 2006 Disabled Singing Contest with 0 viewings related by tag "feeds".
  50. Thought about: Scripting News Annex with 0 viewings related by tag "feeds".