string functions which support UTF8 normally

Hello.

I think you already know.
Lua's string functions become useless when
it came with text containing multibyte characters like Japanese, Chinese, Greeks..etc.

string.byte()
string.char()
string.find()
string.format()
string.gmatch()
string.gsub()
string.len()
string.lower()
string.match()
string.rep()
string.reverse()
string.sub()
string.upper()

Ansca team, please make utf8 supported string functions.
Here is my example of string.len's utf8 version.

how to use:
print( utf8Len( "あいうえおabc" ) ) -- outputs 8

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
-- returns the number of bytes used by the UTF-8 character at byte i in s
-- also doubles as a UTF-8 character validator   
function utf8CharBytes(s, i)
   -- argument defaults
   i = i or 1
   local c = string.byte(s, i)
   
   -- determine bytes needed for character, based on RFC 3629
   if c > 0 and c <= 127 then
      -- UTF8-1
      return 1
   elseif c >= 194 and c <= 223 then
      -- UTF8-2
      local c2 = string.byte(s, i + 1)
      return 2
   elseif c >= 224 and c <= 239 then
      -- UTF8-3
      local c2 = s:byte(i + 1)
      local c3 = s:byte(i + 2)
      return 3
   elseif c >= 240 and c <= 244 then
      -- UTF8-4
      local c2 = s:byte(i + 1)
      local c3 = s:byte(i + 2)
      local c4 = s:byte(i + 3)
      return 4
   end
end
 
-- returns the number of characters in a UTF-8 string
function utf8Len (s)
   local pos = 1
   local bytes = string.len(s)
   local len = 0
 
   while pos <= bytes and len ~= chars do
      local c = string.byte(s,pos)
      len = len + 1
 
      pos = pos + utf8CharBytes(s, pos)
   end
 
   if chars ~= nil then
      return pos - 1
   end
 
   return len
end

Passing this onto the team :)

views:1785 update:2012/2/6 12:03:31
corona forums © 2003-2011