(svn r9012) -Fix/Feature (UTF8): When cutting strings into multiple lines also take into consideration whitespace characters of more than 1 byte length (eg IDEOGRAPHIC SPACE, IsWhitespace() function). When trimming such strings, account for multiple-byte long sequences so use *Utf8PrevChar(v) = '\0'.

-Codechange: Add a function Utf8TrimString() that properly trims a string to an UTF8 encoding seperation instead of somewhere in the wild (and use it in the chat area)
2025-08-14 18:19:11 +00:00 · 2007-03-05 00:45:56 +00:00
parent aea64adbb9
commit 915ae8ffc2
4 changed files with 76 additions and 6 deletions
--- a/src/string.cpp
+++ b/src/string.cpp
@@ -268,3 +268,29 @@ size_t Utf8Encode(char *buf, WChar c)
 	*buf = '?';
 	return 1;
 }
+
+/**
+ * Properly terminate an UTF8 string to some maximum length
+ * @param s string to check if it needs additional trimming
+ * @param maxlen the maximum length the buffer can have.
+ * @return the new length in bytes of the string (eg. strlen(new_string))
+ * @NOTE maxlen is the string length _INCLUDING_ the terminating '\0'
+ */
+size_t Utf8TrimString(char *s, size_t maxlen)
+{
+	size_t length = 0;
+
+	for (const char *ptr = strchr(s, '\0'); *s != '\0';) {
+		size_t len = Utf8EncodedCharLen(*s);
+		if (len == 0) break; // invalid encoding
+
+		/* Take care when a hard cutoff was made for the string and
+		 * the last UTF8 sequence is invalid */
+		if (length + len >= maxlen || (s + len > ptr)) break;
+		s += len;
+		length += len;
+	}
+
+	*s = '\0';
+	return length;
+}