mercredi 8 décembre 2021

How can I know which characters inside a string are compositions of a single accentuated character in C?

My native language is not English, is Portuguese-BR and we have these accentuated characters (á, à, ã, õ, and so on).

So, my problem is, if I put one of these characters inside a string, and I try to iterate over each character inside it, I'm going to get that two characters are necessary to display "ã" on the screen.

Here's an image about me iterating over a string "(Não Informado)", which means: Uninformed. The string should have a length of 15 if we count each character one by one. But if we call strlen("(Não Informado)");, the result is 16. enter image description here

The code I used to print each character in this image is this one:

void print_buffer (const char * buffer) {
    int size = strlen(buffer);
    printf("BUFFER: %s / %i\n", buffer, size);

    for (int i = 0; buffer[i] != '\0'; ++i) {
        printf("[%i]: %i\n", i, (unsigned char) buffer[i]);
    }
}

So, in graphical applications, a buffer could display "ãbc", and inside the raw string we wouldn't have 3 characters, but actually 4.

So here's my question, is there a way to know which characters inside a string are a composition of those special characters? Is there a rule to design and restrict this occurrence? Is it always a composition of 2 characters? Could a special character be composed of 3 or 4, for example?

Thanks

Aucun commentaire:

Enregistrer un commentaire