C String Extraction: Standard, Custom & Multibyte Guide

目次

1. Introduction

String manipulation in C language is one of the important skills for learning programming. In particular, string slicing (extracting substrings) is frequently used when processing data or performing format conversion.

In this article, we will provide a detailed explanation of how to slice strings in C language, including methods using standard library functions, creating custom functions, handling multibyte characters (Japanese), and techniques for splitting strings. We also introduce practical examples and error handling, so please read through to the end.

What You’ll Learn in This Article

By reading this article, you can acquire the following skills.

  • Basic concepts of C language strings and the role of the terminating character
  • strncpystrchrsubstring extraction using standard library functions
  • Custom function
  • Considering multibyte characters (Japanese)
  • strtok method of splitting strings
  • How to Retrieve the Characters Before and After a Specific Character

We will explain the concepts with code examples so that even beginners can understand easily.

Why Is String Slicing Important in C Language?

C language treats strings as “arrays (arrays of char type)”, so unlike higher-level languages such as Python or JavaScript, it cannot easily obtain substrings. Therefore, it is important to choose the appropriate method in situations like the following.

1. Processing Input Data

For example, when analyzing log data or CSV files, you need to extract specific fields.

2. Searching for Specific Keywords

Finding a specific keyword within a string and retrieving the surrounding information is essential for search functionality and data extraction.

3. Improving Program Safety

strncpy by using functions such as strncpy appropriately, you can prevent buffer overflows (writing data beyond the buffer size). This is important for avoiding security risks.

Article Structure

In this article, we will proceed with the explanation in the following order.

  1. What is a string in C? Basic concepts and the importance of the terminating character
  2. How to extract a substring in C language [Standard Library Edition]
  3. How to extract a substring in C language 【Custom function edition】
  4. Method of extracting strings by character encoding
  5. How to split strings in C
  6. Application example: How to extract text before and after a specific character
  7. Summary
  8. FAQ

Now, let’s first take a detailed look at “What Are Strings in C Language? Basic Concepts and the Importance of the Null Terminator.”

2. What are strings in C? Basic concepts and the importance of the terminating character

2.1 Basic concepts of strings in C

A string is an array of char

In C, strings are treated as arrays of characters (arrays of type char). For example, the following code is a basic example of defining and displaying a string.

#include <stdio.h>

int main() {
    char str[] = "Hello, World!"; // Define a string literal as an array
    printf("%s ", str); // Output the string
    return 0;
}

In this code, the string "Hello, World!" is stored as an array of type char, and is output by printf("%s\n", str);.

Internal structure of a string

The string "Hello" is stored in memory as follows.

Index012345
textHello 

In C, a special character that indicates the end of a string (the null character '\0') is automatically added at the end, so the length of a string is “actual number of characters + 1”.

2.2 Importance of the terminating character (null character '

What is a null character?

'
)

Problems when the null character is missing

The null character ('\0') is a special character that indicates the end of a string. To handle C strings correctly, you need to understand the existence of this null character.

#include <stdio.h>

int main() {
    char str[6] = {'H', 'e', 'l', 'l', 'o', '\0'}; // Explicitly specify the null terminator
    printf("%s ", str);                            // Display correctly
    return 0;
}

In the code above, if there is no '\0', the termination of "Hello" is not recognized, which can lead to unintended behavior.

2.3 Correct ways to define strings

As shown below, forgetting the terminating character can cause abnormal memory behavior.

#include <stdio.h>

int main() {
    char str[5] = {'H', 'e', 'l', 'l', 'o'}; // Does not include the null terminator
    printf("%s ", str);                      // May cause unexpected behavior
    return 0;
}

Cause of error

  • printf("%s\n", str);null character '\0' continues to output characters until it finds it
  • If is not present, other data in memory may be output.

Method 1: Use string literals

Method 2: Define the array explicitly

The most common way to define a string is to use a string literal.

char str[] = "Hello";

With this method, the C compiler automatically adds the null character '\0', so no special handling is required.

2.4 How to check the size of a string

If you define it manually including '\0', write it as follows.

char str[6] = {'H', 'e', 'l', 'l', 'o', '\0'};
  • Specify the size of characters, and it is important to put at the end.
  • If you forget to put into , unexpected behavior will occur.

strlen behavior

To obtain the length of a string (number of characters), use the strlen function.

#include <stdio.h>
#include <string.h>

int main() {
    char str[] = "Hello";
    printf("Length of the string: %lu\n", strlen(str)); // Outputs 5 (does not include the null terminator)
    return 0;
}

2.5 Summary

  • strlennull character '\0' appears
  • sizeof(str)

3. How to Extract Substrings in C Language [Standard Library Edition]

  1. C language strings are represented by char arrays
  2. The terminating character (null character '\0') indicates the end of a string, so it must always be included
  3. To get the length of a string strlen use
  4. If you don’t define strings in an appropriate way, unexpected errors may occur.

3.1 Obtaining Substrings Using strncpy

In C language, you can extract substrings by leveraging the standard library. In this section, we explain how to obtain parts of a string using standard library functions such as strncpy and strchr.

Basic Syntax of strncpy

strncpy is a function that copies a portion of a string to another buffer.

Basic Usage Example

char *strncpy(char *dest, const char *src, size_t n);
  • dest
  • src
  • n'\0'

Cautions for strncpy

#include <stdio.h>
#include <string.h>

int main() {
    char src[] = "Hello, World!";
    char dest[6];  // Buffer to store the substring

    strncpy(dest, src, 5); // Copy the first 5 characters "Hello"
    dest[5] = '\0';        // Manually add the null terminator

    printf("Substring: %s\n", dest);  // Output "Hello"

    return 0;
}

3.2 Safe String Copy Using strncpy_s

  1. Null character '\0' needs to be added manually strncpyn'\0' is not automatically appended dest[n] = '\0';
  2. Beware of buffer overflowdestn

Basic Syntax of strncpy_s

strncpy_s is a version of strncpy with enhanced safety that can prevent buffer overflows.

Example

errno_t strncpy_s(char *dest, rsize_t destsz, const char *src, rsize_t n);
  • dest
  • destszdest
  • src
  • n

Benefits of strncpy_s

#include <stdio.h>
#include <string.h>

int main() {
    char src[] = "Hello, World!";
    char dest[6];

    if (strncpy_s(dest, sizeof(dest), src, 5) == 0) {
        dest[5] = '\0';  // Add null terminator just in case
        printf("Substring: %s\n", dest);
    } else {
        printf("Copy error\n");
    }

    return 0;
}

3.3 Extracting Up to a Specific Character Using strchr

  • Specify the buffer size () to safely copy.
  • destszn

However, strncpy_s was added in the C11 standard, so you need to be aware that it may not be available in some environments.

Basic Syntax of strchr

strchr allows you to find the position of a specific character and retrieve the string up to that point.

Example

char *strchr(const char *str, int c);
  • str
  • cchar

Key Points

#include <stdio.h>
#include <string.h>

int main() {
    char str[] = "Hello, World!";
    char *pos = strchr(str, ','); // Find the position of ','

    if (pos != NULL) {
        int length = pos - str; // Calculate the number of characters up to ','
        char result[20];

        strncpy(result, str, length);
        result[length] = '\0'; // Add the null terminator

        printf("Substring: %s\n", result);  // Output "Hello"
    }

    return 0;
}

3.4 Keyword Search and Extraction Using strstr

  • strchr c
  • pos - strstrncpy

Basic Syntax of strstr

strstr is useful for searching for a substring and retrieving the string that follows it.

Example

char *strstr(const char *haystack, const char *needle);
  • haystack
  • needle

Key Points

#include <stdio.h>
#include <string.h>

int main() {
    char str[] = "Hello, World!";
    char *pos = strstr(str, "World"); // Search for the position of "World"

    if (pos != NULL) {
        printf("Found substring: %s\n", pos);
    } else {
        printf("Substring not found.\n");
    }

    return 0;
}

3.5 Summary

  • strstrneedle
  • NULLneedlehaystack

4. How to Extract Substrings in C Language 【Custom Function Edition】

  1. strncpy When using strncpy, you can safely copy a substring, but you need to manually add a null character.
  2. strncpy_s can specify destsz, improving safety.
  3. strchr If you use it, you can obtain the substring up to a specific character.
  4. strstr If you use it, you can get the position of a specific keyword and then cut out from there.

By leveraging the standard library, you can implement string handling in C language in a simple and safe manner.

4.1 Benefits of Creating Custom Functions

While you can perform basic substring extraction using the standard library, sometimes a more flexible approach is required. In this section, we will explain substring extraction using custom functions.

4.2 Basic Substring Extraction Function

Using the standard library allows copying and searching substrings, but there are issues such as the following.

  • strncpy does not automatically add the null character '\0'
  • strchr and strstr can only perform partial search
  • String manipulation is more difficult

Therefore, creating a custom function that can be tailored to specific needs is effective.

Function Specification

First, we create a basic function that extracts a string from a specified position.

Implementation Code

  • Argument
  • const char *source
  • int start
  • int length
  • char *dest
  • Processing content
  • startlengthdest
  • '\0'

Key Points

#include <stdio.h>
#include <string.h>

void substring(const char *source, int start, int length, char *dest) {
    int i;
    for (i = 0; i < length && source[start + i] != '\0'; i++) {
        dest[i] = source[start + i];
    }
    dest[i] = '\0'; // Add null terminator
}

int main() {
    char text[] = "Hello, World!";
    char result[10];

    substring(text, 7, 5, result); // Extract "World"
    printf("Substring: %s\n", result);

    return 0;
}

4.3 Dynamic Substring Acquisition Using malloc

  • forlength
  • '\0'
  • dest[i] = '\0';must always place null character at the end

Function Specification

In the above function, you need to allocate the size of dest in advance. However, if you can allocate the required size dynamically, the function becomes more versatile.

Implementation Code

  • Allocate the required memory with
  • startlength
  • The caller must

Key Points

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *substring_dynamic(const char *source, int start, int length) {
    char *dest = (char *)malloc(length + 1); // +1 for the null terminator
    if (dest == NULL) {
        return NULL; // Memory allocation failed
    }

    int i;
    for (i = 0; i < length && source[start + i] != '\0'; i++) {
        dest[i] = source[start + i];
    }
    dest[i] = '\0';

    return dest;
}

int main() {
    char text[] = "Hello, World!";
    char *result = substring_dynamic(text, 7, 5);

    if (result != NULL) {
        printf("Substring: %s\n", result);
        free(result); // Free allocated memory
    } else {
        printf("Memory allocation failed.\n");
    }

    return 0;
}

4.4 Multibyte Character (Japanese) Support

  • mallocdynamically allocating memory
  • After use, you need to free the memory with .

Implementation Considering Multibyte Characters

When handling Japanese (UTF-8 or other multibyte characters), a character is not necessarily 1 byte, so a simple substring function will not work correctly.

Implementation Code (UTF-8 Compatible)

  • mbstowcswchar_t
  • wcsncpy
  • wcstombs

Key Points

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <wchar.h>
#include <locale.h>

void substring_utf8(const char *source, int start, int length, char *dest) {
    setlocale(LC_ALL, ""); // Set the locale

    wchar_t wsource[256];
    mbstowcs(wsource, source, 256); // Convert UTF-8 string to wide-character string

    wchar_t wresult[256];
    wcsncpy(wresult, wsource + start, length); // Extract substring in wide characters
    wresult[length] = L'\0';

    wcstombs(dest, wresult, 256); // Convert back to multibyte string
}

int main() {
    char text[] = "こんにちは、世界!"; // UTF-8 string
    char result[20];

    substring_utf8(text, 5, 3, result); // Extract "世界"
    printf("Substring: %s\n", result);

    return 0;
}

4.5 Summary

  • setlocale(LC_ALL, "");
  • mbstowcs
  • wcsncpywcstombs

5. Methods for extracting substrings by character encoding

  1. substring If you create it yourself, you can flexibly obtain substrings.
  2. Using dynamic memory allocation (malloc), you can obtain variable-sized substrings.
  3. When handling multibyte characters (Japanese), mbstowcs / wcstombs use.

When the standard library functions such as strncpy or strchr are insufficient, creating custom functions can make C language string handling more powerful.

5.1 When dealing with ASCII (1-byte characters)

In C language, if you don’t pay attention to differences in character encoding, substring extraction may not work correctly . Especially when handling multibyte characters such as Japanese (UTF-8, Shift_JIS, EUC-JP, etc.), because one character is not equal to one byte, simple strncpy or substring functions cannot handle it properly.

In this section, we will explain in detail methods for extracting substrings by character encoding .

Basic substring retrieval

Implementation example

ASCII characters are 1 character = 1 byte , so they can be easily processed with strncpy or substring functions.

5.2 When dealing with UTF-8 (multibyte characters)

#include <stdio.h>
#include <string.h>

void substring_ascii(const char *source, int start, int length, char *dest) {
    strncpy(dest, source + start, length);
    dest[length] = '\0'; // Add null terminator
}

int main() {
    char text[] = "Hello, World!";
    char result[6];

    substring_ascii(text, 7, 5, result); // Extract "World"
    printf("Substring: %s\n", result);

    return 0;
}

Key points

  • In the case of ASCII text (alphanumeric only) strncpy is sufficient to handle
  • '\0' (null character) must always be added

Characteristics of UTF-8

Correct processing method

In UTF-8, the number of bytes per character varies from 1-4 bytes , so using strncpy directly may cut a character in the middle.

Substring retrieval supporting UTF-8

In C language, to safely handle UTF-8, it is recommended to convert it to a wide string ( wchar_t ) using mbstowcs and then obtain substrings.

5.3 When dealing with Shift_JIS (multibyte characters)

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>

void substring_utf8(const char *source, int start, int length, char *dest) {
    setlocale(LC_ALL, ""); // Set the locale

    wchar_t wsource[256];
    mbstowcs(wsource, source, 256); // Convert multibyte string to wide-character string

    wchar_t wresult[256];
    wcsncpy(wresult, wsource + start, length); // Get the substring
    wresult[length] = L'\0';

    wcstombs(dest, wresult, 256); // Convert wide-character string back to multibyte
}

int main() {
    char text[] = "こんにちは、世界!"; // UTF-8 string
    char result[20];

    substring_utf8(text, 5, 3, result); // Extract "世界"
    printf("Substring: %s\n", result);

    return 0;
}

Key points

  • setlocale(LC_ALL, "");
  • mbstowcswchar_twcsncpy
  • wcstombs

Characteristics of Shift_JIS

Substring retrieval supporting Shift_JIS

In Shift_JIS, a character can be 1 byte or 2 bytes , so using simple strncpy can cause garbled output.

Implementation for Shift_JIS

For Shift_JIS, the method of converting to a wide string and processing it is also recommended.

5.4 When dealing with EUC-JP (multibyte characters)

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>

void substring_sjis(const char *source, int start, int length, char *dest) {
    setlocale(LC_ALL, "Japanese"); // Set locale to handle Shift_JIS

    wchar_t wsource[256];
    mbstowcs(wsource, source, 256); // Convert multibyte string (Shift_JIS) to wide-character string

    wchar_t wresult[256];
    wcsncpy(wresult, wsource + start, length); // Extract substring
    wresult[length] = L'\0';

    wcstombs(dest, wresult, 256); // Convert wide-character string back to multibyte (Shift_JIS)
}

int main() {
    char text[] = "こんにちは、世界!"; // Shift_JIS string (depending on environment)
    char result[20];

    substring_sjis(text, 5, 3, result); // Extract "世界"
    printf("Substring: %s\n", result);

    return 0;
}

Key points

  • To correctly process Shift_JIS, set .
  • mbstowcswcstombs

Characteristics of EUC-JP

Substring retrieval supporting EUC-JP

EUC-JP, like Shift_JIS, has variable byte lengths per character, so conversion using wide characters is required .

5.5 Summary

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>

void substring_eucjp(const char *source, int start, int length, char *dest) {
    setlocale(LC_ALL, "ja_JP.eucJP"); // Set locale to handle EUC-JP

    wchar_t wsource[256];
    mbstowcs(wsource, source, 256); // Convert multibyte string (EUC-JP) to wide-character string

    wchar_t wresult[256];
    wcsncpy(wresult, wsource + start, length); // Extract substring
    wresult[length] = L'\0';

    wcstombs(dest, wresult, 256); // Convert wide-character string back to multibyte (EUC-JP)
}

int main() {
    char text[] = "こんにちは、世界!"; // EUC-JP string (depending on environment)
    char result[20];

    substring_eucjp(text, 5, 3, result); // Extract "世界"
    printf("Substring: %s\n", result);

    return 0;
}

Key points

  • setlocale(LC_ALL, "ja_JP.eucJP");
  • Using / , correctly process multibyte characters.

6. How to Split Strings in C Language

Character encodingbyte countRecommended processing method
ASCII1 bytestrncpy
UTF-81-4 bytesmbstowcswcstombs
Shift_JIS1 or 2 bytesmbstowcswcstombs
EUC-JP1 or 2 bytesUse /
  • If only ASCII text, strncpy is OK
  • In the case of UTF-8, Shift_JIS, EUC-JP, use mbstowcs / wcstombs
  • Set appropriately according to the environment setlocale(LC_ALL, \"...\"); set appropriately

6.1 strtok String Splitting Using strtok

String-splitting operations are needed in many situations, such as CSV data parsing, command-line argument processing, and log data analysis among others. In C language, you can use standard library functions like strtok and strtok_r, or create your own functions.

In this section, we will discuss in detail how to split strings using a specific delimiter.

Basic Syntax

strtok is a function that splits a string using the specified delimiter (delimiter).

Example: Split a string with a comma ,

char *strtok(char *str, const char *delim);
  • str
  • delim
  • Return value
  • Points to notestrtok'\0'

Output

#include <stdio.h>
#include <string.h>

int main() {
    char str[] = "apple,banana,orange,grape"; // String to be split
    char *token = strtok(str, ",");            // Get the first token

    while (token != NULL) {
        printf("Token: %s\n", token);
        token = strtok(NULL, ",");             // Get the next token
    }

    return 0;
}

strtok Notes

token: apple
token: banana
token: orange
token: grape

6.2 Thread-Safe String Splitting Using strtok_r

  1. Change the original string
  • strtokdelimiter with '\0'
  1. not thread-safe
  • strtokグローバルな静的変数を内部で使用する

Basic Syntax

strtok_r is the thread-safe version of strtok, which stores its state in saveptr, making it safe to use in multithreaded environments.

Example: Split a string with a space

char *strtok_r(char *str, const char *delim, char **saveptr);
  • str
  • delim
  • saveptr

Advantages of strtok_r

#include <stdio.h>
#include <string.h>

int main() {
    char str[] = "Hello World from C"; // String to be split
    char *token;
    char *saveptr; // Pointer to store internal state

    token = strtok_r(str, " ", &saveptr); // Get the first token
    while (token != NULL) {
        printf("Token: %s\n", token);
        token = strtok_r(NULL, " ", &saveptr); // Get the next token
    }

    return 0;
}

6.3 Splitting Strings with a Custom Function (Method that does not use strtok)

  • Thread-safe
  • Can process multiple strings concurrently

Specification of the Custom Function

strtok modifies the original string, so it is also possible to create a custom function that splits a string without altering it.

Implementation Code

  • Input
  • const char *source
  • const char delim
  • char tokens[][50]
  • Processing
  • source
  • delimtokens

Key Points

#include <stdio.h>
#include <string.h>

void split_string(const char *source, char delim, char tokens[][50], int *count) {
    int i = 0, j = 0, token_index = 0;

    while (source[i] != '\0') {
        if (source[i] == delim) {
            tokens[token_index][j] = '\0';
            token_index++;
            j = 0;
        } else {
            tokens[token_index][j] = source[i];
            j++;
        }
        i++;
    }
    tokens[token_index][j] = '\0';
    *count = token_index + 1;
}

int main() {
    char text[] = "dog,cat,bird,fish";
    char tokens[10][50]; // Can store up to 10 words
    int count;

    split_string(text, ',', tokens, &count);

    for (int i = 0; i < count; i++) {
        printf("Token: %s\n", tokens[i]);
    }

    return 0;
}
Token: dog
Token: cat
Token: bird
Token: fish

6.4 Applications of String Splitting (Processing CSV Data)

  • source
  • tokens

Example of CSV Data Parsing

CSV (comma-separated) data can be parsed using strtok.

Output

#include <stdio.h>
#include <string.h>

int main() {
    char csv[] = "Alice,24,Female\nBob,30,Male\nCharlie,28,Male"; // CSV data
    char *line = strtok(csv, "\n"); // Process line by line

    while (line != NULL) {
        char *name = strtok(line, ",");
        char *age = strtok(NULL, ",");
        char *gender = strtok(NULL, ",");

        printf("Name: %s, Age: %s, Gender: %s\n", name, age, gender);

        line = strtok(NULL, "\n");
    }

    return 0;
}

6.5 Summary

Name: Alice, Age: 24, Gender: Female
Name: Bob, Age: 30, Gender: Male
Name: Charlie, Age: 28, Gender: Male

Conclusion

MethodmeritDisadvantage
strtokCan be easily dividedChange the original string
strtok_rThread-safeThe usage is a bit complicated
Custom functionDo not change the original stringThe code becomes longer
CSV parsingConvenient for data processingstrtok Be careful of the limitations

7. Application Example: How to Extract the Characters Before and After a Specific Character

  • If it’s a simple split strtok
  • If you need multithreading strtok_r
  • If you don’t want to change the original, use a custom function
  • Also applicable to CSV data analysis

In the next section, we will discuss in detail the application example: “How to Extract Text Before and After a Specific Character“.

7.1 strchr Using strchr to Retrieve the String Before a Specific Character

In string processing, extracting the characters before and after specific characters or keywords operations are often required. For example, the following cases can be considered.

  • Get only the domain part from URL
  • Extract file name from file path
  • Retrieve the string before and after specific tags or symbols

In the C language, by using strchr and strstr, you can achieve such processing. Additionally, when more flexible handling is required, creating custom functions is also effective.

Basic Syntax

strchr Using strchr, you can identify the position of a specific character (the first one found).

Example: Retrieve File Name from File Path

char *strchr(const char *str, int c);
  • str
  • cchar

strchr returns the address if it finds c.

Result

#include <stdio.h>
#include <string.h>

void get_filename(const char *path, char *filename) {
    char *pos = strrchr(path, '/'); // Search for the last '/'

    if (pos != NULL) {
        strcpy(filename, pos + 1); // Copy from the character after '/'
    } else {
        strcpy(filename, path); // If no '/', copy the whole path
    }
}

int main() {
    char path[] = "/home/user/documents/report.txt";
    char filename[50];

    get_filename(path, filename);
    printf("Filename: %s\n", filename);

    return 0;
}

Key Points

Filename: report.txt

7.2 strstr Using strstr to Retrieve the String After a Specific Keyword

  • strrchr the last occurrence of a specific character (/) position can be obtained
  • pos + 1you can obtain only the file name

Basic Syntax

strstr Using strstr, you can search for a specific string (keyword) and retrieve the substring that follows its position.

Example: Retrieve Domain from URL

char *strstr(const char *haystack, const char *needle);
  • haystack
  • needle

strstr returns the address of the position if it finds needle.

Result

#include <stdio.h>
#include <string.h>

void get_domain(const char *url, char *domain) {
    char *pos = strstr(url, "://"); // Search for the position of "://"

    if (pos != NULL) {
        strcpy(domain, pos + 3); // Copy from the character after "://"
    } else {
        strcpy(domain, url); // If "://" is not found, copy the entire string
    }
}

int main() {
    char url[] = "https://www.example.com/page.html";
    char domain[50];

    get_domain(url, domain);
    printf("Domain part: %s\n", domain);

    return 0;
}

Key Points

Domain part: www.example.com/page.html

7.3 strchr Using strchr to Split the Sections Before and After a Specific Character

  • strstr"https://""http://""//"
  • pos + 3://

Example: Separate Username and Domain from an Email Address

strchr By leveraging strchr, you can split and retrieve the strings before and after a specific character .

Result

#include <stdio.h>
#include <string.h>

void split_email(const char *email, char *username, char *domain) {
    char *pos = strchr(email, '@'); // Search for the position of '@'

    if (pos != NULL) {
        strncpy(username, email, pos - email); // Copy the part before '@'
        username[pos - email] = '\0';          // Add null terminator
        strcpy(domain, pos + 1);               // Copy the part after '@'
    }
}

int main() {
    char email[] = "user@example.com";
    char username[50], domain[50];

    split_email(email, username, domain);
    printf("Username: %s\n", username);
    printf("Domain: %s\n", domain);

    return 0;
}

Key Points

Username: user
Domain: example.com

7.4 Advanced: Extract Specific Attributes Within HTML Tags

  • strchr'@'
  • strncpy'@' copies the part before ‘@’ and adds a null character
  • strcpy'@' after the part to copy

Example: Retrieve the URL from <a href="URL">

When extracting a specific attribute from an HTML tag, you can also leverage strstr.

Result

#include <stdio.h>
#include <string.h>

void get_href(const char *html, char *url) {
    char *start = strstr(html, "href=\""); // Search for the position of href="
    if (start != NULL) {
        start += 6; // Move past href="
        char *end = strchr(start, '"'); // Search for the next "
        if (end != NULL) {
            strncpy(url, start, end - start);
            url[end - start] = '\0'; // Add null terminator
        }
    }
}

int main() {
    char html[] = "<a href=\"https://example.com\">Click Here</a>";
    char url[100];

    get_href(html, url);
    printf("Extracted URL: %s\n", url);

    return 0;
}

Key Points

Extracted URL: https://example.com

7.5 Summary

  • strstr"href=\"
  • strchr"

Conclusion

Processing contentUse functionMerits
Get the part before a specific characterstrchr / strrchrSimple and fast
Get the part after a specific characterstrstrKeyword search is possible
Split before and after a specific characterstrchr + strncpyConvenient for splitting usernames and domains, etc.
Retrieving attributes of HTML tagsstrstr + strchrApplicable to Web Scraping

8. Summary

  • strchr or strstr when used, you can easily obtain the text before and after a specific character or keyword
  • File path handling, URL parsing, email address splitting, etc., are useful in many situations
  • Applicable even to advanced processes such as Web scraping

8.1 Article Review

In this article, methods for extracting substrings in C language about which we explained in detail from basics to advanced. Here, we review the key points of each section and organize the optimal methods by use case.

8.2 Optimal Methods by Use Case

SectionContentKey Points
Basics of C language stringsIn C language, strings are treated as arrays, and the terminating character is importantWhen handling strings,
Extraction with the standard librarystrncpystrchrstrncpyneeds to manually add a null terminator
Extraction using a custom functionCreate a flexible functionmallocvariable-length substring retrieval is possible
Processing per character encodingHow to support UTF-8, Shift_JIS, EUC-JPmbstowcswcstombs to convert to wide characters is safe
String Splitting Methodstrtokstrtok_rstrtok careful that it changes the original string
Extract characters before and after a specific characterstrchrstrstrObtaining file names, URL parsing, HTML parsing

1. Extracting Substrings

2. Splitting Strings

Use CasesOptimal method
I want to obtain a string of a fixed lengthstrncpy or substring()
I want to do a safe cutstrncpy_s
Handling multibyte characters (UTF-8, Shift_JIS, EUC-JP)mbstowcs / wcstombs

3. Getting Text Before and After a Specific Character

Usage scenarioOptimal method
I want to simply split a string.strtok
I want to do a thread-safe splitstrtok_r
I want to split without changing the original string.Custom function (split_string())

8.3 C Language String Handling Considerations

Usage scenariooptimal method
Get file name from file pathstrrchr(path, '/')
Get the domain part from the URLstrstr(url, "://")
Separate the username and domain from an email addressstrchr(email, '@')
Get attribute values from HTML tagsstrstr(tag, "href=\"") + strchr(tag, '"')

1. Rigorously Manage Null Terminator '
Example of Safe String Copy
'

2. Watch Out for Buffer Overflows

In C language string handling, the terminating character '\0' proper management is most important. Especially when using strncpy or strchr, be careful to add the null character manually.

3. Use mbstowcs for Multibyte Character Handling

#include <stdio.h>
#include <string.h>

int main() {
    char src[] = "Hello, World!";
    char dest[6];

    strncpy(dest, src, 5);
    dest[5] = '\0'; // Add null terminator for safety

    printf("Substring: %s\n", dest);

    return 0;
}

4. Managing Buffer Size

When manipulating strings in C, you need to implement carefully to avoid accessing outside array bounds. Especially when using strncpy, controlling the number of bytes copied is important.

Example of Safe String Copy

#include <stdio.h>
#include <string.h>

int main() {
    char src[] = "Hello, World!";
    char dest[6];

    strncpy(dest, src, sizeof(dest) - 1);
    dest[5] = '\0'; // Explicitly add null terminator

    printf("Substring: %s\n", dest);
    return 0;
}

8.4 Towards Further Learning

When dealing with multibyte characters such as UTF-8 or Shift_JIS, simple strncpy or strlen will not work correctly.

Therefore, when handling multibyte characters, it is recommended to first convert to a wide string using mbstowcs and then process appropriately.

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main() {
    setlocale(LC_ALL, ""); // Set the locale

    char text[] = "こんにちは、世界!"; // UTF-8
    wchar_t wtext[256];

    mbstowcs(wtext, text, 256); // Convert to wide-character string

    printf("Converted wide-character string: %ls\n", wtext);

    return 0;
}

Topics to Deepen Your Learning

In string handling, it is important to calculate the required memory size in advance and prevent buffer overflow. Especially when using malloc to allocate dynamic memory, be sure to know the exact size.

8.5 Summary

C language string handling is an important skill that improves program safety and readability. Based on the content introduced in this article, learning the following topics will enable more advanced string handling.

Topics to Deepen Your Learning

  1. Regular expression (regex)
  2. File Operations (String Processing Using fgets, fscanf)
  3. Memory Management (Dynamic String Processing Using malloc, realloc)
  4. Data Analysis(JSON, XML Parsing Methods)

8.5 Summary

  1. In C language, strings are managed as char arrays, so handling the terminating character '\0' is important
  2. To extract a substring, use strncpy, substring(), malloc
  3. For splitting strings, use strtok / strtok_r / custom functions
  4. If you want to get the characters before and after a specific character, use strchr, strstr use
  5. When handling multibyte characters (Japanese), mbstowcs is used
  6. Be mindful of secure string handling, and beware of buffer overflows

If you apply the content of this article, practical string handling in C language will become possible. After understanding the basic functions, challenge yourself with custom functions and advanced processing to write more efficient code!