C String Extraction: Standard, Custom & Multibyte Techniques

目次

1. Introduction

String manipulation in C is one of the important skills when learning programming. In particular, string extraction (substring extraction) is often used when processing data or performing format conversion. This article provides a detailed explanation of how to extract strings in C, covering using standard library functions, creating custom functions, handling multibyte characters (Japanese), and string splitting methods. It also presents examples and error handling, so please read through to the end.

What You’ll Learn in This Article

By reading this article, you will acquire the following skills.
  • Basic concepts of C strings and the role of the terminating character ' '
  • strncpy and strchr etc. Extracting substrings using standard library functions
  • Custom functions implementation methods for string manipulation
  • Processing UTF-8 and Shift_JIS strings while considering multibyte characters (Japanese)
  • strtok Methods for splitting strings
  • How to obtain the text before and after a specific character and its use cases
We will explain with code examples so that even beginners can understand easily.

Why Is String Extraction Important in C?

C treats strings as arrays (arrays of char), so unlike higher-level languages (such as Python or JavaScript), you cannot easily obtain substrings. Therefore, selecting the appropriate method is important in situations like the following.

1. Processing Input Data

For example, when analyzing log data or CSV files, you need to extract specific fields.

2. Searching for Specific Keywords

Finding a specific keyword within a string and retrieving the surrounding information is essential for search functionality and data extraction.

3. Improving Program Safety

By using functions like strncpy correctly, you can prevent buffer overflows (writing data beyond the buffer size). This is important for avoiding security risks.

Article Structure

This article will be explained in the following order.
  1. What Are C Strings? Basic Concepts and the Importance of the Terminator
  2. How to Extract Substrings in C – Standard Library Edition
  3. How to Extract Substrings in C – Custom Function Edition
  4. String Extraction Methods for Different Character Encodings
  5. How to Split Strings in C
  6. Example: Extracting Text Before and After a Specific Character
  7. Conclusion
  8. FAQ
Now, let’s take a detailed look at “What Are C Strings? Basic Concepts and the Importance of the Terminator.”

2. What are C strings? Basic concepts and the importance of the null terminator

2.1 Basic concepts of C strings

Strings are “char arrays”

In C, strings are treated as arrays of characters (char type arrays). For example, the code below is a basic example of defining and displaying a string.
#include <stdio.h>

int main() {
    char str[] = "Hello, World!"; // Define string literal as an array
    printf("%s
", str); // Output the string
    return 0;
}
In this code, the string "Hello, World!" is stored as a char type array and is output by printf("%s", str);.

Internal structure of strings

The string "Hello" is stored in memory as follows.
Index012345
CharacterHello
In C, a special character that indicates the end of a string (null character '\0') is automatically added at the end, so the length of a string is “actual character count + 1”.

2.2 Importance of the terminating character (null character '

What is a null character?

'
)

Problems when the null character is missing

A null character ('\0') is a special character that indicates the end of a string. To handle C strings correctly, you need to understand the existence of this null character.
#include <stdio.h>

int main() {
    char str[6] = {'H', 'e', 'l', 'l', 'o', '\0'}; // Explicitly specify the terminating character
    printf("%s
", str); // Displays correctly
    return 0;
}
In the code above, if '\0' is missing, the end of "Hello" is not recognized, and unintended behavior may occur.

2.3 Correct ways to define strings

As shown below, forgetting the terminating character can cause abnormal memory behavior.
#include <stdio.h>

int main() {
    char str[5] = {'H', 'e', 'l', 'l', 'o'}; not include null character
    printf("%s
", str); // May cause unexpected behavior
    return 0;
}
Cause of the error
  • printf("%s", str); continues outputting characters until it finds the null character '\0'.
  • If '\0' is not present, data from other memory locations may be output.

Method 1: Use string literals

Method 2: Define the array explicitly

The most common way to define a string is to use a string literal.
char str[] = "Hello";
With this method, the C compiler automatically adds the null character '\0', so no special handling is needed.

2.4 How to check the size of a string

If you manually include '\0' in the definition, write it as follows.
char str[6] = {'H', 'e', 'l', 'l', 'o', '\0'};
  • It is important to specify a size of character count +1 and place the null character '\0' at the end.
  • If you forget to put '\0' in str[5], unexpected behavior may occur.

Behavior of strlen

To obtain the length (number of characters) of a string, use the strlen function.
#include <stdio.h>
#include <string.h>

int main() {
    char str[] = "Hello";
    printf("String length: %lu
", strlen(str)); // outputs 5 (excluding the null character)
    return 0;
}

2.5 Summary

  • strlen counts the number of characters until the null character '\0' appears.
  • Unlike sizeof(str), it obtains the “actual string length” rather than the array size.
年収訴求

3. Extracting Substrings in C Using the Standard Library

  1. C strings are represented as char arrays.
  2. The terminating character (null character '\0') indicates the end of the string, so it must always be included.
  3. Use strlen to obtain the length of a string.
  4. If you do not define strings properly, unexpected errors may occur.

3.1 Obtaining Substrings Using strncpy

In C, you can extract substrings by leveraging the standard library. This section explains strncpy and strchr and other methods of using standard library functions to retrieve a portion of a string.

Basic syntax of strncpy

strncpy is a function that copies a portion of a string into another buffer.

Basic usage example

char *strncpy(char *dest, const char *src, size_t n);
  • dest: Destination buffer
  • src: Source string
  • n: Maximum number of characters to copy (excluding ' ')

Cautions for strncpy

#include <stdio.h>
#include <string.h>

int main() {
    char src[] = "Hello, World!";
    char dest[6];  // buffer to store the substring

    strncpy(dest, src, 5); // copy the first 5 characters "Hello"
    dest[5] = ' ';  // manually add the null terminator

    printf("Substring: %s\n", dest);  // outputs "Hello"

    return 0;
}

3.2 Safe string copy using strncpy_s

  1. Need to manually add the null character ' ' strncpy copies up to n characters but does not automatically add ' ', so you must explicitly add dest[n] = ' ';.
  2. Be careful of buffer overflow If n is larger than the size of dest, you may write beyond the buffer.

Basic syntax of strncpy_s

strncpy_s is a version of strncpy with enhanced safety that can prevent buffer overflows.

Example

errno_t strncpy_s(char *dest, rsize_t destsz, const char *src, rsize_t n);
  • dest: Destination buffer
  • destsz: Size of dest
  • src: Source string
  • n: Maximum number of characters to copy

Benefits of strncpy_s

#include <stdio.h>
#include <string.h>

int main() {
    char src[] = "Hello, World!";
    char dest[6];

    if (strncpy_s(dest, sizeof(dest), src, 5) == 0) {
        dest[5] = ' ';  // add null character just in case
        printf("Substring: %s\n", dest);
    } else {
        printf("Copy error\n");
    }

    return 0;
}

3.3 Extracting up to a specific character using strchr

  • Specifying the buffer size (destsz) allows safe copying.
  • If n exceeds destsz, an error is returned.
However, note that strncpy_s was added in the C11 standard and may not be available in some environments.

Basic syntax of strchr

Using strchr, you can find the position of a specific character and obtain the substring up to that point.

Example

char *strchr(const char *str, int c);
  • str: String to search
  • c: Character to find (of type char)

Key points

#include <stdio.h>
#include <string.h>

int main() {
    char str[] = "Hello, World!";
    char *pos = strchr(str, ','); // find the position of ','

    if (pos != NULL) {
        int length = pos - str; // calculate number of characters up to ','
        char result[20];

        strncpy(result, str, length);
        result[length] = ' '; // add null character

        printf("Substring: %s\n", result);  // outputs "Hello"
    }

    return 0;
}

3.4 Keyword search and extraction using strstr

  • strchr returns the address of the first occurrence of c, so it can be used to determine the range of a substring.
  • By calculating pos - str for the substring length and copying with strncpy, you can obtain the substring up to the specific character.

Basic syntax of strstr

strstr is useful for searching for a substring and retrieving the string that follows it.

Example

char *strstr(const char *haystack, const char *needle);
  • haystack: String to search
  • needle: Substring to find

Key points

#include <stdio.h>
#include <string.h>

int main() {
    char str[] = "Hello, World!";
    char *pos = strstr(str, "World"); // search for "World"

    if (pos != NULL) {
        printf("Found substring: %s\n", pos);
    } else {
        printf("Substring not found.\n");
    }

    return 0;
}

3.5 Summary

  • strstr returns a pointer to the first occurrence of needle.
  • If NULL is returned, needle does not exist in haystack.

4. How to Extract Substrings in C (Custom Function Edition)

  1. Using strncpy lets you copy a substring safely, but you need to manually add the null character.
  2. strncpy_s lets you specify destsz, improving safety.
  3. With strchr, you can obtain a substring up to a specific character.
  4. With strstr, you can locate a specific keyword and extract the string that follows.
By leveraging the standard library, you can implement string handling in C in a simple and safe way.

4.1 Benefits of Creating Custom Functions

While the standard library can handle basic substring extraction, sometimes a more flexible approach is needed. Therefore, this section explains extracting substrings using custom functions.

4.2 Basic Substring Extraction Function

Using the standard library allows copying and searching substrings, but there are issues such as:
  • strncpy does not automatically add a null character '\0'
  • strchr and strstr can only perform partial searches
  • More flexible string manipulation is difficult
Therefore, creating custom functions that can be tailored to specific needs is effective.

Function Specification

First, we create a basic function that extracts a string from a specified position.

Implementation Code

  • Parameters
  • const char *source (original string)
  • int start (starting index)
  • int length (number of characters to extract)
  • char *dest (buffer to store the extracted string)
  • Operation
  • Copy the substring of length length starting from start into dest
  • Automatically add a '\0' character

Key Points

#include <stdio.h>
#include <string.h>

void substring(const char *source, int start, int length, char *dest) {
    int i;
    for (i = 0; i < length && source[start + i] != ' '; i++) {
        dest[i] = source[start + i];
    }
    dest[i] = ' '; // Add null character
}

int main() {
    char text[] = "Hello, World!";
    char result[10];

    substring(text, 7, 5, result); // Extract "World"
    printf("Substring: %s\n", result);

    return 0;
}

4.3 Dynamically Obtaining Substrings Using malloc

  • Copy the specified length characters using a for loop.
  • If a '\0' is reached, the loop terminates.
  • Add dest[i] = '\0'; to ensure a null terminator at the end.

Function Specification

The above function requires the size of dest to be allocated in advance. However, if the required size can be allocated dynamically, the function becomes more versatile.

Implementation Code

  • Allocate the necessary memory with malloc
  • Copy the substring of length characters starting from start into a new buffer
  • The caller must free the memory

Key Points

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *substring_dynamic(const char *source, int start, int length) {
    char *dest = (char *)malloc(length + 1); // +1 for the null terminator
    if (dest == NULL) {
        return NULL; // Memory allocation failed
    }

    int i;
    for (i = 0; i < length && source[start + i] != ' '; i++) {
        dest[i] = source[start + i];
    }
    dest[i] = ' ';

    return dest;
}

int main() {
    char text[] = "Hello, World!";
    char *result = substring_dynamic(text, 7, 5);

    if (result != NULL) {
        printf("Substring: %s\n", result);
        free(result); // Free memory
    } else {
        printf("Failed to allocate memory.\n");
    }

    return 0;
}

4.4 Support for Multibyte Characters (Japanese)

  • Using malloc to dynamically allocate memory eliminates concerns about buffer size.
  • After use, the memory must be released with free(result);.

Implementation Considering Multibyte Characters

When handling Japanese (UTF‑8 or other multibyte characters), a character is not necessarily one byte, so a simple substring function will not work correctly.

Implementation Code (UTF‑8 Support)

  • Use mbstowcs to convert a multibyte string to a wide‑character string (wchar_t)
  • Use wcsncpy to obtain the substring
  • Convert back to a multibyte string with wcstombs

Key Points

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <wchar.h>
#include <locale.h>

void substring_utf8(const char *source, int start, int length, char *dest) {
    setlocale(LC_ALL, ""); // Set locale

    wchar_t wsource[256];
    mbstowcs(wsource, source, 256); // Convert UTF‑8 string to wide‑character string

    wchar_t wresult[256];
    wcsncpy(wresult, wsource + start, length); // Extract substring in wide characters
    wresult[length] = L' ';

    wcstombs(dest, wresult, 256); // Convert back to multibyte string
}

int main() {
    char text[] = "Hello, World!"; // UTF‑8 string
    char result[20];

    substring_utf8(text, 5, 3, result); // Get "world"
    printf("Substring: %s\n", result);

    return 0;
}

4.5 Summary

  • Set the locale with setlocale(LC_ALL, ""); to enable multibyte support.
  • Convert multibyte strings to wide strings using mbstowcs.
  • After obtaining the substring with wcsncpy, convert back to multibyte with wcstombs.

5. Extracting Substrings by Character Encoding

  1. By implementing your own substring, you can flexibly obtain substrings.
  2. Using dynamic memory allocation (malloc) allows you to obtain variable‑size substrings.
  3. When handling multibyte characters (Japanese), leverage mbstowcs / wcstombs.
When the standard library functions like strncpy or strchr are insufficient, creating custom functions can make C string handling more powerful.

5.1 ASCII (1-byte characters)

In C, if you don’t pay attention to differences in character encoding, substring extraction may not work correctly. In particular, when handling multibyte characters such as Japanese (UTF-8, Shift_JIS, EUC-JP, etc.), because one character ≠ one byte, simple strncpy or substring functions cannot process it properly. This section provides a detailed explanation of methods for extracting substrings by character encoding.

Basic Substring Retrieval

Implementation Example
ASCII characters are 1 character = 1 byte, so they can be easily processed with strncpy or substring functions.

5.2 UTF-8 (multibyte characters)

#include <stdio.h>
#include <string.h>

void substring_ascii(const char *source, int start, int length, char *dest) {
    strncpy(dest, source + start, length);
    dest[length] = ' '; // Add null terminator
}

int main() {
    char text[] = "Hello, World!";
    char result[6];

    substring_ascii(text, 7, 5, result); // Get "World"
    printf("Substring: %s\n", result);

    return 0;
}
Key Points
  • For ASCII characters (letters and numbers only), strncpy is sufficient
  • Always add ' ' (null character)

Characteristics of UTF-8

Correct Processing Method

In UTF-8, the byte count per character varies from 1 to 4 bytes, so using strncpy directly can result in cutting a character in the middle.
UTF-8-Compatible Substring Retrieval
In C, to safely handle UTF-8, the recommended approach is to convert it to a wide string (wchar_t) using mbstowcs and then obtain substrings.

5.3 Shift_JIS (multibyte characters)

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>

void substring_utf8(const char *source, int start, int length, char *dest) {
    setlocale(LC_ALL, ""); // Set locale

    wchar_t wsource[256];
    mbstowcs(wsource, source, 256); // Convert multibyte string to wide string

    wchar_t wresult[256];
    wcsncpy(wresult, wsource + start, length); // Get substring
    wresult[length] = L' ';

    wcstombs(dest, wresult, 256); // Convert wide string back to multibyte
}

int main() {
    char text[] = "Hello, World!"; // UTF-8 string
    char result[20];

    substring_utf8(text, 5, 3, result); // Get "World"
    printf("Substring: %s\n", result);

    return 0;
}
Key Points
  • If you don’t set the locale with setlocale(LC_ALL, "");, multibyte characters won’t be processed correctly.
  • Convert the multibyte string to wchar_t with mbstowcs, then safely process it with wcsncpy.
  • Convert back to a multibyte string with wcstombs.

Characteristics of Shift_JIS

Shift_JIS-Compatible Substring Retrieval

In Shift_JIS, a character can be 1 byte or 2 bytes, so using simple strncpy can cause garbled text.
Implementation for Shift_JIS
For Shift_JIS as well, the recommended method is to convert to a wide string and process it.

5.4 EUC-JP (multibyte characters)

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>

void substring_sjis(const char *source, int start, int length, char *dest) {
    setlocale(LC_ALL, "Japanese"); // Set locale for handling Shift_JIS

    wchar_t wsource[256];
    mbstowcs(wsource, source, 256);

    wchar_t wresult[256];
    wcsncpy(wresult, wsource + start, length);
    wresult[length] = L' ';

    wcstombs(dest, wresult, 256);
}

int main() {
    char text[] = "Hello, World!"; // Shift_JIS string (environment dependent)
    char result[20];

    substring_sjis(text, 5, 3, result);
    printf("Substring: %s\n", result);

    return 0;
}
Key Points
  • To handle Shift_JIS correctly, set setlocale(LC_ALL, "Japanese");.
  • Use mbstowcs and wcstombs to safely process strings.

Characteristics of EUC-JP

EUC-JP-Compatible Substring Retrieval

EUC-JP, like Shift_JIS, has varying byte counts per character, so conversion using wide characters is required.

5.5 Summary

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>

void substring_eucjp(const char *source, int start, int length, char *dest) {
    setlocale(LC_ALL, "ja_JP.eucJP"); // Set locale for handling EUC-JP

    wchar_t wsource[256];
    mbstowcs(wsource, source, 256);

    wchar_t wresult[256];
    wcsncpy(wresult, wsource + start, length);
    wresult[length] = L' ';

    wcstombs(dest, wresult, 256);
}

int main() {
    char text[] = "Hello, World!"; // EUC-JP string (environment dependent)
    char result[20];

    substring_eucjp(text, 5, 3, result);
    printf("Substring: %s\n", result);

    return 0;
}
Key Points
  • Set the EUC-JP locale with setlocale(LC_ALL, "ja_JP.eucJP");.
  • Use mbstowcs / wcstombs to correctly handle multibyte characters.

6. How to Split Strings in C

Character EncodingByte CountRecommended Processing Method
ASCII1 bytestrncpy OK
UTF-81–4 bytesUse mbstowcs / wcstombs
Shift_JIS1 or 2 bytesUse mbstowcs / wcstombs
EUC-JP1 or 2 bytesUse mbstowcs / wcstombs
  • If only ASCII characters, strncpy is fine
  • For UTF-8, Shift_JIS, EUC-JP, use mbstowcs / wcstombs
  • Set setlocale(LC_ALL, "..."); appropriately for the environment

6.1 String splitting using strtok

String splitting is needed in many situations such as CSV data parsing, command-line argument processing, and log data analysis. In C, you can use standard library functions like strtok and strtok_r, or create your own functions. In this section, we will discuss in detail how to split strings using a specific delimiter.

Basic syntax

strtok is a function that splits a string using the specified delimiter.

Example: Splitting a string with comma ,

char *strtok(char *str, const char *delim);
  • str: the string to be split (specified on the first call)
  • delim: delimiter characters (multiple can be specified)
  • Return value: the first token (the first part after splitting)
  • Note: strtok modifies the original string (replaces delimiter characters with ' ')

Output

#include <stdio.h>
#include <string.h>

int main() {
    char str[] = "apple,banana,orange,grape"; // string to split
    char *token = strtok(str, ","); // get the first token

    while (token != NULL) {
        printf("Token: %s\n", token);
        token = strtok(NULL, ","); // get the next token
    }

    return 0;
}

strtok considerations

Token: apple
Token: banana
Token: orange
Token: grape

6.2 Thread-safe string splitting using strtok_r

  1. Modifies the original string
  • strtok rewrites delimiter characters as ' ', causing the original string to be lost.
  1. Not thread-safe
  • strtok uses a global static variable internally, so it is advisable not to use it in multithreaded environments.

Basic syntax

strtok_r is a thread-safe version of strtok, and stores its state in saveptr, making it safe to use in multithreaded environments.

Example: Splitting a string by spaces

char *strtok_r(char *str, const char *delim, char **saveptr);
  • str: the string to be split (specified on the first call)
  • delim: delimiter characters (multiple can be specified)
  • saveptr: pointer that holds internal state (updated on each call)

strtok_r advantages

#include <stdio.h>
#include <string.h>

int main() {
    char str[] = "Hello World from C"; // string to split
    char *token;
    char *saveptr; // pointer to store internal state

    token = strtok_r(str, " ", &saveptr); // get the first token
    while (token != NULL) {
        printf("Token: %s\n", token);
        token = strtok_r(NULL, " ", &saveptr); // get the next token
    }

    return 0;
}

6.3 Splitting strings with a custom function (method without using strtok)

  • Thread-safe
  • Can process multiple strings concurrently

Specification of the custom function

strtok modifies the original string, so it is also possible to create a custom function that splits strings without altering them.

Implementation code

  • Input
  • const char *source (original string)
  • const char delim (delimiter character)
  • char tokens[][50] (array to store split strings)
  • Processing
  • Copy source to avoid modifying the original
  • Store split results in tokens based on delim

Output

#include <stdio.h>
#include <string.h>

void split_string(const char *source, char delim, char tokens[][50], int *count) {
    int i = 0, j = 0, token_index = 0;

    while (source[i] != ' ') {
        if (source[i] == delim) {
            tokens[token_index][j] = ' ';
            token_index++;
            j = 0;
        } else {
            tokens[token_index][j] = source[i];
            j++;
        }
        i++;
    }
    tokens[token_index][j] = ' ';
    *count = token_index + 1;
}

int main() {
    char text[] = "dog,cat,bird,fish";
    char tokens[10][50]; // can store up to 10 words
    int count;

    split_string(text, ',', tokens, &count);

    for (int i = 0; i < count; i++) {
        printf("Token: %s\n", tokens[i]);
    }

    return 0;
}

Key points

Token: dog
Token: cat
Token: bird
Token: fish

6.4 Applications of string splitting (CSV data processing)

  • Create a copy of source and process without modifying it.
  • Store split results in the tokens array while preserving the original string.

Example of CSV data parsing

CSV (comma-separated) data can be parsed using strtok.

Output

#include <stdio.h>
#include <string.h>

int main() {
    char csv[] = "Alice,24,Female\nBob,30,Male\nCharlie,28,Male"; // CSV data
    char *line = strtok(csv, "\n"); // process line by line

    while (line != NULL) {
        char *name = strtok(line, ",");
        char *age = strtok(NULL, ",");
        char *gender = strtok(NULL, ",");

        printf("Name: %s, Age: %s, Gender: %s\n", name, age, gender);

        line = strtok(NULL, "\n");
    }

    return 0;
}

6.5 Summary

Name: Alice, Age: 24, Gender: Female
Name: Bob, Age: 30, Gender: Male
Name: Charlie, Age: 28, Gender: Male

Conclusion

MethodAdvantagesDisadvantages
strtokEasy to splitModifies the original string
strtok_rThread-safeUsage is a bit more complex
Custom functionDoes not modify the original stringCode becomes longer
CSV parsingConvenient for data processingBe aware of strtok‘s limitations

7. Example: How to Extract Text Before and After a Specific Character

  • If you need simple splitting, use strtok
  • For multithreading, use strtok_r
  • If you don’t want to modify the original, use a custom function
  • Can also be applied to CSV data parsing
In the next section, we will discuss in detail an application example: extracting text before and after a specific character.

7.1 Using strchr to Get the Text Before a Specific Character

When processing strings, the operation of extracting the text before and after a specific character or keyword is often required. For example, the following cases can be considered.
  • Retrieve only the domain part from a URL
  • Extract the filename from a file path
  • Get the text before and after a specific tag or symbol
In C, you can achieve this using strchr and strstr. Also, when more flexible processing is needed, creating custom functions is effective.

Basic Syntax

Using strchr, you can locate the position of a specific character (the first occurrence).

Example: Retrieve Filename from a File Path

char *strchr(const char *str, int c);
  • strc descriptions remain unchanged.
  • str: The string to be searched
  • c: The character to find (of type char)
strchr returns the address when it finds c.</p

Output

#include <stdio.h>
#include <string.h>

void get_filename(const char *path, char *filename) {
    char *pos = strrchr(path, '/'); // Search for the last '/'

    if (pos != NULL) {
        strcpy(filename, pos + 1); // Copy from the position after '/'
    } else {
        strcpy(filename, path); // If '/' is not present, copy as is
    }
}

int main() {
    char path[] = "/home/user/documents/report.txt";
    char filename[50];

    get_filename(path, filename);
    printf("Filename: %s\n", filename);

    return 0;
}

Key Points

Filename: report.txt

7.2 Using strstr to Get the Text After a Specific Keyword

  • Using strrchr allows you to obtain the position of the last occurrence of a specific character (/).
  • By copying from pos + 1, you can retrieve only the filename.

Basic Syntax

Using strstr, you can search for a specific string (keyword) and retrieve the text that follows it.

Example: Retrieve Domain from a URL>

char *strstr(const char *haystack, const char *needle);
  • haystack: The string to be searched
  • needle: The substring to search for
strstr returns the address of the position when it finds needle.

Output

#include <stdio.h>
#include <string.h>

void get_domain(const char *url, char *domain) {
    char *pos = strstr(url, "://"); // Search for the position of "://"

    if (pos != NULL) {
        strcpy(domain, pos + 3); // Copy from after "://"
    } else {
        strcpy(domain, url); // If "://" is not present, copy as is
    }
}

int main() {
    char url[] = "https://www.example.com/page.html";
    char domain[50];

    get_domain(url, domain);
    printf("Domain part: %s\n", domain);

    return 0;
}

Key Points

Domain part: www.example.com/page.html

7.3 Using strchr to Split Text Before and After a Specific Character

  • Use strstr to obtain the portion after "//" in "https://" or "http://".
  • Copy from pos + 3, which is after ://.

Example: Separate Username and Domain from an Email Address

By using strchr, you can split and obtain the text before and after a specific character.

Output

#include <stdio.h>
#include <string.h>

void split_email(const char *email, char *username, char *domain) {
    char *pos = strchr(email, '@'); // Search for the position of '@'

    if (pos != NULL) {
        strncpy(username, email, pos - email); // Copy the part before '@'
        username[pos - email] = '\0'; // Add null terminator
        strcpy(domain, pos + 1); // Copy the part after '@'
    }
}

int main() {
    char email[] = "user@example.com";
    char username[50], domain[50];

    split_email(email, username, domain);
    printf("Username: %s\n", username);
    printf("Domain: %s\n", domain);

    

Key Points

Username: user
Domain: example.com

7.4 Advanced: Extract Specific Attributes from HTML Tags

  • Use strchr to locate the position of '@'.
  • Copy the part before '@' with strncpy and add a null terminator.
  • Copy the part after '@' using strcpy.

Example: Retrieve URL from <a href="URL">

You can also use strstr to retrieve specific attributes from HTML tags.

8. Summary

#include <stdio.h>
#include <string.h>

void get_href(const char *html, char *url) {
    char *start = strstr(html, "href=\""); // Search for the position of href="
    if (start != NULL) {
        start +=  

8.1 Article Review

In this article, we explained methods for extracting substrings in C in detail, from basics to advanced topics. Here, we review the key points of each section and organize the optimal methods by use case.

8.2 Optimal Methods by Use Case

SectionContentKey Points
Basics of C stringsIn C, strings are treated as char arrays, and the terminating character ' ' is importantWhen handling strings, don’t forget the null terminator
Extraction using the standard libraryUtilize strncpy, strchr, etc.strncpy requires manually adding the null terminator
Extraction with custom functionsCreate a flexible substring functionUsing malloc enables retrieving variable-length substrings
Processing per character encodingHow to handle UTF-8, Shift_JIS, EUC-JPUsing mbstowcs / wcstombs to convert to wide characters safely
String splitting methodsstrtok, strtok_r, splitting with custom functionsBe careful: strtok modifies the original string
Extracting around specific charactersData retrieval using strchr, strstrCan be applied to file name extraction, URL parsing, HTML parsing

1. Substring Extraction

2. String Splitting

Use CaseOptimal Method
Need to obtain a string of a fixed lengthstrncpy or substring()
Want a safe extractionstrncpy_s (C11 and later)
Handling multibyte characters (UTF-8, Shift_JIS, EUC-JP)mbstowcs / wcstombs

3. Retrieve Around Specific Characters

Use CaseOptimal Method
Want to split a string simplystrtok
Need thread-safe splittingstrtok_r
Want to split without modifying the original stringCustom function (split_string())

8.3 C String Handling Considerations

Use CaseOptimal Method
Get file name from file pathstrrchr(path, '/')
Get domain part from URLstrstr(url, "://")
Separate username and domain from email addressstrchr(email, '@')
Get attribute value from HTML tagstrstr(tag, "href="") + strchr(tag, '"')

1. Rigorously Manage Null Terminator ' '

Example of Safe String Copy
In C string handling, properly managing the terminating character ' ' is most important. Especially when using strncpy or strchr, be careful to add the null character manually.

2. Beware of Buffer Overflows

#include <stdio.h>
#include <string.h>

int main() {
    char src[] = "Hello, World!";
    char dest[6];

    strncpy(dest, src, 5);
    dest[5] = ' '; // Add null terminator for safety

    printf("Substring: %s\n", dest);

    return 0;
}

3. Use mbstowcs for Multibyte Character Handling

When manipulating strings in C, you must implement carefully to avoid accessing outside array bounds. In particular, when using strncpy, it is important to control the number of bytes copied. Example of Safe String Copy
#include <stdio.h>
#include <string.h>

int main() {
    char src[] = "Hello, World!";
    char dest[6];

    strncpy(dest, src, sizeof(dest) - 1);
    dest[5] = ' '; // Explicitly add null character

    printf("Substring: %s\n", dest);
    return 0;
}

4. Managing Buffer Sizes

When handling multibyte characters such as UTF-8 or Shift_JIS, simple strncpy or strlen do not work correctly. Therefore, when dealing with multibyte characters, it is recommended to first convert to a wide string with mbstowcs and then process appropriately.
#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main() {
    setlocale(LC_ALL, ""); // Set locale

    char text[] = "こんにちは、世界!"; // UTF-8
    wchar_t wtext[256];

    mbstowcs(wtext, text, 256); // Convert to wide string

    printf(" wide string: %ls\n", wtext);

    return 0;
}

8.4 For Further Learning

In string handling, it is important to calculate the required memory size in advance and prevent buffer overflows. Especially when allocating dynamic memory with malloc, be sure to know the exact size needed.

Topics to Deepen Your Learning

C string handling is an essential skill for improving program safety and readability. Building on the material presented in this article, studying the following topics will enable more advanced string processing.

8.5 Summary

  1. Regular expressions (regex) (available via external C libraries)
  2. File I/O (string handling with fgets, fscanf)
  3. Memory management (dynamic string handling with malloc, realloc)
  4. Data parsing (JSON, XML parsing techniques)

8. Summary

  1. C strings are managed as char arrays, so handling the terminator ' ' is crucial
  2. Use strncpy, substring(), malloc for substring extraction
  3. Leverage strtok / strtok_r / custom functions for string splitting
  4. When retrieving around specific characters, use strchr, strstr
  5. When handling multibyte characters (e.g., Japanese), use mbstowcs
  6. Practice safe string handling and watch out for buffer overflows
By applying the content of this article, you can achieve practical string handling in C. After mastering the basic functions, challenge yourself with custom functions and advanced techniques to write more efficient code!

8.1 Article Review

In this article, we explained methods for extracting substrings in C in detail, from basics to advanced topics. Here, we review the key points of each section and organize the optimal methods by use case.

8.2 Optimal Methods by Use Case

SectionContentKey Points
Basics of C stringsIn C, strings are treated as char arrays, and the terminating character ' ' is importantWhen handling strings, don’t forget the null terminator
Extraction using the standard libraryUtilize strncpy, strchr, etc.strncpy requires manually adding the null terminator
Extraction with custom functionsCreate a flexible substring functionUsing malloc enables retrieving variable-length substrings
Processing per character encodingHow to handle UTF-8, Shift_JIS, EUC-JPUsing mbstowcs / wcstombs to convert to wide characters safely
String splitting methodsstrtok, strtok_r, splitting with custom functionsBe careful: strtok modifies the original string
Extracting around specific charactersData retrieval using strchr, strstrCan be applied to file name extraction, URL parsing, HTML parsing

1. Substring Extraction

2. String Splitting

Use CaseOptimal Method
Need to obtain a string of a fixed lengthstrncpy or substring()
Want a safe extractionstrncpy_s (C11 and later)
Handling multibyte characters (UTF-8, Shift_JIS, EUC-JP)mbstowcs / wcstombs

3. Retrieve Around Specific Characters

Use CaseOptimal Method
Want to split a string simplystrtok
Need thread-safe splittingstrtok_r
Want to split without modifying the original stringCustom function (split_string())

8.3 C String Handling Considerations

Use CaseOptimal Method
Get file name from file pathstrrchr(path, '/')
Get domain part from URLstrstr(url, "://")
Separate username and domain from email addressstrchr(email, '@')
Get attribute value from HTML tagstrstr(tag, "href="") + strchr(tag, '"')

1. Rigorously Manage Null Terminator ' '

Example of Safe String Copy
In C string handling, properly managing the terminating character ' ' is most important. Especially when using strncpy or strchr, be careful to add the null character manually.

2. Beware of Buffer Overflows

#include <stdio.h>
#include <string.h>

int main() {
    char src[] = "Hello, World!";
    char dest[6];

    strncpy(dest, src, 5);
    dest[5] = ' '; // Add null terminator for safety

    printf("Substring: %s\n", dest);

    return 0;
}

3. Use mbstowcs for Multibyte Character Handling

When manipulating strings in C, you must implement carefully to avoid accessing outside array bounds. In particular, when using strncpy, it is important to control the number of bytes copied. Example of Safe String Copy
#include <stdio.h>
#include <string.h>

int main() {
    char src[] = "Hello, World!";
    char dest[6];

    strncpy(dest, src, sizeof(dest) - 1);
    dest[5] = ' '; // Explicitly add null character

    printf("Substring: %s\n", dest);
    return 0;
}

4. Managing Buffer Sizes

When handling multibyte characters such as UTF-8 or Shift_JIS, simple strncpy or strlen do not work correctly. Therefore, when dealing with multibyte characters, it is recommended to first convert to a wide string with mbstowcs and then process appropriately.
#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main() {
    setlocale(LC_ALL, ""); // Set locale

    char text[] = "Hello, World!"; // UTF-8
    wchar_t wtext[256];

    mbstowcs(wtext, text, 256); // Convert to wide string

    printf(" wide string: %ls\n", wtext);

    return 0;
}

8.4 For Further Learning

In string handling, it is important to calculate the required memory size in advance and prevent buffer overflows. Especially when allocating dynamic memory with malloc, be sure to know the exact size needed.

Topics to Deepen Your Learning

C string handling is an essential skill for improving program safety and readability. Building on the material presented in this article, studying the following topics will enable more advanced string processing.

8.5 Summary

  1. Regular expressions (regex) (available via external C libraries)
  2. File I/O (string handling with fgets, fscanf)
  3. Memory management (dynamic string handling with malloc, realloc)
  4. Data parsing (JSON, XML parsing techniques)

8.5 Summary

  1. C strings are managed as char arrays, so handling the terminator ' ' is crucial
  2. Use strncpy, substring(), malloc for substring extraction
  3. Leverage strtok / strtok_r / custom functions for string splitting
  4. When retrieving around specific characters, use strchr, strstr
  5. When handling multibyte characters (e.g., Japanese), use mbstowcs
  6. Practice safe string handling and watch out for buffer overflows
By applying the content of this article, you can achieve practical string handling in C. After mastering the basic functions, challenge yourself with custom functions and advanced techniques to write more efficient code!