- 1 1. Introduction
- 2 2. What are strings in C? Basic concepts and the importance of the terminating character
- 3 3. How to Extract Substrings in C Language [Standard Library Edition]
- 4 4. How to Extract Substrings in C Language 【Custom Function Edition】
- 5 5. Methods for extracting substrings by character encoding
- 6 6. How to Split Strings in C Language
- 7 7. Application Example: How to Extract the Characters Before and After a Specific Character
- 7.1 7.1 strchr Using strchr to Retrieve the String Before a Specific Character
- 7.2 7.2 strstr Using strstr to Retrieve the String After a Specific Keyword
- 7.3 7.3 strchr Using strchr to Split the Sections Before and After a Specific Character
- 7.4 7.4 Advanced: Extract Specific Attributes Within HTML Tags
- 7.5 7.5 Summary
- 7.6 Conclusion
- 8 8. Summary
1. Introduction
String manipulation in C language is one of the important skills for learning programming. In particular, string slicing (extracting substrings) is frequently used when processing data or performing format conversion.
In this article, we will provide a detailed explanation of how to slice strings in C language, including methods using standard library functions, creating custom functions, handling multibyte characters (Japanese), and techniques for splitting strings. We also introduce practical examples and error handling, so please read through to the end.
What You’ll Learn in This Article
By reading this article, you can acquire the following skills.
- Basic concepts of C language strings and the role of the terminating character
strncpystrchrsubstring extraction using standard library functions- Custom function
- Considering multibyte characters (Japanese)
strtokmethod of splitting strings- How to Retrieve the Characters Before and After a Specific Character
We will explain the concepts with code examples so that even beginners can understand easily.
Why Is String Slicing Important in C Language?
C language treats strings as “arrays (arrays of char type)”, so unlike higher-level languages such as Python or JavaScript, it cannot easily obtain substrings. Therefore, it is important to choose the appropriate method in situations like the following.
1. Processing Input Data
For example, when analyzing log data or CSV files, you need to extract specific fields.
2. Searching for Specific Keywords
Finding a specific keyword within a string and retrieving the surrounding information is essential for search functionality and data extraction.
3. Improving Program Safety
strncpy by using functions such as strncpy appropriately, you can prevent buffer overflows (writing data beyond the buffer size). This is important for avoiding security risks.
Article Structure
In this article, we will proceed with the explanation in the following order.
- What is a string in C? Basic concepts and the importance of the terminating character
- How to extract a substring in C language [Standard Library Edition]
- How to extract a substring in C language 【Custom function edition】
- Method of extracting strings by character encoding
- How to split strings in C
- Application example: How to extract text before and after a specific character
- Summary
- FAQ
Now, let’s first take a detailed look at “What Are Strings in C Language? Basic Concepts and the Importance of the Null Terminator.”
2. What are strings in C? Basic concepts and the importance of the terminating character
2.1 Basic concepts of strings in C
A string is an array of char
In C, strings are treated as arrays of characters (arrays of type char). For example, the following code is a basic example of defining and displaying a string.
#include <stdio.h>
int main() {
char str[] = "Hello, World!"; // Define a string literal as an array
printf("%s ", str); // Output the string
return 0;
}In this code, the string "Hello, World!" is stored as an array of type char, and is output by printf("%s\n", str);.
Internal structure of a string
The string "Hello" is stored in memory as follows.
| Index | 0 | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|---|
| text | H | e | l | l | o |
In C, a special character that indicates the end of a string (the null character '\0') is automatically added at the end, so the length of a string is “actual number of characters + 1”.
2.2 Importance of the terminating character (null character 'What is a null character?
')
What is a null character?
'Problems when the null character is missing
The null character ('\0') is a special character that indicates the end of a string. To handle C strings correctly, you need to understand the existence of this null character.
#include <stdio.h>
int main() {
char str[6] = {'H', 'e', 'l', 'l', 'o', '\0'}; // Explicitly specify the null terminator
printf("%s ", str); // Display correctly
return 0;
}In the code above, if there is no '\0', the termination of "Hello" is not recognized, which can lead to unintended behavior.
2.3 Correct ways to define strings
As shown below, forgetting the terminating character can cause abnormal memory behavior.
#include <stdio.h>
int main() {
char str[5] = {'H', 'e', 'l', 'l', 'o'}; // Does not include the null terminator
printf("%s ", str); // May cause unexpected behavior
return 0;
}Cause of error
printf("%s\n", str);null character'\0'continues to output characters until it finds it- If is not present, other data in memory may be output.
Method 1: Use string literals
Method 2: Define the array explicitly
The most common way to define a string is to use a string literal.
char str[] = "Hello";With this method, the C compiler automatically adds the null character '\0', so no special handling is required.
2.4 How to check the size of a string
If you define it manually including '\0', write it as follows.
char str[6] = {'H', 'e', 'l', 'l', 'o', '\0'};- Specify the size of characters, and it is important to put at the end.
- If you forget to put into , unexpected behavior will occur.
strlen behavior
To obtain the length of a string (number of characters), use the strlen function.
#include <stdio.h>
#include <string.h>
int main() {
char str[] = "Hello";
printf("Length of the string: %lu\n", strlen(str)); // Outputs 5 (does not include the null terminator)
return 0;
}2.5 Summary
strlennull character'\0'appearssizeof(str)
3. How to Extract Substrings in C Language [Standard Library Edition]
- C language strings are represented by
chararrays - The terminating character (null character
'\0') indicates the end of a string, so it must always be included - To get the length of a string
strlenuse - If you don’t define strings in an appropriate way, unexpected errors may occur.

3.1 Obtaining Substrings Using strncpy
In C language, you can extract substrings by leveraging the standard library. In this section, we explain how to obtain parts of a string using standard library functions such as strncpy and strchr.
Basic Syntax of strncpy
strncpy is a function that copies a portion of a string to another buffer.
Basic Usage Example
char *strncpy(char *dest, const char *src, size_t n);destsrcn'\0'
Cautions for strncpy
#include <stdio.h>
#include <string.h>
int main() {
char src[] = "Hello, World!";
char dest[6]; // Buffer to store the substring
strncpy(dest, src, 5); // Copy the first 5 characters "Hello"
dest[5] = '\0'; // Manually add the null terminator
printf("Substring: %s\n", dest); // Output "Hello"
return 0;
}3.2 Safe String Copy Using strncpy_s
- Null character
'\0'needs to be added manuallystrncpyn'\0'is not automatically appendeddest[n] = '\0'; - Beware of buffer overflow
destn
Basic Syntax of strncpy_s
strncpy_s is a version of strncpy with enhanced safety that can prevent buffer overflows.
Example
errno_t strncpy_s(char *dest, rsize_t destsz, const char *src, rsize_t n);destdestszdestsrcn
Benefits of strncpy_s
#include <stdio.h>
#include <string.h>
int main() {
char src[] = "Hello, World!";
char dest[6];
if (strncpy_s(dest, sizeof(dest), src, 5) == 0) {
dest[5] = '\0'; // Add null terminator just in case
printf("Substring: %s\n", dest);
} else {
printf("Copy error\n");
}
return 0;
}3.3 Extracting Up to a Specific Character Using strchr
- Specify the buffer size () to safely copy.
destszn
However, strncpy_s was added in the C11 standard, so you need to be aware that it may not be available in some environments.
Basic Syntax of strchr
strchr allows you to find the position of a specific character and retrieve the string up to that point.
Example
char *strchr(const char *str, int c);strcchar
Key Points
#include <stdio.h>
#include <string.h>
int main() {
char str[] = "Hello, World!";
char *pos = strchr(str, ','); // Find the position of ','
if (pos != NULL) {
int length = pos - str; // Calculate the number of characters up to ','
char result[20];
strncpy(result, str, length);
result[length] = '\0'; // Add the null terminator
printf("Substring: %s\n", result); // Output "Hello"
}
return 0;
}3.4 Keyword Search and Extraction Using strstr
strchrcpos - strstrncpy
Basic Syntax of strstr
strstr is useful for searching for a substring and retrieving the string that follows it.
Example
char *strstr(const char *haystack, const char *needle);haystackneedle
Key Points
#include <stdio.h>
#include <string.h>
int main() {
char str[] = "Hello, World!";
char *pos = strstr(str, "World"); // Search for the position of "World"
if (pos != NULL) {
printf("Found substring: %s\n", pos);
} else {
printf("Substring not found.\n");
}
return 0;
}3.5 Summary
strstrneedleNULLneedlehaystack
4. How to Extract Substrings in C Language 【Custom Function Edition】
strncpyWhen using strncpy, you can safely copy a substring, but you need to manually add a null character.strncpy_scan specifydestsz, improving safety.strchrIf you use it, you can obtain the substring up to a specific character.strstrIf you use it, you can get the position of a specific keyword and then cut out from there.
By leveraging the standard library, you can implement string handling in C language in a simple and safe manner.
4.1 Benefits of Creating Custom Functions
While you can perform basic substring extraction using the standard library, sometimes a more flexible approach is required. In this section, we will explain substring extraction using custom functions.
4.2 Basic Substring Extraction Function
Using the standard library allows copying and searching substrings, but there are issues such as the following.
strncpydoes not automatically add the null character'\0'strchrandstrstrcan only perform partial search- String manipulation is more difficult
Therefore, creating a custom function that can be tailored to specific needs is effective.
Function Specification
First, we create a basic function that extracts a string from a specified position.
Implementation Code
- Argument
const char *sourceint startint lengthchar *dest- Processing content
startlengthdest'\0'
Key Points
#include <stdio.h>
#include <string.h>
void substring(const char *source, int start, int length, char *dest) {
int i;
for (i = 0; i < length && source[start + i] != '\0'; i++) {
dest[i] = source[start + i];
}
dest[i] = '\0'; // Add null terminator
}
int main() {
char text[] = "Hello, World!";
char result[10];
substring(text, 7, 5, result); // Extract "World"
printf("Substring: %s\n", result);
return 0;
}4.3 Dynamic Substring Acquisition Using malloc
forlength'\0'dest[i] = '\0';must always place null character at the end
Function Specification
In the above function, you need to allocate the size of dest in advance. However, if you can allocate the required size dynamically, the function becomes more versatile.
Implementation Code
- Allocate the required memory with
startlength- The caller must
Key Points
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *substring_dynamic(const char *source, int start, int length) {
char *dest = (char *)malloc(length + 1); // +1 for the null terminator
if (dest == NULL) {
return NULL; // Memory allocation failed
}
int i;
for (i = 0; i < length && source[start + i] != '\0'; i++) {
dest[i] = source[start + i];
}
dest[i] = '\0';
return dest;
}
int main() {
char text[] = "Hello, World!";
char *result = substring_dynamic(text, 7, 5);
if (result != NULL) {
printf("Substring: %s\n", result);
free(result); // Free allocated memory
} else {
printf("Memory allocation failed.\n");
}
return 0;
}4.4 Multibyte Character (Japanese) Support
mallocdynamically allocating memory- After use, you need to free the memory with .
Implementation Considering Multibyte Characters
When handling Japanese (UTF-8 or other multibyte characters), a character is not necessarily 1 byte, so a simple substring function will not work correctly.
Implementation Code (UTF-8 Compatible)
mbstowcswchar_twcsncpywcstombs
Key Points
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <wchar.h>
#include <locale.h>
void substring_utf8(const char *source, int start, int length, char *dest) {
setlocale(LC_ALL, ""); // Set the locale
wchar_t wsource[256];
mbstowcs(wsource, source, 256); // Convert UTF-8 string to wide-character string
wchar_t wresult[256];
wcsncpy(wresult, wsource + start, length); // Extract substring in wide characters
wresult[length] = L'\0';
wcstombs(dest, wresult, 256); // Convert back to multibyte string
}
int main() {
char text[] = "こんにちは、世界!"; // UTF-8 string
char result[20];
substring_utf8(text, 5, 3, result); // Extract "世界"
printf("Substring: %s\n", result);
return 0;
}4.5 Summary
setlocale(LC_ALL, "");mbstowcswcsncpywcstombs
5. Methods for extracting substrings by character encoding
substringIf you create it yourself, you can flexibly obtain substrings.- Using dynamic memory allocation (
malloc), you can obtain variable-sized substrings. - When handling multibyte characters (Japanese),
mbstowcs/wcstombsuse.
When the standard library functions such as strncpy or strchr are insufficient, creating custom functions can make C language string handling more powerful.
5.1 When dealing with ASCII (1-byte characters)
In C language, if you don’t pay attention to differences in character encoding, substring extraction may not work correctly . Especially when handling multibyte characters such as Japanese (UTF-8, Shift_JIS, EUC-JP, etc.), because one character is not equal to one byte, simple strncpy or substring functions cannot handle it properly.
In this section, we will explain in detail methods for extracting substrings by character encoding .
Basic substring retrieval
Implementation example
ASCII characters are 1 character = 1 byte , so they can be easily processed with strncpy or substring functions.
5.2 When dealing with UTF-8 (multibyte characters)
#include <stdio.h>
#include <string.h>
void substring_ascii(const char *source, int start, int length, char *dest) {
strncpy(dest, source + start, length);
dest[length] = '\0'; // Add null terminator
}
int main() {
char text[] = "Hello, World!";
char result[6];
substring_ascii(text, 7, 5, result); // Extract "World"
printf("Substring: %s\n", result);
return 0;
}Key points
- In the case of ASCII text (alphanumeric only)
strncpyis sufficient to handle '\0'(null character) must always be added
Characteristics of UTF-8
Correct processing method
In UTF-8, the number of bytes per character varies from 1-4 bytes , so using strncpy directly may cut a character in the middle.
Substring retrieval supporting UTF-8
In C language, to safely handle UTF-8, it is recommended to convert it to a wide string ( wchar_t ) using mbstowcs and then obtain substrings.
5.3 When dealing with Shift_JIS (multibyte characters)
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>
void substring_utf8(const char *source, int start, int length, char *dest) {
setlocale(LC_ALL, ""); // Set the locale
wchar_t wsource[256];
mbstowcs(wsource, source, 256); // Convert multibyte string to wide-character string
wchar_t wresult[256];
wcsncpy(wresult, wsource + start, length); // Get the substring
wresult[length] = L'\0';
wcstombs(dest, wresult, 256); // Convert wide-character string back to multibyte
}
int main() {
char text[] = "こんにちは、世界!"; // UTF-8 string
char result[20];
substring_utf8(text, 5, 3, result); // Extract "世界"
printf("Substring: %s\n", result);
return 0;
}Key points
setlocale(LC_ALL, "");mbstowcswchar_twcsncpywcstombs
Characteristics of Shift_JIS
Substring retrieval supporting Shift_JIS
In Shift_JIS, a character can be 1 byte or 2 bytes , so using simple strncpy can cause garbled output.
Implementation for Shift_JIS
For Shift_JIS, the method of converting to a wide string and processing it is also recommended.
5.4 When dealing with EUC-JP (multibyte characters)
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>
void substring_sjis(const char *source, int start, int length, char *dest) {
setlocale(LC_ALL, "Japanese"); // Set locale to handle Shift_JIS
wchar_t wsource[256];
mbstowcs(wsource, source, 256); // Convert multibyte string (Shift_JIS) to wide-character string
wchar_t wresult[256];
wcsncpy(wresult, wsource + start, length); // Extract substring
wresult[length] = L'\0';
wcstombs(dest, wresult, 256); // Convert wide-character string back to multibyte (Shift_JIS)
}
int main() {
char text[] = "こんにちは、世界!"; // Shift_JIS string (depending on environment)
char result[20];
substring_sjis(text, 5, 3, result); // Extract "世界"
printf("Substring: %s\n", result);
return 0;
}Key points
- To correctly process Shift_JIS, set .
mbstowcswcstombs
Characteristics of EUC-JP
Substring retrieval supporting EUC-JP
EUC-JP, like Shift_JIS, has variable byte lengths per character, so conversion using wide characters is required .
5.5 Summary
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>
void substring_eucjp(const char *source, int start, int length, char *dest) {
setlocale(LC_ALL, "ja_JP.eucJP"); // Set locale to handle EUC-JP
wchar_t wsource[256];
mbstowcs(wsource, source, 256); // Convert multibyte string (EUC-JP) to wide-character string
wchar_t wresult[256];
wcsncpy(wresult, wsource + start, length); // Extract substring
wresult[length] = L'\0';
wcstombs(dest, wresult, 256); // Convert wide-character string back to multibyte (EUC-JP)
}
int main() {
char text[] = "こんにちは、世界!"; // EUC-JP string (depending on environment)
char result[20];
substring_eucjp(text, 5, 3, result); // Extract "世界"
printf("Substring: %s\n", result);
return 0;
}Key points
setlocale(LC_ALL, "ja_JP.eucJP");- Using / , correctly process multibyte characters.
6. How to Split Strings in C Language
| Character encoding | byte count | Recommended processing method |
|---|---|---|
| ASCII | 1 byte | strncpy |
| UTF-8 | 1-4 bytes | mbstowcswcstombs |
| Shift_JIS | 1 or 2 bytes | mbstowcswcstombs |
| EUC-JP | 1 or 2 bytes | Use / |
- If only ASCII text,
strncpyis OK - In the case of UTF-8, Shift_JIS, EUC-JP, use
mbstowcs/wcstombs - Set appropriately according to the environment
setlocale(LC_ALL, \"...\");set appropriately
6.1 strtok String Splitting Using strtok
String-splitting operations are needed in many situations, such as CSV data parsing, command-line argument processing, and log data analysis among others. In C language, you can use standard library functions like strtok and strtok_r, or create your own functions.
In this section, we will discuss in detail how to split strings using a specific delimiter.
Basic Syntax
strtok is a function that splits a string using the specified delimiter (delimiter).
Example: Split a string with a comma ,
char *strtok(char *str, const char *delim);strdelim- Return value
- Points to note
strtok'\0'
Output
#include <stdio.h>
#include <string.h>
int main() {
char str[] = "apple,banana,orange,grape"; // String to be split
char *token = strtok(str, ","); // Get the first token
while (token != NULL) {
printf("Token: %s\n", token);
token = strtok(NULL, ","); // Get the next token
}
return 0;
}strtok Notes
token: apple
token: banana
token: orange
token: grape6.2 Thread-Safe String Splitting Using strtok_r
- Change the original string
strtokdelimiter with'\0'
- not thread-safe
strtokグローバルな静的変数を内部で使用する
Basic Syntax
strtok_r is the thread-safe version of strtok, which stores its state in saveptr, making it safe to use in multithreaded environments.
Example: Split a string with a space
char *strtok_r(char *str, const char *delim, char **saveptr);strdelimsaveptr
Advantages of strtok_r
#include <stdio.h>
#include <string.h>
int main() {
char str[] = "Hello World from C"; // String to be split
char *token;
char *saveptr; // Pointer to store internal state
token = strtok_r(str, " ", &saveptr); // Get the first token
while (token != NULL) {
printf("Token: %s\n", token);
token = strtok_r(NULL, " ", &saveptr); // Get the next token
}
return 0;
}6.3 Splitting Strings with a Custom Function (Method that does not use strtok)
- Thread-safe
- Can process multiple strings concurrently
Specification of the Custom Function
strtok modifies the original string, so it is also possible to create a custom function that splits a string without altering it.
Implementation Code
- Input
const char *sourceconst char delimchar tokens[][50]- Processing
sourcedelimtokens
Key Points
#include <stdio.h>
#include <string.h>
void split_string(const char *source, char delim, char tokens[][50], int *count) {
int i = 0, j = 0, token_index = 0;
while (source[i] != '\0') {
if (source[i] == delim) {
tokens[token_index][j] = '\0';
token_index++;
j = 0;
} else {
tokens[token_index][j] = source[i];
j++;
}
i++;
}
tokens[token_index][j] = '\0';
*count = token_index + 1;
}
int main() {
char text[] = "dog,cat,bird,fish";
char tokens[10][50]; // Can store up to 10 words
int count;
split_string(text, ',', tokens, &count);
for (int i = 0; i < count; i++) {
printf("Token: %s\n", tokens[i]);
}
return 0;
}Token: dog
Token: cat
Token: bird
Token: fish6.4 Applications of String Splitting (Processing CSV Data)
sourcetokens
Example of CSV Data Parsing
CSV (comma-separated) data can be parsed using strtok.
Output
#include <stdio.h>
#include <string.h>
int main() {
char csv[] = "Alice,24,Female\nBob,30,Male\nCharlie,28,Male"; // CSV data
char *line = strtok(csv, "\n"); // Process line by line
while (line != NULL) {
char *name = strtok(line, ",");
char *age = strtok(NULL, ",");
char *gender = strtok(NULL, ",");
printf("Name: %s, Age: %s, Gender: %s\n", name, age, gender);
line = strtok(NULL, "\n");
}
return 0;
}6.5 Summary
Name: Alice, Age: 24, Gender: Female
Name: Bob, Age: 30, Gender: Male
Name: Charlie, Age: 28, Gender: MaleConclusion
| Method | merit | Disadvantage |
|---|---|---|
strtok | Can be easily divided | Change the original string |
strtok_r | Thread-safe | The usage is a bit complicated |
| Custom function | Do not change the original string | The code becomes longer |
| CSV parsing | Convenient for data processing | strtok Be careful of the limitations |
7. Application Example: How to Extract the Characters Before and After a Specific Character
- If it’s a simple split
strtok - If you need multithreading
strtok_r - If you don’t want to change the original, use a custom function
- Also applicable to CSV data analysis
In the next section, we will discuss in detail the application example: “How to Extract Text Before and After a Specific Character“.
7.1 strchr Using strchr to Retrieve the String Before a Specific Character
In string processing, extracting the characters before and after specific characters or keywords operations are often required. For example, the following cases can be considered.
- Get only the domain part from URL
- Extract file name from file path
- Retrieve the string before and after specific tags or symbols
In the C language, by using strchr and strstr, you can achieve such processing. Additionally, when more flexible handling is required, creating custom functions is also effective.
Basic Syntax
strchr Using strchr, you can identify the position of a specific character (the first one found).
Example: Retrieve File Name from File Path
char *strchr(const char *str, int c);strcchar
strchr returns the address if it finds c.
Result
#include <stdio.h>
#include <string.h>
void get_filename(const char *path, char *filename) {
char *pos = strrchr(path, '/'); // Search for the last '/'
if (pos != NULL) {
strcpy(filename, pos + 1); // Copy from the character after '/'
} else {
strcpy(filename, path); // If no '/', copy the whole path
}
}
int main() {
char path[] = "/home/user/documents/report.txt";
char filename[50];
get_filename(path, filename);
printf("Filename: %s\n", filename);
return 0;
}Key Points
Filename: report.txt7.2 strstr Using strstr to Retrieve the String After a Specific Keyword
strrchrthe last occurrence of a specific character (/) position can be obtainedpos + 1you can obtain only the file name
Basic Syntax
strstr Using strstr, you can search for a specific string (keyword) and retrieve the substring that follows its position.
Example: Retrieve Domain from URL
char *strstr(const char *haystack, const char *needle);haystackneedle
strstr returns the address of the position if it finds needle.
Result
#include <stdio.h>
#include <string.h>
void get_domain(const char *url, char *domain) {
char *pos = strstr(url, "://"); // Search for the position of "://"
if (pos != NULL) {
strcpy(domain, pos + 3); // Copy from the character after "://"
} else {
strcpy(domain, url); // If "://" is not found, copy the entire string
}
}
int main() {
char url[] = "https://www.example.com/page.html";
char domain[50];
get_domain(url, domain);
printf("Domain part: %s\n", domain);
return 0;
}Key Points
Domain part: www.example.com/page.html7.3 strchr Using strchr to Split the Sections Before and After a Specific Character
strstr"https://""http://""//"pos + 3://
Example: Separate Username and Domain from an Email Address
strchr By leveraging strchr, you can split and retrieve the strings before and after a specific character .
Result
#include <stdio.h>
#include <string.h>
void split_email(const char *email, char *username, char *domain) {
char *pos = strchr(email, '@'); // Search for the position of '@'
if (pos != NULL) {
strncpy(username, email, pos - email); // Copy the part before '@'
username[pos - email] = '\0'; // Add null terminator
strcpy(domain, pos + 1); // Copy the part after '@'
}
}
int main() {
char email[] = "user@example.com";
char username[50], domain[50];
split_email(email, username, domain);
printf("Username: %s\n", username);
printf("Domain: %s\n", domain);
return 0;
}Key Points
Username: user
Domain: example.com7.4 Advanced: Extract Specific Attributes Within HTML Tags
strchr'@'strncpy'@'copies the part before ‘@’ and adds a null characterstrcpy'@'after the part to copy
Example: Retrieve the URL from <a href="URL">
When extracting a specific attribute from an HTML tag, you can also leverage strstr.
Result
#include <stdio.h>
#include <string.h>
void get_href(const char *html, char *url) {
char *start = strstr(html, "href=\""); // Search for the position of href="
if (start != NULL) {
start += 6; // Move past href="
char *end = strchr(start, '"'); // Search for the next "
if (end != NULL) {
strncpy(url, start, end - start);
url[end - start] = '\0'; // Add null terminator
}
}
}
int main() {
char html[] = "<a href=\"https://example.com\">Click Here</a>";
char url[100];
get_href(html, url);
printf("Extracted URL: %s\n", url);
return 0;
}Key Points
Extracted URL: https://example.com7.5 Summary
strstr"href=\"strchr"
Conclusion
| Processing content | Use function | Merits |
|---|---|---|
| Get the part before a specific character | strchr / strrchr | Simple and fast |
| Get the part after a specific character | strstr | Keyword search is possible |
| Split before and after a specific character | strchr + strncpy | Convenient for splitting usernames and domains, etc. |
| Retrieving attributes of HTML tags | strstr + strchr | Applicable to Web Scraping |
8. Summary
strchrorstrstrwhen used, you can easily obtain the text before and after a specific character or keyword- File path handling, URL parsing, email address splitting, etc., are useful in many situations
- Applicable even to advanced processes such as Web scraping
8.1 Article Review
In this article, methods for extracting substrings in C language about which we explained in detail from basics to advanced. Here, we review the key points of each section and organize the optimal methods by use case.
8.2 Optimal Methods by Use Case
| Section | Content | Key Points |
|---|---|---|
| Basics of C language strings | In C language, strings are treated as arrays, and the terminating character is important | When handling strings, |
| Extraction with the standard library | strncpystrchr | strncpyneeds to manually add a null terminator |
| Extraction using a custom function | Create a flexible function | mallocvariable-length substring retrieval is possible |
| Processing per character encoding | How to support UTF-8, Shift_JIS, EUC-JP | mbstowcswcstombs to convert to wide characters is safe |
| String Splitting Method | strtokstrtok_r | strtok careful that it changes the original string |
| Extract characters before and after a specific character | strchrstrstr | Obtaining file names, URL parsing, HTML parsing |
1. Extracting Substrings
2. Splitting Strings
| Use Cases | Optimal method |
|---|---|
| I want to obtain a string of a fixed length | strncpy or substring() |
| I want to do a safe cut | strncpy_s |
| Handling multibyte characters (UTF-8, Shift_JIS, EUC-JP) | mbstowcs / wcstombs |
3. Getting Text Before and After a Specific Character
| Usage scenario | Optimal method |
|---|---|
| I want to simply split a string. | strtok |
| I want to do a thread-safe split | strtok_r |
| I want to split without changing the original string. | Custom function (split_string()) |
8.3 C Language String Handling Considerations
| Usage scenario | optimal method |
|---|---|
| Get file name from file path | strrchr(path, '/') |
| Get the domain part from the URL | strstr(url, "://") |
| Separate the username and domain from an email address | strchr(email, '@') |
| Get attribute values from HTML tags | strstr(tag, "href=\"") + strchr(tag, '"') |
1. Rigorously Manage Null Terminator 'Example of Safe String Copy
'
Example of Safe String Copy
'2. Watch Out for Buffer Overflows
In C language string handling, the terminating character '\0' proper management is most important. Especially when using strncpy or strchr, be careful to add the null character manually.
3. Use mbstowcs for Multibyte Character Handling
#include <stdio.h>
#include <string.h>
int main() {
char src[] = "Hello, World!";
char dest[6];
strncpy(dest, src, 5);
dest[5] = '\0'; // Add null terminator for safety
printf("Substring: %s\n", dest);
return 0;
}4. Managing Buffer Size
When manipulating strings in C, you need to implement carefully to avoid accessing outside array bounds. Especially when using strncpy, controlling the number of bytes copied is important.
Example of Safe String Copy
#include <stdio.h>
#include <string.h>
int main() {
char src[] = "Hello, World!";
char dest[6];
strncpy(dest, src, sizeof(dest) - 1);
dest[5] = '\0'; // Explicitly add null terminator
printf("Substring: %s\n", dest);
return 0;
}8.4 Towards Further Learning
When dealing with multibyte characters such as UTF-8 or Shift_JIS, simple strncpy or strlen will not work correctly.
Therefore, when handling multibyte characters, it is recommended to first convert to a wide string using mbstowcs and then process appropriately.
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main() {
setlocale(LC_ALL, ""); // Set the locale
char text[] = "こんにちは、世界!"; // UTF-8
wchar_t wtext[256];
mbstowcs(wtext, text, 256); // Convert to wide-character string
printf("Converted wide-character string: %ls\n", wtext);
return 0;
}Topics to Deepen Your Learning
In string handling, it is important to calculate the required memory size in advance and prevent buffer overflow. Especially when using malloc to allocate dynamic memory, be sure to know the exact size.
8.5 Summary
C language string handling is an important skill that improves program safety and readability. Based on the content introduced in this article, learning the following topics will enable more advanced string handling.
Topics to Deepen Your Learning
- Regular expression (regex)
- File Operations (String Processing Using fgets, fscanf)
- Memory Management (Dynamic String Processing Using malloc, realloc)
- Data Analysis(JSON, XML Parsing Methods)
8.5 Summary
- In C language, strings are managed as
chararrays, so handling the terminating character'\0'is important - To extract a substring, use
strncpy,substring(),malloc - For splitting strings, use
strtok/strtok_r/ custom functions - If you want to get the characters before and after a specific character, use
strchr,strstruse - When handling multibyte characters (Japanese),
mbstowcsis used - Be mindful of secure string handling, and beware of buffer overflows
If you apply the content of this article, practical string handling in C language will become possible. After understanding the basic functions, challenge yourself with custom functions and advanced processing to write more efficient code!


