std::regex_traits::lookup_classname

template< class ForwardIt > char_class_type lookup_classname( ForwardIt first, ForwardIt last, bool icase = false ) const;

If the character sequence [first, last) represents the name of a valid character class in the currently imbued locale (that is, the string between [: and :] in regular expressions), returns the implementation-defined value representing this character class. Otherwise, returns zero.

If the parameter icase is true, the character class ignores character case, e.g. the regex [:lower:] with std::regex_constants::icase generates a call to regex_traits<>::lookup_classname() with [first, last) indicating the string "lower" and icase == true. This call returns the same bitmask as the call generated by the regex [:alpha:] with icase == false.

The following character classes are always recognized, in both narrow and wide character forms, and the classifications returned (with icase == false) correspond to the matching classifications obtained by the std::ctype facet of the imbued locale, as follows:

character class	std::ctype classification
"alnum"	std::ctype_base::alnum
"alpha"	std::ctype_base::alpha
"blank"	std::ctype_base::blank
"cntrl"	std::ctype_base::cntrl
"digit"	std::ctype_base::digit
"graph"	std::ctype_base::graph
"lower"	std::ctype_base::lower
"print"	std::ctype_base::print
"punct"	std::ctype_base::punct
"space"	std::ctype_base::space
"upper"	std::ctype_base::upper
"xdigit"	std::ctype_base::xdigit
"d"	std::ctype_base::digit
"s"	std::ctype_base::space
"w"	std::ctype_base::alnum with '_' optionally added

The classification returned for the string "w" may be exactly the same as "alnum", in which case isctype() adds '_' explicitly.

Additional classifications such as "jdigit" or "jkanji" may be provided by system-supplied locales (in which case they are also accessible through std::wctype)

#include <iostream>
#include <locale>
#include <regex>
#include <cwctype>
 
// This custom regex traits uses wctype/iswctype to implement lookup_classname/isctype
struct wctype_traits : std::regex_traits<wchar_t>
{
    using char_class_type = std::wctype_t;
    template<class It>
    char_class_type lookup_classname(It first, It last, bool=false) const {
        return std::wctype(std::string(first, last).c_str());
    }
    bool isctype(wchar_t c, char_class_type f) const {
        return std::iswctype(c, f);
    }
};
 
int main()
{
    std::locale::global(std::locale("ja_JP.utf8"));
    std::wcout.sync_with_stdio(false);
    std::wcout.imbue(std::locale());
 
    std::wsmatch m;
    std::wstring in = L"風の谷のナウシカ";
    // matches all characters (they are classified as alnum)
    std::regex_search(in, m, std::wregex(L"([[:alnum:]]+)"));
    std::wcout << "alnums: " << m[1] << '\n'; // prints "風の谷のナウシカ"
    // matches only the kanji
    std::regex_search(in, m,
                      std::basic_regex<wchar_t, wctype_traits>(L"([[:jkata:]]+)"));
    std::wcout << "katakana: " << m[1] << '\n'; // prints "ナウシカ"
}

Output:

alnums: 風の谷のナウシカ
katakana: ナウシカ

[edit] See also

isctype	indicates membership in a character class (public member function)
wctype	looks up a character classification category in the current C locale (function) [edit]

Language
Standard library headers
Concepts
Utilities library
Strings library
Containers library
Algorithms library
Iterators library
Numerics library
Input/output library
Localizations library
Regular expressions library (C++11)
Atomic operations library (C++11)
Thread support library (C++11)
Technical Specifications

Classes
basic_regex (C++11)
sub_match (C++11)
match_results (C++11)
Algorithms
regex_match (C++11)
regex_search (C++11)
regex_replace (C++11)
Iterators
regex_iterator (C++11)
regex_token_iterator (C++11)
Exceptions
regex_error (C++11)
Traits
regex_traits (C++11)
Constants
syntax_option_type (C++11)
match_flag_type (C++11)
error_type (C++11)
Regex Grammar
Modified ECMAScript-262 (C++11)

Member Functions
regex_traits::regex_traits
regex_traits::length
regex_traits::translate
regex_traits::translate_nocase
regex_traits::transform
regex_traits::transform_primary
regex_traits::lookup_collatename
regex_traits::lookup_classname
regex_traits::isctype
regex_traits::value
regex_traits::imbue
regex_traits::getloc

first, last	-	a pair of iterators which determines the sequence of characters that represents a name of a character class
icase	-	if true, ignores the upper/lower case distinction in the character classification
Type requirements
- `ForwardIt` must meet the requirements of `ForwardIterator`.

cppreference.com

Search

Namespaces

Variants

Views

Actions

std::regex_traits::lookup_classname

Contents

[edit] Parameters

[edit] Return value

[edit] Example

[edit] See also

Navigation

Toolbox