Stroika Library 3.0d16
 
Loading...
Searching...
No Matches
Stroika::Foundation::Characters::Character Class Reference

#include <Character.h>

Classes

struct  EqualsComparer
 
struct  ThreeWayComparer
 

Public Types

enum class  ASCIIOrLatin1Result
 

Public Member Functions

constexpr Character () noexcept
 
nonvirtual ASCII GetAsciiCode () const noexcept
 
constexpr char32_t GetCharacterCode () const noexcept
 Return the char32_t UNICODE code-point associated with this character.
 
constexpr operator char32_t () const noexcept
 
constexpr bool IsASCII () const noexcept
 Return true iff the given character (or all in span) is (are) in the ascii range [0..0x7f].
 
constexpr bool IsLatin1 () const noexcept
 Return true iff the given character (or all in span) is (are) in the ascii/iso-latin range [0..0xff].
 
constexpr bool IsWhitespace () const noexcept
 
nonvirtual bool IsUpperCase () const noexcept
 
nonvirtual bool IsLowerCase () const noexcept
 
constexpr bool IsControl () const noexcept
 
nonvirtual Character ToLowerCase () const noexcept
 
nonvirtual Character ToUpperCase () const noexcept
 
constexpr bool IsSurrogatePair () const
 
constexpr pair< char16_t, char16_t > GetSurrogatePair () const
 

Static Public Member Functions

template<IPossibleCharacterRepresentation CHAR_T>
static constexpr void CheckASCII (span< const CHAR_T > s)
 if not IsASCII (arg) throw RuntimeException...
 
template<IUNICODECanUnambiguouslyConvertFrom CHAR_T>
static constexpr ASCIIOrLatin1Result IsASCIIOrLatin1 (span< const CHAR_T > s) noexcept
 
template<IUNICODECanUnambiguouslyConvertFrom CHAR_T>
static void CheckLatin1 (span< const CHAR_T > s)
 if not IsLatin1 (arg) throw RuntimeException...
 
template<typename RESULT_T = string, IPossibleCharacterRepresentation CHAR_T>
requires requires (RESULT_T* into) { { into->empty () } -> same_as<bool>; { into->push_back (ASCII{0}) }; }
static bool AsASCIIQuietly (span< const CHAR_T > fromS, RESULT_T *into)
 
template<IUNICODECanUnambiguouslyConvertFrom CHAR_T, size_t E1, size_t E2>
static constexpr strong_ordering Compare (span< const CHAR_T, E1 > lhs, span< const CHAR_T, E2 > rhs, CompareOptions co) noexcept
 

Static Public Attributes

static constexpr char16_t kUNICODESurrogate_High_Start {0xD800}
 

Detailed Description

Note
Satisfies Concepts: o static_assert (regular<Character>);
Comparisons: o static_assert (totally_ordered<Character>) o Character::EqualsComparer and Character::ThreeWayComparer provided with construction parameters to allow case insensitive compares

Definition at line 218 of file Character.h.

Member Enumeration Documentation

◆ ASCIIOrLatin1Result

Constructor & Destructor Documentation

◆ Character()

constexpr Stroika::Foundation::Characters::Character::Character ( )
constexprnoexcept

Default constructor produces a zero (null) character. Constructor with char32_t always produces a valid character.

The overloads check for a valid character code-point and throw if given invalid data.

  • The overload taking a single char (ASCII) will throw if arg is not ASCII
  • The overload taking a single char16_t will throw if given a code-point not valid on its own (surrogate without pair)
  • The overload taking two char16_t surrogate pairs, may throw if given invalid code-points
  • The overload taking wchar_t will treat it as char16_t or char32_t constructor, depending on sizeof (wchar_t)

To avoid checking, cast 'c' to char32_t, as any code-point will be considered valid (so no need to check).

Definition at line 127 of file Character.inl.

Member Function Documentation

◆ GetAsciiCode()

ASCII Stroika::Foundation::Characters::Character::GetAsciiCode ( ) const
noexcept
Precondition
IsASCII()

Definition at line 183 of file Character.inl.

◆ operator char32_t()

constexpr Stroika::Foundation::Characters::Character::operator char32_t ( ) const
explicitconstexprnoexcept

Explicit cuz creates too many ambiguities with things like c == '\0' where conversions can go both ways.

Definition at line 192 of file Character.inl.

◆ IsASCII()

constexpr bool Stroika::Foundation::Characters::Character::IsASCII ( ) const
constexprnoexcept

Return true iff the given character (or all in span) is (are) in the ascii range [0..0x7f].

Note
unlike other uses of CHAR_T in other methods in this class, even if CHAR_T=ASCII the code still loops and checks the range of characters. This is because ASCII == char and you need some way to check a bunch of 'char' elements and see if they are ascii.

Definition at line 227 of file Character.inl.

◆ IsLatin1()

constexpr bool Stroika::Foundation::Characters::Character::IsLatin1 ( ) const
constexprnoexcept

Return true iff the given character (or all in span) is (are) in the ascii/iso-latin range [0..0xff].

This refers to ASCII OR https://en.wikipedia.org/wiki/Latin-1_Supplement, so any UNICODE character code point less than U+00FF.

Note
this pays close attention to the CHAR_T, and checks differently (especially for sizeof(CHAR_T)==1. If the type is ASCII or Latin1, there is nothing to check, and so this just returns true. For CHAR_T==char8_t, we walk the sequence of characters and verify carefully that the encoded characters all will fit in the ISO-Latin1 range (<= 256).
See also
Latin1

Definition at line 268 of file Character.inl.

◆ IsASCIIOrLatin1()

template<IUNICODECanUnambiguouslyConvertFrom CHAR_T>
static constexpr ASCIIOrLatin1Result Stroika::Foundation::Characters::Character::IsASCIIOrLatin1 ( span< const CHAR_T >  s)
staticconstexprnoexcept

Combines check for IsASCII and IsLatin1 in one call (performance). Returns flag indicating most specific possible answer for the entire span. So if all characters ascii, that's returned. If not, but all characters latin1, that's returned. Else returned none.

Note
, if CHAR_T == Latin1 or ASCII, then this will NEVER return none. Its equivalent to IsASCII. If CHAR_T==ASCII. we do like IsASCII(): and actually check the bytes in the ASCII change, despite the ASCII designation (rationale in IsASCII).

◆ IsWhitespace()

constexpr bool Stroika::Foundation::Characters::Character::IsWhitespace ( ) const
constexprnoexcept

FROM https://en.cppreference.com/w/cpp/string/wide/iswspace: In the default (C) locale, the whitespace characters are the following: space (0x20, ' ') form feed (0x0c, '\f') line feed (0x0a, '
') carriage return (0x0d, '\r') horizontal tab (0x09, '\t') vertical tab (0x0b, '\v') ... ISO 30112 defines POSIX space characters as UNICODE characters U+0009..U+000D, U+0020, U+1680, U+180E, U+2000..U+2006, U+2008..U+200A, U+2028, U+2029, U+205F, and U+3000.

Note
before Stroika v3.0d1, this just used iswspace()

Definition at line 394 of file Character.inl.

◆ IsUpperCase()

bool Stroika::Foundation::Characters::Character::IsUpperCase ( ) const
noexcept

Checks if the given character is upper case. Can be called on any character. Returns false if not alphabetic

Definition at line 449 of file Character.inl.

◆ IsLowerCase()

bool Stroika::Foundation::Characters::Character::IsLowerCase ( ) const
noexcept

Checks if the given character is lower case. Can be called on any character. Returns false if not alphabetic

Definition at line 454 of file Character.inl.

◆ IsControl()

constexpr bool Stroika::Foundation::Characters::Character::IsControl ( ) const
constexprnoexcept

According to https://en.cppreference.com/w/cpp/string/wide/iswcntrl

ISO 30112 defines POSIX control characters as UNICODE characters U+0000..U+001F, U+007F..U+009F, U+2028, and U+2029 (UNICODE classes Cc, Zl, and Zp)

Definition at line 469 of file Character.inl.

◆ ToLowerCase()

Character Stroika::Foundation::Characters::Character::ToLowerCase ( ) const
noexcept
 Note that this does NOT modify the character in place but returns the new desired

character.

 It is not necessary to first check

if the argument character is uppercase or alphabetic. ToLowerCase () just returns the original character if there is no sensible conversion.

See also
https://en.cppreference.com/w/cpp/string/wide/towlower Only 1:1 character mapping can be performed by this function, e.g. the Greek uppercase letter 'Σ' has two lowercase forms, depending on the position in a word: 'σ' and 'ς'. A call to std::towlower cannot be used to obtain the correct lowercase form in this case.

Definition at line 492 of file Character.inl.

◆ ToUpperCase()

Character Stroika::Foundation::Characters::Character::ToUpperCase ( ) const
noexcept
 Note that this does NOT modify the character in place but returns the new desired

character.

 It is not necessary to first check

if the argument character is lowercase or alphabetic. ToUpperCase () just returns the original character if there is no sensible conversion.

Definition at line 501 of file Character.inl.

◆ AsASCIIQuietly()

template<typename RESULT_T , IPossibleCharacterRepresentation CHAR_T>
requires requires (RESULT_T* into) { { into->empty () } -> same_as<bool>; { into->push_back (ASCII{0}) }; }
bool Stroika::Foundation::Characters::Character::AsASCIIQuietly ( span< const CHAR_T >  fromS,
RESULT_T *  into 
)
static

Convert String losslessly into a standard C++ type. If this source contains any invalid ASCII characters, this returns false, and otherwise true (with set into).

Precondition
into->empty ()

Supported Types (RESULT_T): o Memory::StackBuffer<ASCII> o string o u8string

Definition at line 507 of file Character.inl.

◆ IsSurrogatePair()

constexpr bool Stroika::Foundation::Characters::Character::IsSurrogatePair ( ) const
constexpr

Return true iff this Character (or argument codepoints) represent a character which would be represented in UCS-16 as a surrogate pair.

Definition at line 552 of file Character.inl.

◆ GetSurrogatePair()

constexpr pair< char16_t, char16_t > Stroika::Foundation::Characters::Character::GetSurrogatePair ( ) const
constexpr
Precondition
IsSurrogatePair returns the high/low pseudo-characters of the character

Definition at line 577 of file Character.inl.

◆ Compare()

template<IUNICODECanUnambiguouslyConvertFrom CHAR_T, size_t E1, size_t E2>
constexpr strong_ordering Stroika::Foundation::Characters::Character::Compare ( span< const CHAR_T, E1 >  lhs,
span< const CHAR_T, E2 >  rhs,
CompareOptions  co 
)
staticconstexprnoexcept

utility to compare an array of characters, like strcmp (), except with param saying if case sensitive or insensitive.

Definition at line 539 of file Character.inl.

Member Data Documentation

◆ kUNICODESurrogate_High_Start

constexpr char16_t Stroika::Foundation::Characters::Character::kUNICODESurrogate_High_Start {0xD800}
staticconstexpr

See https://en.wikipedia.org/wiki/Universal_Character_Set_characters#Surrogates

Note
- would be nice to use DiscreteRange for these, but hard todo given deadly embrace.

Definition at line 473 of file Character.h.


The documentation for this class was generated from the following files: