Stroika makes extensive use of assertions to help assure correct code, and to document how to use the Stroika library.
Nearly every API has pre-requisite 'Require' statements, and 'Ensure' statements which make promises about the state of the object (in methods) or return values. This provides not only documentation, but executable documentation – which makes it much easier to develop correct Stroika applications.
In debug builds, all Stroika Assertions are checked – evaluated at runtime, and problems cause the program to abort, or drop into a debugger. These are not recoverable (not subject to turning into exceptions or ignoring).
Debug builds are typically 2 or 3 times slower than release builds (this factor can vary a great deal depending on your program).
It is recommended programs be developed mostly with Debug builds, and transition to release builds more towards the end of a release cycle (but always use a mix of both).
Release builds have zero overhead from assertions. There is no runtime or space cost.
This is very important to understand, because it the zero cost of assertion checking in the final delivered product helps encourage more use of assertions (by removing the excuse that the check would make the program seem slow for users). And it contributes to why Stroika is a very high performance framework.
Objects in Stroika are overwhelmingly copy-by-value in semantics, though often copy-by-reference internally, for performance reasons.
For example
String a = L"a";
String b = a;
b += L"a";
Assert (a == L"a");
This principle – copy by value almost everywhere – except where clearly marked by names to the contrary – helps make Stroika semantics visually obvious.
There are a few kinds of objects in Stroika that violate this rule about copy-by-value. Some objects only make sense to share. For example, Sockets don't make sense to copy by value. They are intrinsically associated with network endpoints that cannot be duplicated, and so 'copies' are copies of the pointer to that object. To emphasize this, Stroika uses the naming convention of Ptr at the end of the name, or as the name, of such objects.
These objects often have methods of the thing they point to – with a variety of convenient overloads etc, but copying those objects logically just copies a reference (pointer) to the underlying shared object.
Examples of such things that are intrinsically (and named) 'Ptr' objects include:
Stroika also makes extensive use of Iterator objects – and these are logically copied by value, but they are also logically pointers 'into' some associated container.
A very helpful template class used internally in Stroika is SharedByValue<T>. You don't need to know about this, but it may be helpful in efficiently implementing by-value semantics with nearly by-reference performance impact.
SharedByValue implements 'copy-on-write', fairly simply and transparently. So you store your actual data in a 'rep' (letter part of letter-envelope pattern), and when const methods are accessed, you simply dereference the pointer. Such objects can be copied for the performance cost of copying a shared_ptr<T> - fairly cheap. The only time you pay (a significant) cost, is when you mutate one of these objects (which is already shared) – then you do the copying of the data behind the object.
Stroika's container library, for example, uses SharedByValue, so copying a Stroika Sequence<T> is generally much cheaper than copying a vector<T>. Similarly for String objects (vs std::string). Note however, while this makes copying much cheaper, it makes accessing the insides of a String or Sequence correspondingly more expensive than their vector/string counterparts.
Generally Stroika uses the idea of logical const for its objects, and freely uses mutable for fields to enforce that notion.
But there is one case where this is slightly vague, and at first glance, may appear not fully adhered to: Ptr objects.
Ptr objects are really combinations of two kinds of things – smart pointers – and short-hand accessors for the underlying thing.
Because of the C++ thread safety rules (always safe to access const methods from multiple threads at once so long as no writers, and the need for synchronization on writes) – and because these rules only apply literally and directly to the 'envelope' part – or the smart-pointer part of the object, we use the constness on Ptr objects to refer to the ptr itself, and not thing pointed to.
We arguably COULD get rid of PTR objects and just use shared_ptr<T> or shared_ptr<const T> - but then we would lose the convenience of having simple interfaces for reps, and more complex, overloading etc interfaces for calling.
One reason it's very important to understand what values are copy-by-value, and what are copy-by-reference, is because of understanding thread safety.
All of Stroika is built to be 'thread safe', but automatically synchronizing all operations would create a high and almost pointless performance penalty.
Instead, Stroika mostly follows the C++ STL (Standard Template Library) thread safety convention, of having const methods always safe for multiple readers, and non-const methods ONLY safe with a single caller at a time.
But that only goes one level deep – the outer object you are accessing. For the special case of these 'Ptr' objects, the user must also worry about synchronizing the internal shared 'rep' objects. The way this is done varies from class to class, and look at the particular 'Ptr' classes you are using to see. For example, Thread::Ptr internal rep objects are always internally synchronized (meaning the caller only need worry about synchronizing the Ptr object). Stream internal rep objects are by default, not externally synchronized, but you can easily construct an internally synchronized stream with InternallySynchronizedInputStream<T>::New () – for example (which creates a new delegating object with locks around each call).
And to synchronize any c++ object, you can always use the utility template Synchronized<T> - to wrap access to the object. You can also use lock_guard<>; etc, but Synchronized<> makes accessing shared data in a thread-safe way MUCH simpler and more transparent (but only synchronizes the 'envelope' – not the 'shared rep' of 'Ptr' objects.
See 'Thread Safety.md' for more details.
In the several families of classes, such as Threads, Streams (InputStream, OutputStream etc), Sockets, and others using the letter-envelope paradigm, users must separately consider the thread safety of the letter and the envelope.
The envelope typically follows C++-Standard-Thread-Safety, but the thread safety rules applying to the letter (shared rep object) – depend on how that object was created. So see its Object::New () method for documentation on this.
To document, and to help ensure that Stroika classes are used in a thread safe manner, the helper class Debug::AssertExternallySynchronizedMutex<T> is used fairly consistently throughout Stroika to 'wrap' objects in a thread-safety-checking envelope. This has no performance cost (space or runtime) in release builds, but has a significant (roughly 2x slowdown) in debug builds.
But it means that if your code runs correctly (without assertion errors) in Debug builds, it's probably thread safe.
This doesn't completely replace tools like thread-sanitizer, but it does help provide simpler, and clearer diagnostics directly when you are running your threaded applications.
Tools like valgrind (memcheck), and sanitizers (address, undefined behavior, and thread sanitizer) are all regularly run as part of the Stroika regression test suite, and are a sensible addition Stroika-based development process.
They are especially useful to help validate that any subtle bugs aren't present ONLY in release builds, but not in debug builds (extremely rare, but it can happen).
All Stroika's regression tests are regularly run with valgrind and sanitizers.
make format-code
I'm not even slightly happy about the way this looks but I've found no better alternative. At least its automated and consistent. It can be configured to use astyle or clang-format, but I've found clang-format slightly less buggy.
I personally prefer the style "CamelCase" – probably because I first did object oriented programing in Object Pascal/MacApp – a few years back. Maybe there is another reason. But now it's a quite convenient – providing a subtle but readable visual distinction.
All (or nearly all) Stroika classes, and methods use essentially the same 'Studly Caps' naming styled from MacApp, with a few minor deviations:
However, STL / stdc++ - has its own naming convention (basically all lower case, and _), plus its own words it uses by analogy / convention throughout (e.g. begin, end, empty).
Stroika methods will start with an upper case letter, EXCEPT in the case where they method mimics for follows an existing STL pattern. If you see lower case, assume the function follows STL semantics. If you see CamelCase, you can assume it follows Stroika semantics.
For example:
String:: SubString () follows Stroika semantics (asserting if values out of range).
String::substr() follows the semantics of STL's basic_string<>::substr().
Note – this 'convention' doesn't replace documentation (the behavior of each method is documented). It just provides the user/reader a quick subtle convenient visual cue which semantics to expect without reading the docs.
Examples of common STL methods which appear in Stroika code (with STL semantics):
Rationale
Note this slight variation of naming convention from the std c++ standard is helpful, in that it conveys information with practically no cost. The Stroika functions will tend to have slightly different behavior and guarantees than std c++ (stl) counterparts, and the naming convention makes it a little easier to read code with the two libraries intermingled, knowing which style is in use. (for example, stl iterators have different requirements than Stroika ones, Stroika functions tend to use assertions (pre/post condition assertions), that STL/stdc++ libraries dont, etc).
STL is reasonably consistent, with most APIs using T* start, T* end, but some APIs use length instead of end. The Stroika convention is to always use T* start, T* end.
One, this gives more consistent expectations. That's especially important for APIs that use offsets (like String) – so that it's obvious the meaning of integer parameters.
And it avoids problems with overflow. For example, if you had an API like:
To map this to an internal representation you have todo:
char* s = m_bufPtr + _Off;
char* e = m_bufPtr + _Off + _Count;
but if count was numeric_limits<size_t>::max(), then the e pointer computation would overflow. There are ways around this, but mixing the two styles creates a number of problems - but for implementations – and for use.
It often doesn't matter which you use, but sematically postfix requires returning the previous value which involves an extra copy. This CAN in PRINCIPLE be costly. So in things like for loops for the re-init case, and anyplace else we dont really count on the postfix++, just use prefix preferentially. Similarly for –;
See https://stackoverflow.com/questions/24901/is-there-a-performance-difference-between-i-and-i-in-c
Stroika makes extensive use of the builtin operator"" sv
, which produces (more efficient) String objects (really the STL version produces string_view but Stroika's String class converts string_view to a String more efficiently - reusing the space for the characters)
Stroika also provides operator"" _k
which does about the same thing (producing String_Constant) but we use internally and encourage use of operator"" sv (since its a standard and does about the same thing). operator"" _k
is only provided as an option because there are a few cases of ambiguity where its helpful.
`operator"" _RegEx () can be used as a shortcut for defining regular expressions
Why are Ptr objects 'struct' / 'class' instead of actual namespaces?
This is the use of a struct as a namespace. Now that C++ has namespaces, you might ask - why would anybody do this? The answer is that struct allows defining part of the namespace is PRIVATE, and part public. This can be sometimes useful, and that is why Stroika occasionally uses this pattern.
Things like 'Stream' or 'Socket' – are just logical groupings
These logical groupings could have implemented using actual namespaces or just struct's acting as 'quasi' namespaces.
Advantages of using 'namespace':
Advantages of using struct/class
In the end – no very strong arguments, but for now I've gone with 'struct/class' in several places.
In Stroika, a New () is static method, which allocates an instance of some class, but returns some kind of shared_ptr/smart pointer to the type – not a bare C++ pointer.
Stroika doesn't make much use of the factory pattern, but occasionally – it is useful. If the type provided by the factory is exactly the type of a given class, then we generally use
struct T_Factory {
static T New();
};
That technique is used to control the default kind of containers (backend algorithm) that is used.
Or for Stream classes, the 'stream quasi namespace' contains a New method to construct the actual stream, and the definition of the Ptr type – smart pointer – used to access the stream.
Constructors return an object, not optional, of that object. You could just have the constructor throw an exception when it fails to construct (due to bad arguments) - but sometimes its handy to just return optional<T> for that case.
One case where this is commonly true is with parsing arguments (like a date, or a URL). So for these cases, Stroika provides a static Parse () function, which returns optional<T>, and acts kind of like a constructor, except that it returns missing when unable to 'parse' its arguments.
Another reason why its sometimes helpful to use the static name Parse() instead of a constructor argument, is for clarity take that you aren't converting (so maybe accidentally implicitly converting) a string into that object type, but explicitly noting that the string is being parsed into that object type.
Stroika types generally support the c++20 operator== and operator<=> semantics and operators.
Many types declare this with:
Other templated types, the comparability depends on that of the templated types.
For example - for KeyValuePair<>
But for many classes, for example, 'set' containers, it matters if the function argument is an equality comparer, or ordering comparer, and the C++ comparison syntax does't make that distinction (less and equal are two functions objects that have the same 'signature' but one 'works' in a std::set, but the other fails pretty badly).
Stroika uses a utility class template ComparisonRelationDeclaration<> and some related classes and functions and types, to annotate function objects and some concepts to filter, so you can declare the type of comparison relation function.
NOTE - if you use 'three-way-comparers', there is no need for that, as their function signature is enough to automatically detect what they are. Stroika containers and concepts for equality comparosn etc should automatically convert/handle operator<=>
C++11 now supports a new typedef syntax – using T=…. This is nearly the same as typedef in terms of semantics.
Stroika code will generally use the using T = syntax in preference to typedef for two reasons:
The Windows SDK uses the convention of appending a W to the end of a function name that uses wide characters, and an A to the name that uses the current operating system locale for code page.
In C++ (and Stroika) – this convention is also generally unneeded, because of the availability of overloading.
Stroika generally avoids this issue by returning String classes nearly everywhere – which are Unicode based. But as the Stroika String classes uses the rest of the Stroika infrastructure – including thread interruption, it's sometimes inconvenient for some low level coding to use those String classes.
But you cannot overload on return types.
For this reason, a handful of Stroika APIs follow the convention of a suffix of:
Instead of using ModuleInit<> - which Stroika used until 2.1b7, we now use a combination of magic-inits, in most places, but occasionally call_once() with atexit().
This MAY not be a great idea. Its a little simpler, and will look more standardized to most eyes. But the ModuleInit<> mechaism trades off performace at startup for later performance not having to check if initialized. So not sure its a good switch.
Generally, when you have a shared bit of static content and are sure its only accessed after main, you can store it as:
But if this may be used before main,
Singleton objects are a common pattern. Stroika doesn't use these a ton, but some. One issue with singletons to be careful about is thread safety. Stroika leverages a couple of patterns to handle this.
Where the object is intrinsically constant, we follow the pattern of
Here since the objects are constant, thread safety is obviously not an issue. Doing this where constexpr is not possible DOES present an issue with accessing these objects before main(). Avoid this pattern (without constexpr) if the object maybe needed before main (use Get () below)
Mutable singletons are accessed by the Get() static method. By definition of a mutable singleton, it will have some non-const methods.
Variants of this pattern are safe to use before main (because the Get() method can ensure the underlying singleton object is constructed before returning it).
Within this category of singletons, sometimes we have the Get method return a mutable reference to the global object.
For example:
In this case, all mutable methods (such as Logger::SetAppender(), SetSignalHandlers::SetSignalHanlders()) are internally syncrhonized, and so safe to call from any thread.
This pattern is safe to use before main (because the Get() method can ensure the underlying singleton object is constructed before returning it).
Another pattern is to return a copyable managed object (either shared_ptr, or something that internally has a shared_ptr or like object).
For example,
In this case, you Get a copy of the global object, update it, and then call Set() to reset the shared/default copy of the given object.
In this case, the mutable methods need NOT be internally synchronized, but the Get/Set static functions are guarnateed internally synchronized.
And something like
is safe even if done in parallel with an update to the InternetMediaTypeRegistry (via Set) because the "Get" call maintains a temporary shared_ptr() reference to the old value when the new value is being updated.
NOTE Doing
MAY NOT be safe, depending on the particular type. See each singleton type to see how to call its mutable singleton methods.
Stroika templates make substantial use of concepts to help provide documentation about expectations and better error messages.
Since concepts are largely 'interfaces' and syntactically, cannot be confused with abstract class 'interfaces', Stroika uses the naming convention of starting each concept with the prefix 'I'.
Most Stroika functions raise an exception when they fail. For example, Wait () methods etc, Parse() methods, etc. But sometimes you just want the thing to return an optional result (for speed and simplicity - cuz the failure case is common, not exceptional). To simplify these situations, many APIs have a 'Quietly' variant, that does what the main function does, but instead of raising an exception when it fails, it returns a nullopt.
For many APIs (e.g. member functions) - we annotate the API with a deprecated attribute, and document the new API to be used instead. We try to keep these around til the major version switch (e..g til we switch from 2.1b9 to 2.1rc, or 3.1a, to 3.1b, etc).
For large sections of code which may change (e.g. the ORM or SQL APIs), we may version the entire API section with inline namespaces
for this C++, f is really 'virtual'. It would be best if the compiler warned that there was no 'override' directive, but at least some compilers dont do that.
So when you see a function declaration WITHOUT a virtual or override, its not clear if that function is virtual or not.
Stroika uses the CONVENTION (sadly unenforced by the compilers) of using the macro nonvirtual as a hint/reminder that the given function is NOT virtual (not even implicitly because the base contains a virtual with the same signature).
I really have NO IDEA what is best here. I've searched alot and found no clear guidance. But consistency appears a virtue, so I've come up with a policy and documented. Its NOT ALWAYS right, just a good default
This form makes clear the 'c' in the loop is readonly and just examined in the loop.
Using the explicit name just makes it a little easier sometimes to see in the code where 'c' is used what its type is.
When the 'c' value will be modified in place, or if I KNOW size is small/basic type, I may use
I cannot see the utility - except maybe in templated code where you may want to forward values, things like
NB - use of auto& c won't work with Stroika Iterator<> classes (since operator* returns const reference only as we don't allow updating containers by fiddling with the iterator only.