C++ std::map using the __m128i type
11 February, 2025
In a recent project, I encountered a performance bottleneck while using std::map with CString as the key. The keys represented file extensions, each not exceeding seven Unicode characters. Given the performance-critical nature of the loop, the overhead of hashing CString for such short sequences was suboptimal.
To address this, I used the __m128i data type, which is part of the Streaming SIMD Extensions 2 (SSE2) in C++. This data type allows for handling 128-bit wide integer vectors, making it ideal for the file extensions mapping within the 128-bit limit.
To use the __m128i data type, custom hash and equality functions need to be defined for the map.
Using this data type significantly reduced the overhead and improved the performance of the map operations within the critical loop.
Custom hash and equality functions
// Custom hash function for __m128i.
struct Hash128i
{
std::size_t operator()(const __m128i& key) const
{
const uint64_t* data = reinterpret_cast(&key);
return hash{}(data[0]) ^ hash{}(data[1]);
}
};
// Custom equality function for __m128i.
struct Equal128i
{
bool operator()(const __m128i& lhs, const __m128i& rhs) const
{
// Compare the __m128i values using integer comparison.
const __m128i result = _mm_cmpeq_epi32(lhs, rhs);
// Check if all elements are equal.
return _mm_movemask_epi8(result) == 0xFFFF;
}
};
Declaration
unordered_map<__m128i, lpfnFormatGetInstanceProc, Hash128i, Equal128i> registered_format_plugins_map_m128;
The project is using a function pointer as a data type, but it can be really anything.
typedef CPictureFormat* (__stdcall* lpfnFormatGetInstanceProc)();
Map string to the __m128i data type
__m128i CRegisterFormat::str_to_m128i(const WCHAR* obj)
{
// Converts the first 8 characters of Unicode string obj into a __m128i.
// Extension includes only a..z and 0..9, and 0..9 is case-insensitive,
// and is at most 8 characters long.
const size_t len = wcslen(obj);
char pointy[16] = { 0 };
memcpy(pointy, obj, min(16, 2 * len));
// Initialize __m128i with the char array.
const __m128i ext = _mm_loadu_si128(reinterpret_cast(pointy));
// Case insensitve mapping.
// The extension data is strictly A-Z0-9, so converting them to lowercase can be done by a vectorized operation bitwise OR with 0x20 (obj | 0x20). This moves A-Z to a-z while keeping 0-9, as this range already has this bit set.
// Create a __m128i variable with all bytes set to 0x20.
const static __m128i mask = _mm_set1_epi8(0x20);
// Perform bitwise OR operation on all bytes.
return _mm_or_si128(ext, mask);
}
Example usage
// Adding a new file extension with the associated function pointer for the file type.
const __m128i key(str_to_m128i(ext));
if(registered_format_plugins_map_m128.find(key) == registered_format_plugins_map_m128.end())
{
registered_format_plugins_map_m128[key] = fp;
}
// Implement the format factory.
CPictureFormat* CRegisterFormat::GetInstance(const WCHAR* obj)
{
const WCHAR* ext(wcsrchr(obj, L'.'));
auto fp = registered_format_plugins_map_m128[str_to_m128i(ext)];
if (fp)
return fp();
return NULL;
}
// Compare two extensions to check if they share the same group defined by matching function pointer.
bool CRegisterFormat::IsDifferentFormat(const WCHAR* obj1, const WCHAR* obj2)
{
// Get the file extensions.
const WCHAR* ext1(wcsrchr(obj1, L'.'));
const WCHAR* ext2(wcsrchr(obj2, L'.'));
if ((ext1 == NULL) != (ext2 == NULL))
return true;
return registered_format_plugins_map_m128[str_to_m128i(ext1)] != registered_format_plugins_map_m128[str_to_m128i(ext2)];
}