ใ€ˆ How Benchmarking Saved Me From My Own "Optimizations"

May 13, 2026 โ€ข โœŽ edit

Iโ€™ve been working on an React/Ink based application and need to adjust the content size reactively:

Wide
Agentia Screenshot Wide Format
Normal
Agentia Screenshot Normal Format
Narrow
Agentia Screenshot Narrow Format

Ink uses string-width to understand what the expected width of each character will be in order to estimate the width a string will take up. The library uses another library: get-east-asian-width (eastAsianWidth(codePoints[i])) to determine the expected relative size of all unicode characters. It is calculated by generating the documented expected character size found in the : EastAsianWidth.txt file from the unicode spec. When I investigated the library I was surprised to see the library used a simple binary search, which is normally not the most effecient approach, over the generated lookup arrays.

get-east-asian-width utilities.js

/**
Binary search on a sorted flat array of [start, end] pairs.

@param {number[]} ranges - Flat array of inclusive [start, end] range pairs, e.g. [0, 5, 10, 20].
@param {number} codePoint - The value to search for.
@returns {boolean} Whether the value falls within any of the ranges.
*/
export const isInRange = (ranges, codePoint) => {
  let low = 0;
  let high = Math.floor(ranges.length / 2) - 1;
  while (low <= high) {
    const mid = Math.floor((low + high) / 2);
    const i = mid * 2;
    if (codePoint < ranges[i]) {
      high = mid - 1;
    } else if (codePoint > ranges[i + 1]) {
      low = mid + 1;
    } else {
      return true;
    }
  }

  return false;
};

Comparing Algorithms

I decided that it would be relatively simple to implement a more efficient lookup and got started comparing options:

binarySearch

This is the current libraries implementation modified minimally to take an range array and return a function that will perform a codepoint lookup against the array. The original function took the range array as well as the codepoint so this was an extremely simple change which makes it handy to swap between the other implementations.

export function binarySearch(flatRanges) {
	return (codePoint) => {
		let low = 0;
		let high = Math.floor(flatRanges.length / 2) - 1;
		while (low <= high) {
			const mid = Math.floor((low + high) / 2);
			const i = mid * 2;
			if (codePoint < flatRanges[i]) {
				high = mid - 1;
			} else if (codePoint > flatRanges[i + 1]) {
				low = mid + 1;
			} else {
				return true;
			}
		}

		return false;
	};
}

trieSearch

A trie is a special type of search tree with each branch representing the prefix of the search key. This implementation uses a flat trie tree with any ranges that contain more than 1 prefix appearing under both branches.

const key = (codePoint) => (codePoint >>> 8) & 0xff;
export function trieSearch(flatRanges) {
	const map = new Map();
	function addRange(key, range) {
		const ranges = map.get(key) ?? map.set(key, []).get(key);
		ranges.push(range);
	}

	const min = Math.min(...flatRanges);
	const max = Math.max(...flatRanges);

	for (let i = 0; i < flatRanges.length; i += 2) {
		const range = [flatRanges[i], flatRanges[i + 1]];
		const keyStart = key(range[0]);
		const keyEnd = key(range[1]);
		for (let k = keyStart; k <= keyEnd; k++) {
			addRange(k, range);
		}
	}

	return (codePoint) => {
		if (codePoint < min || codePoint > max) {
			return false;
		}

		return (
			map
				.get(key(codePoint))
				?.find(([low, high]) => codePoint >= low && codePoint <= high) !==
			undefined
		);
	};
}

This flat trie tree is not optimal for branches with a large number of nodes. The histogram below is a visual representation of the number of ranges that fit into each branch of the trie tree for characters which are wide.

0x00: 2 0x01: 2 0x02: 2 0x03: 2 0x04: 2 0x05: 2 0x06: 2 0x07: 2 0x08: 2 0x09: 2 0x0a: 2 0x0b: 2 0x0c: 2 0x0d: 2 0x0e: 2 0x0f: 2 0x10: 2 0x11: 3 0x12: 2 0x13: 2 0x14: 2 0x15: 2 0x16: 2 0x17: 2 0x18: 2 0x19: 2 0x1a: 2 0x1b: 2 0x1c: 2 0x1d: 2 0x1e: 2 0x1f: 2 0x20: 2 0x21: 2 0x22: 2 0x23: 7 0x24: 2 0x25: 3 0x26: 19 0x27: 12 0x28: 2 0x29: 2 0x2a: 2 0x2b: 5 0x2c: 2 0x2d: 2 0x2e: 4 0x2f: 4 0x30: 5 0x31: 6 0x32: 5 0x33: 3 0x34: 3 0x35: 3 0x36: 3 0x37: 3 0x38: 3 0x39: 3 0x3a: 3 0x3b: 3 0x3c: 3 0x3d: 3 0x3e: 3 0x3f: 3 0x40: 3 0x41: 3 0x42: 3 0x43: 3 0x44: 3 0x45: 3 0x46: 3 0x47: 3 0x48: 3 0x49: 3 0x4a: 3 0x4b: 3 0x4c: 3 0x4d: 3 0x4e: 3 0x4f: 3 0x50: 3 0x51: 3 0x52: 3 0x53: 3 0x54: 3 0x55: 3 0x56: 3 0x57: 3 0x58: 3 0x59: 3 0x5a: 3 0x5b: 3 0x5c: 3 0x5d: 3 0x5e: 3 0x5f: 3 0x60: 3 0x61: 3 0x62: 3 0x63: 3 0x64: 3 0x65: 3 0x66: 3 0x67: 3 0x68: 3 0x69: 3 0x6a: 3 0x6b: 3 0x6c: 3 0x6d: 3 0x6e: 3 0x6f: 5 0x70: 4 0x71: 4 0x72: 4 0x73: 4 0x74: 4 0x75: 4 0x76: 4 0x77: 4 0x78: 4 0x79: 4 0x7a: 4 0x7b: 4 0x7c: 4 0x7d: 4 0x7e: 4 0x7f: 4 0x80: 4 0x81: 4 0x82: 4 0x83: 4 0x84: 4 0x85: 4 0x86: 4 0x87: 4 0x88: 4 0x89: 4 0x8a: 4 0x8b: 4 0x8c: 5 0x8d: 5 0x8e: 3 0x8f: 3 0x90: 3 0x91: 3 0x92: 3 0x93: 3 0x94: 3 0x95: 3 0x96: 3 0x97: 3 0x98: 3 0x99: 3 0x9a: 3 0x9b: 3 0x9c: 3 0x9d: 3 0x9e: 3 0x9f: 3 0xa0: 3 0xa1: 3 0xa2: 3 0xa3: 3 0xa4: 4 0xa5: 2 0xa6: 2 0xa7: 2 0xa8: 2 0xa9: 3 0xaa: 2 0xab: 2 0xac: 3 0xad: 3 0xae: 3 0xaf: 6 0xb0: 4 0xb1: 9 0xb2: 4 0xb3: 3 0xb4: 3 0xb5: 3 0xb6: 3 0xb7: 3 0xb8: 3 0xb9: 3 0xba: 3 0xbb: 3 0xbc: 3 0xbd: 3 0xbe: 3 0xbf: 3 0xc0: 3 0xc1: 3 0xc2: 3 0xc3: 3 0xc4: 3 0xc5: 3 0xc6: 3 0xc7: 3 0xc8: 3 0xc9: 3 0xca: 3 0xcb: 3 0xcc: 3 0xcd: 3 0xce: 3 0xcf: 3 0xd0: 3 0xd1: 3 0xd2: 3 0xd3: 5 0xd4: 3 0xd5: 3 0xd6: 3 0xd7: 3 0xd8: 2 0xd9: 2 0xda: 2 0xdb: 2 0xdc: 2 0xdd: 2 0xde: 2 0xdf: 2 0xe0: 2 0xe1: 2 0xe2: 2 0xe3: 2 0xe4: 2 0xe5: 2 0xe6: 2 0xe7: 2 0xe8: 2 0xe9: 2 0xea: 2 0xeb: 2 0xec: 2 0xed: 2 0xee: 2 0xef: 2 0xf0: 4 0xf1: 4 0xf2: 7 0xf3: 11 0xf4: 6 0xf5: 9 0xf6: 10 0xf7: 4 0xf8: 2 0xf9: 6 0xfa: 10 0xfb: 2 0xfc: 2 0xfd: 2 0xfe: 6 0xff: 2

An easy optimization would be to increase the depth of branches where we have a heavy concentration of ranges and limit the node count to less than 7. I have not chosen to pursue this further because of the performance metrics of the entire library suggest that the lookup is not the bottle-neck (see benchmarking library entrypoints).

setSearch

This is a simple solution that uses the native javascript set and adds all of the codePoints within the ranges to the set. This allows us to use set.has to determine if the codepoint exists in the set.

export function setSearch(flatRanges) {
	const set = new Set();
	for (let i = 0; i < flatRanges.length; i += 2) {
		const from = flatRanges[i];
		const to = flatRanges[i + 1];
		for (let j = from; j <= to; j++) {
			set.add(j);
		}
	}

	return (codePoint) => set.has(codePoint);
}

bitflagSearch

This lookup represents the Set in a Uint32Array with each bit representing a value in the codepoint. The array has a bit for all codepoints between 0 and the last and max value for the collection of codepoints in the range.

export function bitflagSearch(flatRanges) {
	const max_range = flatRanges.at(-1);
	const flags = new Uint32Array((max_range >>> 5) + 1);

	for (let i = 0, length = flatRanges.length; i < length; i += 2) {
		const from = flatRanges[i];
		const to = flatRanges[i + 1];
		for (let j = from; j <= to; j++) {
			flags[j >>> 5] |= 1 << (j & 31);
		}
	}

	return (codePoint) =>
		((flags[codePoint >>> 5] >>> (codePoint & 31)) & 1) === 1;
}

offsetBitflagSearch

An offset bitflag search uses a small memory optimization of the bitflagSearch which by eliminating empty Uint32 values from the start of the Array.

We achieve this by starting the Uint32Array at the smallest bucket (WORD) which contains a bit that corresponds to a an active codepoint and storing its offset from 0.

const WORD = 0xff_ff_ff_ff;
export function offsetBitflagSearch(flatRanges) {
	const [min, max] = [flatRanges.at(0), flatRanges.at(-1)];
	const [firstWord, lastWord] = [min >>> 5, (max >>> 5) + 1];
	const flags = new Uint32Array(lastWord - firstWord);
	const offset = firstWord << 5;

	for (let i = 0, length = flatRanges.length; i < length; i += 2) {
		const from = flatRanges[i] - offset;
		const to = flatRanges[i + 1] - offset;
		const fromWord = from >>> 5;
		const toWord = to >>> 5;

		if (fromWord === toWord) {
			flags[fromWord] |= (WORD << (from & 31)) & (WORD >>> (31 - (to & 31)));
		} else {
			flags[fromWord] |= WORD << (from & 31);
			flags.fill(WORD, fromWord + 1, toWord);
			flags[toWord] |= WORD >>> (31 - (to & 31));
		}
	}

	return (codePoint) => {
		if (codePoint < min || codePoint > max) {
			return false;
		}

		return ((flags[(codePoint - offset) >>> 5] >>> (codePoint & 31)) & 1) !== 0;
	};
}

Testing the Algorithms

In order to confirm the algorithms maintain behavior and do not introduce any regressions I ran the following test:

import test from "ava";
import {
	bitflagSearch,
	biSearch,
	offsetBitflagSearch,
	setSearch,
	trieSearch,
} from "./lookup.js";
import {
	ambiguousRanges,
	fullwidthRanges,
	halfwidthRanges,
	narrowRanges,
	wideRanges,
} from "./lookup-data.js";

const ranges = Object.entries({
	ambiguousRanges,
	fullwidthRanges,
	halfwidthRanges,
	narrowRanges,
	wideRanges,
});
const lookups = Object.entries({
	bitflagSearch,
	offsetBitflagSearch,
	setSearch,
	trieSearch,
});

const max_range = Math.max(
	ambiguousRanges.at(-1),
	fullwidthRanges.at(-1),
	halfwidthRanges.at(-1),
	narrowRanges.at(-1),
	wideRanges.at(-1),
);
for (let [rangeName, range] of ranges) {
	const binarySearch = biSearch(range);
	for (let [lookupName, lookup] of lookups) {
		test(`${lookupName}(${rangeName}) matches binarySearch(${rangeName})`, (t) => {
			const lookupDiff = [];
			const search = lookup(range);

			for (let codePoint = 0; codePoint <= max_range + 1; codePoint++) {
				if (search(codePoint) !== binarySearch(codePoint)) {
					lookupDiff.push(
						`Expected ${lookupName}(${codePoint}) === ${binarySearch(codePoint)}`,
					);
				}
			}

			t.deepEqual(lookupDiff, [], "Expect no difference in lookup values");
		});
	}
}

Memory Usage of the Algorithms

Array size of Uint32 used for each range:

Range offsetBitflagSearch bitflagSearch
ambiguousRanges 34811 34816
fullwidthRanges 1664 2048
halfwidthRanges 1787 2048
narrowRanges 332 333
wideRanges 8056 8192

Benchmarking the Algorithms

In order to compare the performance of algorithms I set up a test benchmark with interchangable search algorithms:

lookup-benchmark.js

import { run, bench, summary } from "mitata";
import {
	bitflagSearch,
	binarySearch,
	offsetBitflagSearch,
	setSearch,
	trieSearch,
} from "./lookup.js";
import { ambiguousRanges } from "./lookup-data.js";

const searches = Object.entries({
	binarySearch,
	bitflagSearch,
	offsetBitflagSearch,
	setSearch,
	trieSearch,
});

const max_range = ambiguousRanges.at(-1);

summary(() => {
	for (const [searchName, searchFunction] of searches) {
		bench(`${searchName}`, function* () {
			const lookup = searchFunction(ambiguousRanges);

			yield () => {
				let checksum = 0;
				for (let cp = 0; cp <= max_range; cp++) {
					checksum ^= lookup(cp);
				}
				return checksum;
			};
		});
	}
});

await run();

Notice the checksum above. This is an important feature of the benchmark which I will discuss below in potential pitfalls.

The results are here:

% npm run lookup-benchmark

> get-east-asian-width@1.5.0 lookup-benchmark
> node --expose-gc lookup-benchmark.js

clk: ~4.39 GHz
cpu: Apple M4
runtime: node 25.7.0 (arm64-darwin)

benchmark                   avg (min โ€ฆ max) p75 / p99    (min โ€ฆ top 1%)
------------------------------------------- -------------------------------
binarySearch                  28.31 ms/iter  28.33 ms  โ–ˆ
                      (28.23 ms โ€ฆ 28.63 ms)  28.42 ms  โ–ˆโ–ˆ โ–ˆ   โ–ˆ โ–ˆ    โ–ˆ
                    (632.00  b โ€ฆ  27.51 kb)   2.04 kb โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–โ–ˆโ–โ–ˆโ–โ–โ–ˆโ–โ–โ–โ–โ–ˆ

bitflagSearch                  3.66 ms/iter   3.69 ms    โ–ˆ
                        (3.48 ms โ€ฆ 4.34 ms)   4.23 ms    โ–ˆโ–‚
                    (736.00  b โ€ฆ   6.81 kb) 768.33  b โ–‚โ–‚โ–ˆโ–ˆโ–ˆโ–‡โ–…โ–ƒโ–„โ–‚โ–โ–‚โ–‚โ–‚โ–โ–โ–‚โ–โ–โ–โ–

offsetBitflagSearch            3.65 ms/iter   3.65 ms      โ–ˆ
                        (3.49 ms โ€ฆ 4.25 ms)   4.11 ms      โ–ˆ
                    (448.00  b โ€ฆ   8.72 kb) 577.84  b โ–‚โ–โ–โ–„โ–‡โ–ˆโ–ƒโ–‚โ–‚โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–

setSearch                     19.32 ms/iter  19.88 ms                    โ–ˆโ–ƒ
                      (18.58 ms โ€ฆ 19.96 ms)  19.94 ms    โ–‚โ–‚โ–‡โ–‡โ–‚   โ–‚       โ–ˆโ–ˆ
                    (448.00  b โ€ฆ   6.53 kb) 616.65  b โ–†โ–†โ–†โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–†โ–โ–โ–ˆโ–†โ–†โ–†โ–†โ–โ–†โ–โ–ˆโ–ˆ

trieSearch                     9.64 ms/iter   9.72 ms            โ–ˆ
                       (9.27 ms โ€ฆ 10.08 ms)  10.08 ms      โ–†โ–ƒ    โ–ˆ    โ–…
                    (448.00  b โ€ฆ   6.53 kb) 532.32  b โ–„โ–ƒโ–ˆโ–ˆโ–ƒโ–ˆโ–ˆโ–ˆโ–ˆโ–ƒโ–†โ–ˆโ–ƒโ–โ–ƒโ–ˆโ–ˆโ–โ–ƒโ–ƒโ–ƒ

summary
  offsetBitflagSearch
   1x faster than bitflagSearch
   2.64x faster than trieSearch
   5.29x faster than setSearch
   7.75x faster than binarySearch

Benchmarking the library entrypoints

There are some clear winners when comparing the algorithms in this way but how much faster will the library in common use? In order to answer that question I developed the following performance test:

benchmark.js

import { run, bench, summary } from "mitata";
import { eastAsianWidth } from "./index.js";
import { load, binarySearch, bitflagSearch } from "./lookup.js";

const corpora = Object.entries({
	en: "Software engineering is the systematic application of engineering approaches to the development of software. It encompasses a wide range of disciplines, from requirement analysis and system design to implementation, testing, and maintenance. High-performance computing requires careful optimization of data structures, algorithms, and memory management to ensure efficient execution at scale. The quick brown fox jumps over the lazy dog repeatedly while developers analyze the underlying bytecode.",
	zh: "่ฝฏไปถๅทฅ็จ‹ๆ˜ฏๅฐ†ๅทฅ็จ‹ๅญฆ็š„ๆ–นๆณ•็ณป็ปŸๅœฐๅบ”็”จไบŽ่ฝฏไปถๅผ€ๅ‘็š„่ฟ‡็จ‹ใ€‚ๅฎƒๆถต็›–ไบ†ไปŽ้œ€ๆฑ‚ๅˆ†ๆžใ€็ณป็ปŸ่ฎพ่ฎกๅˆฐไปฃ็ ๅฎž็Žฐใ€ๆต‹่ฏ•ๅ’Œ็ปดๆŠค็š„ๅนฟๆณ›้ข†ๅŸŸใ€‚ๅœจ้ซ˜ๆ€ง่ƒฝ่ฎก็ฎ—็Žฏๅขƒไธญ๏ผŒๅฟ…้กปไป”็ป†ไผ˜ๅŒ–ๆ•ฐๆฎ็ป“ๆž„ใ€็ฎ—ๆณ•ๅ’Œๅ†…ๅญ˜็ฎก็†๏ผŒไปฅ็กฎไฟๅœจๅคง่ง„ๆจก่ฟ่กŒๆ—ถ็š„ๆ‰ง่กŒๆ•ˆ็އใ€‚ไธœไบšๅญ—็ฌฆ็š„ๅฎฝๅบฆๅค„็†ๅœจ็ปˆ็ซฏๆจกๆ‹Ÿๅ™จใ€ๆ–‡ๆœฌ็ผ–่พ‘ๅ™จๅ’Œๅ‘ฝไปค่กŒ็•Œ้ขไธญๅฐคไธบๅ…ณ้”ฎ๏ผŒๅ› ไธบๅ…จ่ง’ๅ’ŒๅŠ่ง’ๅญ—็ฌฆๅœจๅฑๅน•ไธŠๅ ๆฎ็š„็ฉบ้—ดไธๅŒใ€‚ไฝ ๅฅฝ๏ผŒ่ฟ™ๆ˜ฏไธ€ไธชๅŒ…ๅซๅคš็งๅญ—็ฌฆ็š„ๆต‹่ฏ•ๆ–‡ๆœฌใ€‚",
	es: "La ingenierรญa de software es la aplicaciรณn sistemรกtica de enfoques de ingenierรญa para el desarrollo de software. Abarca una amplia gama de disciplinas, desde el anรกlisis de requisitos y el diseรฑo del sistema hasta la implementaciรณn, las pruebas y el mantenimiento. La informรกtica de alto rendimiento requiere una cuidadosa optimizaciรณn de las estructuras de datos, los algoritmos y la gestiรณn de la memoria para garantizar una ejecuciรณn eficiente a gran escala. El pingรผino viviรณ en el รกrtico; maรฑana serรก un gran dรญa para programar.",
	emoji:
		"๐Ÿš€๐ŸŒŸ๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿณ๏ธโ€๐ŸŒˆ๐Ÿ”ฅโœจ๐ŸŽ‰๐ŸŽˆ๐Ÿ•๐Ÿ”๐Ÿฃ๐Ÿš—โœˆ๏ธ๐ŸŒ๐ŸŒ™โ˜€๏ธ๐Ÿ˜Ž๐Ÿค”๐Ÿ˜‚๐Ÿ˜ญโค๏ธ๐Ÿ‘๐Ÿ™Œ๐Ÿ™๐Ÿ’ช๐Ÿƒโ€โ™€๏ธ๐Ÿšถโ€โ™‚๏ธ๐Ÿถ๐Ÿฑ๐Ÿผ๐Ÿ’๐ŸŒธ๐ŸŒบ๐ŸŒฒ๐ŸŒตโšก๏ธ๐Ÿ’ง๐ŸŒช๏ธ๐Ÿ“ฑ๐Ÿ’ปโŒš๏ธ๐Ÿ•น๏ธ๐Ÿ’ก๐Ÿ“šโœ‚๏ธ๐Ÿ“๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ๐Ÿคฆ๐Ÿฝโ€โ™€๏ธ๐Ÿ‡บ๐Ÿ‡ณ๐Ÿฅท๐Ÿฟ๐ŸงŸโ€โ™‚๏ธ๐Ÿง™โ€โ™€๏ธ๐Ÿงšโ€โ™‚๏ธ๐Ÿงœโ€โ™€๏ธ๐Ÿง›โ€โ™‚๏ธ",
	mixed:
		"Agent v2.0 ๐Ÿš€: ๆฌข่ฟŽไฝฟ็”จ (Welcome to) the terminal framework. Loading configuration... [100%] โœ…. ็”จๆˆท้…็ฝฎๅทฒๅŠ ่ฝฝใ€‚Warning: ๅ†…ๅญ˜ไฝฟ็”จ็އ (Memory usage) exceeds 80% โš ๏ธ. Proceeding with optimization รกrtico.",
});
const algorithms = Object.entries({ binarySearch, bitflagSearch });

for (const [language, initialText] of corpora) {
	summary(() => {
		for (const [lookupName, lookup] of algorithms) {
			bench(`${lookupName}::${language}`, function* (state) {
				const size = state.get("size");
				const codePoints = [...text.repeat(size)].map((c) => c.codePointAt(0));
				load(lookup);

				yield () => {
					return codePoints.reduce((sum, cp) => sum + eastAsianWidth(cp));
				};
			}).range("size", 256, 1024);
		}
	});
}

await run();

We must ensure the results of library are returned in the yield block so that V8 wont optimize out unused work.

This benchmark shows that, in normal usage, the lookup algorithm is not a bottleneck and has a negligible effect on the performance of the library.

% npm run benchmark

> get-east-asian-width@1.5.0 benchmark
> node --expose-gc benchmark.js

clk: ~4.40 GHz
cpu: Apple M4
runtime: node 25.7.0 (arm64-darwin)

benchmark                   avg (min โ€ฆ max) p75 / p99    (min โ€ฆ top 1%)
------------------------------------------- -------------------------------
binarySearch::en             656.08 ยตs/iter 660.75 ยตs              โ–ˆโ–ƒ
                    (601.54 ยตs โ€ฆ 701.79 ยตs) 680.71 ยตs             โ–„โ–ˆโ–ˆโ–„
                    (504.00  b โ€ฆ   2.09 mb)   2.57 kb โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–‚โ–ƒโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–„โ–‚โ–‚โ–

binarySearch::en               2.64 ms/iter   2.66 ms       โ–‡โ–ƒ โ–…โ–ˆ
                        (2.55 ms โ€ฆ 2.80 ms)   2.75 ms      โ–…โ–ˆโ–ˆโ–‚โ–ˆโ–ˆโ–‡โ–ƒ
                    (504.00  b โ€ฆ   2.48 mb)   9.97 kb โ–‚โ–‚โ–‚โ–„โ–…โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–„โ–„โ–…โ–„โ–โ–‚โ–‚

bitflagSearch::en            654.03 ยตs/iter 657.96 ยตs             โ–ˆ
                    (604.92 ยตs โ€ฆ 702.04 ยตs) 681.67 ยตs             โ–ˆโ–ˆโ–†
                    (504.00  b โ€ฆ   2.31 mb)   2.68 kb โ–โ–โ–โ–โ–โ–โ–โ–‚โ–โ–‚โ–ƒโ–‡โ–ˆโ–ˆโ–ˆโ–†โ–ƒโ–‚โ–‚โ–โ–

bitflagSearch::en              2.64 ms/iter   2.66 ms         โ–ƒโ–‡โ–ˆ
                        (2.56 ms โ€ฆ 2.83 ms)   2.71 ms         โ–ˆโ–ˆโ–ˆโ–„โ–ˆ โ–‚
                    (504.00  b โ€ฆ   2.39 mb)   9.67 kb โ–‚โ–‚โ–ƒโ–ƒโ–‚โ–ƒโ–ƒโ–„โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–…โ–†โ–ˆโ–„โ–ƒโ–‚

summary
  binarySearch::en
   1โ€ฆ1x faster than bitflagSearch::en

------------------------------------------- -------------------------------
binarySearch::zh             451.21 ยตs/iter 456.67 ยตs            โ–†โ–„โ–ˆโ–‚
                    (408.88 ยตs โ€ฆ 493.58 ยตs) 475.04 ยตs            โ–ˆโ–ˆโ–ˆโ–ˆโ–ƒ
                    (504.00  b โ€ฆ   1.73 mb)   1.62 kb โ–‚โ–‚โ–‚โ–‚โ–โ–‚โ–‚โ–ƒโ–‚โ–‡โ–†โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–†โ–†โ–ƒโ–‚

binarySearch::zh               1.80 ms/iter   1.82 ms               โ–ˆ
                        (1.69 ms โ€ฆ 1.87 ms)   1.86 ms             โ–‡โ–„โ–ˆโ–ˆโ–„
                    (504.00  b โ€ฆ   1.56 mb)   4.54 kb โ–‚โ–‚โ–‚โ–‚โ–‚โ–ƒโ–‚โ–ƒโ–…โ–„โ–‡โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–„โ–„โ–‚

bitflagSearch::zh            453.90 ยตs/iter 459.58 ยตs              โ–ˆ
                    (409.21 ยตs โ€ฆ 628.54 ยตs) 478.92 ยตs            โ–†โ–†โ–ˆโ–‚
                    (504.00  b โ€ฆ   1.65 mb)   1.76 kb โ–‚โ–‚โ–โ–โ–โ–โ–‚โ–‚โ–‚โ–„โ–‡โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–…โ–„โ–ƒโ–‚

bitflagSearch::zh              1.81 ms/iter   1.83 ms             โ–ˆโ–ˆโ–ˆโ–‚
                        (1.69 ms โ€ฆ 1.88 ms)   1.87 ms            โ–‡โ–ˆโ–ˆโ–ˆโ–ˆโ–‡
                    (504.00  b โ€ฆ   1.52 mb)   4.48 kb โ–‚โ–โ–‚โ–‚โ–ƒโ–โ–‚โ–ƒโ–„โ–…โ–„โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–†โ–ƒ

summary
  binarySearch::zh
   +1.01โ€ฆ+1.01x faster than bitflagSearch::zh

------------------------------------------- -------------------------------
binarySearch::es             729.43 ยตs/iter 736.25 ยตs          โ–ˆ
                    (684.96 ยตs โ€ฆ 797.29 ยตs) 765.67 ยตs         โ–‚โ–ˆโ–‡โ–†โ–…โ–ƒ
                    (504.00  b โ€ฆ   2.13 mb)   2.74 kb โ–โ–โ–โ–โ–โ–โ–‚โ–„โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–…โ–„โ–ƒโ–‚โ–‚โ–

binarySearch::es               2.91 ms/iter   2.93 ms             โ–…โ–†โ–ˆโ–†
                        (2.80 ms โ€ฆ 2.99 ms)   2.97 ms           โ–…โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡
                    (504.00  b โ€ฆ   1.96 mb)   8.75 kb โ–ƒโ–โ–‚โ–‚โ–„โ–โ–ƒโ–‚โ–„โ–„โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ƒโ–„

bitflagSearch::es            729.47 ยตs/iter 736.21 ยตs           โ–†โ–ˆ
                    (680.67 ยตs โ€ฆ 833.13 ยตs) 760.75 ยตs           โ–ˆโ–ˆโ–ˆโ–„ โ–ƒ
                    (504.00  b โ€ฆ   2.27 mb)   3.27 kb โ–โ–โ–โ–โ–โ–โ–‚โ–ƒโ–„โ–…โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–„โ–ƒโ–‚โ–‚

bitflagSearch::es              2.91 ms/iter   2.93 ms          โ–ˆโ–ƒ
                        (2.82 ms โ€ฆ 3.07 ms)   2.99 ms         โ–‡โ–ˆโ–ˆโ–†โ–‚โ–‡โ–‚
                    (504.00  b โ€ฆ   1.76 mb)   7.91 kb โ–‚โ–‚โ–‚โ–‚โ–‚โ–„โ–…โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ƒโ–‚โ–…โ–‚โ–‚โ–‚

summary
  bitflagSearch::es
   1โ€ฆ1x faster than binarySearch::es

------------------------------------------- -------------------------------
binarySearch::emoji          671.55 ยตs/iter 684.33 ยตs      โ–…โ–„ โ–‡โ–ˆโ–‡โ–„
                    (632.33 ยตs โ€ฆ 804.17 ยตs) 721.96 ยตs  โ–„ โ–ƒโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–…โ–‡โ–„
                    (120.00  b โ€ฆ   1.19 mb)   2.20 kb โ–†โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–†โ–…โ–…โ–†โ–ƒ

binarySearch::emoji            2.70 ms/iter   2.75 ms              โ–ƒโ–ƒโ–ˆโ–†
                        (2.55 ms โ€ฆ 2.82 ms)   2.80 ms         โ–…โ–†โ–ˆ โ–„โ–ˆโ–ˆโ–ˆโ–ˆโ–…
                    (120.00  b โ€ฆ   1.21 mb)   4.84 kb โ–ƒโ–†โ–ƒโ–ƒโ–ˆโ–„โ–„โ–„โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‚

bitflagSearch::emoji         677.12 ยตs/iter 690.25 ยตs           โ–„โ–ˆโ–…
                    (632.71 ยตs โ€ฆ 826.08 ยตs) 722.71 ยตs       โ–„โ–„โ–ˆโ–†โ–ˆโ–ˆโ–ˆโ–ˆโ–‡
                    (120.00  b โ€ฆ   1.16 mb)   1.52 kb โ–„โ–ˆโ–†โ–†โ–‡โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–…โ–„โ–‚โ–

bitflagSearch::emoji           2.74 ms/iter   2.78 ms         โ–ƒโ–ˆโ–‚
                        (2.56 ms โ€ฆ 2.97 ms)   2.94 ms       โ–…โ–ƒโ–ˆโ–ˆโ–ˆโ–…โ–…
                    (120.00  b โ€ฆ   1.15 mb)   4.67 kb โ–ƒโ–‚โ–ƒโ–„โ–†โ–†โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–‡โ–„โ–‚โ–ƒโ–ƒโ–โ–ƒ

summary
  binarySearch::emoji
   +1.01โ€ฆ+1.01x faster than bitflagSearch::emoji

------------------------------------------- -------------------------------
binarySearch::mixed          308.42 ยตs/iter 313.08 ยตs           โ–ˆโ–ƒโ–‚
                    (272.42 ยตs โ€ฆ 468.42 ยตs) 336.04 ยตs          โ–…โ–ˆโ–ˆโ–ˆโ–„
                    (120.00  b โ€ฆ   2.48 mb) 230.04 kb โ–โ–โ–‚โ–‚โ–‚โ–‚โ–…โ–„โ–„โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–†โ–…โ–ƒโ–‚โ–‚โ–

binarySearch::mixed            1.23 ms/iter   1.24 ms            โ–ˆโ–‡โ–ˆ
                        (1.15 ms โ€ฆ 1.30 ms)   1.28 ms         โ–‚โ–‡โ–‡โ–ˆโ–ˆโ–ˆโ–ˆโ–†โ–‡
                    (120.00  b โ€ฆ   1.65 mb)   3.06 kb โ–‚โ–โ–‚โ–‚โ–โ–ƒโ–ƒโ–†โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–„โ–ƒโ–‚

bitflagSearch::mixed         308.65 ยตs/iter 313.37 ยตs           โ–‚โ–ˆ
                    (275.29 ยตs โ€ฆ 412.12 ยตs) 334.42 ยตs          โ–ƒโ–ˆโ–ˆโ–ˆโ–ƒ
                    (120.00  b โ€ฆ   2.48 mb) 229.96 kb โ–โ–โ–‚โ–โ–‚โ–„โ–„โ–…โ–†โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–…โ–ƒโ–‚โ–‚โ–

bitflagSearch::mixed           1.24 ms/iter   1.25 ms          โ–†โ–ƒโ–ˆ
                        (1.17 ms โ€ฆ 1.30 ms)   1.29 ms         โ–…โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ƒ
                    (120.00  b โ€ฆ   1.78 mb)   3.31 kb โ–‚โ–‚โ–‚โ–‚โ–„โ–„โ–†โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–‡โ–„โ–‚โ–

summary
  binarySearch::mixed
   +1.01โ€ฆ1x faster than bitflagSearch::mixed

Conclusions

It pays to verify your assumptions when trying to optimize an algorithm. I initially assumed that replacing the binary search with a more effecient algorithm would result in better performance. But as the benchmarks showed, theoretical efficiency doesnโ€™t always beat practical simplicity. Iโ€™m glad I took the time to measure the actual impact before going ahead with a more complex solution.

Leave a Comment!