Svelte Hacker News logo
  • top
  • new
  • show
  • ask
  • jobs
  • about

Show HN: FlashTokenizer – 10x faster C++ tokenizer for Python

github.com

5 points by springkim 4 months ago

I built a tokenizer in C++ with a Python binding that outperforms HuggingFace tokenizers by 10x on large inputs. It's optimized for minimal memory usage and latency.

Benchmarks and comparison included in README. Would love feedback or contributions!