I love reading stuff on the internet.
The web is messy though. Articles buried under navigation menus, ads, cookie banners, and whatever else modern websites throw at you. Mozilla's Reader Mode is wonderful at cutting through this clutter—it's genuinely impressive how well it works across different sites.
The thing is, it's written in JavaScript.
For Kindlepathy, I initially used Readability.js by spinning up a Bun HTTP server as a child process. It works, but feels a bit messy for what was essentially a function call.
This time around, I'm using the embedded JavaScript runtime approach with rquickjs
, packaged as a proper Rust library.
use readability_js::Readability;
let reader = Readability::new()?;
let html = std::fs::read_to_string("messy-article.html")?;
let article = reader.parse_with_url(&html, "https://example.com")?;
println!("Title: {}", article.title);
println!("Clean content: {}", article.content);
Why not use a Rust port? There are some good ones that perform better. But I wanted to stick with the battle-tested Mozilla implementation I'd already been using. ~30ms to initialize the JavaScript engine, then ~10ms per document isn't too bad.
The CLI is pretty handy too. It converts the clean HTML into Markdown, which is excellent for terminal usage:
# installs the `readable` binary
cargo install readability-js-cli
# Extract from URL, output clean Markdown
readable https://egemengol.com/blog/readability/ > clean.md
# Pipeline friendly
curl -s https://news.site/story | readable | bat -l markdown
Bonus: Great for AI applications that analyze web content. By removing styling and excess markup, you can cut token usage by ~70% while keeping all the meaningful text.