Plain English
The HTML Keyless Extractor turns HTML source into a reviewable compact-data candidate without treating markup as semantic truth.
Technical summary
It separates raw parse evidence, cleaned visible text, normalization choice, registry candidates, warnings, and the final positional array so a reviewer can see what was preserved and what was compressed.
Deep spec
This is a local implementation prototype. It does not replace schema validation, signed registry snapshots, production canonicalization, or UAIX.org protocol authority.
Extractor stages
- Parse pasted HTML in the browser and inventory source tags.
- Remove script-like and invisible source concerns from the visible-text candidate.
- Normalize text with NFC or NFKC and split it into reviewable units.
- Attach demo registry hits separately from the positional keyless array.
- Warn when raw HTML evidence and cleaned text are not enough for production reuse.
HTML Keyless Extractor
Turn HTML into a reviewable keyless candidate
This browser-only prototype follows the intake reports: parse HTML as source evidence, clean visible text, normalize it, then emit a compact positional array that still depends on declared registry meaning.