HTML Keyless Extractor

Browser-only prototype for converting pasted HTML into raw DOM evidence, cleaned text records, and positional keyless arrays.

Record JAI-TOOL-0024

Path /tools/html-keyless-extractor/

Use Canonical public record

Plain English

The HTML Keyless Extractor turns HTML source into a reviewable compact-data candidate without treating markup as semantic truth.

Technical summary

It separates raw parse evidence, cleaned visible text, normalization choice, registry candidates, warnings, and the final positional array so a reviewer can see what was preserved and what was compressed.

Deep spec

This is a local implementation prototype. It does not replace schema validation, signed registry snapshots, production canonicalization, or UAIX.org protocol authority.

Extractor stages

Parse pasted HTML in the browser and inventory source tags.
Remove script-like and invisible source concerns from the visible-text candidate.
Normalize text with NFC or NFKC and split it into reviewable units.
Attach demo registry hits separately from the positional keyless array.
Warn when raw HTML evidence and cleaned text are not enough for production reuse.

HTML source

Normalization Output view

Raw DOM evidence

Cleaned text record

Keyless array

Extractor stages#

Turn HTML into a reviewable keyless candidate#

Extractor stages

Turn HTML into a reviewable keyless candidate