wodHtmlParser is the main engine for parsing. It will
Load
document from the disk (or use memory data put to Body
property), and create collection of wodHtmlEntity
objects. Each entity will contain it's Type,
Text,
possibly few attributes etc.
Parsing is done recursively - each entity may contain
it's own child entities. All entities are listed
through main wodHtmlParser's Parts
property, but also are accessible through parent
entities.
For example, when wodHtmlParser encounters tag like
this:
<IMG src="image.jpg"
border=0>
it will create new wodHtmlEntity object, set its type to
IMG, create two attributes (src and border), setup start
and end position, and try to extract readable text from
it.