Too many cooks spoil the DOM (getting content out of page elements)

I frequently find myself wanting to refer to the contents of an element on a web page, having got hold of that element in the DOM. However, I equally frequently forget how to access this content.

The confusion arises because there are various ways to get hold of the content of an element: nodeValue, value, data, textContent, innerText and innerHTML all seem to do the job in different circumstances. In the hope that I’m not the only one with this problem, I looked into the matter…

In a nutshell:

  • nodeValue returns attributes or text (read/write)
  • value returns the text content of an attribute (read)
  • data is used to access text content of a node (read/write)
  • textContent returns the text content of a node and its descendents (read/write)
  • innerText returns the text content of a node and its descendents (read/write)
  • innerHTML returns the HTML contained by an element (read/write)

nodeValue, counter-intuitively, does not return the text content of a node such as a paragraph i.e. it is not a W3C analogue of innerHTML. nodeValue is in the DOM Core specification and is well supported by browers. It is a read/write attribute.

This table from developer.mozilla.org shows the return value of accessing nodeValue on various nodes:

Attr value of attribute
CDATASection content of the CDATA Section
Comment content of the comment
Document null
DocumentFragment null
DocumentType null
Element null
NamedNodeMap null
EntityReference null
Notation null
ProcessingInstruction entire content excluding the target
Text content of the text node

value returns the text value of a node’s attribute. This does not work with text nodes. value is in the DOM Core Specification and it is very poorly supported in Windows IE, although Firefox does support it. It is a read-only attribute.

data is used to address the content of a text node. It is the same as nodeValue for that node. data is in the DOM Core Specification and it is well supported by browsers. It is a read/write attribute.

textContent/innerText returns the text contained in a node and its descendants, ignoring any contained tags. Whilst textContent is in the DOM Level 3 Core Specification and it is not really supported at all except in Firefox, innerText is well supported by browsers, the exception being buggy support in Mozilla 1.75. Both are read/write attributes.

innerHTML returns the markup contained in a node and its descendants. Whilst innerHTML is not in a W3C specification, most browsers support it and it is often faster than W3C DOM methods. It is a read/write attribute.

I’m heavily indebted to quirksmode.org and the W3C DOM specifications for this information.

Advertisements

5 Comments

  1. FND
    Posted September 5, 2007 at 5:38 pm | Permalink

    Thanks for this; this goes straight into my WebDev tiddler…

  2. Posted September 6, 2007 at 11:38 am | Permalink

    Thanks Fred!

  3. Posted May 30, 2008 at 2:58 pm | Permalink

    FWIW, you’ll mostly end up using something like var text = el.innerText || el.textContent;

  4. Posted May 30, 2008 at 4:32 pm | Permalink

    Nice one FND, that’s a pretty darn useful point.

  5. Posted June 11, 2008 at 12:32 pm | Permalink

    Apparently, node.text works as well in IE:
    var text = el.text || el.textContent;