#String Analysis

xsl:analyze-string applies a regular expression to a string and processes the matched and unmatched portions separately. It is the XSLT equivalent of iterating over regex matches — you get fine-grained control over how each match is transformed into output.

#Contents

xsl:analyze-string
xsl:matching-substring
xsl:non-matching-substring
Capture Groups
Regex Flags
Comparison with replace()
Common Patterns

#xsl:analyze-string

The instruction takes a string and a regular expression. It splits the string into alternating matching and non-matching segments, then processes each segment with the corresponding child instruction.

<xsl:analyze-string select="$input" regex="pattern">
  <xsl:matching-substring>
    <!-- what to do with matched portions -->
  </xsl:matching-substring>
  <xsl:non-matching-substring>
    <!-- what to do with unmatched portions -->
  </xsl:non-matching-substring>
</xsl:analyze-string>

#Attributes

Attribute	Description
`select`	XPath expression that evaluates to the string to analyze
`regex`	The regular expression pattern (uses XPath/XML Schema regex syntax)
`flags`	Optional regex flags: `i`, `m`, `s`, `x`

#Basic Example

Detect and linkify email addresses in text:

<xsl:template match="description">
  <p>
    <xsl:analyze-string select="." regex="[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{{2,}}">
      <xsl:matching-substring>
        <a href="mailto:{.}"><xsl:value-of select="."/></a>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <xsl:value-of select="."/>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </p>
</xsl:template>

Given <description>Contact us at [email protected] or [email protected] for details.</description>, output:

<p>Contact us at <a href="mailto:[email protected]">[email protected]</a>
or <a href="mailto:[email protected]">[email protected]</a> for details.</p>

Important: Inside the regex attribute, curly braces must be doubled ({{ and }}) because the attribute is an AVT. A single { would be interpreted as the start of an XPath expression.

C# parallel:

var result = Regex.Replace(input, @"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
    match => $"<a href=\"mailto:{match.Value}\">{match.Value}</a>");

#xsl:matching-substring

Processes each portion of the string that matches the regex. Inside this element:

. (dot) refers to the matched substring (as a string, not a node)
regex-group(N) returns the Nth capture group

<xsl:analyze-string select="'Price: $29.99, Discount: $5.00'" regex="\$(\d+\.\d{{2}})">
  <xsl:matching-substring>
    <span class="price"><xsl:value-of select="."/></span>
  </xsl:matching-substring>
  <xsl:non-matching-substring>
    <xsl:value-of select="."/>
  </xsl:non-matching-substring>
</xsl:analyze-string>

Output:

Price: <span class="price">$29.99</span>, Discount: <span class="price">$5.00</span>

#xsl:non-matching-substring

Processes each portion of the string between matches. If omitted, non-matching text is discarded.

<!-- Extract only the numbers, ignoring everything else -->
<xsl:analyze-string select="'Order #12345, Items: 3, Total: $99.50'" regex="\d+\.?\d*">
  <xsl:matching-substring>
    <number><xsl:value-of select="."/></number>
  </xsl:matching-substring>
  <!-- non-matching-substring omitted: non-numeric text is dropped -->
</xsl:analyze-string>

Output:

<number>12345</number>
<number>3</number>
<number>99.50</number>

#Both Are Optional

You can include either, both, or neither (though neither would be pointless). The order does not matter — xsl:matching-substring and xsl:non-matching-substring can appear in any order as children of xsl:analyze-string.

#Capture Groups

Parentheses in the regex define capture groups. The regex-group(N) function returns the string matched by the Nth group. Group 0 is the entire match (same as .).

#Example: Parsing a Date String

<xsl:analyze-string select="'2026-03-19'" regex="(\d{{4}})-(\d{{2}})-(\d{{2}})">
  <xsl:matching-substring>
    <date>
      <year><xsl:value-of select="regex-group(1)"/></year>
      <month><xsl:value-of select="regex-group(2)"/></month>
      <day><xsl:value-of select="regex-group(3)"/></day>
    </date>
  </xsl:matching-substring>
</xsl:analyze-string>

Output:

<date>
  <year>2026</year>
  <month>03</month>
  <day>19</day>
</date>

#Example: Parsing Key-Value Pairs

<xsl:variable name="input" select="'name=Widget;category=electronics;price=29.99'"/>
<xsl:analyze-string select="$input" regex="([^=;]+)=([^;]+)">
  <xsl:matching-substring>
    <field name="{regex-group(1)}">
      <xsl:value-of select="regex-group(2)"/>
    </field>
  </xsl:matching-substring>
</xsl:analyze-string>

Output:

<field name="name">Widget</field>
<field name="category">electronics</field>
<field name="price">29.99</field>

#Nested Groups

Groups are numbered by the position of their opening parenthesis, left to right:

<!-- regex: ((https?)://([^/]+))(/.*)? -->
<!-- Group 1: full authority (https://example.com) -->
<!-- Group 2: scheme (https) -->
<!-- Group 3: host (example.com) -->
<!-- Group 4: path (/products/123) -->
<xsl:analyze-string select="$url" regex="((https?)://([^/]+))(/.*)?">
  <xsl:matching-substring>
    <url>
      <scheme><xsl:value-of select="regex-group(2)"/></scheme>
      <host><xsl:value-of select="regex-group(3)"/></host>
      <path><xsl:value-of select="regex-group(4)"/></path>
    </url>
  </xsl:matching-substring>
</xsl:analyze-string>

C# parallel: regex-group(N) is equivalent to match.Groups[N].Value:

var match = Regex.Match(url, @"((https?)://([^/]+))(/.*)?");
var scheme = match.Groups[2].Value;  // "https"
var host = match.Groups[3].Value;    // "example.com"
var path = match.Groups[4].Value;    // "/products/123"

#Regex Flags

The flags attribute accepts a string of flag characters that modify regex behavior. These are the same flags used by the XPath matches(), replace(), and tokenize() functions:

Flag	Name	Description
`i`	Case-insensitive	`a` matches both `a` and `A`
`m`	Multi-line	`^` and `$` match start/end of each line, not just the string
`s`	Dot-all	`.` matches newline characters (by default it does not)
`x`	Extended	Whitespace in the regex is ignored (for readability); use `\s` for literal whitespace

#Examples

<!-- Case-insensitive matching -->
<xsl:analyze-string select="$text" regex="error|warning|info" flags="i">
  <xsl:matching-substring>
    <span class="{lower-case(.)}"><xsl:value-of select="."/></span>
  </xsl:matching-substring>
  <xsl:non-matching-substring>
    <xsl:value-of select="."/>
  </xsl:non-matching-substring>
</xsl:analyze-string>
<!-- Matches "Error", "WARNING", "Info", etc. -->
<!-- Extended mode for readable complex patterns -->
<xsl:analyze-string select="$text" flags="x"
    regex="(\d{{1,3}}) \.(\d{{1,3}}) \.(\d{{1,3}}) \.(\d{{1,3}})">
  <xsl:matching-substring>
    <ip><xsl:value-of select="."/></ip>
  </xsl:matching-substring>
</xsl:analyze-string>

#Regex Syntax Notes

XSLT uses XML Schema regular expressions (with XPath extensions), not Perl-compatible regexes. Key differences from C#'s System.Text.RegularExpressions:

Feature	C# (.NET Regex)	XSLT/XPath Regex
Anchors	`^`, `$`	`^`, `$` (only with `m` flag; regex always matches full segments)
Lazy quantifiers	`*?`, `+?`	`*?`, `+?` (supported)
Backreferences	`\1`, `\2`	Not supported in match/replace (only in `regex-group()` for `analyze-string`)
Lookahead/lookbehind	`(?=...)`, `(?<=...)`	Not supported
Named groups	`(?<name>...)`	Not supported
Character class subtraction	Not standard	`[a-z-[aeiou]]` (vowels removed from a-z)

The lack of lookahead/lookbehind means some complex patterns require different approaches in XSLT — often by using xsl:analyze-string with simpler patterns and handling the logic in the matching/non-matching substring bodies.

#Comparison with replace()

XPath provides a replace() function for simple regex substitutions. When should you use each?

#replace()

Best for simple text-to-text substitutions where the output is a plain string:

<!-- Simple substitution -->
<xsl:value-of select="replace($text, '\bfoo\b', 'bar')"/>
<!-- Using capture groups in replacement -->
<xsl:value-of select="replace($phone, '(\d{3})(\d{3})(\d{4})', '($1) $2-$3')"/>
<!-- Remove all non-alphanumeric characters -->
<xsl:value-of select="replace($text, '[^a-zA-Z0-9\s]', '')"/>

#xsl:analyze-string

Use when:

You need to produce markup (elements, attributes) from the matches — replace() can only produce strings
Different matches need different treatment — the matching-substring body can use xsl:choose or other conditionals
You need multiple capture groups processed independently
The non-matching portions also need transformation

<!-- replace() CANNOT do this — it can only produce strings, not elements -->
<xsl:analyze-string select="$text" regex="https?://\S+">
  <xsl:matching-substring>
    <a href="{.}"><xsl:value-of select="."/></a>
  </xsl:matching-substring>
  <xsl:non-matching-substring>
    <xsl:value-of select="."/>
  </xsl:non-matching-substring>
</xsl:analyze-string>

#Decision Guide

Need	Use
Replace text with text	`replace()`
Remove characters	`replace($text, 'pattern', '')`
Wrap matches in elements	`xsl:analyze-string`
Extract structured data from strings	`xsl:analyze-string`
Simple find-and-replace	`replace()`
Different handling for matched vs. unmatched	`xsl:analyze-string`

#Common Patterns

#Syntax Highlighting

Highlight keywords in a code snippet:

<xsl:template match="code[@language='sql']">
  <pre class="code sql">
    <xsl:analyze-string select="."
        regex="(SELECT|FROM|WHERE|JOIN|ON|AND|OR|INSERT|UPDATE|DELETE|CREATE|DROP|ALTER|ORDER\s+BY|GROUP\s+BY|HAVING|LIMIT|OFFSET|AS|IN|NOT|NULL|IS|LIKE|BETWEEN|EXISTS|DISTINCT|SET|INTO|VALUES)"
        flags="i">
      <xsl:matching-substring>
        <span class="keyword"><xsl:value-of select="upper-case(.)"/></span>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <!-- Highlight strings within non-keyword text -->
        <xsl:analyze-string select="." regex="'[^']*'">
          <xsl:matching-substring>
            <span class="string"><xsl:value-of select="."/></span>
          </xsl:matching-substring>
          <xsl:non-matching-substring>
            <xsl:value-of select="."/>
          </xsl:non-matching-substring>
        </xsl:analyze-string>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </pre>
</xsl:template>

This nests two xsl:analyze-string calls — first to highlight keywords, then to highlight string literals in the remaining text.

#Link Detection in Text

Convert URLs and email addresses in plain text to clickable links:

<xsl:function name="my:linkify" as="node()*">
  <xsl:param name="text" as="xs:string"/>
  <xsl:analyze-string select="$text"
      regex="(https?://[^\s&lt;&gt;&quot;]+)|([\w.+-]+@[\w.-]+\.\w{{2,}})">
    <xsl:matching-substring>
      <xsl:choose>
        <xsl:when test="regex-group(1)">
          <a href="{.}" target="_blank"><xsl:value-of select="."/></a>
        </xsl:when>
        <xsl:when test="regex-group(2)">
          <a href="mailto:{.}"><xsl:value-of select="."/></a>
        </xsl:when>
      </xsl:choose>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <xsl:value-of select="."/>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:function>
<!-- Usage -->
<xsl:template match="comment">
  <p class="comment">
    <xsl:sequence select="my:linkify(string(.))"/>
  </p>
</xsl:template>

#Data Extraction from Formatted Strings

Parse a price string like "$1,234.56" into a number:

<xsl:function name="my:parse-price" as="xs:decimal">
  <xsl:param name="price-string" as="xs:string"/>
  <xsl:variable name="cleaned" select="replace($price-string, '[$,\s]', '')"/>
  <xsl:sequence select="xs:decimal($cleaned)"/>
</xsl:function>
<!-- For more complex parsing, use analyze-string -->
<xsl:function name="my:parse-money" as="map(xs:string, item())">
  <xsl:param name="text" as="xs:string"/>
  <xsl:variable name="result" as="map(xs:string, item())*">
    <xsl:analyze-string select="$text" regex="([A-Z]{{3}}|\$|€|£)\s*([0-9,]+\.?\d*)">
      <xsl:matching-substring>
        <xsl:sequence select="map {
          'currency': regex-group(1),
          'amount': xs:decimal(replace(regex-group(2), ',', ''))
        }"/>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </xsl:variable>
  <xsl:sequence select="$result[1]"/>
</xsl:function>

#Markdown-Like Formatting

Convert simple markup conventions in plain text:

<xsl:function name="my:simple-format" as="node()*">
  <xsl:param name="text" as="xs:string"/>
  <!-- Bold: **text** -->
  <xsl:analyze-string select="$text" regex="\*\*([^*]+)\*\*">
    <xsl:matching-substring>
      <strong><xsl:value-of select="regex-group(1)"/></strong>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <!-- Italic: *text* -->
      <xsl:analyze-string select="." regex="\*([^*]+)\*">
        <xsl:matching-substring>
          <em><xsl:value-of select="regex-group(1)"/></em>
        </xsl:matching-substring>
        <xsl:non-matching-substring>
          <!-- Code: `text` -->
          <xsl:analyze-string select="." regex="`([^`]+)`">
            <xsl:matching-substring>
              <code><xsl:value-of select="regex-group(1)"/></code>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
              <xsl:value-of select="."/>
            </xsl:non-matching-substring>
          </xsl:analyze-string>
        </xsl:non-matching-substring>
      </xsl:analyze-string>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:function>

#CSV Line Parser

Parse a single CSV line into fields, handling quoted fields:

<xsl:function name="my:parse-csv-line" as="xs:string*">
  <xsl:param name="line" as="xs:string"/>
  <xsl:analyze-string select="$line" regex='("([^"]*(?:""[^"]*)*)"|([^,]*))(?:,|$)'>
    <xsl:matching-substring>
      <xsl:choose>
        <xsl:when test="regex-group(2)">
          <!-- Quoted field: unescape doubled quotes -->
          <xsl:sequence select="replace(regex-group(2), '&quot;&quot;', '&quot;')"/>
        </xsl:when>
        <xsl:otherwise>
          <!-- Unquoted field -->
          <xsl:sequence select="regex-group(3)"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:matching-substring>
  </xsl:analyze-string>
</xsl:function>

C# parallel summary:

XSLT	C#
`xsl:analyze-string`	`Regex.Matches()` + loop, or `MatchEvaluator`
`xsl:matching-substring`	Code inside the match loop
`xsl:non-matching-substring`	Text between matches
`regex-group(N)`	`match.Groups[N].Value`
`replace($s, $pat, $rep)`	`Regex.Replace(s, pat, rep)`
`flags="i"`	`RegexOptions.IgnoreCase`
`flags="s"`	`RegexOptions.Singleline`
`flags="m"`	`RegexOptions.Multiline`
`flags="x"`	`RegexOptions.IgnorePatternWhitespace`