#String Analysis
xsl:analyze-string applies a regular expression to a string and processes the matched and unmatched portions separately. It is the XSLT equivalent of iterating over regex matches — you get fine-grained control over how each match is transformed into output.
#Contents
#xsl:analyze-string
The instruction takes a string and a regular expression. It splits the string into alternating matching and non-matching segments, then processes each segment with the corresponding child instruction.
<xsl:analyze-string select="$input" regex="pattern">
<xsl:matching-substring>
<!-- what to do with matched portions -->
</xsl:matching-substring>
<xsl:non-matching-substring>
<!-- what to do with unmatched portions -->
</xsl:non-matching-substring>
</xsl:analyze-string>#Attributes
|
Attribute |
Description |
|---|---|
|
|
XPath expression that evaluates to the string to analyze |
|
|
The regular expression pattern (uses XPath/XML Schema regex syntax) |
|
|
Optional regex flags: |
#Basic Example
Detect and linkify email addresses in text:
<xsl:template match="description">
<p>
<xsl:analyze-string select="." regex="[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{{2,}}">
<xsl:matching-substring>
<a href="mailto:{.}"><xsl:value-of select="."/></a>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</p>
</xsl:template>Given <description>Contact us at [email protected] or [email protected] for details.</description>, output:
<p>Contact us at <a href="mailto:[email protected]">[email protected]</a>
or <a href="mailto:[email protected]">[email protected]</a> for details.</p>
Important: Inside the regex attribute, curly braces must be doubled ({{ and }}) because the attribute is an AVT. A single { would be interpreted as the start of an XPath expression.
C# parallel:
var result = Regex.Replace(input, @"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
match => $"<a href=\"mailto:{match.Value}\">{match.Value}</a>");#xsl:matching-substring
Processes each portion of the string that matches the regex. Inside this element:
-
.(dot) refers to the matched substring (as a string, not a node) -
regex-group(N)returns the Nth capture group
<xsl:analyze-string select="'Price: $29.99, Discount: $5.00'" regex="\$(\d+\.\d{{2}})">
<xsl:matching-substring>
<span class="price"><xsl:value-of select="."/></span>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>Output:
Price: <span class="price">$29.99</span>, Discount: <span class="price">$5.00</span>#xsl:non-matching-substring
Processes each portion of the string between matches. If omitted, non-matching text is discarded.
<!-- Extract only the numbers, ignoring everything else -->
<xsl:analyze-string select="'Order #12345, Items: 3, Total: $99.50'" regex="\d+\.?\d*">
<xsl:matching-substring>
<number><xsl:value-of select="."/></number>
</xsl:matching-substring>
<!-- non-matching-substring omitted: non-numeric text is dropped -->
</xsl:analyze-string>Output:
<number>12345</number>
<number>3</number>
<number>99.50</number>#Both Are Optional
You can include either, both, or neither (though neither would be pointless). The order does not matter — xsl:matching-substring and xsl:non-matching-substring can appear in any order as children of xsl:analyze-string.
#Capture Groups
Parentheses in the regex define capture groups. The regex-group(N) function returns the string matched by the Nth group. Group 0 is the entire match (same as .).
#Example: Parsing a Date String
<xsl:analyze-string select="'2026-03-19'" regex="(\d{{4}})-(\d{{2}})-(\d{{2}})">
<xsl:matching-substring>
<date>
<year><xsl:value-of select="regex-group(1)"/></year>
<month><xsl:value-of select="regex-group(2)"/></month>
<day><xsl:value-of select="regex-group(3)"/></day>
</date>
</xsl:matching-substring>
</xsl:analyze-string>Output:
<date>
<year>2026</year>
<month>03</month>
<day>19</day>
</date>#Example: Parsing Key-Value Pairs
<xsl:variable name="input" select="'name=Widget;category=electronics;price=29.99'"/>
<xsl:analyze-string select="$input" regex="([^=;]+)=([^;]+)">
<xsl:matching-substring>
<field name="{regex-group(1)}">
<xsl:value-of select="regex-group(2)"/>
</field>
</xsl:matching-substring>
</xsl:analyze-string>Output:
<field name="name">Widget</field>
<field name="category">electronics</field>
<field name="price">29.99</field>#Nested Groups
Groups are numbered by the position of their opening parenthesis, left to right:
<!-- regex: ((https?)://([^/]+))(/.*)? -->
<!-- Group 1: full authority (https://example.com) -->
<!-- Group 2: scheme (https) -->
<!-- Group 3: host (example.com) -->
<!-- Group 4: path (/products/123) -->
<xsl:analyze-string select="$url" regex="((https?)://([^/]+))(/.*)?">
<xsl:matching-substring>
<url>
<scheme><xsl:value-of select="regex-group(2)"/></scheme>
<host><xsl:value-of select="regex-group(3)"/></host>
<path><xsl:value-of select="regex-group(4)"/></path>
</url>
</xsl:matching-substring>
</xsl:analyze-string>
C# parallel: regex-group(N) is equivalent to match.Groups[N].Value:
var match = Regex.Match(url, @"((https?)://([^/]+))(/.*)?");
var scheme = match.Groups[2].Value; // "https"
var host = match.Groups[3].Value; // "example.com"
var path = match.Groups[4].Value; // "/products/123"#Regex Flags
The flags attribute accepts a string of flag characters that modify regex behavior. These are the same flags used by the XPath matches(), replace(), and tokenize() functions:
|
Flag |
Name |
Description |
|---|---|---|
|
|
Case-insensitive |
|
|
|
Multi-line |
|
|
|
Dot-all |
|
|
|
Extended |
Whitespace in the regex is ignored (for readability); use |
#Examples
<!-- Case-insensitive matching -->
<xsl:analyze-string select="$text" regex="error|warning|info" flags="i">
<xsl:matching-substring>
<span class="{lower-case(.)}"><xsl:value-of select="."/></span>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
<!-- Matches "Error", "WARNING", "Info", etc. -->
<!-- Extended mode for readable complex patterns -->
<xsl:analyze-string select="$text" flags="x"
regex="(\d{{1,3}}) \.(\d{{1,3}}) \.(\d{{1,3}}) \.(\d{{1,3}})">
<xsl:matching-substring>
<ip><xsl:value-of select="."/></ip>
</xsl:matching-substring>
</xsl:analyze-string>#Regex Syntax Notes
XSLT uses XML Schema regular expressions (with XPath extensions), not Perl-compatible regexes. Key differences from C#'s System.Text.RegularExpressions:
|
Feature |
C# (.NET Regex) |
XSLT/XPath Regex |
|---|---|---|
|
Anchors |
|
|
|
Lazy quantifiers |
|
|
|
Backreferences |
|
Not supported in match/replace (only in |
|
Lookahead/lookbehind |
|
Not supported |
|
Named groups |
|
Not supported |
|
Character class subtraction |
Not standard |
|
The lack of lookahead/lookbehind means some complex patterns require different approaches in XSLT — often by using xsl:analyze-string with simpler patterns and handling the logic in the matching/non-matching substring bodies.
#Comparison with replace()
XPath provides a replace() function for simple regex substitutions. When should you use each?
#replace()
Best for simple text-to-text substitutions where the output is a plain string:
<!-- Simple substitution -->
<xsl:value-of select="replace($text, '\bfoo\b', 'bar')"/>
<!-- Using capture groups in replacement -->
<xsl:value-of select="replace($phone, '(\d{3})(\d{3})(\d{4})', '($1) $2-$3')"/>
<!-- Remove all non-alphanumeric characters -->
<xsl:value-of select="replace($text, '[^a-zA-Z0-9\s]', '')"/>#xsl:analyze-string
Use when:
-
You need to produce markup (elements, attributes) from the matches —
replace()can only produce strings -
Different matches need different treatment — the matching-substring body can use
xsl:chooseor other conditionals -
You need multiple capture groups processed independently
-
The non-matching portions also need transformation
<!-- replace() CANNOT do this — it can only produce strings, not elements -->
<xsl:analyze-string select="$text" regex="https?://\S+">
<xsl:matching-substring>
<a href="{.}"><xsl:value-of select="."/></a>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>#Decision Guide
|
Need |
Use |
|---|---|
|
Replace text with text |
|
|
Remove characters |
|
|
Wrap matches in elements |
|
|
Extract structured data from strings |
|
|
Simple find-and-replace |
|
|
Different handling for matched vs. unmatched |
|
#Common Patterns
#Syntax Highlighting
Highlight keywords in a code snippet:
<xsl:template match="code[@language='sql']">
<pre class="code sql">
<xsl:analyze-string select="."
regex="(SELECT|FROM|WHERE|JOIN|ON|AND|OR|INSERT|UPDATE|DELETE|CREATE|DROP|ALTER|ORDER\s+BY|GROUP\s+BY|HAVING|LIMIT|OFFSET|AS|IN|NOT|NULL|IS|LIKE|BETWEEN|EXISTS|DISTINCT|SET|INTO|VALUES)"
flags="i">
<xsl:matching-substring>
<span class="keyword"><xsl:value-of select="upper-case(.)"/></span>
</xsl:matching-substring>
<xsl:non-matching-substring>
<!-- Highlight strings within non-keyword text -->
<xsl:analyze-string select="." regex="'[^']*'">
<xsl:matching-substring>
<span class="string"><xsl:value-of select="."/></span>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:non-matching-substring>
</xsl:analyze-string>
</pre>
</xsl:template>This nests two xsl:analyze-string calls — first to highlight keywords, then to highlight string literals in the remaining text.
#Link Detection in Text
Convert URLs and email addresses in plain text to clickable links:
<xsl:function name="my:linkify" as="node()*">
<xsl:param name="text" as="xs:string"/>
<xsl:analyze-string select="$text"
regex="(https?://[^\s<>"]+)|([\w.+-]+@[\w.-]+\.\w{{2,}})">
<xsl:matching-substring>
<xsl:choose>
<xsl:when test="regex-group(1)">
<a href="{.}" target="_blank"><xsl:value-of select="."/></a>
</xsl:when>
<xsl:when test="regex-group(2)">
<a href="mailto:{.}"><xsl:value-of select="."/></a>
</xsl:when>
</xsl:choose>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:function>
<!-- Usage -->
<xsl:template match="comment">
<p class="comment">
<xsl:sequence select="my:linkify(string(.))"/>
</p>
</xsl:template>#Data Extraction from Formatted Strings
Parse a price string like "$1,234.56" into a number:
<xsl:function name="my:parse-price" as="xs:decimal">
<xsl:param name="price-string" as="xs:string"/>
<xsl:variable name="cleaned" select="replace($price-string, '[$,\s]', '')"/>
<xsl:sequence select="xs:decimal($cleaned)"/>
</xsl:function>
<!-- For more complex parsing, use analyze-string -->
<xsl:function name="my:parse-money" as="map(xs:string, item())">
<xsl:param name="text" as="xs:string"/>
<xsl:variable name="result" as="map(xs:string, item())*">
<xsl:analyze-string select="$text" regex="([A-Z]{{3}}|\$|€|£)\s*([0-9,]+\.?\d*)">
<xsl:matching-substring>
<xsl:sequence select="map {
'currency': regex-group(1),
'amount': xs:decimal(replace(regex-group(2), ',', ''))
}"/>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:sequence select="$result[1]"/>
</xsl:function>#Markdown-Like Formatting
Convert simple markup conventions in plain text:
<xsl:function name="my:simple-format" as="node()*">
<xsl:param name="text" as="xs:string"/>
<!-- Bold: **text** -->
<xsl:analyze-string select="$text" regex="\*\*([^*]+)\*\*">
<xsl:matching-substring>
<strong><xsl:value-of select="regex-group(1)"/></strong>
</xsl:matching-substring>
<xsl:non-matching-substring>
<!-- Italic: *text* -->
<xsl:analyze-string select="." regex="\*([^*]+)\*">
<xsl:matching-substring>
<em><xsl:value-of select="regex-group(1)"/></em>
</xsl:matching-substring>
<xsl:non-matching-substring>
<!-- Code: `text` -->
<xsl:analyze-string select="." regex="`([^`]+)`">
<xsl:matching-substring>
<code><xsl:value-of select="regex-group(1)"/></code>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:function>#CSV Line Parser
Parse a single CSV line into fields, handling quoted fields:
<xsl:function name="my:parse-csv-line" as="xs:string*">
<xsl:param name="line" as="xs:string"/>
<xsl:analyze-string select="$line" regex='("([^"]*(?:""[^"]*)*)"|([^,]*))(?:,|$)'>
<xsl:matching-substring>
<xsl:choose>
<xsl:when test="regex-group(2)">
<!-- Quoted field: unescape doubled quotes -->
<xsl:sequence select="replace(regex-group(2), '""', '"')"/>
</xsl:when>
<xsl:otherwise>
<!-- Unquoted field -->
<xsl:sequence select="regex-group(3)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:function>C# parallel summary:
|
XSLT |
C# |
|---|---|
|
|
|
|
|
Code inside the match loop |
|
|
Text between matches |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|