, , , ,

The Rich Text field type in Sitecore allows the user to create and store HTML-formatted content. Most of the time you want to display this content on the website as is, but sometimes you want to truncate it (for example in some kind of overview).

In that case you have to strip the HTML from the content. If you want to truncate the content at 300 characters for example, you will risk stripping the closing tag of an element while maintaining the opening tag. This will not good in any browser.

So we are going to use a very simple Regular Expression to strip the HTML from our Rich Text Field (you can use this of course for any type of database field that contains HTML): “<.*?>”. After that, we can take for example 300 characters, and add “…” after it, as this is a conventional visual signal that the text has been truncated.

Item item = Sitecore.Context.Database.GetItem("/sitecore/content/home/some item");

string richTextFieldRawValue = item["Some Rich Text Field"]

Regex htmlRemovalRegex = new Regex("<.*?>", RegexOptions.Compiled);

string richTextFieldWithoutHtml = htmlRemovalRegex.Replace(richTextFieldRawValue, string.Empty);

string truncatedRichTextField = richTextFieldWithoutHtml.SubString(0, 300) + "...";