Strip HTML Tags easily using Excel

Don’t want to read all about how to do this? No problem. Just download the spreadsheet with the custom user function already embedded and get right to work!

Click here to Download the Spreadsheet with the Strip HTML function built in

 

UPDATE: December, 2015: Uploaded a new version that is now Mac Compatible

 

The Problem:

When receiving a client’s data during a Salesforce migration, I often find pesky HTML tags embedded in large email and note data dumps from legacy systems. So how do you strip html tags using only Excel? I found some VBA scripts quite effective at getting rid of the tags within the ‘< >’brackets. However, this does not remove HTML special character notation such as ‘&nbsp;’ and it sometimes removes line breaks.

The Solution:

Click the button above to download the spreadsheet. It contains part of an Excel VBA script I found which stripped all tags quite effectively. However, I made some modifications to the original script to make the end result look more almost like it did when it was first written in HTML. Before it strips out all of the tags, it first replaces HTML breaks and end of paragraph tags with one and two carriage returns, respectively. I also added a whole lot of replace functions to add back the special characters. Bold, underline, and hyperlinking disappear, but the text otherwise looks almost exactly like it would have before. You can download a spreadsheet with a working example of the ‘=StripHTML’ function or you can grab the VBA code below and paste it into a new module.

Feel free to leave a comments, but if you have a question, please email me.