How to Convert HTML to Markdown: Tools, Edge Cases, and What to Watch For
Converting HTML to Markdown sounds straightforward โ and it mostly is. But a few elements cause consistent problems that trip up every tool: tables, nested lists, and inline styles. Here's the complete guide.
The conversion algorithm
A good HTML-to-Markdown converter works by walking the HTML DOM and replacing each element with its Markdown equivalent. Simple elements are trivial:
| HTML | Markdown |
|---|---|
| <h1>Title</h1> | # Title |
| <h2>Subtitle</h2> | ## Subtitle |
| <strong>bold</strong> | **bold** |
| <em>italic</em> | _italic_ |
| <a href='url'>text</a> | [text](url) |
| <code>snippet</code> | `snippet` |
| <hr /> | --- |
| <blockquote><p>text</p></blockquote> | > text |
The problem elements
Tables โ HTML tables map to Markdown pipe tables, but only if the table uses standard <thead>/<tbody>/<tr>/<th>/<td> structure. Tables that use CSS for layout (common in emails) don't have semantic structure and can't be converted meaningfully.
<!-- HTML table โ Markdown pipe table --> <table> <thead><tr><th>Name</th><th>Role</th></tr></thead> <tbody><tr><td>Alice</td><td>Engineer</td></tr></tbody> </table> โ | Name | Role | |---|---| | Alice | Engineer |
Code blocks with language โ GitHub-Flavored Markdown supports fenced code blocks with a language hint (```typescript). Most HTML code blocks use <pre><code class="language-typescript">. A good converter extracts the class name to populate the language hint.
Inline styles โ HTML emails and CMS exports are full of style="color: red; font-size: 14px;" attributes. Markdown doesn't support inline styles, so the best approach is to strip them entirely and preserve only the text content. Good converters make this optional.
Rowspan and colspan โ merged table cells have no Markdown equivalent. Converters simplify these by ignoring the span and rendering the cell value in its first position. Complex tables may need manual cleanup.
Migrating a full blog to Markdown
If you're migrating from WordPress, Ghost, or a CMS to a static site generator (Hugo, Astro, Jekyll, Eleventy), the workflow is:
- Export posts as HTML from your CMS (WordPress: Tools โ Export โ Posts)
- Parse the XML export to extract post HTML
- Run each post through the HTML-to-Markdown converter
- Save as
slug.mdwith frontmatter (title, date, tags) - Spot-check a sample โ especially tables and code blocks
For WordPress specifically, the wp2md CLI tool automates steps 1โ4. For other CMSes, a short script using Turndown (Node.js) or markdownify (Python) handles the conversion.
The Turndown library (Node.js)
import TurndownService from "turndown"
const td = new TurndownService({
headingStyle: "atx",
codeBlockStyle: "fenced",
fence: "```",
bulletListMarker: "-",
})
const markdown = td.turndown("<h1>Hello</h1><p>World</p>")
// โ "# Hello
World"Convert any HTML to clean Markdown โ with tables, code blocks, and inline style handling โ using the HTML โ Markdown Converter. Paste HTML and get formatted Markdown instantly, 100% in your browser.
Try the related tool
HTML โ Markdown Converter โ free, runs 100% in your browser.
Open HTML โ Markdown Converter โEnjoyed this? Get notified when Pro launches.
