- Pandoc: convert between markup languages
- MarkItDown: convert various files to markdown
Pandoc
Pandoc is a document converter that can convert files from one markup format into another.
This can be useful when switching Static Site Generators that use different markup languages. The following Markdown flavours are supported:
markdown_phpextra(PHP Markdown Extra)markdown_github(deprecated GitHub-Flavored Markdown)markdown_mmd(MultiMarkdown)markdown_strict(Markdown.pl)commonmark(CommonMark)gfm(Github-Flavored Markdown)commonmark_x(CommonMark with many pandoc extensions)
Caution
Markdown syntax extensions or SSG-specific syntax is not supported!
MarkItDown
MarkItDown is an open source tool provided by Microsoft to convert files to Markdown. The following file types are supported:
- PowerPoint
- Word
- Excel
- Images (EXIF metadata and OCR)
- Audio (EXIF metadata and speech transcription)
- HTML
- Text-based formats (CSV, JSON, XML)
- ZIP files (iterates over contents)
Installation using pip:
pip install markitdownBasic usage in Python:
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert(“test.xlsx”)
print(result.text_content)For more information, see Microsoft Open Sourced MarkItDown: An AI Tool to Convert All Files into Markdown for Seamless Integration and Analysis.