by compressedgas on 11/5/2023, 3:20:41 PM
That's more neat as a collection of awesome lists. I really think for testing Markdown formatting one needs to use a test suite which provides both the input and the expected HTML output such as the one provided by CommonMark.
Hi, Markdown has taken the world by storm, and can be found all over the place. With so many different parsers, extensions, and flavours it is also bound to spawn a lot of inconsistencies. Of course between the different flavours, but also just through bugs or edge-cases in parsers that users learn to work around. I was wondering if anyone had collected a corpus of markdown, which could be used for analysis, testing new parsers, and other related tasks. I came across https://github.com/tcr/markdown-corpus, which is the biggest corpus I've found, but since it only pulls from GitHub they are all bound to be defined by GH flavoured markdown and its parser.
In case no existing corpus of this kind exists, I've set up one based on the one mentioned above (with a couple extra documents). It can be found here: https://github.com/PMunch/markdown-corpus. Feel free to contribute!