Hacker News Clone

Show HN: Convert HTML to markdown with python

by gaojiuli on 5/27/2017, 8:51:35 AM with 14 comments

by krstf13 on 5/29/2017, 10:56:44 AM
You should use HTML.parser and focus on the conversion to markdown. The way you parse HTML in the convert function is very inefficient and can easily produce incorrect results with valid HTML (e.g. <p class="some>stuff">some text</p> )
by williamstein on 5/28/2017, 4:51:35 PM
Since this is "Show HN", what is the motivation you have for doing this? There's no motivation or background in the linked README.
by Bino on 5/28/2017, 5:10:43 PM
Wouldn't it make sense to escape markdown syntax in the HTML? In HTML which you're converting from. * has not special meaning.
<p>foo</p>
by jwilk on 5/28/2017, 5:18:15 PM
How is it better than https://alir3z4.github.io/html2text/ ?
by roryisok on 5/28/2017, 6:54:58 PM
Nice to see projects that convert HTML to markdown. Projects to convert in the other direction (md to html) are far more common. Parsing HTML is tough
by confounded on 5/28/2017, 6:56:27 PM
Pandoc may be of interest.