Markdown to HTML converter

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
siberia-man
Posts: 208
Joined: 26 Dec 2013 09:28
Contact:

Markdown to HTML converter

#1 Post by siberia-man » 11 Sep 2021 15:37

Earlier jfl implemented his own batch script which converts markdown to HTML sending requests towards GitHub or another API with curl. His script is published in thread A dual Batch+CSS script to convert Markdown to HTML. Since that time I kept hope to combine both ways -- using local pandoc and requesting Git API using curl or wget.

After all I finished my attempts on implementing the own script (the set of scripts on shell, batch and perl, in fact) converting markdown to HTML. Initially they were two -- shell and batch both using pandoc. There are three scripts now attempting to establish almost the same result.

Perl uses Text::Markdown for local conversion and HTTP::Tiny to query API.

Shell and batch scripts invoke pandoc for local use and curl or wget to convert remotely (via API).

If a server requires a token it can be provided with the -t TOKEN-FILE or -T TOKEN options.

Convert locally:

Code: Select all

git-md-html README.md
Convert remotely using git API

Code: Select all

git-md-html -u README.md
All scripts are located on my GitHub at https://github.com/ildar-shaimordanov/git-markdown-html

siberia-man
Posts: 208
Joined: 26 Dec 2013 09:28
Contact:

Re: Markdown to HTML converter

#2 Post by siberia-man » 12 Sep 2021 00:40

More words about some circumstances having to do with particular implementations.

Because of its limitation, the batch script accepts the -u/-U and -t/-T options in this order exactly. It can be changed to enable an arbitrary order but makes the script more complicated. Right now I don't want to do this work.

The shell version doesn't have this limitation. Also it's POSIX-ly compatible. But in some cases it fails:
-- wget from BusyBox 1.34 doesn't support the --post-file option
-- wget 1.11.4 from Gow (The lightweight alternative to Cygwin) fails with SSL connections
-- wget 1.8.2 from UnxUtils doesn't support both --post-data and --no-check-certificate options

Fortunately, wget from Cygwin works fine!

In some cases working with another hosts over HTTPS both curl and wget fail. To escape this issue I hard coded their invocation as curl --insecure and wget --no-check-certificate.

Perl requires Text::Markdown and HTTP::Tiny (and also IO::Socket::SSL and Net::SSLeay). If they are still not installed, You have to install them on your own before using the Perl script. For now I haven't used Text::Markdown actively and don't know its limitation. At least it's compatible with the original markdown and likely doesn't support GFM (GitHub Flavored Markdown).

siberia-man
Posts: 208
Joined: 26 Dec 2013 09:28
Contact:

Re: Markdown to HTML converter

#3 Post by siberia-man » 02 Mar 2022 15:58

I'm glad to announce a new version of the script.

Changes since the last publication

- add the new option -r to display the raw html (no head, no CSS; html body only)
- arbitrary order of options (batch)
- improve reading a token from the token file (perl, batch)
- enable arbitrary order for options (batch)
- POSIX compliant and BusyBox compatible (shell)
- some internal changes to uniform all scripts

Now BusyBox supports the --post-file option and allows to overwrite the Content-Type request header. So using BusyBox, it's possible to render a markdown to html sending a request to GitHub API with BusyBox/wget as well.

Examples

Render using pandoc

Code: Select all

$ echo "* text" | git-md-html -r
<ul>
<li>text</li>
</ul>
Render using GitHub API

Code: Select all

$ echo "* text" | git-md-html -r -u
<ul>
<li>text</li>
</ul>
Here is pure and standalone batch version. This one and others are on my github (see the link above).

Code: Select all

:::NAME
:::
:::    git-md-html - convert markdown to HTML
:::
:::SYNOPSIS
:::
:::    git-md-html [OPTIONS] [FILENAME]
:::
:::DESCRIPTION
:::
:::git-md-html assumes an input as a markdown text and converts it to
:::HTML. Data can be read from a file, pipe or redirection, and results
:::default to the standard output. You can redirect the output to another
:::file or another process (if needs).
:::
:::If one of the options -u/-U is specified, converting is performed
:::with help of GitHub API (defaults to the public GitHub API).
:::
:::In this case curl or wget (the first found) is used to communicate
:::with the selected Git host.
:::
:::If no more options specified, the script will tries to complete this
:::action invoking pandoc, the cool many-to-many offline converter.
:::
:::OPTIONS
:::
:::  -u             Use the public GitHub API by https://api.github.com
:::  -U URL         Use another GitHub API URL
:::  -t TOKEN-FILE  Specify a filename to read a token from
:::  -T TOKEN       Specify the token
:::  -r             Raw output (no head, no CSS; html body only)
:::
:::SEE ALSO
:::
:::Pandoc home page
:::https://pandoc.org
:::
:::The idea to develop this script was inspired by the JFL's script:
:::https://github.com/JFLarvoire/SysToolsLib/blob/master/Batch/md2h.bat
:::
:::As well as his script this one uses Github Markdown Stylesheet:
:::https://gist.github.com/tuzz/3331384
:::
:::Yet another Perl-written GitHub API based converter (one of many):
:::https://github.com/brxfork/md2html
:::
:::Another GitLab API based converter (one of many):
:::https://gitlab.triumf.ca/-/snippets/53
:::
:::COPYRIGHT
:::
:::Copyright (c) 2019-2022 Ildar Shaimordanov. All rights reserved.
:::
:::  MIT License

@echo off

setlocal

set "GITHUB_API_URL=https://api.github.com"

set "SRCFILE=-"
set "PAGE_TITLE={STDIN}"

set "API_URL="
set "API_TOKEN="

:: ========================================================================

timeout /t 0 >nul 2>&1 && if "%~1" == "" (
	call :print_usage >&2
	goto :EOF
)

:parse_options

if "%~1" == "-u" (
	set "API_URL=%GITHUB_API_URL%"
	shift /1
	goto :parse_options
)

if "%~1" == "-U" (
	set "API_URL=%~2"
	shift /1
	shift /1
	goto :parse_options
)

if "%~1" == "-t" (
	if not exist "%~2" (
		call :die "Token file not found: %~2"
		goto :EOF
	)
	for /f "usebackq tokens=*" %%f in ( "%~2" ) do set "API_TOKEN=%%f"
	shift /1
	shift /1
	goto :parse_options
)

if "%~1" == "-T" (
	set "API_TOKEN=%~2"
	shift /1
	shift /1
	goto :parse_options
)

if "%~1" == "-r" (
	set "HTML_RAW=1"
	shift /1
	goto :parse_options
)

if "%~1" == "--" shift /1

if not "%~1" == "" (
	set "SRCFILE=%~1"
	set "PAGE_TITLE=%~nx1 (%~f1)"
)

:: ========================================================================

if defined API_URL (
	call :conv_online
) else (
	call :conv_offline
)

goto :EOF

:: ========================================================================

:conv_online
set "URL=%API_URL%/markdown/raw"

set "AUTH_HEADER="
if defined API_TOKEN set "AUTH_HEADER=--header "Authorization: token %API_TOKEN%""

for %%f in ( curl.exe wget.exe ) do if not "%%~$PATH:f" == "" (
	if not defined HTML_RAW (
		call :html_begin
		call :html_css
	)
	call :dl_%%~nf "%SRCFILE%"
	if not defined HTML_RAW (
		call :html_end
	)
	goto :EOF
)

call :die "curl or wget not found"
goto :EOF


:dl_curl
curl --insecure -s "%URL%" ^
	--request POST --data-binary "@%~1" ^
	--header "Content-Type: text/plain" ^
	%AUTH_HEADER%

goto :EOF


:dl_wget
if "%~1" == "-" (
	set "TEMPFILE=%TEMP%/%~n0.%RANDOM%"
	more > "%TEMPFILE%"
	call :dl_wget "%TEMPFILE%"
	del /q "%TEMPFILE%"
	goto :EOF
)

wget --no-check-certificate -qO - "%URL%" ^
	--post-file="%~1" ^
	--header "Content-Type: text/plain" ^
	%AUTH_HEADER%

goto :EOF

:: ========================================================================

:conv_offline
for %%f in ( pandoc.exe ) do if "%%~$PATH:f" == "" (
	die "pandoc not found. Try with -u/-U to request git API"
	goto :EOF
)

if not defined HTML_RAW (
	call :html_begin
	call :html_css
)

pandoc --from=gfm --to=html "%SRCFILE%"

if not defined HTML_RAW (
	call :html_end
)

goto :EOF

:: ========================================================================

:html_begin
echo:^<!DOCTYPE html^>
echo:^<html xmlns="http://www.w3.org/1999/xhtml" lang xml:lang^>
echo:^<head^>
echo:^<meta charset="utf-8" /^>
echo:^<title^>%PAGE_TITLE%^</title^>
echo:^</head^>
echo:^<body^>
goto :EOF

:html_css
echo:^<style type="text/css"^>
set "DATA_FOUND="
for /f "usebackq tokens=* delims=" %%s in ( "%~f0" ) do (
	if defined DATA_FOUND echo:%%s
	if "%%~s" == "__DATA__" set DATA_FOUND=1
)
echo:^</style^>
goto :EOF

:html_end
echo:^</body^>
echo:^</html^>
goto :EOF

:: ========================================================================

:die
call :warn "%~1"
exit /b 255

:warn
echo:%~1>&2
goto :EOF

:print_usage
for /f "tokens=* delims=:" %%s in ( 'findstr "^:::" "%~f0"' ) do echo:%%s
goto :EOF

:: ========================================================================

:: EOF

:: Style sheet from https://gist.github.com/tuzz/3331384
__DATA__
/*
Copyright (c) 2017 Chris Patuzzo
https://twitter.com/chrispatuzzo

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
*/

body {
  font-family: Helvetica, arial, sans-serif;
  font-size: 14px;
  line-height: 1.6;
  padding-top: 10px;
  padding-bottom: 10px;
  background-color: white;
  padding: 30px;
  color: #333;
}

body > *:first-child {
  margin-top: 0 !important;
}

body > *:last-child {
  margin-bottom: 0 !important;
}

a {
  color: #4183C4;
  text-decoration: none;
}

a.absent {
  color: #cc0000;
}

a.anchor {
  display: block;
  padding-left: 30px;
  margin-left: -30px;
  cursor: pointer;
  position: absolute;
  top: 0;
  left: 0;
  bottom: 0;
}

h1, h2, h3, h4, h5, h6 {
  margin: 20px 0 10px;
  padding: 0;
  font-weight: bold;
  -webkit-font-smoothing: antialiased;
  cursor: text;
  position: relative;
}

h2:first-child, h1:first-child, h1:first-child + h2, h3:first-child, h4:first-child, h5:first-child, h6:first-child {
  margin-top: 0;
  padding-top: 0;
}

h1:hover a.anchor, h2:hover a.anchor, h3:hover a.anchor, h4:hover a.anchor, h5:hover a.anchor, h6:hover a.anchor {
  text-decoration: none;
}

h1 tt, h1 code {
  font-size: inherit;
}

h2 tt, h2 code {
  font-size: inherit;
}

h3 tt, h3 code {
  font-size: inherit;
}

h4 tt, h4 code {
  font-size: inherit;
}

h5 tt, h5 code {
  font-size: inherit;
}

h6 tt, h6 code {
  font-size: inherit;
}

h1 {
  font-size: 28px;
  color: black;
}

h2 {
  font-size: 24px;
  border-bottom: 1px solid #cccccc;
  color: black;
}

h3 {
  font-size: 18px;
}

h4 {
  font-size: 16px;
}

h5 {
  font-size: 14px;
}

h6 {
  color: #777777;
  font-size: 14px;
}

p, blockquote, ul, ol, dl, li, table, pre {
  margin: 15px 0;
}

hr {
  border: 0 none;
  color: #cccccc;
  height: 4px;
  padding: 0;
}

body > h2:first-child {
  margin-top: 0;
  padding-top: 0;
}

body > h1:first-child {
  margin-top: 0;
  padding-top: 0;
}

body > h1:first-child + h2 {
  margin-top: 0;
  padding-top: 0;
}

body > h3:first-child, body > h4:first-child, body > h5:first-child, body > h6:first-child {
  margin-top: 0;
  padding-top: 0;
}

a:first-child h1, a:first-child h2, a:first-child h3, a:first-child h4, a:first-child h5, a:first-child h6 {
  margin-top: 0;
  padding-top: 0;
}

h1 p, h2 p, h3 p, h4 p, h5 p, h6 p {
  margin-top: 0;
}

li p.first {
  display: inline-block;
}

ul, ol {
  padding-left: 30px;
}

ul :first-child, ol :first-child {
  margin-top: 0;
}

ul :last-child, ol :last-child {
  margin-bottom: 0;
}

dl {
  padding: 0;
}

dl dt {
  font-size: 14px;
  font-weight: bold;
  font-style: italic;
  padding: 0;
  margin: 15px 0 5px;
}

dl dt:first-child {
  padding: 0;
}

dl dt > :first-child {
  margin-top: 0;
}

dl dt > :last-child {
  margin-bottom: 0;
}

dl dd {
  margin: 0 0 15px;
  padding: 0 15px;
}

dl dd > :first-child {
  margin-top: 0;
}

dl dd > :last-child {
  margin-bottom: 0;
}

blockquote {
  border-left: 4px solid #dddddd;
  padding: 0 15px;
  color: #777777;
}

blockquote > :first-child {
  margin-top: 0;
}

blockquote > :last-child {
  margin-bottom: 0;
}

table {
  padding: 0;
}
table tr {
  border-top: 1px solid #cccccc;
  background-color: white;
  margin: 0;
  padding: 0;
}

table tr:nth-child(2n) {
  background-color: #f8f8f8;
}

table tr th {
  font-weight: bold;
  border: 1px solid #cccccc;
  text-align: left;
  margin: 0;
  padding: 6px 13px;
}

table tr td {
  border: 1px solid #cccccc;
  text-align: left;
  margin: 0;
  padding: 6px 13px;
}

table tr th :first-child, table tr td :first-child {
  margin-top: 0;
}

table tr th :last-child, table tr td :last-child {
  margin-bottom: 0;
}

img {
  max-width: 100%;
}

span.frame {
  display: block;
  overflow: hidden;
}

span.frame > span {
  border: 1px solid #dddddd;
  display: block;
  float: left;
  overflow: hidden;
  margin: 13px 0 0;
  padding: 7px;
  width: auto;
}

span.frame span img {
  display: block;
  float: left;
}

span.frame span span {
  clear: both;
  color: #333333;
  display: block;
  padding: 5px 0 0;
}

span.align-center {
  display: block;
  overflow: hidden;
  clear: both;
}

span.align-center > span {
  display: block;
  overflow: hidden;
  margin: 13px auto 0;
  text-align: center;
}

span.align-center span img {
  margin: 0 auto;
  text-align: center;
}

span.align-right {
  display: block;
  overflow: hidden;
  clear: both;
}

span.align-right > span {
  display: block;
  overflow: hidden;
  margin: 13px 0 0;
  text-align: right;
}

span.align-right span img {
  margin: 0;
  text-align: right;
}

span.float-left {
  display: block;
  margin-right: 13px;
  overflow: hidden;
  float: left;
}

span.float-left span {
  margin: 13px 0 0;
}

span.float-right {
  display: block;
  margin-left: 13px;
  overflow: hidden;
  float: right;
}

span.float-right > span {
  display: block;
  overflow: hidden;
  margin: 13px auto 0;
  text-align: right;
}

code, tt {
  margin: 0 2px;
  padding: 0 5px;
  white-space: nowrap;
  border: 1px solid #eaeaea;
  background-color: #f8f8f8;
  border-radius: 3px;
}

pre code {
  margin: 0;
  padding: 0;
  white-space: pre;
  border: none;
  background: transparent;
}

.highlight pre {
  background-color: #f8f8f8;
  border: 1px solid #cccccc;
  font-size: 13px;
  line-height: 19px;
  overflow: auto;
  padding: 6px 10px;
  border-radius: 3px;
}

pre {
  background-color: #f8f8f8;
  border: 1px solid #cccccc;
  font-size: 13px;
  line-height: 19px;
  overflow: auto;
  padding: 6px 10px;
  border-radius: 3px;
}

pre code, pre tt {
  background-color: transparent;
  border: none;
}
Last edited by siberia-man on 04 Mar 2022 04:22, edited 1 time in total.

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Markdown to HTML converter

#4 Post by Aacini » 04 Mar 2022 01:40

This application remembers me my old TextToHtml.bat conversion program that is a similar converter, but that takes a text file with extended BBCode tags (like the ones in Wikipedia) instead Markup and converts it to a fully working HTML file. My converter is a pure, standalone Batch file.

Antonio

siberia-man
Posts: 208
Joined: 26 Dec 2013 09:28
Contact:

Re: Markdown to HTML converter

#5 Post by siberia-man » 04 Mar 2022 04:33

Aacini wrote:
04 Mar 2022 01:40
My converter is a pure, standalone Batch file.
The script I posted above is pure and standalone batch version as well. I edited the last message to make it clearer. I am not sure, if it's possible to implement md-2-html converter in pure batch without any external tools (like pandoc, wget and curl). The only way is to use online converter implementing requests to GitHub API on WSH (JScript or VBScript) which is shipped to Windows by default.

Post Reply