How to convert HTML to PDF preserving CSS

Converting a web page into PDF becomes very difficult when you want to preserve CSS of your page. There are two ways of performing this task.

  1. Client Side
  2. Server Side

The solutions which work on client side are JavaScript libraries like jsPDF but they do not impress the user. The server side solutions are mostly APIs but most of them are not free. There is one solution for this problem which is Apache FOP. Apache FOP is actually a free tool which work on server side which convert HTML/XML into PDF and provides lot of formatting for PDF . In this article, I will show you how to create PDF using FOP.

What is Apache FOP?

Formatting Objects Processor abbreviated as FOP is a Java application that changes over XSL Formatting Objects (XSL-FO) documents to PDF or other printable arrangements. Its latest available version is 1.1.

Running FOP

Download FOP from following link.

Extract fop in some folder. You can run fop by running following command

./path_to_fop_directory/fop

When you have successfully run fop, download following rar file from link.There are two files and an image folder in rar file. I am writing them in list below.

⋅⋅* Index.html ⋅⋅* Index.xsl

The index.html is the HTML page which will be converted into PDF. Index.xsl is the style sheet for this HTML page. Enter in directory in which you have downloaded these two file through command line and run following command.

./fop –xml index.html –xsl index.xsl –pdf index.pdf

In this command, you are running FOP and giving it two arguments which are xml file (index.html) and xsl file (index.xsl) and telling it that convert HTML file into PDF (index.pdf). Your output will be like this with no footers and headers.

Output 1

Now open xsl file and start understanding it. Every section there have comments telling its purpose but I will explain some sections.

Look at the section below in xsl file. This section is for setting header and footer of your PDF. Now edit xsl as shown in below section.

<fo:static-content flow-name="rb-page">
          <fo:block font-size="10pt">
            <fo:table table-layout="fixed" inline-progression-dimension="100%">
              <fo:table-column column-width="50%"/>
              <fo:table-column column-width="50%"/>
              <fo:table-body>
                <fo:table-row>
                  <fo:table-cell>
                    <fo:block text-align="start">
                          Test Pdf file Generated by FOP
                    fo:block>
                  fo:table-cell>
                  <fo:table-cell>
                    <fo:block text-align="end" font-weight="bold"
                      font-family="monospace">
                          M Hassan Siddiqui
                    fo:block>
                  fo:table-cell>
                fo:table-row>
              fo:table-body>
            fo:table>
          fo:block>
        fo:static-content>
        <fo:static-content flow-name="ra-page">
          <fo:block font-size="10pt">
            <fo:table table-layout="fixed" inline-progression-dimension="100%">
              <fo:table-column column-width="50%"/>
              <fo:table-column column-width="50%"/>
              <fo:table-body>
                <fo:table-row>
                  <fo:table-cell>
                    <fo:block text-align="start">
                            Footer Text
                    fo:block>
                  fo:table-cell>
                  <fo:table-cell>
                    <fo:block text-align="end">Page
                       of

                    fo:block>
                  fo:table-cell>
                fo:table-row>
              fo:table-body>
            fo:table>
          fo:block>
 fo:static-content>

Now run the above command to generate PDF and you will see following header and footer in PDF. If your PDF file is open while running this command, it will give error in Windows but will work fine in Ubuntu. In Windows, close that file and run command.

Header

Footer

You can also choose an image as footer or header. Put following line in place of ‘Footer Text’.

<fo:external-graphic width="200pt" height="200pt" content-width="150pt" content-height="150pt" src="images/fop.jpg"/>

Header

Styling for a specific element

In xsl file, there are defined behavior for all tags which will we used in your HTML. For example, following section is defining tag.

<xsl:template match="b">
    <fo:block font-weight="bold" >
      <xsl:attribute name="id">
        <xsl:choose>
          <xsl:when test="@id">
            <xsl:value-of select="@id"/>
          xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="generate-id()"/>
          xsl:otherwise>
        xsl:choose>
      xsl:attribute>
      <xsl:apply-templates select="*|text()"/>
    fo:block>
  xsl:template>

If you want to give it some other style like color, you have to include following line

<xsl:template match="b">
    <fo:block font-weight="bold" xsl:use-attribute-sets="boldTextStyles">
      <xsl:attribute name="id">
        <xsl:choose>
          <xsl:when test="@id">
            <xsl:value-of select="@id"/>
          xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="generate-id()"/>
          xsl:otherwise>
        xsl:choose>
      xsl:attribute>
      <xsl:apply-templates select="*|text()"/>
    fo:block>
xsl:template>

boldTextStyle is like name of id you give to some div. Now write this line at bottom of file but before ‘’.

<xsl:attribute-set name="boldTextStyles">
             <xsl:attribute name="color">#FF0000xsl:attribute>
xsl:attribute-set>

Now when you will generate PDF, you will see color of bold text will be red.

FOP actually convert xml into PDF. XML and HTML are same but in HTML, same types of elements are differentiated with each other with the help of their ids. For example if you have two types of bold texts with red and green color, you just give each tag an id and in css, you give them their required color. But when this file will be converted into PDF, fop will give same style to every tag when fop will find a tag in web page, it will find its definition in xsl file and will do whatever that definition says with that section. If you want two bold section having different color, then you just have to write different tags for each section. Like if one type of bold text is red and other is green, you define another tag in xsl like defined in below section and give it style as required. Copy that code before ‘’ and generate PDF. This section is defining another tag having definition same as for but will have different style

<xsl:template match="bold_green">
    <fo:block font-weight="bold" xsl:use-attribute-sets="boldGreenTextStyles">
      <xsl:attribute name="id">
        <xsl:choose>
          <xsl:when test="@id">
            <xsl:value-of select="@id"/>
          xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="generate-id()"/>
          xsl:otherwise>
        xsl:choose>
      xsl:attribute>
      <xsl:apply-templates select="*|text()"/>
    fo:block>
  xsl:template>


<!-- ============================================
    Styling for boldGreenTextStyles
    =============================================== -->


<xsl:attribute-set name=" boldGreenTextStyles ">
             <xsl:attribute name="color">#00FF00xsl:attribute>
xsl:attribute-set>

Add this line in index.html

<bold_green> This is bold green text </bold_green>

Now output PDF will be like this

Output 5

Conclusion

To use fop and get PDF according to demand, you first have to convert your HTML code into xml which will be used with xsl to generate PDF. You will have to translate all the css into xsl to get exact PDF as you web page look like. When you write an HTML page, if you miss close tags like </b> or </p> etc., browser just ignore them and show you the output. That is not the case when you are generating PDF from fop. Fop will give an error when it does not find a closing tag of a tag. Even when you write <br> instead of <br/>, it will give error. So you have to write code very carefully to perform your task.