VFP's Editor Code RTF2HTML (Part 2)

Version: 2.00.01 - last Update: Thursday, April 27, 2008, 23:59:00

Previous ChapterToolbox Home (TOC)Next Chapter


This thread is all about how to get VFP’s syntax coloured code to an HTML-formatted Blog like this is.

Intro

Today let us talk about some (the most important) of the Rich Text Format version 1.5 specification you can find following this link: OffSiteLinkhttp://www.biblioscape.com/rtf15_spec.htm
We need to have a basic understanding of RTF tags to successfully interpret VFP's RTF output. Keep in mind: I am not going to repeat the contents of the site linked above. Therefore, it is up to you to have a closer look on the general definitions made there first!

VFP Goes RTF

Okay, as you know by now, an RTF file consists of unformatted text, control words, groups, and control symbols. We will encounter text, control words and groups in VFP's RTF output, too!

The first group in an RTF formatted text stream is called the RTF header. After the header the document body follows. The RTF header is the all-enclosing group within an RTF steam/file, thus every RTF document must begin with an opening brace ({) and must end with a closing brace (})!

Here is our first RTF draft that we are going to refine step by step. I am using green bold text for RTF tags. All other plain text uses standard formatting (black, non-bold):

{\rtf This is my first RTF formatted document}

Just copy the above line and and paste it into Notepad. Next save it to some file with an RTF extension! (I'm using test1.rtf here). Now open it with your standard text processor (I'm using Microsoft Word here) and you will see something like the line below:

This is my first RTF formatted document

BTW: notice that there is NO leading space in your Word document! This is because the space between your first "{\rtf" tag and the first plain word "This" still belongs to the "{\rtf" tag itself! Let us proof this and save the following RTF text with Notepad, then reopen it with Word:

{\rtf This is my first RTF formatted document}

Notice that now there are two spaces between the opening RTF tag and the first plain "This". As you can see Microsoft Word now has added a leading space to your text!

Okay, next question is: "Are the above lines stored with an RTF extension to your disk already valid RTF files?" The answer is (as always): "It depends...". In fact, it really depends on how "smart" your RTF reader is, which means how forgivingly your application is able to parse missing and/or invalid RTF tokens. As I already mentioned in my Intro part, Microsoft's Word is one of the more tolerant ones.

But hey, why shouldn't we create our own VFP-based RTF reader on the fly? Maybe you don't want to use Word (or owe it at all). Let's go and do it (it's a five minutes "baby-easy job" :-)

  • Open VFP and create a new SCX based form. Give it a caption like "RTF Evaluator" or so.
  • Add a VFP EditBox on top of your form and name it "oEdit".
  • Add a Microsoft Rich-Text OLE-Control below your EditBox and name it "oRTF" (You should know how to do that).
  • Add a VFP CommandButton at the bottom of your form and name it "oBtnRefreshRTF".
  • Add the following code to your command button's click event: THISFORM.oRTF.OBJECT.TextRTF = THISFORM.oTXT.VALUE
  • You may wish to add the Anchor-property values 75, 30 and 6 to your controls to support form resizing.

Now you should see something like shown in figure #1 below.

RTF Evaluator in design mode 
Figure #1: A quick & dirty RTF-Evaluator

Now we are able to test all of our RTF formatting directly without making a detour. Let's see how our own RTF evaluator parses our first RTF lines above. Figures #2 and #3 show the first results. Well, it seems that Microsoft's Rich Text ActiveX control also acts very forgivingly!

One space between 1st tag and text 
Figure #2: One space between 1st tag and text

Two spaces between 1st tag and text 
Figure #3: Two spaces between 1st tag and text

Even if we "forget" the first space, our RTF control does not complain, but...

No space between 1st tag and text 
Figure #4: No space between 1st tag and text

... drops our first plain text word. This looks pretty strange but if you read the RTF documentation carefully you might guess, that this is no fault! Now, let us drop the closing bracket at the end of our RTF text. As you can see in figure #5 this is too much for our RTF control - finally it fails silently.

RTF Evaluator fails 
Figure #5: Too much of a good thing

Let's have a short look at our RTF reference. We can read there that every RTF header should consist of at least the leading "\rtf" tag extended by a digit telling us the RTF version. In addition to that this first tag should be followed by a character set control word! Let's test this like shown in figure #6 below:

Correct minimal RTF header 
Figure #6: Correct minimal RTF header

As you can see there is no different output compared to our earlier tests, because if we omit any required RTF tags, every RTF reader application should be able to insert the default tag recommended by the RTF specification. In our case "1" is the minimal RTF version and "\ansi" is the character set to be used by the RTF reader by default. As we soon will see VFP always drops the "\ansi" control word completely but uses the "\rtf1" extended control word correctly!

Okay, by now we are able to test all the RTF tags we will need in the future. One final word about our new "Mini RTF evaluator". Later in the game when we will explore more sophisticated RTF formatting you might be forced to close and restart your form. The reason for this is that all RTF formatting the ActiveX control has parsed so far stays active as long as it is not reset, either by feeding the appropriate RTF tag into the reader, or by resetting the whole OLE-Control. Let's proceed to the next important RTF header entries.

RTF Tables

Visual FoxPro uses two of all possible RTF tables: the font table and the colour table. Our RTF reference says the first table to follow the "\rtf1" control word should be a font table (if any). But obviously this isn't a must, because VFP does it the other way round and writes out the colour table first followed by the font table. No RTF reader I tested so far ever complained about VFP's "special" output order. Therefor, let's stick to VFP's ordering and talk about the colour table first.

The Colour Table

The RTF colour table is a group of its own. Therefor, it has to be embedded in curly braces ({}), too. Within the colour table group we are able to specify the colours we need in our document body later on. The RTF colour table starts with the control word "{\colortbl" immediately followed by the first colour definition. Each colour definition is closed by a semi-colon (;). After the last colour definition the group has to be closed with the ending curly brace (}). Each colour definition is made of the control words "\red", "\green" and "\blue" (always written in this sequence). Each control word is extended by the colour value. Let's look at the following example:

{\colortbl\red0\green255\blue0;}

This is a complete and valid colour table defining one colour only (green). Let's have a look on a more complex colour table:

{\colortbl\red0\green128\blue0;\red0\green0\blue255;\red0\green128\blue128;\red0\green0\blue0;\red128\green128\blue128;\red255\green255\blue0;\red255\green0\blue0;}

Here we have a colour table defining seven colours, which is the quantity VFP dumps to the clipboard generating it's own RTF colour table. Of course the colour values do vary depending on how you set up your own syntax colouring scheme. BTW: All colours defined in a RTF colour table are numbered starting with <0> (zero-based indexing)! We have to keep this in mind when referencing the defined colours within our document body.

Okay, let's add the above colour table to our first RTF formatted text and test it with our Evaluator:

{\rtf1{\colortbl\red0\green128\blue0;\red0\green0\blue255;\red0\green128\blue128;\red0\green0\blue0;\red128\green128\blue128;\red255\green255\blue0;\red255\green0\blue0;}This is my first RTF formatted document}

Notice that there is no space between the closing curly brace of the colour table group and the plain text of the document body! Figure #7 shows our test drive. Nothing new to see in the lower RTF control but no errors were thrown as well :-)

Correct RTF colour table  
Figure #7: Correct RTF colour table

Referencing Colours

Now time has come to assign some colour attributes to our plain document text. This is pretty easy to achieve. According to our RTF reference (version 1.5) there are two control words allowing us to assign a (pre)defined colour attribute to any part of our document's body text. "\cf" stands for "colour foreground" and "\cb" stands for the "colour background". Now, of course, these control words have to be extended by another numeric value indicating the index into the RTF colour table defined in the TRF header group. It's time to remember what I said about index numbering: "All colours defined in a RTF colour table are numbered starting with <0> (zero-based indexing)". Unfortunately the background colour control word "\cb" obviously isn't part of the 1.0 RTF specification. That's why most RTF readers don't support it natively. This is also true for our own RTF Evaluator's ActiveX control and the number one reason for VFP's inability to output syntax colours for text backgrounds other than the default white one.

Taking our example colour table from above we now can use the control words listed in table #1 below to assign one of the predefined colours to our document text. I grayed out the second column of the table, because VFP does not support that control word!

control word
foreground
control word
background
RTF colour definition colour
\cf0 \cb0 \red0\green128\blue0; green
\cf1 \cb1 \red0\green0\blue255; blue
\cf2 \cb2 \red0\green128\blue128; turquoise
\cf3 \cb3 \red0\green0\blue0; black
\cf4 \cb4 \red128\green128\blue128; gray
\cf5 \cb5 \red255\green255\blue0; yellow
\cf6 \cb6 \red255\green0\blue0; red

Table #1: foreground (background) colour attributes

Wait! There is one thing left we have to know about assigning colour attributes to our plain document body. Even more, this hint is valid for many RTF attributes we can assign to plain vanilla text - it is called attribute inheritance. Setting an attribute like the fore colour within plain text will stay in effect as long as no other fore colour (re)assignment occurs! All of those settings should stay intact even if a group closes (in that case they "inherit over" to the text inside the next group).

Let's give it a try and carefully add some rainbow foreground colours to our RTF source like shown below.

{\rtf1{\colortbl\red0\green128\blue0;\red0\green0\blue255;\red0\green128\blue128;\red0\green0\blue0;\red128\green128\blue128;\red255\green255\blue0;\red255\green0\blue0;}\cf0 This \cf1 is \cf2 my \cf3 first \cf4 RTF \cf5 formatted \cf6 document}

Figure #8 below shows our RTF Evaluator after having parsed the new input.

 Assigning foreground colours
 Figure #8: Assigning foreground colours

Pay attention to the fact that each foreground colour assignment has to  be followed by a space! This is a must by definition!

The Font Table

The RTF font table is a group of its own. Therefor, it has to be embedded in curly braces ({}), too. Within the font table group we are able to specify the fonts we need in our document body later on. The RTF font table starts with the control word "{\fonttbl" immediately followed by the first font-info definition. Each font-info definition can be treated like a group of its own, thus being embraced with curly braces, and has to be finalised with a semi-colon (;). Each font-info definition starts with the control word "\fN" where N is the zero-based font table index value of the font. This first entry is called the fontnumber. There are two other mandatory entries called the fontfamily and the fontname. VFP only creates the fontnumber- and the fontname entry like shown below:

{\fonttbl {\f0 Courier New;}}

Notice that VFP also puts a space between the "\fonttbl" control word and the opening brace of the font-info. Both table definitions must appear before all plain text within the RTF header region. So, let's refine our RTF source text now and check it with the help of our RTF Evaluator:

{\rtf1{\colortbl\red0\green128\blue0;\red0\green0\blue255;\red0\green128\blue128;\red0\green0\blue0;\red128\green128\blue128;\red255\green255\blue0;\red255\green0\blue0;}{\fonttbl {\f0 Courier New;}}\cf0 This \cf1 is \cf2 my \cf3 first \cf4 RTF \cf5 formatted \cf6 document}

Now with correct RTF font table
 Figure #9: Now with correct RTF font table

Of course we need a control word to assign the defined font to our document body text. We will address this issue in the next chapter. There is only one thing left worth mentioning it. As we can only have one font assigned to a VFP editor instance, VFP needs to create only one font-info in its entire font table when creating the RTF clipboard content!

Other Control Words Used by VFP

There are only a handful RTF control words left to discuss - luckily!

The \fN Control Word

The "\f" control word is used to assign a font definition to plain text within the document body. We have to replace N with the font-info index number. As we said already, VFP only defines one font, therefor we always have to deal with the "\f0" control word only. Once a font is assigned to the body text it stays active until another font assignment is encountered by the RTF reader.

The \fsN Control Word

The "\fs" control word is used to assign a new font size to the document body text. The N is the font point size in half points! Yes, I'm serious - no kidding! Therefor, if you select a 12 point font in your VFP editor you will find a "\fs24" control word in VFP's RTF code!

Let's check out both control words like shown in figure #10 below:

24pt Font DaytonItal assigned
Figure #10: 24pt Font "DaytonItal" assigned to document body

The \b Control Word

The "\b" control word is used to assign the bold font attribute to the document body text. To revert the assignment one has to use the "\b0" control word. One trailing space belongs to both versions of this control word!

The \i Control Word

The "\i" control word is used to assign the italic font attribute to the document body text. To revert the assignment one has to use the "\i0" control word. One trailing space belongs to both versions of this control word!

The \par Control Word

The "\par" control word is used to create a line break. One trailing space belongs to each appearance of this control word! Figure #11 below demonstrates the application of bold, italic and line break control words. Please notice that I've added many CRLFs  in the upper VFP Editbox manually. Even after the word "my" there is a CRLF in the RTF source. But it isn't in the rendered output. Only the "\par" control word creates a linefeed in the output!

Bold and italic font attributes and line break
Figure #11: Bold and italic font attributes and line break

This is all we have to know about RTF control words and groups by now, because VFP doesn't utilize more than the ones we've talked about. In the next (final) chapter of this part 2 we have to look at some oddities and bugs related to VFP's RTF writing capabilities and how to work around them.

VFP Bugs, Oddities and Workarounds

There is a real RTF-writer bug that hit me while playing with VFP's RTF clipboard section. I posted it to Microsoft's bug report site without high hopes, because it isn't a real show stopper. The bottom line is: the existence of the NOTE keyword within a code block will cause unpredictable corruption of RTF clipboard data! All source code that appears before the first NOTE keyword will be formatted correctly and gets transferred into the RTF clipboard data. The source code that follows the first NOTE keyword gets lost and/or will be corrupted (not correctly formatted)!
There is no workaround for this bug today. The only thing you can do to avoid it is not to use the NOTE keyword as a remark token if you need to copy the code out of VFP into another application.

I must admit that using NOTE instead of a simple * is one of my more harmless whims :-) Sometimes I'm using the NOTE keyword to write cool looking "inline remarks" like the following

IF m.llOk
   NOTE that the following section still is under construction
   FOR m.lnLoop = 1 TO 1000
        some other code goes here
   NEXT
ENDIF

Well, as long as you are copy&pasting between VFP sessions, this bug won't appear. Same is true as long as you are pasting into an application that doesn't (or isn't able to) retrieve the RTF clipboard content. But pasting to Word using the Ctrl+V shortcut always gets the most advanced (in terms of formatting) clipboard contend, which is VFP's RTF clipboard data in this case - et voilà!

Another RTF related VFP oddity is the missing support of coloured text background in the RTF clipboard data. We can solve this by providing a workaround. As we know by now VFP generates an RTF colour table containing 7 different colour definitions. The seven colours correspond to the seven different syntax areas you can select from the area dropdown shown in figure #12 below:

Seven colours <-> seven syntax areas 
Figure #12: Seven colours <-> seven syntax areas

It is pretty easy to create a mapping table used by our converter tool that enables us to map any VFP single RTF foreground colour to some foreground/background colour combination in our own HTML output.

Finally another oddity is the pretty "non-standard" RTF code VFP sometimes produces. This is nothing serious as we should be able to deal with it gently, but we have to keep an eye on it when creating our own RTF reader. Well, okay folks. That's it for the moment. Maybe I have to revisit this place sooner or later to make some corrections or additions. But for now I'm done with it.

Preview

In part 3 of this thread we will talk about how to access Windows’ clipboard using Windows’ API calls.


Previous ChapterToolbox Home (TOC)Next Chapter

No comments:

Post a Comment