MohammadTaghi Hajiaghayi,
Department of Computer Science,
University of Waterloo,
mhajiaghayi@math.uwateroo.ca
Available On-line at:
http://www.cs.uwaterloo.ca/~mhajiaghayi/mathml/mathml1.html
Slides
of this presentation
Abstract
1. Introduction
Mathematical Markup Language (MathML) is an application for defining and expressing mathematical documents on the WEB, just like HTML which provides this functionality for text documents. Its final goal is to provide a standard notation that other WEB softwares and application are be capable of reading and expressing mathematical formulas very easily.
Because of lack of standard notaitons for representing of mathematics on the web, the current method for defining the mathematical expressions is using the GIF or other types of images. However they are not appropriate at all. Suppose that you want to copy and paste a GIF formula. There would be a lot of problems, because you must copy and paste the actual image file. Also, the problem arises in the case that you want to change the formula somewhat. However, using the MathML you are be able to copy and paste or change the mathematical formula like the other HTML text. Another property of MathML is that, people with visual disabilities can also understand the mathematical formula, because the MathML documents can be easily converted into alternative media such as speech or Braille. Also MathML provides a more efficient method for transferring the mathematics formulas on the web, since sending the Image files instead of formulas is not efficient mainly because of the large size of them.
At present, W3C which is the main sponsor of MathML has released three recommendation of MathML. One is W3C MathML 1.0 Recommendation released on 7th April 1998. Another is MathML 1.01 Recommendation which was release on 7 July 1999 [1] and the other is Proposed Recommendation 2.0 published on 8 January 2001.
Now, several vendors offer plug-ins appletes which can render the MathML encoding. However still there are a few translators and equation editors which are capable of generating HTML pages with embedded MathML codes. So far, W3C's Amaya browser supports MathML as does E-Lite by ICEsoft. Netscape's new unreleased browser, titled Gecko, is planning on fully supporting MathML, and maybe including a WYSIWYG editor for mathematical equations [3]. It is hoped that MathML will be also available through the use of plug-ins for browsers that can't fully implement it yet.
However, many organizations have expressed support for MathML.
Some of them are as follow: IBM, makers of the Techexplorer Scientific
Browser; Wolfram Research, makers of Mathematica 3.0; Waterloo Maple, makers
of Maple V; Hewlett-Packard, makers of the EzMath plug-in; the American
Mathematical Society, developers of a LaTeX to MathML translator;
Design Science, makers of the MathType equation editor; and yours truly,
Design Science, makers of the WebEQ suite of MathML tools [3]. Also there
are some conferences and workshops e.g. MathML International Conference
2000 [4] and some discussion sites e.g [7] concerning MathML. Also, there
are so many brief papers about introduction or features of MathML such
as [5, 6, 8]
2. MathML Overview
2. 1. Presentation and Content
MathML supports both of these content encoding and presentation encoding and you can easily chose each of them in different situation depending upon which of them is easier and more appropriate for the expressing the formula. Even you can mix this two encoding and use the hybrid of both of them.
In MathML, there are about 28 MathML elements, with about 50 attributes which are used for presentation encoding [1]. Most of these elements are used for representing templates or patterns for laying out subexpressions. For an example mfrac is an element for expressing a fraction of two subexpressions by putting one over the other with a line in between. When you use elements like this, renderer or other MathML softwares generate or print it in an appropriate form without mentioning any futher information concerning the special software or hardware media, however, still you have to mention the tickness of the line or the length of the line by adding some attributes (the default value will be used if you don't mention them explicitly). So the presentation encoding provide a middle level for expressing the mathematical formulas.
For content encoding there are about 75 elements with about only 12 attributes [1]. As you can see the number of attributes here is so smaller than the number of presentation attributes. The main reason is that in content encoding we rarely deal with attributes concerning representation. Most of these elements are used for representing the mathematical operations and functions, such as plus and sin or other mathematical concepts like set and vector. These elements provide a high level way for expressing formulas.
In MathML, presentation and content elements provide a rich set of means for representing the mathematical expressions without dealing with very low level attributes such as media-dependent attributes.
Now, let us consider an example. Suppose you want to show this expression:
(a * b)^2
The presentation encoding for this formula is as follows:
Now, let us show the same expression using content encoding:
As you see the structure of this example is very similar to
the structure of presentation encoding, however the elements used are content
elements instead of presentation elements. Each subexpression is
enclosed between <apply> and </apply> tags and
also it is represented in prefix form. This form of representation
is common in Content encoding, because MathML content encoding considers
all operations as function with some operands. Also, we have used
<ci>
and <cn> tags instead of <mi> and <mn> tags.
In addition, we can mix the content and presention encoding and get
a hybrid encoding like below example for above experssion.
Most of expressions in MathML are nested and the best way
for expressing these nested formula and having a good idea in our mind
specially when the formula is complicated is expression trees. In these
expression trees each node represents an operation or a particular layout
schema, and its children represent the subexpression. This description
is not only used for bettter understing of expressions but also used for
knowing how the MathML tags should be nested on the screen or on
the printer. We discuss more about this issue after introducing the layout
boxes.
You can see one mathematical notation and its tree structure cited from
[3] in figure below:
In fact, layout boxes are corresponded to nodes of expression tree, and these two models i.e. expression tree and layout boxes provide an abstract model for representing, evaluating and rendering the mathematical formulas. This abstract model is not only used for people to better understanding the expressions, but also used for renderer to find the size and dimensions of these boxes and provides an abstract model for evaluating of a expression especially in system algebra softwares.
It is worth mentioning that in spite of that simple layout boxes are the smallest substructures of MathML presentation encoding, however they are still media independent and they are middle level models.
You can see the layout boxes and it's corresponding nodes in the above
figure. For an example, if you want to evaluate the dimension of the root
of this tree first, you must compute the dimension of the simple layout
box 3, then you must compute the size of the complicated layout box (x+
2) recursively and finally consider an attribute of "thickness" or
"length" of the fraction line. Using all these information you can
compute the size of the whole box of 3/(x+ 2).
The style of mark-up in MathML is very similar to that of
HTML, however, because the number of notation in the mathematical formula
is more than the number of notation in regular text, naturally the
number of start tag/end tag is very high in MathML. The syntax of using
attributes for tags is similar to HTML. Say precisely,
MathML has two kinds of elements. Most elements have start and end
tags of the form:
<element_name> ... </element_name>
where other elements can be nested in between start tag and end tag. Also MathML has empty tags of the form:
<element_name/>
where these elements have only one tag, which looks like a hybrid between a start and an end tag.
Some elements of MathML can have a few attributes however the other can have a dozen or even more. Again similar to HTML documents each attribute provides additional information about a specific tag. Each attribute has a name and a value and it is used in start tags and in between element name and the final '>'. In the case of empty element, attributes are used in between the element name and the final '/>'. In addition, the value must be quoted between single or double quotes. A general form can be represented as follows:
<mfenced>
1:
Sets (Generic form: <set> [<elt1> <elt2> ... | <condition>]
</set>)
A set represent a set of data which all must be from the same type.
There are two ways for showing a set: specify the elements of a set explicitly
or only say some conditions like "all real y such that 1< y <
2". This condition can be represented by interval container elements
introduced below.
2: Intervals (Generic form: <interval> <pt1> <pt2> </interval>)
These elements were the the main container elements of MathML, however still there are other kinds of container elements such as list which are introduced in [1, 2].
4.2. Operators and Functions
For an example consider the expression below:
You may find a wide variety of examples of this kind in references [1, 2, 3].
In this presentation we have introduced the MathML and briefly discuss
about main points of it. We said that MathML is very good standard for
using and showing mathematics in the WEB. But, the problem is that MathML
codes are not so readable, and so we must use other software's and
interfaces to create, manipulate and show them. Now there is a question:
It wouldn't be better if we use an alternative already exists in some software's
such as Mathematical or Maple, for example we use only a
start tag like <mathematics use=Mathematica> and an end tag</mathematics>
in HTML (with possible other values, like Maple). This may be expensive
in the short run (because of copyright), but maybe much cheaper and beneficial
in the longer term. The problem of copyright is also not so important,
because, after developing some other interfaces for MathML, they will be
not necessarily free and so this problem will be appeared again. In addition,
using of this software's will increase the portability of mathematics in
the web and in the computing software area. In the mean time, the
MathML proposal as in the draft paper can be published, but not as a new
standard but as a contribution to the discussion.
I believe that the answer of the above question is 'no',
and my reasons are mainly those which I have mentioned in the Introduction
section,
but I think this question must be answered deeper. You can also find futher
discussion about this issue in [7].
6. References:
[1]: Mathematical Markup Language (MathML[tm]) 1.01 Specification WIC Recommendation, revision of 7 July 1999
[2]: Mathematical Markup Language (MathML) Version 2.0 W3C Proposed Recommendation 08 January 2001
[3]: Gentle Introduction to MathML
[4]: MathML International Conference 2000
[5]: The Interchange of Mathematics in XML: MathML, OpenMath and their Application
[6]: MathML - What's in it for us?
[7]: The Disappointment and Embarrassment of MathML - update: Including Reactions and Answers
[8]: Putting Mathematical Notation on the Web