Wiki source code of Sample Import: 73234XLS

Last modified by Vincent Massol on 2024/02/26 17:53

Hide last authors
Ecaterina Moraru (Valica) 14.1 1 == 73234XLS ==
Asiri Rathnayake 3.1 2
Ecaterina Moraru (Valica) 14.1 3 Following are the noticeable importer defects resulting from [[attach:73284.xls]]:
Asiri Rathnayake 6.1 4
Ecaterina Moraru (Valica) 14.1 5 * **The sizing of content**: in general, the resulting wiki page seems to be quite larger than that of the original document. There are several reasons for this:
6 ** //**Stripping of style tags**//: The original html content has the following style tag defined in the html header:
7
8 {{code}}
Asiri Rathnayake 6.1 9 <STYLE>
10 <!--
11 BODY,DIV,TABLE,THEAD,TBODY,TFOOT,TR,TH,TD,P { font-family:"Arial"; font-size:x-small }
12 -->
Ecaterina Moraru (Valica) 14.1 13 </STYLE>
14 {{/code}}
15
16 Since we do not address these style tags within Office Importer, the imported content has quite large font sizes.
17
18 {{html clean="false" wiki="true"}}
19 ** //**<colgroup> elements used by OO**//: The OO server used to convert office documents into html generates tables like the following:
20
21 {{code}}
Asiri Rathnayake 6.1 22 <table>
23 <colgroup>
24 <col width="87" />
25 <col width="152" />
26 <col width="50" />
27 </colgroup>
28 <tbody>
29 <tr>
30 <td/>
31 <td/>
32 <td/>
33 </tr>
34 </tbody>
35 </table>
Ecaterina Moraru (Valica) 14.1 36 {{/code}}
37
38 Even though this is valid xhtml, our xhtml parser neglects these <colgroup> elements at the moment. This can be further illustrated with the following rendering test case which passes:
39
40 {{code}}
Asiri Rathnayake 6.1 41 .#-----------------------------------------------------
42 .input|xhtml/1.0
43 .#-----------------------------------------------------
44 <table><colgroup><col width="87" /><col width="152" /><col width="50" /></colgroup><tbody><tr><td/><td/><td/></tr></tbody></table>
45 .#-----------------------------------------------------
46 .expect|event/1.0
47 .#-----------------------------------------------------
48 beginDocument
49 beginTable
50 beginTableRow
51 beginTableCell
52 endTableCell
53 beginTableCell
54 endTableCell
55 beginTableCell
56 endTableCell
57 endTableRow
58 endTable
59 endDocument
60 .#-----------------------------------------------------
61 .expect|xwiki/2.0
62 .#-----------------------------------------------------
63 |||
Ecaterina Moraru (Valica) 14.1 64 {{/code}}
Asiri Rathnayake 6.1 65
Ecaterina Moraru (Valica) 14.1 66 So as a result, the width information of the table is lost and the resulting table indeed looks fat.
Asiri Rathnayake 10.1 67
Ecaterina Moraru (Valica) 14.1 68 ** //**Placement of images**//: If we examine the original document we can see that none-textual components like charts are not placed inside table cells (of the spreadsheet) but floats on the document. But this is not true about the generated html where the images are embedded inside some table cells depending on their positioning. This also causes some sizing and alignment problems.
69 * **Spurious Table Cells**: In the wiki result, the second table appears to have several table cells out of order. This is true for spreadsheets containing a lot of images (like charts). The html generated from OO server outputs tables that are out of shape and contains a lot of spurious table cells because it tries to embed the images into the tables.
70 * **Horizontal Lines**: The html generated by OO server for spreadsheets contains <hr/> elements used to separate multiple spreadsheets. {{warning}}Open Question: Should these horizontal lines be filtered from spreadsheets while cleaning?{{/warning}}
71 * **Italic Headings**: The headings generated by OO server looks somewhat like:
72
73 {{code}}<H1>Sheet 1: <EM>Survey sheet</EM></H1>{{/code}}
74
75 Thus the resulting headings are in italics.{{warning}}Open Question: Should these headings be corrected in spreadsheets while cleaning?{{/warning}}
76
77 * **Missing vertical orientations**: The second table of the original document has some table cells where the cell content (text) is vertically oriented, but this is completely missing in the resulting wiki page. This is because the OO server cannot produce html that renders vertically oriented text.
78 <p/>
79
80 {{/html}}

Get Connected