Hide last authors
Ecaterina Moraru (Valica) 135.2 1 {{warning}}
2 This document seems to be out-dated, will be updated as time permits.
3 {{/warning}}
Ecaterina Moraru (Valica) 134.1 4
Ecaterina Moraru (Valica) 135.3 5 {{toc start="2"/}}
6
Ecaterina Moraru (Valica) 134.1 7 This page will hold all the information about my [[Google Summer of Code project>>http://code.google.com/soc/2008/]] - [[Office Converter>>http://dev.xwiki.org/xwiki/bin/view/GoogleSummerOfCode/OfficeImport2008]]. The project is to create a xwiki plugin to convert office douments such as MS Doc, MS Excel, Openoffice Odt to xwiki syntax and insert the result to a xwiki page. The middle setp of the conversion is to convert office document to clean tidy html code. Then use xhtml parser convert the html to xwiki syntax. {{warning}}This project is not finished and unstable. This page is for development. The target release of this project is dependence on the XE 1.6 which is not released yet. So if you want to try the newest version of this project, you need build the latest version of xwiki from the svn and download the [[officeconverter for XE 1.6>>attach:[email protected]rXE1.6.zip} and follow the README. If you want to try the office importer in XE 1.5, you can download the {attach:version for XE 1.5||]], and follow the README in the zip file to use it.{{/warning}}
8
9 == Proposal ==
10
11 === Introduction ===
12
daning 26.1 13 * use openoffice runtime as server to convert document to html code
daning 27.1 14 * clean html code
15 * parse html to xwiki syntax
Ecaterina Moraru (Valica) 134.1 16 * integrate those feature into xwiki. see below [[||anchor="mock-up"]]
17
18 === Integration mock-up ===
19
20 The features below is usable only Office Converter Plugin is installed. After discussion with Vincent, we decide the integration for office converter will be plugin + application. This is,
21
daning 75.1 22 * a xwiki plugin for converting office document to many document format, like pdf, html, xwiki syntax.
23 * a office import application for user to import office document to xwiki page
Ecaterina Moraru (Valica) 134.1 24
25 The office import application should look like below: \ [[image:OfficeImporterPage.png]] {{warning}}These two features belownwill not support in this release, as it's related to other modules in xwiki and I not have enough time till the endline of gsoc. I will discuss it in dev list and implement them in future.{{/warning}}
26
daning 75.1 27 * Import from WYSIWYG
Ecaterina Moraru (Valica) 134.1 28 ** mock up demo: attach:OfficeImporterWYSIWYG.png
daning 75.1 29 * Preview Office document
Ecaterina Moraru (Valica) 134.1 30 ** mock up demo: attach:OfficeImporterPreview.png
daning 7.1 31
Ecaterina Moraru (Valica) 134.1 32 == Current State ==
33
34 === Features ===
35
daning 105.1 36 * Convert a office document to html code and save the html code to a xwiki
37 * handle xwiki syntax in html content and escape special characters in the html content
38 * support document type: doc, xls, ppt, odt, odp, ods
39 * support convert ppt odp to a zip file and display the zip in a iframe in a xwiki page
daning 12.1 40 * handle the images in office document. Upload pictures into xwiki page as attachments
daning 45.1 41 * integrate to xwiki as a xwiki plugin
daning 105.1 42 * provide a xwiki application to import office document which can can select to convert2html or convert2xwiksyntax
43 * a unfinished convert2xwikisyntax feature. To be finished in next version.
daning 7.1 44
Ecaterina Moraru (Valica) 134.1 45 == Quick Start ==
46
47 === Install ===
48
daning 105.1 49 * latest XE 1.6 in svn trunk is required.
Ecaterina Moraru (Valica) 134.1 50 * install openoffice(>=2.3) in the computer in which xwiki will run. Refer [[http://www.openoffice.org]]
daning 37.1 51 * copy all the libs mentioned below to XWIKI_WEB_HOME/WEB-INF/lib/
Ecaterina Moraru (Valica) 134.1 52 ** All the dependanted libraries can be downloaded [[here>>attach:libs.zip]]. install requirement libraries.include:
daning 37.1 53 *** slf4j-api-1.4.3.jar
54 *** slf4j-jdk14-1.4.3.jar
Ecaterina Moraru (Valica) 134.1 55 *** jodconverter-2.2.1.jar [[http://sourceforge.net/project/showfiles.php?group_id=91849]]
daning 37.1 56 *** jurt-2.3.0.jar
57 *** juh-2.3.0.jar
58 *** ridl-2.3.0.jar
59 *** unoil-2.3.0.jar
Ecaterina Moraru (Valica) 134.1 60 *** htmlcleaner-2.0.jar [[http://htmlcleaner.sourceforge.net]]
61 * copy [[office importer plugin lib>>attach:xwiki-plugin-officeconverter-0.0.3.jar]] to XWIKI_WEB_HOME/WEB-INF/lib/
daning 69.1 62 * add the office converter plugin in xwiki.cfg
Ecaterina Moraru (Valica) 134.1 63 ** Edit your WEB-INF/xwiki.cfg file as follows:
daning 35.1 64
Ecaterina Moraru (Valica) 134.1 65 {{code}}
66 xwiki.plugins=[...], com.xpn.xwiki.plugin.officeconverter.OfficeConverterPlugin
67 {{/code}}
68
69 === Start Server ===
70
daning 37.1 71 * start xwiki as you always do.
Ecaterina Moraru (Valica) 135.2 72 * start the openoffice as a server in the computer.
daning 37.1 73 ** If you are using windows, it's a little complicated. please refer http://www.artofsolving.com/node/11 to find out.
Ecaterina Moraru (Valica) 134.1 74 ** Or you just find the executable soffice file(often it is in c:/program files/openoffice-2.3) and go to the path in command line run
daning 35.1 75
Ecaterina Moraru (Valica) 134.1 76 {{code}}
77 soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard
78 {{/code}}
79
Ecaterina Moraru (Valica) 135.2 80 * \\
Ecaterina Moraru (Valica) 134.1 81 ** If you are in linux, the simplest one is to start it from the command line with the following options:
82
83 {{code}}
84 soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard
85 {{/code}}
86
87 === Use the plugin in xwiki ===
88
89 * Import [[ the office import application>>attach:officeimporter-application0.0.3.xar]] to xwiki
daning 63.1 90 * go to Import.WebHome to convert office document
daning 105.1 91 * select the source file, input the target xwiki page's space and page name.
92 * select covnert2xhtml or convert2xwiki
93 * click "convert" button
94 * if success, you can click "result" link to see the new page.
Ecaterina Moraru (Valica) 134.1 95
daning 106.1 96 Warning
Ecaterina Moraru (Valica) 134.1 97
98 * The source file should have a normal filename with correct extension.\
99 * The target xwiki page should not existed. Otherwise, will show you not allowed to view the page.\
daning 105.1 100 * If you don't have the edit right of the target page, will show you "not allowed to view the page.")
daning 63.1 101
Ecaterina Moraru (Valica) 134.1 102 == ToDo List and plan ==
103
Ecaterina Moraru (Valica) 135.3 104 === Use htmlcleaner to clean html but not jdom filters. ===
Ecaterina Moraru (Valica) 134.1 105
106 Time: 10 hours \ Predict Begin: 2008.08.16\ Predict End: 2008.08.17\ Task:
107
108 {{html clean="false" wiki="true"}}
daning 81.1 109 * <del>clean html code to well format</del>
daning 79.1 110 * <del>remove head tag</del> head tag can be handled by xhtmlparer.
Ecaterina Moraru (Valica) 134.1 111 * replace <img> tag to {image}
112 * <del>remove empty link <a/></del>
daning 75.1 113 * replace deprecated tags of xhtml(if possible)
daning 82.1 114 ** pb: HTMLCleaner can't just simple replace a tag, so a a little hard.
daning 56.1 115
Ecaterina Moraru (Valica) 134.1 116 ===== Write test cases for the conversion =====
117
118 Time: 10 hours\
119 Predict Begin: 2008.08.17\
120 Predict End: 2008.08.18\
daning 75.1 121 Task:
Ecaterina Moraru (Valica) 134.1 122
daning 84.1 123 * <del>refactor the test framework of office converter test cases</del>
124 * <del>make small test input file(MS word, excel, powerpoint and openoffice) and verify the output</del>
125 * test the HtmlCleaner( have to implement the filter and fix some bugs in htmlcleaner, so it's out of track)
126 * <del>test the typeformat, util, and other classes</del>
daning 7.1 127
Ecaterina Moraru (Valica) 134.1 128 ===== Insert task <ins>convert2html</ins> =====
129
130 see [[here>>http://xwiki.markmail.org/search/?q=#query:+page:1+mid:2u2to6ywsqqcx42b+state:results]]
131
daning 87.1 132 * <del>implement a convert2html feature</del>
daning 112.1 133 * <del>clean the code</del>
134 * <del>write javadoc</del>
daning 105.1 135 * <del>write readme</del>
136 ** <del>feature list</del>
137 ** <del>quick start for how to use it</del>
Ecaterina Moraru (Valica) 134.1 138 <p/>
139 Time: 5 hours\
140 Predict Begin: 2008.08.19\
141 Predict End: 2008.08.19\
daning 86.1 142
Ecaterina Moraru (Valica) 134.1 143 ===== Convert xhtml to xwiki syntax 2.0 =====
daning 86.1 144
daning 75.1 145 Main Task:
Ecaterina Moraru (Valica) 134.1 146
daning 75.1 147 * Write test cases for WikimodelXHTMLParser. Consider all the base tags in xhtml.
Ecaterina Moraru (Valica) 134.1 148 * submit patches to [[wikimodel>>http://code.google.com/p/wikimodel]] and xwiki-core-rendering to make WikimodelXHTMLParser + XWikiSyntaxRendering works well for all the test cases.\
149 <p/>
150 Time: about 8 days\
151 Predict Begin: 2008.08.18\
152 Predict End: 2008.08.26\
153 Detail Plan for this:
154
155 |=Name|=Predict time|=Predict begin|=Predict end|=Test cases|=Problems
156 |<del>Base text format</del>|About 1 day|2008.08.18|2008.08.19|
157
158 |{{code}}
daning 75.1 159 <b>
160 <strong>
161 <i>
162 <u>
163 <s>
164 <strike>
165 <em>
166 <del>
167 <ins>
168 <sup>
169 <sub>
170 <p> (existed)
171 title or section level(existed)
172 <hr>
173 <br>
Ecaterina Moraru (Valica) 134.1 174 {{/code}}
175
176 | |if the tag is deprecated in xhtml, like <u>, how to deal with it. That would be the role of the HTML cleaner. So I need to do it in the "html clean" step. Add TagHandler in wikimodel's XhtmlHandler and add blocks, parser method in xwiki-core-rendering
177 |List|About 2 days|2008.08.19|2008.08.21|
178
179 |{{code}}
daning 75.1 180 <html>
181 <ol>
Ecaterina Moraru (Valica) 135.2 182 <li>Item 1
183 <ol>
184 <li>Item 2
185 <ul class="star">
186 <li>Item 3</li>
187 </ul>
188 </li>
189 <li>Item 4</li>
190 </ol>
191 </li>
192 <li>Item 5</li>
daning 75.1 193 </ol>
194 <ul class="star">
Ecaterina Moraru (Valica) 135.2 195 <li>Item 1
196 <ul class="star">
197 <li>Item 2
198 <ul class="star">
199 <li>Item 3</li>
200 </ul>
201 </li>
202 <li>Item 4</li>
203 </ul>
204 </li>
205 <li>Item 5</li>
206 <li>Item 6</li>
daning 75.1 207 </ul>
208 </html>
Ecaterina Moraru (Valica) 134.1 209 {{/code}}
210
211 |This is hard to fix. Need to see what happen in wikimodel's xhtmlparser.
212 |<del>Links</del>|About 2 days|2008.08.21|2008.08.23|
213
214 |{{code}}<a href="http://www.xwiki.org">xwiki</a>{{/code}}
215
216 |This is hard too. If can't solve in parser, I will use filter to replace link to xwiki syntax when clean html.
217 |Table|About 2 days|2008.08.23|2008.08.25|
218
219 |{{code}}
daning 75.1 220 <html>
221 <body>
222 <table>
223 <tr>
Ecaterina Moraru (Valica) 135.2 224 <th>1.1</th>
225 <th>1.2</th>
daning 75.1 226 </tr>
227 <tr>
Ecaterina Moraru (Valica) 135.2 228 <th>2.1</th>
229 <th>2.2</th>
daning 75.1 230 </tr>
231 </table>
232 </body>
233 </html>
Ecaterina Moraru (Valica) 134.1 234 {{/code}}
daning 75.1 235
Ecaterina Moraru (Valica) 134.1 236 |even harder because it's handled by macro in new rendering. Can I just add a simple temporary tableblock solution .
237 |Image|5 hours|2008.08.25|2008.08.25|
238
239 |{{code}}<img src="imgurl"/>{{/code}}
240
241 |just ignore as I replace <img> to {image}
242 |attribute|10 hours|2008.08.25|2008.08.26|
243
244 |{{code}}<p align="center" color="red">middle</p>{{/code}}
245
246 |use the style, but how? Need to find out.
247 |class| | | |
248
249 |{{code}}<span class="underline">test</span>{{/code}}
250
251 |maybe ignore, just as the same without class.
252 |font| | | |
253
254 |{{code}}<font size="1" style="font-size: 8pt">test</font>{{/code}}
255
256 |ignore? or something else.
257
258 ===== Make ppt and odp works =====
259
260 <del>as ppt and odp have generate multi html pages, so how to assemble them to a xwiki page.</del>\
261 Time: about 1 day\
262 Predict Begin: 2008.08.27\
daning 75.1 263 Predict End: 2008.08.28
264
Ecaterina Moraru (Valica) 134.1 265 ===== Test the project on windows =====
266
267 <del>As I use linux for develop, I need test to see if the project work well on windows OS.</del>\
268 Time: 5 hours\
269 Predict Begin: 2008.08.28\
270 Predict End: 2008.08.29\
daning 75.1 271 Maybe if you are using windows OS, you can help me test it. Thanks.
272
Ecaterina Moraru (Valica) 134.1 273 ===== Documents and package =====
daning 76.1 274
Ecaterina Moraru (Valica) 134.1 275 <del>Javadoc and Readme.</del>\
276 Time: 5 hours\
277 Predict Begin: 2008.08.29\
278 Predict End: 2008.08.30\
279
280 == <del>Old Plan</del> ==
281
282 {{info}}There are the function point of office import. I'll give the detail plan and time line soon for every sections. The plan is still a draft. Any suggestion and discussion is very appreciated.{{/info}}
283
284 === core of plugin July 8 - July 12 ===
285
daning 61.1 286 Actually, this work will last to the end of the project, as the core code need to change to meet the high level api.
Ecaterina Moraru (Valica) 134.1 287
daning 58.1 288 * Todo
289 ** Clean up code. provide low level api and high level api. Hense the plugin can be used in xwiki page and other part of xwiki both.(to be detail)
290 ** handle the conflict of the xwiki syntax(maybe it's the job of xhtmlparser)
291
Ecaterina Moraru (Valica) 134.1 292 === Integration with xwiki ===
293
294 ==== Develop a application July 13 - July 15 ====
295
daning 58.1 296 * upload a file
297 * select the target page
298 * convert the document to the page
Ecaterina Moraru (Valica) 134.1 299 <p/>
daning 58.1 300 problem
Ecaterina Moraru (Valica) 134.1 301
302 * <del>how to upload a file using fileuploadplugin and get the byte[] of the file.</del>
daning 75.1 303 * <del>a new page or insert to the existed page</del>
daning 58.1 304
Ecaterina Moraru (Valica) 134.1 305 == Source Code ==
daning 58.1 306
daning 6.1 307 This project is just started and only product the initial code. Any suggestion is appreciated. And please add comment to this page to discuss.
308
Ecaterina Moraru (Valica) 134.1 309 * svn for office converter plugin: [[https://svn.xwiki.org/svnroot/xwiki/sandbox/xwiki-plugin-officeimporter]]
310 * svn for office import application: [[http://svn.xwiki.org/svnroot/xwiki/sandbox/xwiki-application-officeimporter]]
daning 5.1 311
Ecaterina Moraru (Valica) 134.1 312 == Build ==
313
314 This project use maven2 as project management tool. You can get the source code, type "maven install" to get the plugin package. \
daning 105.1 315 But as it dependence on some libs which are not release yet, you need to build the dependencies if you want to try the latest version.
Ecaterina Moraru (Valica) 134.1 316
daning 105.1 317 * get the latest code from the svn for these libs below
318 ** xwiki-core
319 ** xwiki-core-rendering
320 ** xwiki-core-xml
321 ** org.wikimodel.wem
322 * Patch them as these issues:
323 ** http://code.google.com/p/wikimodel/issues/detail?id=34
324 ** http://jira.xwiki.org/jira/browse/XWIKI-2568
325 * install these libs above to your maven repository
326 * if you want to test the project with "mvn test" or "mvn install", you should start the openoffice as a server
327 * if you want to build it without test, you should run "mvn install -Dmaven.test.skip=true"
daning 50.1 328
Ecaterina Moraru (Valica) 134.1 329 == POM File ==
daning 50.1 330
Ecaterina Moraru (Valica) 134.1 331 Please see [[pom.xml>>http://svn.xwiki.org/svnroot/xwiki/sandbox/xwiki-plugin-officeimporter/pom.xml]]
daning 56.1 332
Ecaterina Moraru (Valica) 134.1 333 == Reference Libraries ==
daning 75.1 334
Ecaterina Moraru (Valica) 134.1 335 Libraries dependented by Office Importer.
Asiri Rathnayake 124.1 336
Ecaterina Moraru (Valica) 134.1 337 * [[JODConverter>>http://www.artofsolving.com/opensource/jodconverter]]
338 * [[HtmlCleaner>>http://htmlcleaner.sourceforge.net/]]
339 * [[Openoffice>>http://www.openoffice.com]]
Asiri Rathnayake 125.1 340
Ecaterina Moraru (Valica) 134.1 341 == Sample Imports (Known issues) ==
Asiri Rathnayake 127.1 342
Ecaterina Moraru (Valica) 134.1 343 This section presents some of the results we obtained with Office Importer along with the original documents for comparison. The purpose of this comparison is for us to identify the shortcomings and difficulties with importing office documents into wiki pages. Each of following pages analyses the result of a single document import operation. Please note that all the documents mentioned below were taken from [[scribed>>http://www.scribd.com/]] and that all of them were imported with filtering mode set to **None**.
Asiri Rathnayake 127.1 344
Ecaterina Moraru (Valica) 134.1 345 * [[73234.xls>>73234XLS]]
346 * [[1622132.xls>>1622132XLS]]
347 * [[5927315.doc>>5927315DOC]]
Asiri Rathnayake 129.1 348
Ecaterina Moraru (Valica) 134.1 349 == Support ==
Asiri Rathnayake 130.1 350
daning 75.1 351 Any question and problem, please send email to [email protected](need to subscribe) or to me daning106(at)gmail.com
Ecaterina Moraru (Valica) 134.1 352
353 {{/html}}

Get Connected