Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo
- - - - -

Ms Word To Html Convertor


  • Please log in to reply
8 replies to this topic

#1 rohgan03

rohgan03

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 944 posts

Posted 14 January 2005 - 03:22 PM

HI All,

I have some MS Word files that I need to convert to HTML. When I do so with Word it makes the HTML very complicated. I dont want any font tags etc and all I want is simple conversion using <p> {new para} .<b> {bold} and < li> ( for bullets).

No font tags or other extra tags since all this is picked up on my site from teh stylesheet..

Any ideas on whats the best way?

#2 mcanerin

mcanerin

    HR 7

  • Active Members
  • PipPipPipPipPipPipPip
  • 2,242 posts
  • Location:Calgary, Alberta, Canada

Posted 14 January 2005 - 03:26 PM

I know that FP2000+ can clean up Word HTML, and also there is a specific command in HTML Tidy that can do so:

http://tidy.sourcefo....html#word-2000

Since HTML Tidy can be used in batch mode, that might be a good start.

Ian

#3 Googlewhacked

Googlewhacked

    Got geek?

  • Active Members
  • PipPipPipPipPip
  • 348 posts
  • Location:Florida: The Plywood State

Posted 14 January 2005 - 03:32 PM

roghan03,

I don't know what version of Word you are using, but this plug-in from MS will go a long way towards cleaning up your Word-generated HTML files:

The only catch: it is for Word 2000...

Phil

#4 Wired Paul

Wired Paul

    HR 2

  • Members
  • PipPip
  • 16 posts

Posted 14 January 2005 - 03:47 PM

FrontPage does this very well, especially 2003

If you can get your hands on Macromedia Dreamweaver there's a command "Clean Up Word Html".

Also when you choose to save the document as webpage from the "File" menu on Word the resulting screen lets you choose the File Type, which you should set to "WebPage with filter" - or something like that - straight translation from my non-English Office.

Another possible walkaround would be to save your document from Word as html; view it in MS Explorer; then select and copy the text to some HTML editor, they may preserve the formatting but rid the code of garbage. No guarantee here, it works in Dreamweaver, may work in HomeSite, just try if nothing helps.

#5 rohgan03

rohgan03

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 944 posts

Posted 14 January 2005 - 07:41 PM

Looks like this will rid the document of word specific markups...but what about the font tags etc?

The document has a large number of bulleted lists with sub bullets. I need something that would give me simple HTML using ul and li tags.

Any suggestions?

#6 Hyperformance

Hyperformance

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 634 posts
  • Location:Chicago, Illinois

Posted 14 January 2005 - 08:50 PM

I have had to do this myself when clients provide me with Word documents...

It's a simple solution, and you lose all formatting, but I also lose all that MS Word stuff that's just not needed.

I drop it into notepad, save it and copy it then into my html editor... removes everything. Doesn't cost a dime. biggrin.gif

#7 Ron Carnell

Ron Carnell

    HR 6

  • Moderator
  • 959 posts
  • Location:Michigan USA

Posted 15 January 2005 - 03:41 AM

QUOTE
Looks like this will rid the document of word specific markups...but what about the font tags etc?

In FrontPage: Ctrl-A (select all) Ctrl-Shift-Z (remove formatting)

That should remove all inline formatting, such as bold, italic and fonts, but still leave the block level formatting, like a list or H1, intact.

And, believe it or not, the IE DOM actually has a way to accomplish that through JavaScript. Run a Google search on execCommand RemoveFormat for more information.

#8 Wired Paul

Wired Paul

    HR 2

  • Members
  • PipPip
  • 16 posts

Posted 15 January 2005 - 12:46 PM

QUOTE(rohgan03 @ Jan 14 2005, 08:41 PM)
Looks like this will rid the document of word specific markups...but what about the font tags etc?

The document has a large number of bulleted lists with sub bullets. I need something that would give me simple HTML using ul and li tags.

Any suggestions?
View Post


It shouldn't; <font>, <ul> et al will get carried over. Have you tried any of the methods? Anything worked?

#9 lyn

lyn

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 940 posts
  • Location:London, Ontario

Posted 15 January 2005 - 05:27 PM

if you can get to Dreamweaver, you can use the Word clean-up to get rid of most of the MS code, then use Replace on the Source to strip away any other codes, such as font tags.
Did you try Ian's suggestion of using Tidy?

L.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users