Import XML data file as text

pbaxter-disable · ‎Feb 18, 2010

The attached XML data file is output from a program. When rendered in a web browser the text must be manually copied & pasted into a text file to strip out the tags and non-rendering information.

What I need is some method of reading in the DATA (ie text only) into Mathcad where it will get used later on. Any ideas from the programmers out there?

{Update: I just realized the XML file is too large to add to this message. If someone wants a "real" example I'll be happy to email it to you. But the question is the same ... is there some way to transform a general XML file to be just plain text in MC?}

Thanks,

Preston

StuartBruff · ‎Feb 18, 2010

On 2/18/2010 12:55:56 PM, pbaxter wrote:
== {Update: I just realized the XML file is too large to add to this message. If someone wants a "real" example I'll be happy to email it to you. But the question is the same ... is there some way to transform a general XML file to be just plain text in MC?}

Probably. I've got a few functions that can help parse files and I've got a worksheet somewhere that can aid in debugging a corrupted xmcd file.

Even thought the whole xml file may be long, can you zip it or post just a representative example (that gives the structure of the file and some typical values)?

Stuart

pbaxter-disable · ‎Feb 18, 2010

ZIPPED

StuartBruff · ‎Feb 19, 2010

On 2/18/2010 1:37:46 PM, pbaxter wrote:
>ZIPPED

Written whilst munching breakfast, so it's just a quick play - I gather that the main sections of "data" are separated by the "PREFORMATTED TEXT" tag but I don't know the rules for further breaking down the data.

The worksheet first scans the xml file to get each "PREFORMATTED TEXT" section. A function exists that then converts a specified section into individual lines.

Stuart

pbaxter-disable · ‎Feb 19, 2010

Stuart,

Thanks, this looks like it can be made to work! What's the best way to simply put together ALL of the XML data into a single file? Looks like it is currently set up to extract one page at a time. Since I don't know ahead of time how many pages I'll have (there will be hundreds of these data files ... hence the need for automation) I can't simply go from 1 to x.

Thanks,
Preston

pbaxter-disable · ‎Feb 19, 2010

This "program" only returns the value of xmllines(0) ... none of the other xmllines(n) values get added. Likely that I'm using the ON ERROR and BREAK commands incorrectly?

This will be a good programming technique for me once I find out what I'm doing wrong. :+)

Preston

RichardJ · ‎Feb 19, 2010

The "on error" statement has always seemed to me to be backwards. It evaluates the statement to the right of "on error". If that does not generate an error, that's what you get (in this case, it breaks every time, since that will never generate an error). If the statement to the right generates an error it uses the statement to the left to generate the return value. So switch the statements.

Richard

pbaxter-disable · ‎Feb 19, 2010

Switch the statements?

If the right side of ON ERROR is what I want to happen if there is NOT an error, then I put my stack out to the right. Since I want the loop to break, that command goes on the left side? This still doesn't work:

Preston

RichardJ · ‎Feb 19, 2010

You have somehow introduced an error in your loop definition that wasn't there before.

Richard

pbaxter-disable · ‎Feb 19, 2010

Just realized that the xmllines function results in a double-spaced text file. Maybe in post-processing there's a way to simply ignore every-other line, since I don't want to delete ALL the blank lines.

I'm gathering that there isn't a simple way to (via script or program) to open the XML - rendered as text - then copy the text and write to another file? Currently we're doing this manually, but would MUCH prefer to find an automated method.

Thanks again!
Preston

StuartBruff · ‎Feb 19, 2010

On 2/19/2010 4:02:30 PM, pbaxter wrote:
== Just realized that the xmllines function results in a double-spaced text file. Maybe in post-processing there's a way to simply ignore every-other line, since I don't want to delete ALL the blank lines.

I don't think all of it is double spaced.

You could certainly write a simple algorithm to scan for and remove triple lines. You could manually add a character to the 'empty' lines you wanted to keep, delete the rest and then (automatically) replace the holding characters with "".

== I'm gathering that there isn't a simple way to (via script or program) to open the XML - rendered as text - then copy the text and write to another file? Currently we're doing this manually, but would MUCH prefer to find an automated method.

I'm not quite sure what it is exactly that you want to do. The method presented is sufficient to do that. Once you've got an array of the 'text' element of the xml file (which your loop should do - see attached if it isn't), then you can create a simple program to output it to a file.

Do you want to retain page numbers and heading? It looks like the structure is the same for all pages except the first, so a general header removal function should also be straightforward.

Stuart

pbaxter-disable · ‎Feb 21, 2010