Notices
Computer & Technology Related Post here for help and discussion of computing and related technology. Internet, TVs, phones, consoles, computers, tablets and any other gadgets.

read txt from multiple html files???

Thread Tools
 
Search this Thread
 
Old 05 June 2003, 08:29 AM
  #1  
shunty
Scooby Regular
Thread Starter
 
shunty's Avatar
 
Join Date: Aug 2001
Location: wakefield
Posts: 2,082
Likes: 0
Received 0 Likes on 0 Posts
Post

we have 600 html files that contain text that needs to be exported into 1 readable file of any format......
does anyone know how this can be done ??

cheers

shunty
Old 05 June 2003, 08:44 AM
  #2  
JackClark
Scooby Senior
 
JackClark's Avatar
 
Join Date: Dec 2000
Location: Overdosed on LCD
Posts: 20,852
Received 51 Likes on 34 Posts
Post

Is the text in the same place in all the files? If so record a macro in Word.
Old 05 June 2003, 11:33 AM
  #3  
shunty
Scooby Regular
Thread Starter
 
shunty's Avatar
 
Join Date: Aug 2001
Location: wakefield
Posts: 2,082
Likes: 0
Received 0 Likes on 0 Posts
Post

just going to give that a go Jack.
cheers

shunty
Old 05 June 2003, 12:44 PM
  #4  
David_Wallis
Scooby Regular
 
David_Wallis's Avatar
 
Join Date: Nov 2001
Location: Leeds - It was 562.4bhp@28psi on Optimax, How much closer to 600 with race fuel and a bigger turbo?
Posts: 15,239
Likes: 0
Received 1 Like on 1 Post
Post

easy with a script..

mail me if you want something doing.

David
Old 05 June 2003, 02:02 PM
  #5  
stevencotton
Scooby Regular
 
stevencotton's Avatar
 
Join Date: Jan 2001
Location: behind twin turbos
Posts: 2,710
Likes: 0
Received 1 Like on 1 Post
Post

You need Perl.
Old 05 June 2003, 03:44 PM
  #6  
shunty
Scooby Regular
Thread Starter
 
shunty's Avatar
 
Join Date: Aug 2001
Location: wakefield
Posts: 2,082
Likes: 0
Received 0 Likes on 0 Posts
Post

knit one perl one
hello steve, have you got something already done ??
heres one you made earlier - perl2exe
I'm sure I would have something in exchange

shunty
Old 05 June 2003, 08:44 PM
  #7  
stevencotton
Scooby Regular
 
stevencotton's Avatar
 
Join Date: Jan 2001
Location: behind twin turbos
Posts: 2,710
Likes: 0
Received 1 Like on 1 Post
Post

From HTML::TokeParser::Simple (search.cpan.org), assuming 5.6.x or earlier:

#!/usr/local/bin/perl -w


use strict;
use HTML::TokeParser::Simple;


my $p = HTML::TokeParser::Simple->new( $somefile );

while ( my $token = $p->get_token ) {
# This prints all text in an HTML doc (i.e., it strips the HTML)
next unless $token->is_text;
print $token->as_is;
}

If that's too high a level you can use HTML::TokeParser. 'Course you'll have to modify this to iterate over all your files and put the filename in $somefile, which will have to be relatively or absolutely pathed.

(aren't there any [CODE] tags?)

[Edited by stevencotton - 6/5/2003 8:47:18 PM]
Old 06 June 2003, 08:49 AM
  #8  
shunty
Scooby Regular
Thread Starter
 
shunty's Avatar
 
Join Date: Aug 2001
Location: wakefield
Posts: 2,082
Likes: 0
Received 0 Likes on 0 Posts
Post

thanks, I'll see if I can get me head in gear this morning well it is Friday

anything you need if so mail me as per profile

cheers

shunty
Old 06 June 2003, 09:30 AM
  #9  
stevencotton
Scooby Regular
 
stevencotton's Avatar
 
Join Date: Jan 2001
Location: behind twin turbos
Posts: 2,710
Likes: 0
Received 1 Like on 1 Post
Post

A brand spanking new 996 Turbo with factory performance upgrade please
Old 06 June 2003, 10:58 AM
  #10  
shunty
Scooby Regular
Thread Starter
 
shunty's Avatar
 
Join Date: Aug 2001
Location: wakefield
Posts: 2,082
Likes: 0
Received 0 Likes on 0 Posts
Post



shunty
Related Topics
Thread
Thread Starter
Forum
Replies
Last Post
Wingnuttzz
Member's Gallery
30
26 April 2022 11:15 PM
Mattybr5@MB Developments
Full Cars Breaking For Spares
38
17 July 2016 10:43 PM
Mattybr5@MB Developments
Full Cars Breaking For Spares
20
22 October 2015 06:12 AM
blockhead
Subaru Parts
7
25 September 2015 08:33 AM
riiidaa
ScoobyNet General
1
12 September 2015 11:52 AM



Quick Reply: read txt from multiple html files???



All times are GMT +1. The time now is 02:54 PM.