Summary: Changing text file record lengths

From: Pidgeon, Phillip <PPIDGEON_at_omc.otis.utc.com>
Date: Mon, 16 Sep 1996 23:26:00 -0700 (PDT)

As usual the people on this list deliver with lightening speed, thank very
much to all who replied:

BigRedDog ckrieger_at_latrade.COM
Brian Sherwood sherwood_at_esu.EDU
Woody Lee woody_at_gergu3.tamu.edu
David Hinz dhinz_at_dna406.dna.mci.COM
Ken Teh teh_at_sun0.phy.anl.GOV
Pat O'Brien pobrien_at_draco.harvard.EDU
Lucio Chiappetti LUCIO_at_IFCTR.MI.CNR.IT
Peter Beerli beerli_at_genetics.washington.edu
Serge Munhoven MUNHOVEN_at_OLIVE.MSM.ULG.AC.BE
Ruben Zelwer ruben_at_garnet.berkeley.edu
? rioux_at_ip6480nl.ce.utexas.EDU
Dr. Thomas P Blinn tpb_at_zk3.dec.com
Tom Webster webster_at_ssdgwy.mdc.com

My Original Question:

>Hi,
>I am looking for a unix script or utility that can reformat text files that

>don't have end of line characters (crlf), into fixed length records of a
>user defined length say 160chrs capable of running on Dec UNIX V3.2c.
>
>Any help would be appreciated, I have tried writing an awk script but the
>text file is longer than 3000 characters, which seems to be an upper record

>size limit. I have a tonnes of these files and not a lot of time to
reformat
>them.

Summary answers in order of receipt (there seems to be a lot of ways to do
it, and
while I have not tested all options I will keep them for reference)

1/ man split <-- this did exactly what I wanted ie split -w172 original.txt
> new.txt
2/ man fold
3/ man fmt
4/ man dd
5/ man col
6/ perl scripts
7/ c programs

The following are the complete responses received from:
 ----------------------------------------------------------------------------
 -----------------------------------------
1)
I think the command you are looking for is fold. I have never used it in
this context, but I think it will work. You might also check out the man
page on split. Good luck.

 -cliff
BigRedDog
ckrieger_at_latrade.COM
 ----------------------------------------------------------------------------
 -----------------------------------------
2) Try "man fmt"

Brian
Brian Sherwood
sherwood_at_esu.EDU
 ----------------------------------------------------------------------------
 -----------------------------------------
3) check out the man page for dd.

 --
 -=-=- woody_at_gergu3.tamu.edu -=|=- http://gergu3.tamu.edu/~woody -=-=-
Woody Lee, Research Associate | "That's the whole problem with science.
GERG - Texas A&M University | You've got a bunch of empiricists trying
727 Graham Road MS 3149 | to describe things of unimaginable
wonder."
College Station, TX 77845 | - Calvin (& Hobbes)
 -=-=-Phone: (409) 862-2323x122-=|=- FAX: (409) 862-1347 -=-=-
 ----------------------------------------------------------------------------
 -----------------------------------------
4) Here is a quick and dirty perl script that does what you need. The part
that does the real work is the "open, while (read...), print, close"
loop. The rest is just for getting the record length.

dave.


#! /usr/local/bin/perl

use Getopt::Std;

getopts('r:h');

if (defined $opt_h)
{
  &usage;
}

if (defined $opt_r)
{
  $byte_len = $opt_r
}
else
{
  &usage;
}

open(FP, "$ARGV[$#ARGV]") || die "Could not open $ARGV[$#ARGV]";
while (read(FP, $buf, $byte_len))
{
  print "$buf\n";
}
close(FP);


sub usage
{
  print "Usage: $0 [-h] -r <record length> <input-filename>\n";
        print "where -h Prints this message\n";
        print " -r <record length> in bytes\n";
        exit;
}
David Hinz
dhinz_at_dna406.dna.mci.COM
 ----------------------------------------------------------------------------
 -----------------------------------------
5) I suggest using dd.
Ken Teh
teh_at_sun0.phy.anl.GOV
 ----------------------------------------------------------------------------
 -----------------------------------------
6) Maybe the following C thingy helps:

snip it out (e.g. -> chop.c) and compile it with cc chop.c -o chop.
it should do what you expect (syntax chop columns < file > outfile)

/* snip here--------------------*/
#include <stdio.h>
#include <stdlib.h>
int main(long argc, char *argv[])
{
        long col=0, columns;
        char ch;
        if(argc!=2) {
                fprintf(stderr,"Syntax: chop columns < file\n");
                exit(-1);
        }
        if(argc==2){
                columns = atoi(argv[1]);
                while((ch=getc(stdin))!=EOF){
                        col++;
                        if(col % columns == 0){
                                putchar(ch);
                                putchar('\n');
                        }
                        else
                                putchar(ch);
                }
        }
        return 0;
}
/* snip here--------------------*/

cheers,
Peter

P.S.
I just wrote and tried it just once, so try it first before you throw
away your source files.

 -----------------------------------------------------------
Peter Beerli <beerli_at_genetics.washington.edu>
University of Washington, GENETICS, Box 357360, Seattle,
WA 98195-7360, USA. Work:(206) 543-8751,
Home:(206) 527-9906, Fax:(206) 543-0754; GMT+0800.
WWW://evolution.genetics.washington.edu/PBhtmls/beerli.html
 ----------------------------------------------------------------------------
 -----------------------------------------
7) I think fmt is what you want.

 -Pat

fmt(1)
                                                                fmt(1)

NAME
  fmt - Formats mail messages prior to sending

SYNOPSIS

  fmt [-width] file ...

DESCRIPTION

  The fmt command reads the input file or files, or standard input if no
  files are specified, and writes to standard output a version of the input
  with lines of a length as close as possible to width bytes.

pobrien_at_draco.harvard.EDU
Systems Administrator
Harvard-Smithsonian Center for Astrophysics
60 Garden Street
Cambridge, MA 02138
 ----------------------------------------------------------------------------
 -----------------------------------------
8) Have a look at man dd.
  Try playing with the conv option, I use something like

dd if=$t of=$f ibs=$b cbs=$r conv=unblock

# $b is a blocksize
# $r is a record length

This does not do *exactly* what you want, i.e. takes the input blocks,
splits them into records of $r, adds a linefeed, BUT strips trailing
blanks (the results is that records ending in blank will be shorter ; to
prevent this - which we use to decode tape data - we asked the supplier of
our data to put nulls at the end of each record.

 ----------------------------------------------------------------------------
       A member of G.ASS : Group for Astronomical Software Support
 ----------------------------------------------------------------------------
Lucio Chiappetti - IFCTR/CNR | Ma te' vugl' da' quost avis a ti' Orsign
via Bassini 15 - I-20133 Milano | Buttet rabios intant te se' pisnign
Internet: LUCIO_at_IFCTR.MI.CNR.IT | (Rabisch, II 46, 119-120)
 ----------------------------------------------------------------------------
For more info : http://www.ifctr.mi.cnr.it/~lucio/personal.html
 ----------------------------------------------------------------------------
9) Hi,

did you try :

fold -160

I hope there is no buffer limit for that one since it seems to do what you'd
like to.
Check the manpage fold(1) for details.

Good luck,

    Serge

 -

 -
 Serge Munhoven Internet:
MUNHOVEN_at_OLIVE.MSM.ULG.AC.BE
 Univ. of Liege, Department MSM (C2), Phone:
++32-4-3669337
 Quai Banning, 6, B-4000 LIEGE (Belgium) Fax:
++32-4-2530978
 -

 -
                              "Virtual reality has nothing on Calvin."
                               Susie (Calvin & Hobbes by Bill Watterson)
 ----------------------------------------------------------------------------
 -----------------------------------------
10) You might want to experiment with the col command.

 --Ruben
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Ruben Zelwer
University of California
Information Systems and Technology ruben_at_garnet.berkeley.edu
225 Evans Hall 510-642-5359
Berkeley, CA 94720-3804
 ----------------------------------------------------------------------------
 -----------------------------------------
11) Have you looked at "man dd"?
rioux_at_ip6480nl.ce.utexas.EDU
 ----------------------------------------------------------------------------
 -----------------------------------------
12) I believe the utility you need is "dd". Check the reference page.

Tom

 Dr. Thomas P. Blinn, UNIX Software Group, Digital Equipment Corporation
  110 Spit Brook Road, MS ZKO3-2/U20 Nashua, New Hampshire 03062-2698
   Technology Partnership Engineering Phone: (603) 881-0646
    Internet: tpb_at_zk3.dec.com Digital's Easynet: alpha::tpb

  Worry kills more people than work because more people worry than work.

     My favorite palindrome is: Satan, oscillate my metallic sonatas.
                                         -- Phil Agre, pagre_at_ucsd.edu

  Opinions expressed herein are my own, and do not necessarily represent
  those of my employer or anyone else, living or dead, real or imagined.
 ----------------------------------------------------------------------------
 -----------------------------------------
13) I'm pretty sure someone else has sent you this answer before me, but
just
in case....

I think Perl will take good care of you. One of Larry's design goals
with perl was to try to eliminate the semi-documented limits that a lot
of the standard UNIX commands have. The syntax is something like a cross
between C, BASIC, sed, and awk. (Easy to learn enough to get the job done,
deep enough to keep you interested for years.)

I can think about a couple of ways of doing it, depending on what resources
were available, ie memory vs diskspace (for temp files), and just plain
uglyness. :->

Perl 5.003 (the current version) compiles cleanly under 3.2d and 3.2g, so
hopefully it will be OK with 3.2c as well.

Tom
 --
+--------------------------------+------------------------------+
| Tom Webster | "Funny, I've never seen it |
| webster_at_kaiwan.com (home) | do THAT before...." |
| webster_at_ssdgwy.mdc.com (work) | - Any user support person |
+--------------------------------+------------------------------+
| finger -l webster_at_kaiwan.com to get my PGP Public Key. |
+---------------------------------------------------------------+
 ----------------------------------------------------------------------------
 -----------------------------------------

Regards,
Phil Pidgeon
OTIS Engineering Center
ppidgeon_at_omc.otis.utc.com
Received on Tue Sep 17 1996 - 06:02:07 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT