Greetings,
I have taken from a former team and have written ETL jobs which process the CSV files Does. I use a combination of shell scripts and pearls on Ubuntu CSV files are very large; They arrive as zipped archives. Unzipped, many are more than 30Gb - yes, it's a live
Heritage process is a batch work that runs on the cronx which opens up completely every file, reads its first line and a config Creates a copy in the file, then loads the whole file. It takes several hours of processing time to not have any benefits in a few days.
Can you suggest a method to remove the first line (or the first few lines) from each file inside the ziped archive without completely unpacking archives?
The command line utility has a -p
option which is a dump file To get the standard out, it will not be the only trouble in the pipe and the removal of the whole file on the disk.
Alternatively:
My ($ position, $ buffer reef); My $ member = $ zip- & gt; Member nominee ('xyz.txt'); $ Member- & gt; Desired compilation method (COMPRESSION_STORED); $ Position = $ member- & gt; Rewind data (); Die "error status $" unless that position $ == AZ_OK; While (! $ Member- & gt; read ()) {($ buffer reef, $ position) = $ member- & gt; Read chad (); $ Status "error $ status" die! = AZ_OK & amp; Amp; $ Position! = AZ_STREAM_END; Do something with # $ buffer Reef: $$ buffer print; } $ Member- & gt; And read ();
Modify to suit, i.e. run again on the file list $ zip-> username ()
, and read only the first few lines.
Comments
Post a Comment