Category: Unix or Linux


shell script used to extract zip file


exec 1>/tmp/dbProcess.${$}.log;
exec 2>/tmp/dbProcess.${$}.debug;

7z e VDPECMclaim_fy2004q1.zip;
7z e VDPECMclaim_fy2004q2.zip;

rm VDPECMclaim_fy2004q1.zip;
rm VDPECMclaim_fy2004q2.zip;

tail VDPECMclaim_fy2004q1.txt > out2004Q01.txt;
tail VDPECMclaim_fy2004q2.txt > out2004Q02.txt;

wc -l out2004Q01.txt;
wc -l out2004Q02.txt;

wc -l VDPECMclaim_fy2004q1.txt;
wc -l VDPECMclaim_fy2004q2.txt;

rm VDPECMclaim_fy2004q1.txt;
rm VDPECMclaim_fy2004q2.txt;

# second part
7z e VDPECMclaim_fy2004q3.zip;
7z e VDPECMclaim_fy2004q4.zip;

rm VDPECMclaim_fy2004q3.zip;
rm VDPECMclaim_fy2004q4.zip;

tail VDPECMclaim_fy2004q3.txt > out2004Q03.txt;
tail VDPECMclaim_fy2004q4.txt > out2004Q04.txt;

wc -l out2004Q03.txt;
wc -l out2004Q04.txt;

wc -l VDPECMclaim_fy2004q3.txt;
wc -l VDPECMclaim_fy2004q4.txt;

rm VDPECMclaim_fy2004q3.txt;
rm VDPECMclaim_fy2004q4.txt;

#third part
7z e VDPECMclaim_fy2005q1.zip;
7z e VDPECMclaim_fy2005q2.zip;

rm VDPECMclaim_fy2005q1.zip;
rm VDPECMclaim_fy2005q2.zip;

tail VDPECMclaim_fy2005q1.txt > out2005Q01.txt;
tail VDPECMclaim_fy2005q2.txt > out2005Q02.txt;

wc -l out2005Q01.txt;
wc -l out2005Q02.txt;

wc -l VDPECMclaim_fy2005q1.txt;
wc -l VDPECMclaim_fy2005q2.txt;

rm VDPECMclaim_fy2005q1.txt;
rm VDPECMclaim_fy2005q2.txt;


I am going to give a background on why I am writing about this error.  If you need to know how to fix it, go to the section that says, using p7zip.

The reason I began using a unix emulator (cygwin) was so that I can accomplish row counts on large data files.  The counts needed to be completed prior to being uploaded to a database.

The files I was given were large in nature (3-4 gigs on average).  They were provided to me in a DVD as compressed files.  Sometimes they were compressed as .zip, .gz and .rar files

Cygwin comes equipped with gunzip by default, at least I don’t remember having to install it.  So for the gzip files, I would write:

gunzip -d archive_name.gz

This will not only extract your file, but it will get rid of the gz file and only leave the archived contents.

For the initial zip files that I encountered, I used:

unzip zip_file_name.zip

This did the trick until I received the following error: skipping: file.txt  need PK compat. v4.5 (can do v2.1) where file.txt was the name of my file.

I know that a utility called PKzip existed so I thought that perhaps if I revert to my cygwin installer, I can look for the package download it and use it so that I can extract my .zip archive.  Well I looked for what seemed to be a while until I gave up.  I did not find a PKzip package for cygwin but I did find p7zip


To get p7zip under cygwin

  1. click on your original setup.exe file for cygwin. 
  2. Follow through the wizard the same way that  you did the first time. 
  3. Then when you get to the "Select Packages" section, expand the Archive software packages. 
  4. Select the one that says p7zip
  5. Complete the wizard.
  6.  

Using p7zip

If you skipped to this section, it is because you probably received an error ( skipping: file.txt  need PK compat. v4.5 (can do v2.1) )when extracting a zip file with unix unzip command. 

instead of using:

unzip zip_file_name.zip

use your p7zip utility to extract the contents.  Type the following to extract the archived file.

7z -e zip_file_name.zip

A question that you might of had when looking for this solution is, does p7zip handle pkzip archives?  The answer is yes!  Fortunately it does.  This allowed me script all my steps.  Below is a sample shell script that I wrote to execute what I would have otherwise done manually.

#Begin of shell script
#This shell script will
#    1.) Unzip the contents of the zip archive
#    2.) Remove the zip archive after the contents are extracted
#    3.) Count the number of lines in the archive (usually a text file).  You should know the text file's name that will be extracted.
#    4.) Delete the file after the rows have been counted.

7z e zip_file_name.zip;

rm zip_file_name.zip;

wc -l contents_inside_zip_file.txt;

rm contents_inside_zip_file.txt;

#end of the program
# if you don't know the name of the file to be extracted
# or you will be extracting many files from that .zip file
# replace the last two lines with
# wc -l . *
# this will count all the lines in all the files

This is a blog post so post questions if you have any.


As of this writing, I am pretty new to the linux/unix environment.  At some point in college I had  installed Mandrake Linux on one of my computers.  The most of I ever did on that was spoff text e-mails (to friends and family) and maybe use open office.  But since then I had not really messed with any type of linux variant of any kind, until last Friday. 

A former co-worker of mine recommended that I check out cygwin if I wanted to get comfortable with a unix enviroment.  Furthermore the awk and sed functions in unix provide a good deal of commands that allow you to work with text files (without having to write utilities in C# or any scripting language). 

So, I switched jobs and it is my first opportunity to give the unix environment and it’s features a test drive via cygwin.  Installing it was fairly straightforward.  Opening it is easy with the batch file they provide.

Running basic commands is much like any linux/unix terminal, with the exception of one.  The "cls" or "clear" command.

if I type that, nothing will happen.  So here is what you do,

on your screen type: alias clear=’cmd /c cls’

Once you type that, your cursor will go to a new line.

Now anytime you want to clear your screen, just type clear.  This will clear the entire contents of the dos prompt that you are using for cygwin.   If you don’t want to clear the entire contents of your screen and prefer to just scroll down to the most recent line in the command prompt with empty space below it, hit [CTRL] + [L] at the same time (that’s control + l).