Manifest/description of some of the files on this site
Required (used) modules for most of these scripts/programs
MARC::Record and its associated modules.
Time::HiRes
MARC::BBMARC (available on this site--if the link doesn't work, try browsing in /bryanmodules/ for the latest version)
For MARC::Lintadditions, Business::ISBN and Business::ISSN (from CPAN) are required (used).
Some scripts also require List::Compare (from CPAN).
Some scripts also use my MARC::QBI::Misc module for file handling. This is not yet available on my site. I am considering moving the file handling code to MARC::BBMARC, but haven't gotten around to it yet.
Extensions on files are either .txt, .pl/.pm, or a combination of the two.
All are plain text, most are .pls (except BBMARC.pm, Lintadditions.pm, and Errorchecks.pm), and end with .txt so that they can be saved/downloaded.
File names may not match the names listed here, but they should be close.
Not all files are covered here, and the individual files have better descriptions.
See also the index.htm list of Changes.
/bryanmodules/*.*
MARC::BBMARC -- module containing various subroutines that I have found useful.
MARC::Lintadditions -- module containing additional check_XXX functions for MARC::Lint of the MARC::Record distribution (see http://marcpm.sourceforge.net/ for more information).
MARC::Errorchecks -- module containing additional MARC record validation checks not easily performed in MARC::Lint and MARC::Lintadditions.
MARC::Lint::CodeData -- module containing MARC Code List data for Languages, Countries, and Geographic Areas. Used by checks in Lintadditions and Errorchecks.
There are tar.gz files for each of these.
At same level as index.htm:
Template for reading records: marcreadingstarttemplate.txt
/fullrecscripts/*.*
Cleanup full recs
- 003cleanupscript.txt--Matches 001 with 003 and fixes mismatches. Reports errors for unmatched records.
- 007cleanupscript.txt--The 007cleanupscript produces cleaned records and records for manual inspection while validating 007 bytes.
- 010cleanupscript.txt--Fixes spacing in 010 subfield 'a'
- cleantrailingspaces.txt--Removes spaces at the end of fields greater than 010, skipping 016 fields as well.
Code list cleanup
- countrycodelistclean.txt--Used to create the DATA for country code validation. Requires the ASCII version of the MARC Code List for Countries.
- gaccleanupscript.txt--Used to create the DATA for geographic area code validation. Requires the ASCII version of the MARC Geographic Areas Codes.
- languagecodelistclean.txt--Used to create the DATA for geographic area code validation. Requires the ASCII version of the MARC Language Codes.
Counting
- comparemerge.txt (Compare merge)--Tells how many records would be updated and how many records are in a file (code based on mermarcfiles.pl and hasbeenupdated.pl (below)).
- countrecords.txt--Reports number of records in a file of MARC records.
- countrecsbytype.txt--Counts records and outputs counts by record type (nonbook (all but 'a', 'e', or 'o'), book, LCCIP upgrade, PCIP, and original/PCIP-upgrade.)
- errreptcount.txt--Takes the result of lintallchecks.pl, removes control number and title from start of line, outputs (2 files) with each error on a line with the count of occurances for that error.
- fieldsubfieldcounts.txt--Report totals for each tag and subfield for a file of records. Currently limited to field/tag count.
deleteSHandDDC.txt--Script for removing designated subject headings and Dewey numbers. Relies upon MARC::QBI::Misc for file handling and prompting.
EAN_ISBNconverter.txt--Reads from the command prompt EANs (13 digit ISBNs) and outputs to the screen ISBNs (10 digit).
Extraction
The field extraction scripts are essentially the same, with minor modifications.
- extractbycontrolno.txt--Given a file of control numbers, extracts records with those numbers from a file of MARC records.
- extractbycontrolnoignrspace.txt--Same as Extract by Control Number, but ignores leading or trailing spaces in the numbers.
- extractbyisbn.txt--Given a file of ISBNs (tab separated), and a file of MARC records, exports any MARC records with one of the matching ISBNs (020a or 020z). Also exports a separate file of non-matches (for additional searching or dealing with).
- extractbystockorisbn.txt--Similar to extractbyisbn.pl.txt, but takes file of stock numbers (for 037 field) and ISBNs, tab separated..
- extracterrorsfrommodules.txt--Outputs POD info, comments, and lines with an error or warning statement from a module/script.
- extractnonbookby008date.txt--Extracts all nonbook records (those without 'a' in LDR/06) between two dates, based on 008 creation dates (008/0-5).
- extractpcip.txt--Extracts records coded as CIP-level (LDR/17 eq '8')
- extractspecsubfield.txt--Extract Specific Subfield is based on Field Extraction and prompts for a subfield to extract after the field and indicator prompts.
- fieldextraction.txt--Generalized version, allowing the user to select the field number to extract, and desired indicators for that field.
- fieldextractionwithregex.txt--Preliminary version of a script to extract fields by keyword. This version requires the programmer to hard code the regex. Future versions should allow users to input search terms/regexes..
- fieldextraction3.txt--Field Extraction 3 is a more basic (earlier) version of the first, and was limited to fields with indicators and second indicator of '0'.
- fieldextractioncleanspaces.txt
- fieldextractionnocontrols.txt--Field Extraction No Controls differs from Field Extractiononly in that it doesn't output the control numbers it has stored for each extracted field.
findmultiplefields.txt--Scans through a file of records and outputs control number of records having multiple occurances of a specified field.
hasbeenupdated.txt (Has been updated?)--Compares two files, and outputs list of those that have been updated.
Linting
- lintallchecks.txt--Incorporates all checks in MARC::Lint, MARC::Lintadditions, and MARC::Errorchecks.
- lintcheck2.txt--Older version of lintallchecks.pl
- linttest.txt--Older version of lintallchecks.pl
- lintwithadditions.txt--Older version of lintallchecks.pl (added Lintadditions to Lint checking)
- lintwithadditionsselective.txt--Used to test individual MARC::Lintadditions methods (to turn checks on and off as needed).
mermarcfiles.txt (Merge MARC files)--Merges two files of MARC records into one, and removes any records with a control number matching one in a file of Deleted control numbers. The updated records are tacked on to the end of the base record file.
outputchangestogether.txt--Uses MARC::BBMARC::updated_record_hash() subroutine to match control number in updated file (1st file) with control number in base file (2nd file) and outputs the two as raw MARC, one right after the other.
printrecordasformatted.txt--Uses MARC::BBMARC's functions to output each field of a record in human readable form, with tabs separating each subfield, and @ as subfield indicator. This will need to be modified to work on other systems, as it uses MARC::QBI::Misc for file handling.
rawanddecodedscan.txt--Relies upon warnings generated in MARC::File::USMARC when decoding a record to take note of invalid indicators. Reports when indicators have been forced to blanks so that those records can be corrected without losing the indicators.
splitmarcfile.pl.txt--Splits a file of MARC records into multiple files, based on a specified (hard-coded presently) number of records.
Tests for Errorchecks
Most of these are previous versions/script versions of the subroutines in MARC::Errorchecks.
- 008checker.txt--Identifies records with bad bytes in the 008 (Likely to fail, since it refers to MARC::BBMARC::validate008 instead of MARC::Errorchecks::validate008).
- 008illvs300.txt
- 008matchvsotherfields.txt
- checkcipforstockno.txt
- Errorchecks.t.txt--Test script for the MARC::Errorchecks distribution (in progress).
- findemptysubfields.txt
- findlongrecords.txt
- findmultiperiodsafter010.txt
- findmultiplefields.txt
- findmultispacesafter010.txt
- findunderscoredollarinfield.txt
- ldrvalidatescript.txt
- pubdatecomparisons.txt
- testgetdate.txt
- testnewerrorchecks.txt--Allows individual subroutines to be turned on and off more easily than in lintallchecks
(using commenting/uncommenting of subroutine calls).
- viddvdvsvhs.txt
Tests for Lintadditions
Most of these are previous versions/script versions of the subroutines in MARC::Lintadditions.
- check022script.txt
- isbnvalidatescript.txt
- lintwithadditionsselective.txt
- validate007.t.txt
/cleanupscripts/*.*
Various scripts to clean output of fieldextraction.txt or other scripts.
See individual files for description of each.
Template for cleanupscripts: findregexinfieldextract.pl.txt
/inprocess/*.*
LCSHchangesparserpl.txt
Finds changed LCSH in the LCSH weekly lists. Outputs Tag [tab] Old heading [tab] New heading.
This is not yet working properly, but it does create a file of changed headings for each week, given a folder containing weekly lists. Actual file name will contain the version number.
It also compiles a single file of all changes in the specified folder's files.
/marc-marcmaker0.02/*.*
Preliminary working version of MARC::File::MARCMaker, a module based on MARC::File::USMARC and ::MicroLIF, using code from the MARC.pm module. Once done, the module will allow one to work with MARCMaker format files.
COPYRIGHT AND LICENSE
This software is free software and may be distributed under the same terms as Perl itself.
Copyright (C) 2003-2005
Bryan Baldus
eijabb@cpan.org
Last updated July 16, 2005.