A. SELECTION.

These represent three very crudely 'discovered' collections,
as follows.

1. DRAMA

Strategy:

Find Martin Mueller's list. Found it.

Grep TCP XML files (phase 1) for Dramatis Personae
  This adds 55 to Martin's list, but many of them
  are not properly drama, or are in Latin. Add
  them anyway.

Grep for items with more than 200 <SP> tags.
  This would add more than 300 to Martin's list,
  but most of them are not plays (they are disputations,
  etc.), and I have therefore not included them.
  If you want them added, I still have the list
  and can pull them easily enough, or you can.

  Could also add other keyword/title searches e.g.
  "acted by" "acted at" comedy" "masque" "tragedy" 
  I've   made a list of those too and could easily add
  them if wanted. BUt most of them are not
  drama.
  


2. ALCHEMY


Search stage 12 MARC for subject headings
'alchemy' and 'chemistry'

Grep titles for ???

Grep full text for ??? antimony sal armoniac


3. NEWSBOOKS

a. Find Coranto (Dahl) list.

  It's at C:\tcp\EEBO\mdata\dahl
  Extract list of TCP IDs
  Turn it into a perl script to pull corresponding lines from eebodat
  Remove lines that belong to Phase 1
  
b. Find EEBOdat lines for the few Thomason serials
Search EEBOdat for a few characteristic title phrases
  (newes from, nevves from, true account, wonderful?)
Remove anything not in Phase 1
Remove anything bigger than 29 images long

c. Combine a and b. This makes a list of merely 178 items,
not all of which are actually newsbooks even in the broadest
sense.



B. Documentation.

I have included in this zip file the corresponding extracts
from EEBOdat.


C. Getting the actual files.

I haved included only three batch files (assuming that
they will be run on Windows -- but they should run 
equally well on a Unix box or, for all I know, 
a Mac), rather than the files themselves. The batch files
consist of a series of "wget" commands that should
pull the raw P5 xml files from their homes on gitHub.

If you prefer to use the TCP xml files, I will have to do
that separately and pull the relevant files from their 
homes on the linux servers here. To do that I'll need to
know not only the TCP ID number but also the delivery
batch date (e.g. 200812 or whatever -- available from the
EEBOdat snippets mentioned under B. above.)


If you don't have wget on your machine, you should do.
It's free, useful, and requires no installation.

(If it's on your machine but not on the default path,
either change the path or add the full bath to each line
in the batch file.)

The batch file should be run in whatever folder you
want to store (at least temporarily) the transcription
xml files.

pfs

