Information about the TCP transcriptions will appear first and primarily on its home site, https://textcreationpartnership.org All of the files created by (or in a few cases received and modified by) the Text Creation Partnership are, as of 1 August 2020, free of all restrictions on use, re-use, modification, or distribution. Though distribution via dropbox is not a perfect or permanent solution, it will serve for the moment. You do not require an invitation to download; having the link is sufficient. Evans bulk files : https://umich.box.com/s/vael0mzdioctraixuglh ECCO bulk files: https://umich.box.com/s/7dc9b3b0f859a6b36bc2 EEBO bulk files: https://umich.box.com/s/f3mphvepm20akwloqna2 The raw transcripts for EEBO Phases 1 and 2, ECCO, and Evans are all available for bulk download as zipped files for those wishing to do text mining or similar projects. These are substantial file transfers and should not be undertaken casually! We are in the process of reorganizing these downloadable files, doing away with the old arrangement by date-of-release (since that no longer matters), and rearranging the files strictly by ID number. Public downloads are available from these Dropbox.com folders: Evans TCP bulk files. https://www.dropbox.com/sh/inyy253jvytoxcu/AAAT9nMPo0sa5aXEgs5SC5aGa?dl=0 ECCO TCP bulk files. https://www.dropbox.com/sh/inhwjphw682i2gf/AAC8NixNye8Gp0smYBTly2Y9a?dl=0 EEBO Phase 1 and 2 (eventually also Phase 3) bulk files. https://www.dropbox.com/sh/pfx619wnjdck2lj/AAAeQjd_dv29oPymNoKJWfEYa?dl=0 If for some reason, these links fail, a backup copy is mirrored on Box.com: Evans TCP bulk files. https://app.box.com/s/zj7pzfokxde4glrhebxsavbbxzyr3ogz ECCO TCP bulk files. https://app.box.com/s/6jbuf443i145f97c4t56z3garu3u2j09 EEBO Phase 1 and 2 (eventually also Phase 3) bulk files. https://app.box.com/s/jjzmnrx98dkvanipopz3nxkvymnjccht Most of the files are available in three forms: 1. TCP (P4) XML. This, the version that we generally recommend, uses UTF-8 character encoding and TEI bibliographic headers based on MARC catalog records. These files are converted from the SGML source, and their character inventory, converted from SGML SDATA character entities, consists mostly of corresponding single or composite Unicode characters, where those exist and are widely supported by fonts. In a few cases 'lookalike' characters are substituted. And in fewer still the SGML entities are converted instead to text strings within {curly braces}. The intent of this variety of character transformations is to supply a text that will be readily displayable, and therefore human-readable. The markup is slightly more sophisticated than that in the SGML, in that the character-based kludges "^" "_" and "~" have been removed in favor of actual markup (e.g. a SUP element in place of the "^" character). Though the TCP schema includes customizations, some of which anticipate developments in TEI P5, this is still better thought of as an essentially TEI P4 version, and its semantics should be readily recognizable. Unlike the SGML files, these XML files contain not only the transcription, but a basic TEI bibliographic header, derived almost entirely from a library catalogue record via a standard transformation of MARC to TEI via a MARCxml intermediary. AS they stand, the XML files invoke an extremely crude CSS stylesheet sufficient to allow them to be displayed, albeit in a rather garish way, in a modern web browser. The CSS is actually that used internally for diagnostic purposes,but should suffice to provide a minimally displayable text. Those wishing for a more sophisticated display are very welcome to do their own styling or transformation. 2. TCP (P3) SGML. The original SGML files, as produced by the TCP keyers and editors, also remain available. These files use 7-bit character encoding with named (‘mnemonic’) SDATA character entities and minimal headers (consisting mostly of ID numbers) but are otherwise very similar to TEI P3. The SGML files also use a few character-based markup kludges such as "^" to indicate that the following character is superscripted, "_" to indicate that the following character is a decorated or historiated initial, and "~" to indicate that the preceding character is topped by a horizontal stroke of some kind, most commonly a nasal suspension (not, technically, a macron, but usually displayed as such). Only users of tools that cannot accept multi-byte character encodings, or those that desire utter losslessness, are likely to want to look at this version. 3. P5 TEI XML. For those who use TEI-based tools or need compatibility with other TEI corpora, we also make available (thanks to Sebastian Rahtz, Lou Burnard, and James Cummings all at the time at Oxford) an XML version conformant to TEI P5, largely but not completely lossless relative to the underlying SGML. Such few losses as occur in the transformation are the result in part of SGML remnants in the TCP files (e.g. the native TCP files allow multiple values for the lang attribute, whereas the xml:lang attribute does not), and in part on divergence between the TCP and TEI schemas, especially with regard to milestones, the relatively loose treatment of attribute values in TCP, and some differences in the content models of figure, closer, and opener. This version features TEI headers (again, based for the most part on underlying MARC) and UTF-8 character encoding, solving the character issues by heavy use of the TEI element. The "Oxford P5" version is available for all of the Evans-TCP files, all of the released ECCO-TCP files (but not for a number of unedited files that are available here but were never formally released because of quality issues), all of the EEBO phase 1 files, and most of the EEBO phase 2 files (but not yet for those released by TCP after the creation of the P5 instance.) The native home for these files is on gitHub, but copies are provided here for convenience' sake. In the case of Evans, ECCO, and EEBO-1, these are simply 'snapshots' (downloads) of the gitHub texts; in the case of EEBO-2, the files were obtained from the Oxford Text Archive and have not yet been posted to a gitHub repository. Please let us know (tcp-info@umich.edu) if you have any questions about or problems with the files. For more about the project, its history, goals, policies, etc., visit https://textcreationpartnership.org/ To search the texts online, visit the TCP site at the Univ of Michigan Library: EEBO: https://quod.lib.umich.edu/e/eebogroup/ Evans: https://quod.lib.umich.edu/e/evans/ ECCO: https://quod.lib.umich.edu/e/ecco/ The EEBO texts are also available for searching, for those with access, on the JISC historical books portal (UK only) and the ProQuest EEBO site (subscribers only), https://www.proquest.com/eebo All of the TCP texts are available for searching and download at the Oxford Text Archive ( https://ota.bodleian.ox.ac.uk/repository/xmlui/ ), with the Oxford-generated P5 version as basis. File names and conventions The marked-up transcriptions use the TCP ID number as the basis for their filenames. Those in SGML have a simple .sgm extension. The P4 version interposes a 'p4' (*.p4.xml), and the P5 version a 'p5' (*.p5.xml). They are placed in folders based on the beginning of the filename, e.g. "N12345.sgm" will be in a folder named "N1". All files conform to the schemas referenced at their head. Files using TCP markup may be validated against either (SGML) eebo2prf.dtd or (XML) eebo2prf.xml.dtd. The SGML character entities are listed in charmap.ent. The map between SGML character entities and their targets in the XML derivative is contained in the file called 'charmap.sgm'. More human-readable versions of this can be found in either charmap.htm or TCP_chars.html (the latter in alphabetical order by entity). A very crude stylesheet allowing intelligible display of the P4 versions in a browser can be found in pfs.css -- but it is ugly. Files with *dat* in their filenames contain SGML-encoded metadata concerning the files, usually basic title information, ID numbers, and processing steps and dates, with one line per text (i.e., they are 'newline-delimited' files). Examples include 'eebodat?.sgm', 'evnsdat?.sgm' and 'eccodat?.sgm'. These contain not only the texts that were transcribed but all the texts from which selection was made. Subsets containing only 'done' or 'partly done' texts are also occasionally posted, e.g. 'evnsdat_unfinished.sgm', with names that are perhaps self-explanatory. The fields and format of these files is not always obvious. Anyone wishing to use them is welcome to ask for details. Any files with an *.mtxt extension contain MARC records, usually modified for TCP purposes, in MARCbreaker text format. Finally, a few derivative lists and hashes may be posted to the repositories occasionally, usually at user request.