Commit Graph

286 Commits

Author SHA1 Message Date
Simon Kornblith
a5ff752509 -add blackwell synergy translator
-add DOI support to RIS translator
2006-12-16 04:41:08 +00:00
Simon Kornblith
62255639ee - added Ovid translator
- fixed bug with scraping multiple items from within scaffold
2006-12-16 03:29:55 +00:00
Simon Kornblith
17d7b9fe88 - support browse mode in ScienceDirect 2006-12-15 23:52:33 +00:00
Simon Kornblith
baa21e1a6f - added ScienceDirect translator
- fixed a bug in scaffold involving debug logging
2006-12-15 22:46:27 +00:00
Simon Kornblith
875ceea852 closes #449, use library domain in repository field 2006-12-15 20:25:25 +00:00
Simon Kornblith
9192478dd8 addresses #427, add abstract support to translators
adds abstract support to JSTOR and ProQuest translators
adds abstract support to RIS translator
uses dcterms "abstract" property in Zotero RDF for abstract export
2006-12-15 19:47:21 +00:00
Simon Kornblith
3982e1aabf - changed Zotero.Utilities.debug() to Zotero.debug(), for consistency and for the upcoming in-browser translator development tool
- various other preparations
2006-12-15 08:54:31 +00:00
Simon Kornblith
91492910ad addresses #427, Add abstract support to translators.
-translators may now attach abstracts simply by assigning a value to item.abstractNote
-added abstract support to PubMed translator
2006-12-14 22:58:00 +00:00
Simon Kornblith
31535b6d1d oops, need to commit this too. 2006-12-13 05:57:02 +00:00
Simon Kornblith
c6d4cdd57b - closes #391, Second export to same location with attached files fails
- removes extraneous debug code from Zotero RDF export
2006-12-13 05:23:39 +00:00
Simon Kornblith
857f0a907c closes #369, scrapers should store Repository field. the label is automatically used as the repository field, unless a translator explicitly sets the item's repository property to a value. if a translator sets the item's repository property to "false," no value is stored. 2006-12-13 05:05:03 +00:00
Simon Kornblith
6c2c33fc6d - closes #391, second export to same location with attached files fails (I think)
- improves RDF error handling
2006-12-13 03:37:58 +00:00
Simon Kornblith
986fea0b03 -closes #365, update import/export/bibliography to handle new item types. any fields I couldn't find in an existing RDF ontology use the Zotero namespace. we still have to decide if primary creators should be mapped to "author" in the RDF, and translated back out later (currently they aren't).
-adds Zotero.Utilities.getCreatorsForType to Zotero utilities
-makes CiteBase search translator error more gracefully
2006-12-13 03:18:57 +00:00
Simon Kornblith
6e84f20de3 closes #428, line endings missing in imported RIS notes 2006-12-12 17:05:29 +00:00
Simon Kornblith
c5ec016ed9 - closes #327, scrapers should either take snapshots or use URL field
- closes #351, scrapers with PDF downloads should use downloadAssociatedFiles instead of automaticSnapshots

there are some problems with snapshot titles. see bug #436.
2006-12-12 00:28:49 +00:00
Simon Kornblith
0c2ee5d449 closes #406, Incorrect "et al" handling for APA style 2006-12-11 20:59:40 +00:00
Simon Kornblith
6c80c879da - closes #407, error in EBSCOhost translator
- closes #430, Amazon translator causing utilities.js to throw exception
- officially deprecated Zotero.Utilities.getNodeString() (use doc.evaluate and nodeValue or textContent instead, or access attributes directly; these options take the nearly the same amount of code, should be faster, and don't unnecessarily bloat our utilities)
- updated word integration to the latest version
2006-12-11 20:54:22 +00:00
Dan Stillman
58466ef656 BibTeX patch from Patrick Wagstrom on dev list
His note:

1. adds the conference paper item type (currently only exported to BibTeX as inproceedings)
2. Fixes bug with editor names in BibTeX export
3. Provides more intelligent naming for entities in BibTeX exports.  Previously items would be named something like Wagstrom2006, Wagstrom2006-1, etc.  However, I noticed that this ordering could get changed around pretty easily in the export process, resulting in bad references in articles.  We can't really be having that now can we?  The keys are now take the first word of the title, stripping out a few common words.  For example, If I had a paper called "Zoteros impact on time to author scholarly papers", it would have a key of "wagstrom_zotero_2006", which is much more constant. 


There was still an editor field bug after Patrick's patch that I corrected, and author and editor fields seem to be handled properly now.


Also addresses #384, option to prevent escaping of curly brackets in BibTeX output

I believe this patch actually now prevents escaping of curly braces by default, however (according to Simon) it should still be based on a pref or option of some kind
2006-12-09 23:09:14 +00:00
Sean Takats
507efb4758 Replaces SIRSI -2003 and SIRSI 2003+ translators with single SIRSI translator. Handles WebCat, iLink, and iBistro interfaces. Ready for Emory test XPI but needs some refinement to handle other library view preferences as noted in #381 2006-12-07 04:22:57 +00:00
Sean Takats
e0d955afba closes #373 page numbers not captured in PubMed/HubMed 2006-11-29 16:42:09 +00:00
Dan Stillman
9a7c18ed5e Had to repush some scrapers, since apparently phpMyAdmin on the server replaces \n with \r\n on edits, pushing some over the b2.r2 limit 2006-11-27 22:58:36 +00:00
Sean Takats
e391844acd closes #387 year in date field is truncated. 2006-11-27 14:15:27 +00:00
Dan Stillman
f545e6a884 Setting minVersion for Google Scholar and Embedded RDF to 1.0.0b3.r1 2006-11-26 23:53:47 +00:00
Dan Stillman
24ae82b07f Aleph/arXiv/CrossRef/CiteBase pushed to repo 2006-11-26 23:50:58 +00:00
Dan Stillman
361a1e4bc6 Add minVersion/maxVersion to translators schema and schema update mechanisms (local and remote) -- these aren't really necessary on the client but let us use the same SQL to update the repo, and we probably should include them in error reports (instead of relying on different timestamps to differentiate versions)
Added minVersion and maxVersion times to existing scrapers, setting 1.0.0b3.r1 as minVersion for any >4096 characters; these could theoretically now be added back to the repository without problems, but there's not really much reason to test that theory at the moment
2006-11-26 09:19:07 +00:00
Dan Stillman
c8cecf4b7e Pushed updated NYT and Google Books translators to repo
Refs #409, Google Books translator broken after site update
Refs #380, Archived New York Times articles accessed via TimesSelect aren't detected
2006-11-25 19:59:45 +00:00
Sean Takats
88d8f19ece closes #409, google books translator broken after site update 2006-11-25 19:22:33 +00:00
Sean Takats
fc2be5bf21 closes #380 by updating translator regex to run against select.times.com. note that the example article in #380 still will not display the zotero icon or scrape, since that article does not contain the standard meta tags that we use to scrape nytimes articles. other timesselect content now does scrape, however. 2006-11-25 04:07:50 +00:00
Simon Kornblith
38531da9fa closes #396, accents are lost when scraping multiple items (with InnoPAC) 2006-11-25 03:41:13 +00:00
Simon Kornblith
5caf0d2803 made arXiv/eprintweb translator work with lists of recent articles, etc. 2006-11-25 03:16:33 +00:00
Simon Kornblith
e201c3b580 made arXiv translator work with eprintweb as well 2006-11-25 02:53:38 +00:00
Simon Kornblith
05b3cd8566 - added arXiv.org translator
- added CiteBase OpenURL search translator (although CiteBase COinS still won't work, because you can't look most of them up with the CiteBase resolver; ugh)
- fixed Amazon translator type ID (12 -> 4)
2006-11-25 02:13:17 +00:00
Simon Kornblith
94302bbe1c closes #403, Aleph translator not working
i modified the XPath the Aleph translator uses to something that should work in nearly every case.
2006-11-25 00:01:24 +00:00
Sean Takats
6ff2168729 Amazon scraper now supports international Amazon sites and retrieves data from Amazon's API 2006-11-21 21:56:13 +00:00
Simon Kornblith
445ff98277 - made doGet handle multiple urls, with processor/done style interface (as in processDocuments). this should be backwards compatible
- beginnings of mapping for new item types
- fixes for Word integration (because i was using it to write a paper)
2006-11-21 07:14:27 +00:00
Simon Kornblith
a1269146b7 - fixed XML issues with PubMed scraper (although probably not the issue that everyone seems to be experiencing)
- unfinished support for new item types
2006-11-02 00:33:50 +00:00
Dan Stillman
7a3be3e306 Updating SIRSI scraper to last time from repo
(The current repo system is a bit flawed in that translators need to be inserted with CURRENT_TIMESTAMP but scrapers.sql can't be, so scrapers.sql needs to be updated with the repo timestamp after the fact to prevent new installs from unnecessarily grabbing the changed scrapers (or they need to be post-dated to a timestamp after the UTC time of their repository insert but preferably not by more than 24 hours). Suffice it to say, we'll have a more automated solution for this in the future.)
2006-10-25 19:07:11 +00:00
Sean Takats
48659542d3 Updated SIRSI translator to handle author field (not just personal author). 2006-10-25 17:53:17 +00:00
Simon Kornblith
666831748e closes #358, APA style doesn't properly handle references with editors and no authors
closes #348, OpenURL should use only relevant parts of dates
closes #354, Error saving History Cooperative article
closes #356, Embedded Dublin Core scraper incorrectly saves web pages as item type "book"
closes #355, PubMed translator problem
closes #368, RIS/Endnote export hijack doesn't go into active collection
fixes an issue with quotation marks in bibliographies exported as RTF
fixes an issue with bibliographies and non-English locales
2006-10-23 07:34:34 +00:00
Dan Stillman
fab65f743c Eek--bump the scraper version after clearing the tables for upgraders 2006-10-06 15:26:04 +00:00
Dan Stillman
7712a24434 Moved translators and CSL CREATE TABLE statements to userdata.sql, since those are the two tables that we actually _want_ users to modify (without them being wiped on every update) 2006-10-05 23:50:29 +00:00
Dan Stillman
73149b86c7 Add ECL license block to scrapers.sql 2006-10-05 17:29:03 +00:00
Simon Kornblith
cbe7c086e1 closes #336, Some metadata fields are not exported with notes and attachments
closes #165, verify import/export can carry all data for all fields and item types
closes #168, make sure MODS import works with files from external sources
2006-10-05 08:45:44 +00:00
Dan Stillman
cd26267afe Closes #340, Change isInstitution to fieldMode everywhere
Including in the DB, which it turns out isn't really all that bad (thanks, among other things, to SQLite's ability to DROP tables within transactions without autocommitting (which MySQL can't do))
2006-10-05 00:59:26 +00:00
Simon Kornblith
92620afa52 fix a couple of rather inconsequential small bugs 2006-10-04 00:31:29 +00:00
Simon Kornblith
ac50ab16a2 Scholar -> Zotero (thanks Dan S.) 2006-10-04 00:10:35 +00:00
Simon Kornblith
56e77619c4 closes #334, Washington Post scraper shouldn't include " - washingtonpost.com" in title
closes #313, Blacklist known ad sites from scraper detection
closes #306, some New York Times ads prevent page from being recognized
closes #308, attachment import bug

currently, the ad site blacklist is located at the top of ingester/browser.js. at some point, we may want to switch this to a database table.
2006-10-03 22:13:49 +00:00
Simon Kornblith
96ccf85aba - improve CSL
- tag institutional authors appropriately
2006-10-03 21:08:02 +00:00
Dan Stillman
1cd51be497 Sorry, it was now or never, and now is better:
Changed "Scholar" to "Zotero", everywhere

Apologies to anyone with working copy changes, but there are probably the fewer at this moment than there will be again.

Hopefully this won't break anything, though existing prefs will be lost. I avoided scholar.google.com--if you know any other legitimate "scholar"s in the code, be sure to fix them once I'm done here.

This is a multi-commit change--there's at least one more coming. *Do not update to this version! It won't work!*
2006-10-02 23:15:27 +00:00
Dan Stillman
eccc2159c1 Oops--CSL table needs to be defined in scrapers.sql too.
(The problem with the current system is that any local translators or styles will be wiped out on upgrades (though not auto-updates), but the solution for that is probably to just offer an SQL file that the user can put custom SQL statements in to be run on upgrades (sorta the same idea as user.js in Firefox). Will deal with that at a later date, though.)
2006-10-02 21:25:47 +00:00
Dan Stillman
508b35f6d1 1) By "Scrapers don't save metadata properly" in my last commit, I meant only URL and accessDate, though on second thought they probably will work.
1b) However, I also did, in fact, break scraping completely, so my previous statement was actually correct. Fix for that coming right up.

2) Fixed problem with translators table getting wiped out completely whenever system.sql was updated (from r671, I believe). Right. Moved the DROP and CREATE statements for translators into translators.sql.
2006-10-02 01:07:56 +00:00
Dan Stillman
b684e97366 Closes #252, Metadata not displaying for page snapshots
Closes #304, change references to "website" to "web page"

More changes as per discussions with Dan:

- Linked URLs have been given a second chance at life, though they still shouldn't be used for (most, if any) scrapers (which should use snapshots or the URL field instead)
- Renamed the "website" item type to "webpage"
- Removed "web page" from the New Item menu
- Added Save Link To Current Page toolbar button
- Added toolbar separator between New Item buttons and link/attachment/note to differentiate
- Added limited metadata (URL and accessDate) for attachments
- URL for attachments now stored in itemData (itemAttachments.originalPath is no longer used, but I'm probably not gonna worry about it and just wait for SQLite to support dropping columns with ALTER TABLE) -- getURL() removed in favor of getField('url')
- Snapshots now say "View Snapshot"
- Added Show File button to file attachments to show in filesystem
- Added timed note field to attachments for single notes and adjusted Item.updateNote(), etc. to work with attachments
- Fixed bug with manually bound params in fulltext indexer and Item.save() (execute() vs. executeStep()) -- any recently added items probably aren't in the fulltext index because of this


Known bugs/issues:

- Attachment metadata and notes probably aren't properly imported/exported now (and accessDate definitely isn't)
- Scrapers don't save metadata properly
- Attachment title should be editable
- File attachments could probably use some more metadata (#275, more or less, though they won't be getting tabs)
2006-10-02 00:00:50 +00:00
Simon Kornblith
7c3e054ebc addresses #301, COinS bugs/enhancements; remaining issue blocked by #3 (add as many item types as possible) 2006-09-11 22:34:39 +00:00
Simon Kornblith
3dfca25879 - closes #277, disambiguation and notifier updates for Word integration
- closes #217, ability to exclude notes/attachments from select items window
- closes #244, ability to quick search from select items window
- fixes a bug with footnotes in Word integration
- fixes a bug in InnoPAC translator where items would sometimes appear twice
2006-09-10 17:38:17 +00:00
Simon Kornblith
d5bc6cbe4b - fixes a bug in capitalizeTitle
- better feedback for search translator errors
2006-09-09 22:45:03 +00:00
Simon Kornblith
14c5c40a50 - closes #279, Refer/EndNote translator
- fixes a bug in text handling that was previously masked by another
2006-09-09 22:00:04 +00:00
Simon Kornblith
67f6ae3ed2 - closes #69, notification system for broken scrapers
- don't put "Page" before page in WaPo scraper
2006-09-09 19:47:47 +00:00
Simon Kornblith
d4576d3d55 addresses #69, notification system for broken scrapers
thanks to Dan for his help on the repository side of things
2006-09-09 00:12:09 +00:00
Simon Kornblith
60422e032e - closes #261, work around content-disposition: attachment on endnote links. this workaround is far from the most elegant, but it seemed nicer than writing a stream converter component that didn't really convert streams
- fixes bugs in RIS import
2006-09-08 22:26:59 +00:00
Simon Kornblith
7b7d3d85e3 - added Washington Post translator
- translation works properly even when a user has switched to a different page
2006-09-08 05:47:47 +00:00
Simon Kornblith
b8ddba3a67 CiteSeer translator 2006-09-08 01:59:22 +00:00
Simon Kornblith
5028880d38 closes #280, BibTeX translator
- fixes date bugs
- fixes (again) an issue that would cause the "unresponsive script" dialog to appear when importing or exporting
2006-09-07 22:10:26 +00:00
Simon Kornblith
cf8dc232b1 - new translators: New York Review of Books, Chronicle of Higher Education
- more useful errors in utilities
- fixes minor bugs in citation styling
2006-09-07 01:23:13 +00:00
Simon Kornblith
89cf0c7235 closes #276, fix RIS bugs
- import translators no longer fail when trying to import an item with no name
- the T2/BT field becomes the publication title when no JO/JF field is available (fixes newspaper issues)
- Y2 is now treated as part of the date if and only if it is improperly formatted (seriously, why can't Thomson get their own specs straight?)
- work around EndNote's strange behavior of putting article titles into notes for no apparent reason
- RIS export gives dates as per specification
- fixed a bug that could have (potentially) caused problems formatting "January"
- allow translators to access strToDate function
2006-09-06 04:45:19 +00:00
Simon Kornblith
b3bb6b9013 remove unnecessary debug code 2006-09-05 07:59:25 +00:00
Simon Kornblith
045780d9ac closes #250, figure out proper text encodings for import/export
MODS uses the encoding as specified in the <?xml tag, or else UTF-8
RIS uses IBM850, since the spec says "IBM Extended Character Set" and it's the only code page Mozilla supports. (should I do this? or just use unicode?)
MARC uses UTF-8, since I don't think there's any way to get full MARC-8 support, and UTF-8 is now the preferred encoding anyway
2006-09-05 07:51:55 +00:00
Simon Kornblith
cec35d7566 closes #272, problems with Library of Congress ingest 2006-09-05 03:06:22 +00:00
Simon Kornblith
e0f6f023d8 various fixes to citation formatting (mostly Chicago Manual of Style) 2006-09-05 01:09:04 +00:00
Simon Kornblith
7d93903e2d closes #239, fix embedded RDF translator 2006-09-04 21:43:23 +00:00
Simon Kornblith
370fe48388 - remove extraneous debug code
- update scrapers.sql version (do not put into the repository)
2006-09-04 20:21:38 +00:00
Simon Kornblith
aa6e2cfab1 closes #264, UMich lib catalog doesn't work on Windows; other issues related to Mirlyn
positions "saving item" window in a slightly better place on Windows

the UMich bug was actually bigger than I though. as it turns out, the HiddenDOMWindow in Windows is not a chrome window, so i had to modify createHiddenBrowser() to attach the hidden browser object to an existing browser window. i don't believe this should have any adverse effects for snapshots, etc., but Dan, correct me if i'm wrong. it would be nice to be able to create a real chrome instance instead of a XUL element, but all of my attempts at doing so have failed.
2006-09-04 20:19:38 +00:00
Simon Kornblith
2b0bebe7a4 closes #258, MARC translator should capitalize titles 2006-09-04 18:16:50 +00:00
Simon Kornblith
e5404f4938 closes #269, For some COinS pages "could not save item" error 2006-09-04 17:37:07 +00:00
Simon Kornblith
0ab9e8b36c references #268, occasional problems with ingest of pages with multiple references
i've fixed the Amazon.com bug (i think) and made the translator show a "Could Not Save Item" prompt rather than show an empty list, but if you see any other pages where this happens, let me know
2006-09-04 17:09:44 +00:00
Simon Kornblith
ed6650c4e7 closes #218, Windows support for Word integration. this solution seems to work with both Word 2003 and Word 2007. i have not tested with earlier versions. Zotero.dot is the Windows verison; Zotero.dot.dmg is the Mac version. the only difference is the function call used to perform SOAP requests.
to get this to work right, you'll need the SOAP toolkit from http://www.microsoft.com/downloads/details.aspx?FamilyID=ba611554-5943-444c-b53c-c0a450b7013c&DisplayLang=en
I may replace the SOAP object with a simple XMLHTTP object, since that page says that the SOAP toolkit is deprecated.
2006-09-04 08:06:04 +00:00
Simon Kornblith
10f4b28c63 closes #214, add footnote support to word integration
closes #215, allow user to select desired citation style and change citation styles on the fly
2006-09-04 04:13:12 +00:00
Simon Kornblith
59a1628e5b fixes #254, NY Times scraper fails (thanks Sean) 2006-09-01 02:45:31 +00:00
Simon Kornblith
6f885c9cb0 make Amazon.com translator work on book pages linked from other book pages 2006-08-31 22:36:05 +00:00
Simon Kornblith
438ff82955 - replace storage streams with plain old strings for translate IO. there's not much of a reason to use storage streams now, and it was screwing up non-ASCII characters.
- make EBSCO scraper work better through a proxy
- shorten Accession Number -> Accession No, Journal Abbreviation -> Journal Abbr, Publication Title -> Publication. it does look a bit stranger, but it also makes the interface more functional (especially for those of us without giant widescreen LCDs ;-)
2006-08-31 07:45:03 +00:00
Simon Kornblith
4b756d700b fixed an issue that could prevent MARC fields below 100 (ISBN and call number) from appearing in records 2006-08-31 05:21:41 +00:00
Simon Kornblith
1c8e3fcb02 closes #239, fix embedded RDF translator
modifies scrapers to use dates in the format that comes out of the page, rather than converting to SQL
adds Scholar.Date.formatDate() to provide a pretty representation of dates
2006-08-31 00:04:11 +00:00
Simon Kornblith
0cd3021cf3 closes #241, improved date handling
- Scholar.strToDate() accepts a string date and returns an object containing year, month, day, and part
- capture access date whenever URL is captured
- updated Zotero.dot to use new namespaces
2006-08-30 21:56:52 +00:00
Simon Kornblith
27617ee152 closes #236, Export windows should offer a default filename with extension
closes #238, present dialog if no import translator is available for a file
closes #240, change XML namespaces
2006-08-30 19:57:23 +00:00
Simon Kornblith
a75c5df70c - add MLA style
- put text-only version of bibliography on the clipboard, in addition to HTML version (Windows-only).
2006-08-30 06:12:26 +00:00
Simon Kornblith
1c21bddbfc - modifications to citation engine's handling of localized strings
- added the missing integrationDocPrefs.xul file
2006-08-30 04:00:19 +00:00
Simon Kornblith
68c480b7b5 - closes #232, University of Michigan library site does not work
- improved handling of scraper errors (hopefully, the hanging should be gone)
2006-08-30 01:41:51 +00:00
Simon Kornblith
d8171f775c closes #223, citing the same item multiple times should produce only one bibliography entry 2006-08-29 17:29:35 +00:00
Simon Kornblith
0c24beee3f closes #213, add in-text citation support to citation engine
fixes date and et al. handling bugs in citation engine
permits citation of multiple items in Word integration
2006-08-29 04:24:11 +00:00
Simon Kornblith
7385ba4df2 ABC-CLIO fixes 2006-08-26 21:57:02 +00:00
Simon Kornblith
d3fc9866b9 - add ABC-CLIO (America: History and Life) translator
- fix a potential issue with COinS support
2006-08-26 21:36:49 +00:00
Simon Kornblith
a457cdb493 added New York Times translator 2006-08-26 07:27:02 +00:00
Simon Kornblith
ddb754839c make google scholar translator turn on export automatically 2006-08-26 06:04:29 +00:00
Simon Kornblith
72da6e412e add Google Scholar translator 2006-08-26 05:51:41 +00:00
Simon Kornblith
53aae7751c support FirstSearch databases besides WorldCat 2006-08-26 04:59:30 +00:00
Simon Kornblith
f07cb5a5bc adds an InfoTrac OneFile translator
fixes a bug in ingester progress window handling
2006-08-26 03:50:15 +00:00
Simon Kornblith
0e63958f96 - make proquest work better behind proxies
- improved frame support
2006-08-24 18:00:48 +00:00
Simon Kornblith
04d05548b2 closes #103, figure out how to store captured pages in native export format
fixes ampersands in citation COinS
fixes tags and seeAlso in import/export (should now work for all items)
2006-08-20 04:35:04 +00:00
Simon Kornblith
a55b035761 update scrapers.sql version (oops) 2006-08-19 23:15:38 +00:00
Simon Kornblith
94bd2415da adds short roles to CSL (Ed. instead of Editor)
adds COinS to exported HTML
uses real lists in HTML output
fixes other small citation style issues
2006-08-19 23:14:27 +00:00
Simon Kornblith
26668a6e73 closes #194, EBSCO translator
closes #160, cache regular expressions
closes #188, rewrite MARC handling functions

MARC-based translators should now produce item types besides "book." right now, artwork, film, and manuscript are available. MARC also has codes for various types of audio (speech, music, etc.) and maps.
the EBSCO translator does not yet produce attachments. i sent them an email because their RIS export is invalid (the URLs come after the "end of record" field) and i'm waiting to see if they'll fix it before i try to fix it myself.
the EBSCO translator is unfortunately a bit slow, because it has to make 5 requests in order to get RIS export. the alternative (scraping individual item pages) would be even slower.
regular expression caching can be turned off by disabling extensions.scholar.cacheTranslatorData in about:config. if you leave it on, you'll have to restart Firefox after updating translators.
2006-08-19 18:58:09 +00:00
Simon Kornblith
20486d5053 addresses #103, figure out how to store captured pages in native export format
import/export of file data should work for all file types _except_ snapshots (in this situation, export is working, but import is not yet complete; see #193)
also, fixes a potential security issue that could have allowed malicious web translators to post local data to remote sites (although, given we maintain the central repository and there's no easy way to install a translator, the risk would have been minimal to begin with).
2006-08-18 05:58:14 +00:00
Simon Kornblith
10ba568ee8 closes #39, auto-ingest of associated files (as recognizable)
closes #3, Overflow metadata dumps into "extra" field

add "extra" data where such data is useful and conveniently accessible (not available for XML-based export or MARC formats yet)
add links to permanent URLs
download associated files from full text sources (if extensions.scholar.downloadAssociatedFiles preference is enabled)
fix WorldCat translator
improve InnoPAC translator (it now works on Georgetown search results pages, albeit slowly, because it must first realize the catalog is misconfigured)
tag items from SIRSI and WorldCat
return to putting the full lengths of books into "pages," because some citation styles require it
fix COinS (broken a few revisions ago)
2006-08-17 07:56:01 +00:00
Simon Kornblith
51108446e3 closes #187, make berkeley's library work
closes #186, stop translators from hanging

when a document loads inside a frameset, we now check whether we can scrape each individual frame.
all functions involving tabs have been vastly simplified, because in the process of figuring this out, i discovered Firefox 2's new tab events.
if a translator throws an exception inside loadDocument(), doGet(), doPost(), or processDocuments(), a translate error message will appear, and the translator will not hang
2006-08-15 19:46:42 +00:00
Simon Kornblith
dac5bbb3f3 closes #183, export bibliography to RTF 2006-08-14 21:54:45 +00:00
Simon Kornblith
feff0aa531 closes #53, export to footnote or bibliography
closes #180, make all contextual menu export/create bibliography options work right

also:
- add Chicago Note style output
- unregister RDF data sources from cache after import
2006-08-14 20:34:13 +00:00
Simon Kornblith
bb07710b34 okay, i actually tested it this time, so i'm pretty sure i got all the bugs out. no, i am not just cheating to get closer to the illusive r500. 2006-08-14 05:23:39 +00:00
Simon Kornblith
45f84fb31f actually fix the bug properly this time 2006-08-14 05:15:52 +00:00
Simon Kornblith
a67b8c8b95 oops. saw a bug looking over my diff. 2006-08-14 05:14:16 +00:00
Simon Kornblith
3195a1c382 closes #112, ingested items should be automatically added to selected project
references #178, changes to various date fields

- updates CSL to work with the latest schema. we can now (almost) generate completely valid APA style. the only issue is that there's no syntax for specifying short forms for page and creator type labels.
- updates scrapers to use date field rather than year field.
- removes now-unnecessary translation engine code pertaining to year field.
2006-08-14 05:12:28 +00:00
Simon Kornblith
05edc2a08b rewrote citation support to support new version of CSL schema. bibliographic output is much improved. 2006-08-12 23:23:56 +00:00
Simon Kornblith
4284132db5 update scrapers.sql version 2006-08-12 04:29:59 +00:00
Simon Kornblith
ddb4fc872c remove the doStatus argument from Scholar.Utilities.HTTP 2006-08-12 04:27:49 +00:00
Simon Kornblith
36a402713c rename Scholar.Utilities.Ingester.HTTPUtilities to Scholar.Utilities.Ingester.HTTP for consistency 2006-08-11 16:34:22 +00:00
Simon Kornblith
064ecd17db removes unnecessary pieces of piggy bank API from utilities and updates translators to abide by current translator guidelines 2006-08-11 15:28:18 +00:00
Simon Kornblith
6efd6d2cc4 closes #99, add options for export 2006-08-08 23:00:33 +00:00
Simon Kornblith
3edb6e0286 closes #86, steal EndNote download links
Scholar should now attempt to process citation information from EndNote download links (MIME types application/x-endnote-refer and application/x-research-info-systems). in situations where Scholar cannot process the information, a standard helper app dialog will appear. this behavior is controlled by the preference extensions.scholar.parseEndNoteMIMETypes.
2006-08-08 21:17:07 +00:00
Simon Kornblith
504ebf8996 closes #162, do sniffing for import formats
import should now work regardless of file extensions. this should make #86 (steal EndNote download links) fairly easy to implement.
2006-08-08 02:46:52 +00:00
Simon Kornblith
216f0c7581 closes #83, figure out how to implement OpenURL
closes #76, implement extensible search/retrieval architecture for obtaining metadata

OpenURL COinS lookup is now implemented using a real search architecture system. at the moment, it works with Open WorldCat for books, CrossRef for journal articles (provided the COinS object contains a DOI or an ISSN), and PubMed when a PMID is available.
2006-08-08 01:06:33 +00:00
Simon Kornblith
6626eba844 addresses #83, figure out how to implement OpenURL
OpenURL lookup now works for books. this means that all that's necessary to add scrapable book metadata to a page is an ISBN, as shown below:

<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info:ofi/fmt:kev:mtx:book&amp;rft.isbn=1579550088"></span>

also, we can now scrape Open WorldCat and Wikipedia Book Sources pages with no specialized code involved.

i'm still looking for a better way of looking up journal article metadata. it's currently implemented with CrossRef, but CrossRef simply will not work without a DOI, and is also incomplete (only holds the last name of the first author).
2006-08-07 05:15:30 +00:00
Simon Kornblith
e3d062a819 fix inappropriately truncated field values in InnoPAC 2006-08-07 01:49:56 +00:00
Simon Kornblith
2b5b65f4dd addresses #83, figure out how to implement OpenURL
adds preliminary support for COinS microformat data. does not yet support COinS where there is only a DOI or ISBN.
2006-08-07 00:30:36 +00:00
Simon Kornblith
c0bab22016 bring scrapers into sync with updated database schema 2006-08-06 17:34:41 +00:00
Simon Kornblith
fc589a37cf closes #131, make import/export symmetrical
all 4 import/export formats currently supported (MODS, Hybrid RDF, Unqualified Dublin Core, and RIS) now work as both import and export translators
2006-08-06 09:34:51 +00:00
Simon Kornblith
9144b56772 addresses #131, make import/export symmetrical
closes #163, make translator API allow creator types besides author

import and export in the multi-ontology RDF format should now work properly. collections, notes, and see also are all preserved. more extensive testing will be necessary later.
2006-08-05 20:58:45 +00:00
Simon Kornblith
b4c8dbe700 closes #157, add database infrastructure for different CSL styles
CSL is stored in a new "csl" table. only metadata relevant to updates and selection (ID, date updated, and title) is stored in columns.
2006-08-03 04:54:16 +00:00
Simon Kornblith
6305e4cada closes #55, export bibliography to printable version
closes #4, Make printable version

- moves functions for creating and deleting hidden browser objects to scholar.js (from ingester.js), since these are necessary for printing as well
- allows saving bibliography in HTML or printing bibliography. style support is not yet complete (pending finalization of 0.9 version of CSL specification).
2006-07-27 23:01:55 +00:00
Simon Kornblith
c64e5c841f closes #78, figure out import/export architecture
closes #100, migrate ingester to Scholar.Translate
closes #88, migrate scrapers away from RDF
closes #9, pull out LC subject heading tags
references #87, add fromArray() and toArray() methods to item objects

API changes:
all translation (import/export/web) now goes through Scholar.Translate
all Scholar-specific functions in scrapers start with "Scholar." rather than the jumbled up piggy bank un-namespaced confusion
scrapers now longer specify items through RDF (the beginning of an item.fromArray()-like function exists in Scholar.Translate.prototype._itemDone())
scrapers can be any combination of import, export, and web (type is the sum of 1/2/4 respectively)
scrapers now contain functions (doImport, doExport, doWeb) rather than loose code
scrapers can call functions in other scrapers or just call the function to translate itself
export accesses items item-by-item, rather than accepting a huge array of items
MARC functions are now in the MARC import translator, and accessed by the web translators

new features:
import now works
rudimentary RDF (unqualified dublin core only), RIS, and MARC import translators are implemented (although they are a little picky with respect to file extensions at the moment)
items appear as they are scraped
MARC import translator pulls out tags, although this seems to slow things down
no icon appears next to a the URL when Scholar hasn't detected metadata, since this seemed somewhat confusing

apologizes for the size of this diff. i figured if i was going to re-write the API, i might as well do it all at once and get everything working right.
2006-07-17 04:06:58 +00:00
Simon Kornblith
d65328c830 adds Biblio/DC/FOAF/PRISM/VCard RDF export type. Bruce D'Arcus, author of CiteProc and co-lead on the OpenOffice bibliographic project, is currently using this as his ontology, and we can unambiguously encode all of our metadata with it.
caveats:
- it's not human readable. mozilla doesn't nest blank nodes, so everything's scattered throughout the file. it would be relatively easy to do post-processing with E4X or even regexps to correct this.
- there's no generic callNumber field, so all callNumbers are encoded as LCC.

adds container creation routines to dataMode rdf

changes Dublin Core export to Unqualified Dublin Core, and removes DC Terms qualifiers
2006-07-07 18:41:21 +00:00
Simon Kornblith
c02666fcd3 add an API for Mozilla's RDF data source, so that import/export translators will be able to create and parse RDF with minimal effort
convert Dublin Core export to new API
2006-07-06 21:55:46 +00:00
Simon Kornblith
b7124bd8c1 ack, update scrapers.sql version info 2006-07-06 03:41:18 +00:00
Simon Kornblith
2d8ed16d88 adds export of tags to MODS.
adds export of seeAlso info and project hierarchy to RDF. for now, this is embedded in the modsCollection root element.

uses nodeIDs for Dublin Core RDF.
2006-07-06 03:39:32 +00:00
Simon Kornblith
c0251085a9 Add export filters for RIS and Dublin Core RDF 2006-07-05 21:44:01 +00:00
Simon Kornblith
8b4a44be0f fixes a bug that made the Google Books translator not appear
adjusts the Google Books translator to work with the latest revision of the site

renames the MODS translator to just MODS, because "Metadata Object Description Schema (MODS)" was too long for the export dialog
2006-06-30 19:21:36 +00:00
Simon Kornblith
77282c3edc - fixes a bug that could result in scrapers using utilities.processDocuments malfunctioning
- fixes a bug that could result in the Scrape Progress chrome thingy sticking around forever
- makes chrome thingy disappear when URL changes or when tabs are switched
2006-06-29 03:22:10 +00:00
Simon Kornblith
cd25ecc034 I swear I've fixed this bug before, but make multiple item ingest work right for InnoPAC 2006-06-29 02:54:37 +00:00
Simon Kornblith
45b9234996 addresses #78, figure out import/export architecture
- changes scrapers table to translators table; all import/export/web translators now belong in this table
- adds Scholar.Translate to handle translation issues. eventually, Scholar.Ingester.Document will become part of this interface
- adds Scholar_File_Interface (in fileInterface.js) to handle UI for export and eventually import. (David, when you have time, please connect Scholar_File_Interface.exportFile to a button.)
- adds an export translator for MODS. all of our metadata, but not our hierarchy (projects, etc.) translates directly and unambiguously into valid MODS. eventually, we can use RDF or another format to handle hierarchy.
- adds utilities.getVersion() and utilities.inArray() for simplified scraper coding
- fixes minor interface issues with the nifty chrome scraping status window
2006-06-29 00:56:50 +00:00
Simon Kornblith
19504e6746 - closes #73, use chrome for "Scraping Progress..." indicator
- multiple and book icons were swapped for Voyager scraper
2006-06-27 02:03:10 +00:00
Simon Kornblith
f1cc809f76 Add a generic scraper that will scrape any website, although it may not always find very much information. It looks at META tags, both Dublin Core and otherwise.
When tags are ready, we can pull out META keywords.
2006-06-26 20:44:45 +00:00
Simon Kornblith
4242c62b1b - Fix redundancy in utilities.js (I accidentally copied and pasted a much larger block of code than i meant to)
- Move processDocuments, a function for loading a DOM representation of a document or set of documents, to Scholar.Utilities.HTTP
- Add Scholar.Ingester.ingestURL, a simplified function to scrape a URL (closes #33)
2006-06-26 20:02:30 +00:00
Simon Kornblith
4535b220db Closes #84, make type icon in toolbar match item about to be scraped. It's not perfect, since to get everything right, we'd need to scrape the page as soon as it appears, but it provides a pretty good indication. Multiple items get the folder icon. If there's a better icon out there, it's pretty straightforward to implement. 2006-06-26 18:05:23 +00:00
Simon Kornblith
a33b119dff grab ISBN from SIRSI 2003+ catalogs 2006-06-26 01:17:29 +00:00
Simon Kornblith
303c6ee68d closes #41, get library call number 2006-06-26 01:08:59 +00:00
Simon Kornblith
d73127b1b3 update modification times 2006-06-25 22:01:04 +00:00
Simon Kornblith
f6b0d9a541 search results scraping for InfoTrac. closes #15 2006-06-25 22:00:20 +00:00
Simon Kornblith
1ec834cef2 Search results scraping for Project MUSE 2006-06-25 21:12:14 +00:00
Simon Kornblith
6a627fad0a Search results scraping for LexisNexis 2006-06-25 20:09:27 +00:00
Simon Kornblith
a48ea7dabf Search results scraping for ProQuest 2006-06-25 19:32:49 +00:00
Simon Kornblith
7402577806 Add search results scraping for History Cooperative 2006-06-25 18:34:23 +00:00
Simon Kornblith
a9c79f6110 Search results scraping for JSTOR 2006-06-25 18:17:00 +00:00
Simon Kornblith
5e73dcdd2e - Search results scraping for WorldCat.
- Make scraperJavaScript run on reload again, because it makes debugging easier
- There's not actually a memory leak in the proxyMonitor code.
2006-06-25 16:13:47 +00:00