Commit Graph

171 Commits

Author SHA1 Message Date
Simon Kornblith
0c2ee5d449 closes #406, Incorrect "et al" handling for APA style 2006-12-11 20:59:40 +00:00
Simon Kornblith
6c80c879da - closes #407, error in EBSCOhost translator
- closes #430, Amazon translator causing utilities.js to throw exception
- officially deprecated Zotero.Utilities.getNodeString() (use doc.evaluate and nodeValue or textContent instead, or access attributes directly; these options take the nearly the same amount of code, should be faster, and don't unnecessarily bloat our utilities)
- updated word integration to the latest version
2006-12-11 20:54:22 +00:00
Dan Stillman
58466ef656 BibTeX patch from Patrick Wagstrom on dev list
His note:

1. adds the conference paper item type (currently only exported to BibTeX as inproceedings)
2. Fixes bug with editor names in BibTeX export
3. Provides more intelligent naming for entities in BibTeX exports.  Previously items would be named something like Wagstrom2006, Wagstrom2006-1, etc.  However, I noticed that this ordering could get changed around pretty easily in the export process, resulting in bad references in articles.  We can't really be having that now can we?  The keys are now take the first word of the title, stripping out a few common words.  For example, If I had a paper called "Zoteros impact on time to author scholarly papers", it would have a key of "wagstrom_zotero_2006", which is much more constant. 


There was still an editor field bug after Patrick's patch that I corrected, and author and editor fields seem to be handled properly now.


Also addresses #384, option to prevent escaping of curly brackets in BibTeX output

I believe this patch actually now prevents escaping of curly braces by default, however (according to Simon) it should still be based on a pref or option of some kind
2006-12-09 23:09:14 +00:00
Sean Takats
507efb4758 Replaces SIRSI -2003 and SIRSI 2003+ translators with single SIRSI translator. Handles WebCat, iLink, and iBistro interfaces. Ready for Emory test XPI but needs some refinement to handle other library view preferences as noted in #381 2006-12-07 04:22:57 +00:00
Sean Takats
e0d955afba closes #373 page numbers not captured in PubMed/HubMed 2006-11-29 16:42:09 +00:00
Dan Stillman
9a7c18ed5e Had to repush some scrapers, since apparently phpMyAdmin on the server replaces \n with \r\n on edits, pushing some over the b2.r2 limit 2006-11-27 22:58:36 +00:00
Sean Takats
e391844acd closes #387 year in date field is truncated. 2006-11-27 14:15:27 +00:00
Dan Stillman
f545e6a884 Setting minVersion for Google Scholar and Embedded RDF to 1.0.0b3.r1 2006-11-26 23:53:47 +00:00
Dan Stillman
24ae82b07f Aleph/arXiv/CrossRef/CiteBase pushed to repo 2006-11-26 23:50:58 +00:00
Dan Stillman
361a1e4bc6 Add minVersion/maxVersion to translators schema and schema update mechanisms (local and remote) -- these aren't really necessary on the client but let us use the same SQL to update the repo, and we probably should include them in error reports (instead of relying on different timestamps to differentiate versions)
Added minVersion and maxVersion times to existing scrapers, setting 1.0.0b3.r1 as minVersion for any >4096 characters; these could theoretically now be added back to the repository without problems, but there's not really much reason to test that theory at the moment
2006-11-26 09:19:07 +00:00
Dan Stillman
c8cecf4b7e Pushed updated NYT and Google Books translators to repo
Refs #409, Google Books translator broken after site update
Refs #380, Archived New York Times articles accessed via TimesSelect aren't detected
2006-11-25 19:59:45 +00:00
Sean Takats
88d8f19ece closes #409, google books translator broken after site update 2006-11-25 19:22:33 +00:00
Sean Takats
fc2be5bf21 closes #380 by updating translator regex to run against select.times.com. note that the example article in #380 still will not display the zotero icon or scrape, since that article does not contain the standard meta tags that we use to scrape nytimes articles. other timesselect content now does scrape, however. 2006-11-25 04:07:50 +00:00
Simon Kornblith
38531da9fa closes #396, accents are lost when scraping multiple items (with InnoPAC) 2006-11-25 03:41:13 +00:00
Simon Kornblith
5caf0d2803 made arXiv/eprintweb translator work with lists of recent articles, etc. 2006-11-25 03:16:33 +00:00
Simon Kornblith
e201c3b580 made arXiv translator work with eprintweb as well 2006-11-25 02:53:38 +00:00
Simon Kornblith
05b3cd8566 - added arXiv.org translator
- added CiteBase OpenURL search translator (although CiteBase COinS still won't work, because you can't look most of them up with the CiteBase resolver; ugh)
- fixed Amazon translator type ID (12 -> 4)
2006-11-25 02:13:17 +00:00
Simon Kornblith
94302bbe1c closes #403, Aleph translator not working
i modified the XPath the Aleph translator uses to something that should work in nearly every case.
2006-11-25 00:01:24 +00:00
Sean Takats
6ff2168729 Amazon scraper now supports international Amazon sites and retrieves data from Amazon's API 2006-11-21 21:56:13 +00:00
Simon Kornblith
445ff98277 - made doGet handle multiple urls, with processor/done style interface (as in processDocuments). this should be backwards compatible
- beginnings of mapping for new item types
- fixes for Word integration (because i was using it to write a paper)
2006-11-21 07:14:27 +00:00
Simon Kornblith
a1269146b7 - fixed XML issues with PubMed scraper (although probably not the issue that everyone seems to be experiencing)
- unfinished support for new item types
2006-11-02 00:33:50 +00:00
Dan Stillman
7a3be3e306 Updating SIRSI scraper to last time from repo
(The current repo system is a bit flawed in that translators need to be inserted with CURRENT_TIMESTAMP but scrapers.sql can't be, so scrapers.sql needs to be updated with the repo timestamp after the fact to prevent new installs from unnecessarily grabbing the changed scrapers (or they need to be post-dated to a timestamp after the UTC time of their repository insert but preferably not by more than 24 hours). Suffice it to say, we'll have a more automated solution for this in the future.)
2006-10-25 19:07:11 +00:00
Sean Takats
48659542d3 Updated SIRSI translator to handle author field (not just personal author). 2006-10-25 17:53:17 +00:00
Simon Kornblith
666831748e closes #358, APA style doesn't properly handle references with editors and no authors
closes #348, OpenURL should use only relevant parts of dates
closes #354, Error saving History Cooperative article
closes #356, Embedded Dublin Core scraper incorrectly saves web pages as item type "book"
closes #355, PubMed translator problem
closes #368, RIS/Endnote export hijack doesn't go into active collection
fixes an issue with quotation marks in bibliographies exported as RTF
fixes an issue with bibliographies and non-English locales
2006-10-23 07:34:34 +00:00
Dan Stillman
fab65f743c Eek--bump the scraper version after clearing the tables for upgraders 2006-10-06 15:26:04 +00:00
Dan Stillman
7712a24434 Moved translators and CSL CREATE TABLE statements to userdata.sql, since those are the two tables that we actually _want_ users to modify (without them being wiped on every update) 2006-10-05 23:50:29 +00:00
Dan Stillman
73149b86c7 Add ECL license block to scrapers.sql 2006-10-05 17:29:03 +00:00
Simon Kornblith
cbe7c086e1 closes #336, Some metadata fields are not exported with notes and attachments
closes #165, verify import/export can carry all data for all fields and item types
closes #168, make sure MODS import works with files from external sources
2006-10-05 08:45:44 +00:00
Dan Stillman
cd26267afe Closes #340, Change isInstitution to fieldMode everywhere
Including in the DB, which it turns out isn't really all that bad (thanks, among other things, to SQLite's ability to DROP tables within transactions without autocommitting (which MySQL can't do))
2006-10-05 00:59:26 +00:00
Simon Kornblith
92620afa52 fix a couple of rather inconsequential small bugs 2006-10-04 00:31:29 +00:00
Simon Kornblith
ac50ab16a2 Scholar -> Zotero (thanks Dan S.) 2006-10-04 00:10:35 +00:00
Simon Kornblith
56e77619c4 closes #334, Washington Post scraper shouldn't include " - washingtonpost.com" in title
closes #313, Blacklist known ad sites from scraper detection
closes #306, some New York Times ads prevent page from being recognized
closes #308, attachment import bug

currently, the ad site blacklist is located at the top of ingester/browser.js. at some point, we may want to switch this to a database table.
2006-10-03 22:13:49 +00:00
Simon Kornblith
96ccf85aba - improve CSL
- tag institutional authors appropriately
2006-10-03 21:08:02 +00:00
Dan Stillman
1cd51be497 Sorry, it was now or never, and now is better:
Changed "Scholar" to "Zotero", everywhere

Apologies to anyone with working copy changes, but there are probably the fewer at this moment than there will be again.

Hopefully this won't break anything, though existing prefs will be lost. I avoided scholar.google.com--if you know any other legitimate "scholar"s in the code, be sure to fix them once I'm done here.

This is a multi-commit change--there's at least one more coming. *Do not update to this version! It won't work!*
2006-10-02 23:15:27 +00:00
Dan Stillman
eccc2159c1 Oops--CSL table needs to be defined in scrapers.sql too.
(The problem with the current system is that any local translators or styles will be wiped out on upgrades (though not auto-updates), but the solution for that is probably to just offer an SQL file that the user can put custom SQL statements in to be run on upgrades (sorta the same idea as user.js in Firefox). Will deal with that at a later date, though.)
2006-10-02 21:25:47 +00:00
Dan Stillman
508b35f6d1 1) By "Scrapers don't save metadata properly" in my last commit, I meant only URL and accessDate, though on second thought they probably will work.
1b) However, I also did, in fact, break scraping completely, so my previous statement was actually correct. Fix for that coming right up.

2) Fixed problem with translators table getting wiped out completely whenever system.sql was updated (from r671, I believe). Right. Moved the DROP and CREATE statements for translators into translators.sql.
2006-10-02 01:07:56 +00:00
Dan Stillman
b684e97366 Closes #252, Metadata not displaying for page snapshots
Closes #304, change references to "website" to "web page"

More changes as per discussions with Dan:

- Linked URLs have been given a second chance at life, though they still shouldn't be used for (most, if any) scrapers (which should use snapshots or the URL field instead)
- Renamed the "website" item type to "webpage"
- Removed "web page" from the New Item menu
- Added Save Link To Current Page toolbar button
- Added toolbar separator between New Item buttons and link/attachment/note to differentiate
- Added limited metadata (URL and accessDate) for attachments
- URL for attachments now stored in itemData (itemAttachments.originalPath is no longer used, but I'm probably not gonna worry about it and just wait for SQLite to support dropping columns with ALTER TABLE) -- getURL() removed in favor of getField('url')
- Snapshots now say "View Snapshot"
- Added Show File button to file attachments to show in filesystem
- Added timed note field to attachments for single notes and adjusted Item.updateNote(), etc. to work with attachments
- Fixed bug with manually bound params in fulltext indexer and Item.save() (execute() vs. executeStep()) -- any recently added items probably aren't in the fulltext index because of this


Known bugs/issues:

- Attachment metadata and notes probably aren't properly imported/exported now (and accessDate definitely isn't)
- Scrapers don't save metadata properly
- Attachment title should be editable
- File attachments could probably use some more metadata (#275, more or less, though they won't be getting tabs)
2006-10-02 00:00:50 +00:00
Simon Kornblith
7c3e054ebc addresses #301, COinS bugs/enhancements; remaining issue blocked by #3 (add as many item types as possible) 2006-09-11 22:34:39 +00:00
Simon Kornblith
3dfca25879 - closes #277, disambiguation and notifier updates for Word integration
- closes #217, ability to exclude notes/attachments from select items window
- closes #244, ability to quick search from select items window
- fixes a bug with footnotes in Word integration
- fixes a bug in InnoPAC translator where items would sometimes appear twice
2006-09-10 17:38:17 +00:00
Simon Kornblith
d5bc6cbe4b - fixes a bug in capitalizeTitle
- better feedback for search translator errors
2006-09-09 22:45:03 +00:00
Simon Kornblith
14c5c40a50 - closes #279, Refer/EndNote translator
- fixes a bug in text handling that was previously masked by another
2006-09-09 22:00:04 +00:00
Simon Kornblith
67f6ae3ed2 - closes #69, notification system for broken scrapers
- don't put "Page" before page in WaPo scraper
2006-09-09 19:47:47 +00:00
Simon Kornblith
d4576d3d55 addresses #69, notification system for broken scrapers
thanks to Dan for his help on the repository side of things
2006-09-09 00:12:09 +00:00
Simon Kornblith
60422e032e - closes #261, work around content-disposition: attachment on endnote links. this workaround is far from the most elegant, but it seemed nicer than writing a stream converter component that didn't really convert streams
- fixes bugs in RIS import
2006-09-08 22:26:59 +00:00
Simon Kornblith
7b7d3d85e3 - added Washington Post translator
- translation works properly even when a user has switched to a different page
2006-09-08 05:47:47 +00:00
Simon Kornblith
b8ddba3a67 CiteSeer translator 2006-09-08 01:59:22 +00:00
Simon Kornblith
5028880d38 closes #280, BibTeX translator
- fixes date bugs
- fixes (again) an issue that would cause the "unresponsive script" dialog to appear when importing or exporting
2006-09-07 22:10:26 +00:00
Simon Kornblith
cf8dc232b1 - new translators: New York Review of Books, Chronicle of Higher Education
- more useful errors in utilities
- fixes minor bugs in citation styling
2006-09-07 01:23:13 +00:00
Simon Kornblith
89cf0c7235 closes #276, fix RIS bugs
- import translators no longer fail when trying to import an item with no name
- the T2/BT field becomes the publication title when no JO/JF field is available (fixes newspaper issues)
- Y2 is now treated as part of the date if and only if it is improperly formatted (seriously, why can't Thomson get their own specs straight?)
- work around EndNote's strange behavior of putting article titles into notes for no apparent reason
- RIS export gives dates as per specification
- fixed a bug that could have (potentially) caused problems formatting "January"
- allow translators to access strToDate function
2006-09-06 04:45:19 +00:00
Simon Kornblith
b3bb6b9013 remove unnecessary debug code 2006-09-05 07:59:25 +00:00