Ticket #891 (new enhancement)

Opened 4 years ago

Last modified 3 years ago

genericize article delete - expect the article zip filename to match the doi suffix, rather than removing the string "journal"

Reported by: russ Assigned to: rich
Priority: high Milestone:
Component: ambra Version: 0.8.2.1
Keywords: Cc:
Blocking: Blocked By:

Description

hoping this is a minor tweak to look for a file that starts with proc. probably can push this out of 0.8.2.2 if it's at all complicated though it will be annoying for production (and me)

Dependency Graph

Attachments

image.pntd.v02.i10.zip (1.5 MB) - added by russ on 10/29/08 10:10:00.

Change History

04/03/08 10:39:17 changed by rich

  • priority changed from medium to high.

ArticleUtil?.delete is currently using the DOI xml file as the base of the zip file (e.g. "info_doi_10_1371_journal_pctr_0010039.xml" should move "pcbi.1000039.zip". The new filename for processed files has a "proc." prefix (e.g. "proc.pcbi.1000039.zip").

This function needs to be extended to handle both filenames and/or detect the zip file that exists and delete that file.

04/03/08 10:43:44 changed by rich

This function has a hardcoded substring associated to the PLoS DOI "10.1371":

String fname = article.substring(25) + ".zip";

What happens if another pubApp needs a different length DOI number?!?

04/03/08 15:24:45 changed by jsuttor

  • status changed from new to closed.
  • resolution set to fixed.

(In [5323]) fixes #891, when moving files, be more flexible in Article.zip file name

07/16/08 11:01:33 changed by

  • milestone deleted.

Milestone pubApp_0.8.2.2 deleted

09/15/08 09:12:21 changed by russ

  • status changed from closed to reopened.
  • resolution deleted.
  • blocking changed.
  • blockedby changed.
  • summary changed from on article delete, proc article zip is not moved from ingested queue to ingestion-queue to on article delete, image article zip is not moved from ingested queue to ingestion-queue.

this bug has regressed in 0.9

09/15/08 09:12:31 changed by russ

  • owner changed from jsuttor to rich.
  • status changed from reopened to new.

09/20/08 02:24:19 changed by ronald

r5323 was never merged into head, but I don't it think it's necessary either. Since no file renaming happens currently in head (i.e. we only move files from one directory to another), I don't see why this should not be working for all article types.

Please provide a sample article that exhibits this problem.

10/29/08 10:06:56 changed by russ

this happens with any article that doesn't follow the normal plos article naming conventions (ie pone.1234567.zip, corresponding to a 10.1371/journal.pone.1234567 doi)

our issue image articles are named like image.pone.v01.i01.zip, corresponding to a 10.1371/image.pone.v01.i01 doi.

i'm not sure why you can't see the problem here, but i'll upload an example file.

this is a pretty large example of plos-specific code - we seem to be requiring that the article zip filename is a suffix of the doi, AND that the string "journals." is appended to the doi, which is really silly.

in the past, we brought up this problem in the context of prepare-sip and were told that prepare-sip was plos specific by design and we should create alternate methods to prepare if we had non-standard dois for any article.

now we have the same problem with the admin pages.

would it really be that hard to decouple the filename from the doi in both prepare-sip AND the admin pages?

10/29/08 10:10:00 changed by russ

  • attachment image.pntd.v02.i10.zip added.

10/29/08 15:07:28 changed by ronald

Thanks for the zip. I see the problem: it's the ambra.services.documentManagement.documentPrefix that contains the 'journal' and is messing things up.

Re decoupling things, yes, but the problem with delete is that we have to somehow guess what file originally contained the article. Yes, we could in theory open up all 2000 files in the ingested directory and search each one whether it contains the article, but I'm sure you won't be pleased with the wait time. So, instead the app tries to guess ("reverse compute") the filename from the article name, and this is where the naming conventions come into play (this problem does not exist on the ingest side of course, just for delete). I now understand the hack in r5323 a bit better, and while it sorta worked for this case it was still prone to failure if names were sufficiently different.

One solution may be to put a marker file in the ingested directory that is named like the uri of the article and that contains the name of the sip file that the article came from.

10/30/08 10:24:41 changed by russ

maybe it would be simpler to require the article zip to be named after the portion of the doi following the slash?

that way there's no config issue and plos would be required to rename their article zips to eg. journal.pone.1234567.zip which would be very easy to do.

otherwise we need to add complexity somehow, either by storing the doi->filename relationship somewhere as you suggest, or making config more complicated...

02/18/09 15:28:47 changed by russ

  • summary changed from on article delete, image article zip is not moved from ingested queue to ingestion-queue to genericize article delete - expect the article zip filename to match the doi suffix, rather than removing the string "journal".