Ticket #921 (closed defect: fixed)

Opened 4 years ago

Last modified 3 years ago

Remove CitationInfo and use Citation instead

Reported by: ronald Assigned to: russ
Priority: high Milestone:
Component: ambra Version: 0.9-SNAPSHOT
Keywords: Cc:
Blocking: Blocked By:

Description (Last modified by pradeep)

Citations are still being generated by loading the article xml, transforming that into an xml description of CitationInfo, and deserializing that xml description. Almost all of CitationInfo's info is available already in Citation - what's missing (is this really needed?) is the concept of collaborative authors and the marking of primary vs other authors.

Please see #742 for additional description. It is marked as a duplicate of this.

Dependency Graph

Change History

05/07/08 19:18:26 changed by amit

Clean up citation-info caching. Please see r5681.

07/31/08 23:22:50 changed by amit

  • milestone set to 0.9.1.

08/05/08 09:54:44 changed by pradeep

  • description changed.

09/09/08 10:34:33 changed by amit

  • owner changed from ronald to rich.
  • component changed from topaz to ambra.

09/09/08 16:07:16 changed by amit

  • owner changed from rich to ronald.

09/10/08 17:02:17 changed by amit

  • type changed from unassigned to defect.

10/24/08 14:26:06 changed by amit

  • owner changed from ronald to dragisak.
  • priority changed from unassigned to high.

Reassigning...

11/11/08 11:24:43 changed by dragisak

  • status changed from new to assigned.

11/11/08 16:24:12 changed by dragisak

Citation class is missing some properties that CitationInfo has:

  1. Title formatting: Foo <italic>bar</italic> needs to be converted to Foo <i>bar</i> the way it is done in XSL (Same issue is in Article class). It can be done either during ingestion or on the fly, during the display.
  2. Journal short name (as in XML: article/front/journal-meta/journal-id[@journal-id-type='nlm-ta']). Journal property seems to be always null.
  3. Authors are listed as UserProfile objects, not as Author objects. UserProfile is missing suffix property.
  4. startPage (article/front/article-meta/elocation-id) is missing in Citation

This will require data migration.

11/12/08 15:12:55 changed by amit

The change from <italic>bar</italic> to <i>bar</i> should be done on the fly at display time.

11/17/08 12:31:35 changed by dragisak

(In [6692]) Remove CitationInfo? that was generated from article XML by XSLT transformation and use Citation entity from triplet store. Modify article ingestion so it includes new properties in Citation and UserProfile?. Will require data migration. Addresses #921

11/17/08 14:45:59 changed by amit

  • owner changed from dragisak to ronald.
  • status changed from assigned to new.

Reassigning to Ronald to do migration as part of search also.

11/17/08 15:18:34 changed by amit

  • owner changed from ronald to pradeep.

Assigning migration to Pradeep.

11/21/08 12:25:16 changed by dragisak

(In [6741]) Should have been done in r6692. Addresses #921

12/18/08 15:22:38 changed by pradeep

(In [6953]) Migrator for ambra v0.9 Citations. See the applicationContext.xml for tuning information. By default this runs in a back-ground thread allowing web-traffic to go thru. The migrations are cache aware and so there is no need to restart peers after the migrations are done.

If 'background' strategy is chosen, tune the txn time outs and blobThrottle to adjust maximum concurrency. The migration requires a write txn from mulgara - and so it is not expected that ingests etc. are attempted till migrations are all complete. Also note that a mulgara txn related error will terminate the migration operation and a restart of ambra is required to continue the rest of the migrations.

If 'background' strategy is not chosen, ambra start up will succeed only when all Citations in the database are migrated.

Addresses #921

12/18/08 15:26:41 changed by amit

  • owner changed from pradeep to russ.

Re-assigning for testing.

(follow-up: ↓ 18 ) 12/22/08 17:49:35 changed by russ

  • owner changed from russ to pradeep.

citation migrator is failing on many articles (21/44 on branch).

pradeep explained on the list that this is due to authors being out of order in mulgara, and that the solution is to reingest by hand.

i don't think it will be possible to reingest 50% of our articles by hand.

perhaps this belongs to dragisa?

(in reply to: ↑ 17 ; follow-up: ↓ 19 ) 12/22/08 23:31:29 changed by pradeep

  • status changed from new to assigned.

Replying to russ:

citation migrator is failing on many articles (21/44 on branch). pradeep explained on the list that this is due to authors being out of order in mulgara, and that the solution is to reingest by hand. i don't think it will be possible to reingest 50% of our articles by hand.

Hmm. Something is strange then. The serverbackup.gz that I used for running couple of rounds of migration tests, succeeded on moody.topazproject.org. (Barring just one article that failed because the duplicate citation key). My understanding was that this was a backup of the production data from a few weeks back. But if that is not the case and you suspect 50% of the articles in the production have the author order wrong, that also can be addressed during this migration.

Not sure how 50% of articles have the author order all wrong. Was the article XMLs on production edited after ingestion to correct some of these things - but a re-ingestion was not done?

The current version of Migration is flagging this as an error to bring attention to this so that it can be fixed by the admin - mainly by doing a re-ingest. But if the mismatch is as high as 50% as you say, then the Migrator can be modified to take care of this easily.

(in reply to: ↑ 18 ) 12/23/08 02:11:00 changed by pradeep

  • owner changed from pradeep to russ.
  • status changed from assigned to new.

Replying to pradeep:

Realized what the confusion was with the 50% number. (See the message on the mailing list) Since this is not a migration issue, assigning back to you to figure out what is going on with the data used in the test.

01/26/09 12:29:14 changed by russ

  • status changed from new to closed.
  • resolution set to fixed.

pretty much confirmed that branch has corrupt data. migrations on stage corpus and clean small corpus did not reproduce this issue.

02/25/09 14:46:46 changed by

  • milestone deleted.

Milestone 0.9.1 deleted