Provenance and standards for figure metadata

Question

Provenance and standards for figure metadata

asked Aug 8, 2015 in Open Science by Ian (40 points)

I'm interested in linking the figures (png, svg, pdf) that I embed in my papers, talks, and webpages, to the scripts, codes and data that produce the figures in the first place. The hope is that by taking just the figure and a simple script, the entire environment that produced the results can be recreated (probably via a VM, despite their issues) and the process for replicating the results made clear (or as clear as the code and pipeline are).

To do this I need to embed metadata in the figure files. That's easy enough to do, but there's lots of standards for what metadata to include and how. The more standards compliant this metadata can be, the more likely it is to be preserved if/when the figures are processed by other tools.

What's the best standard(s) to follow when embedding provenance-related metadata in figure files?

This post has been migrated from the Open Science private beta at StackExchange (A51.SE)

commented Aug 17, 2015 by Alexander Konovalov (155 points)

2 Answers

answered Aug 12, 2015 by Robin Berjon (75 points)

Best answer

If your metadata must be embedded inside the files themselves (as opposed to residing in for instance an external manifest) then the only option that I am aware of that will work with the range of document types that you list is XMP.

I am not up to date as to the level and quality of tooling available for it, but I believe it is relatively okay. At the very least if memory serves it is rather simple to implement because you basically need to look for a standard marker, which works even in arbitrary binary streams (in PNG and PDF I believe it's in comment sections, in SVG I reckon you can just embed it in a <metadata> element).

This post has been migrated from the Open Science private beta at StackExchange (A51.SE)

commented Aug 17, 2015 by Ian (40 points)

Rex Kerr · Answer 1 · 2015-08-15T19:14:31+0000

answered Aug 15, 2015 by Rex Kerr (165 points)

I think you're going about this backwards because it doesn't scale to all use cases. Suppose you do an analysis of 500 days of mouse behavioral video. Do you really want to embed your entire analysis code plus the entire video in your figure? I don't think so.

Instead, you should consider embedding a reference within the figure to a uniform identifier specifying what created it (DOI or URL or something). Most image formats contain comment fields that can easily contain something like this.

You might worry about the figure and analysis getting out of date, but that can be easily enough verified by binary comparison (or other more sophisticated image comparison methods, if e.g. you're using lossy compression at different levels).

This post has been migrated from the Open Science private beta at StackExchange (A51.SE)

commented Aug 17, 2015 by Ian (40 points)

commented Aug 17, 2015 by Rex Kerr (165 points)

commented Aug 17, 2015 by Ian (40 points)

commented Aug 17, 2015 by Rex Kerr (165 points)

commented Aug 17, 2015 by Ian (40 points)

Provenance and standards for figure metadata

Please log in or register to add a comment.

Please log in or register to answer this question.

2 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Categories

Most popular tags

Provenance and standards for figure metadata

Please log in or register to add a comment.

Please log in or register to answer this question.

2 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories

Most popular tags