Proprietary data formats aren't the issue: It's the ease of accessibility to the data they hold that is.
In simple terms, a file format is basically anything that can store data in such a way so that in can be used for various, respective purposes. How can storing the data in such a way be an issue? The thing is, that's not the issue. The biggest issue to this is the ability to access the data that is contained within the file format:
Let's take a look at what our trusty Open Definition says: (Emphasis mine)
An Open Format for Data - Definition 2
An Open Format is a format that, “can be processed with at least one free/libre/open-source software tool”.
The concern that people have with proprietary formats is that there are obstacles in accessing the data stored in the format. For the sake of argument, let's say that I create a data format, the .ziz
format. It's a fantastic format, capable of storing hundreds of rows of data, across multiple categories, and compresses it in such a way no quality is lost, and as such doesn't take loads of space on a computer. I create a program to access the data, the Ziz Reader
. I put it on sale for 50 dollars.
What's the issue in this? That I need to pay in order to access the data stored in the .ziz
format. The key issue is not with the format itself, but the lack of a suitable program readily available to access the data contained in the format.
While the lack of free, available programs is an issue, don't put the format to blame. There is a connection between proprietary formats and ease of access, but the problem is access. My friend Bill has to pay 50 dollars just to see what the answer of 1 plus 1 is.
Should I discourage the use of proprietary formats?
If there is an open format that does what you would like, then by all means you should use it. Don't go crazy about it. If the best means of storing and sharing the data is through means of a proprietary format, it's not that big of an issue. Generally, if you need to use a proprietary format, try your best to use one that satisfies the second definition of an open format, as listed above: I can access the data with a free/libre/open-source software tool.
This post has been migrated from the Open Science private beta at StackExchange (A51.SE)