I don't know about a ranking that would provide the answer to your question, but would be interested in seeing one, so off we go.
I agree that the platforms mentioned by Ross are amongst the most likely contenders, and I recently read Analyzing data citation practices using the Data Citation Index, which found
that the repository with more citations is specialized in Crystallography (Crystallography Open Database), followed by the Protein Data Bank (Biochemistry &Molecular Biology) and the Inter-university Consortium for Political and Social Research
(Social Sciences, Interdisciplinary.
Other resources that are widely used include PubMed/ MEDLINE (which are not open in the sense of the Open Definition but useful for comparison), the Sloan Digital Sky Survey and the Database of Genotypes and Phenotypes.
So, to get a first rough idea of the scale of use we are talking about here, let's throw all of these (either using complete titles or abbreviations) into Google Scholar and record the number of hits:
- "PubMed" 5,430,000
- "MEDLINE" 1,250,000
- "Genbank" 720,000
- "Protein Data Bank" 182,000
- protein PDB 142,000
- "Human Genome Project" 86,600
- "Sloan Digital Sky Survey" 38,000
- "UniProt" 47,400
- crystallography COD 49,800
- dbGaP 44,200
- "database of Genotypes and Phenotypes" 40,700
- "Inter-university Consortium for Political and Social Research" 23,800
- ICPSR data 20,800
- HGP genome 11,900
- "Crystallography Open Database" 596
Of these, the PubMed one is an outlier, due to many articles simply having a PubMed ID, rather than using PubMed. The results for MEDLINE include things like papers by people named Medline, but probably most are legit for our purposes, whereas the difference between crystallography COD and "Crystallography Open Database" illustrates that this method may well be off target by several orders of magnitude.
This post has been migrated from the Open Science private beta at StackExchange (A51.SE)