Questions tagged [best-practice]
Best-practice questions generally involve a short guide or tutorial related to an outstanding example of the use or implementation of Open Data principles or practices.
57 questions
1
vote
0
answers
41
views
How can I sufficiently rigorously create a list from news reports?
For a research paper I'm currently working on, I need to compile a dataset based on various news reports. Obviously, there's no way I can be sure that my list is exhaustive, but how can I compile data ...
1
vote
0
answers
30
views
Preserving user-privacy whilst enabling strangers to perform data analysis on some form of private data
Context
This question is about the intent to analyse taskwarrior data of people to improve automatic schedulers of tasks for individual taskwarrior users. This intend is accompanied with two desires. ...
1
vote
1
answer
92
views
Patient Workflow Steps in MIMIC-III
I'm trying to build a workflow model for subjects in the MIMIC-III data set. there's clearly no shortage of activities in here but if I were trying to track the major steps in the process (treatment ...
2
votes
0
answers
56
views
How avoid duplication in a data lifecycle?
I will put an example.
I have a 1 TB CSV file, I need to conserve the raw file and sanitize then.
After the sanitization near 10% of entries has changed.
Both files need to be stored for years.
So,...
3
votes
1
answer
42
views
reference to share on best practices [closed]
I'd like to find a best practices document to share with the developer of a internal DB interface. Specifically I'd like to show that the approach that has been used for 15+ years results in output ...
3
votes
1
answer
67
views
How strictly are "universal design" standards enforced in open data?
"Universal design"is a U.S. government standard that forces items, in this case metadata, to be accessible to people with disabilities.
Apparently universal design standards were promulgated some ...
3
votes
0
answers
48
views
Examples and experiences of text and datamining use of open data?
I am currently working on a research project that looks at the barriers and best practices when it comes to text and datamining.
www.FutureTDM.eu
Do you have any experience with this in relation to ...
3
votes
0
answers
95
views
How does open data differ in quality depending on the source?
Let's talk about government data, e.g. data.gov, first. Putting aside "outliers," are most government data sets guided by e.g. "mandates" by the government? If not, how does the fact that say weather, ...
3
votes
1
answer
110
views
Is it possible to request an api key for each one of my users, using their registered email?
I am using the USDA food database in an app, however I will not being able to use it in the live version because of the API request limitations. One workaround would be to give each registered user ...
1
vote
1
answer
60
views
Should the testing of my learning algorithm be restricted only on standard datasets or can I use any dataset to publish my results?
I've trained a speech recognition model using the LibriSpeech database which isn't as widely used as other datasets like TIMIT or MNIST. Does that factor in any way or can I publish the results ...
4
votes
1
answer
111
views
methods to build your own data set from public domain data sources
Data sets can originate from government sources, corporations, or individuals.
When an individual collects data, the exercise of collecting data falls into at least two categories:
The individual ...
6
votes
1
answer
111
views
How to collect hand-writing data
I'm working on a machine learning project that requires that I collect hand-writing data of alphabets from an uncommon language (I'm 100% there is no available data out there). My question is how ...
1
vote
2
answers
65
views
To whom does "author" refer when using schema.org's "MusicAlbum" schema?
I am trying to encode reviews of music albums. If I understand the "MusicAlbum" schema correctly, the =reviewer= is the "author" of the "review" content? Can anyone point me to examples of well-...
5
votes
3
answers
312
views
How should US SSN be anonimized?
I have just gotten access to a US government dataset. It is not open, but could eventually be made open. The dataset includes hashed US SSNs. It looks like they used some general hashing function that ...
8
votes
6
answers
401
views
Cases where open data has been removed?
I'm looking for cases where data (including software) that once was open is not anymore, or was not for periods of several months or more.
I can imagine several scenarios for this, including
Technical ...