Backup File format: JSON compliant + Only includes set attributes + One line per record #75

porg · 2022-11-10T10:35:01Z

.osxmeta JSON v0.99.37

✅ One record per row
✅ Only contains attributes whose value is set
❌ Malformed JSON — QuickLook plugins joked on it was filed as:
Validators say osxmetadata.json is malformed JSON #57

.osxmeta JSON v1.0.0 — This update brought 1 improvement but 2 disadvantages

✅ Compliant JSON — Properly read/rendered in all apps
❌ Each record contains ALL 188 attributes, mostly only a handful set, the big majority are unset ("null")
❌ All attributes broken on single lines, tag attribute saved as array, each item on own row.

Which together cause these downsides:

Increases file size tremendously (bad in LAN use cases)
Terrible human readability:
- Hundreds of rows with just "null" attributes
- Cannot compare items to each other line by line as each items takes ca 180-250 lines

Proposed next JSON format

✅ Compliant JSON while
✅ Only contains attributes whose value is set
✅ One record takes only one row, also arrays fit in there without any line breaks

RhetTbull · 2022-11-10T14:34:25Z

JSON is a format for interoperability with computers, not humans. There are many tools such as jq (brew install jq) and json_pp for working for json files. I'm using the json library from the standard python library to output the json using standard JSON indentation rules. The backup file is currently output with an indentation of 2 spaces to make it at least human inspectable (but the goal is machine readability, not human). There are tools such as jd (brew install jd) for doing structural diff of JSON already. I do not plan to change the format. Similar to the discussion on #73, I believe that command line tools should make use of other specialized command line tools rather than implement all functionality in a single tool

The suggestion to backup only attributes that are non-null is worth considering. Some file formats such as TOML do this by default. I'll take a look at adding a filter for null values before outputting the JSON but I need to first ensure there are no 2nd order effects on the restore process from doing so.

Removed null values from backup JSON #75

RhetTbull · 2022-11-11T05:22:11Z

Version v1.1.0 strips null values from the backup JSON file. This fixes the issue of lots of null fields and improves readability somewhat. As stated above, I don't intend to fix the other formatting issues as I think there are other ways to address this (using 3rd party tools) and I'm content to use the python standard JSON library.

porg · 2022-11-11T09:32:07Z

At least the size optimization was achieved then.
On the issue of formatting as human readible as possible if it comes at almost no extra development/maintenance/performance cost, I don't agree, but I respect.
- Ofc there are tools for many things (thanks for your hints on jd and jq) but nothing beats having native good readability.
- Quick check of backup file for every average user with no reliance on extra software is value-able!
  - Ascertains a user "Did the backup work? Were all files included by my shell pattern?"
  - Especially for users not dealing with furthermore extra specialized software.
- And especially if the development/maintenance/performance is negligible, because I suspect most serializing frameworks have formatting/beautifying options, so it's just a matter of setting a few flags.
  - But if the python standard JSON library doesn't and it's not worth to go the extra mile, then I very much respect that ofc, you are the expert!

RhetTbull · 2022-11-11T15:35:35Z

On the issue of formatting as human readible as possible if it comes at almost no extra development/maintenance/performance cost, I don't agree, but I respect.

From your original issue, I think what you want is every record on a single line but this is not human readable. The only alternative is to use JSON indentation, which osxmetadata now does, but that also means that arrays are broken across lines -- this is standard formatting for JSON. So I don't really understand the alternative you want (other than writing a custom JSON parser that doesn't follow JSON conventions).

for every average user with no reliance on extra software

Average user shouldn't be inspecting the backup file. It's an internal thing used by osxmetadata and the format is subject to change. I considered using sqlite as the format which would have been even more difficult for average user to inspect. But my point is that the backup file is designed for the use of osxmetadata not the user. The fact that it's JSON and in #57 I made it well-formed so it could be inspected easily, is a bonus, not a feature.

Ascertains a user "Did the backup work? Were all files included by my shell pattern?"

You can use osxmetadata with the --verbose flag which prints out the name of each file being processed.

You could do one of the following:

grep _filepath .osxmetadata.json

of slightly more pretty:

jq '.[] | ._filepath' .osxmetadata.json

Both of these print the names (with full path) of the files in the backup file.

The .[] is needed in jq to tell jq to operate on the values of the array that are in the backup file (the backup file is actually one JSON object (array) with a bunch of values representing each file). I changed this in #57 to make the JSON well formed (though the previous "malformed" format was much easier to work with in osxmetadata).

If you wanted to inspect the filename and a specific key, for example, kMDItemFinderComment which is the Finder comment, you could do this:

jq '.[] | ._filepath, .kMDItemFinderComment' .osxmetadata.json | paste -d, - -

This command extracts the _filepath and kMDItemFinderComment keys and then prints them as comma separated values (CSV) that could then easily be imported into some other tool. The - - in paste tells it to take two values at a time then join them with , (specified by -d = delimiter).

most serializing frameworks have formatting/beautifying options,

The python JSON library allows you to set the indentation, or no indentation. I use indentation of 2 for osxmetadata. The alternative would be using no indentation and that results in one file record per line (this is what earlier versions of osxmetadata did) but that is completely unreadable without using a 3rd party tool. For my purposes, I use the bat command (a replacement for cat) to view JSON files and many other things because it uses colorized output and automatic paging. This makes it easy to quickly inspect the backup file which would not be possible if no indentation was used.

porg · 2022-11-12T15:12:49Z

Ascertain user that backup goes / went well

You convinces me that these are indeed sufficient:

--verbose during backup
or grep _filepath .osxmetadata.json for inspecting a backup file in retrospect

Readability

One file record per line is what I meant and deem as very well human readable
- If the filepaths and the tags are somehow of similar length
- Then one can read that almost like a spreadsheet

Excerpt .osxmetadata v0.99.37

{"_version": "0.99.37", "_filepath": "/Volumes/Shared/Videos/Creative-Ads/Apple iPhone 4 (2010).mp4", "_filename": "Apple iPhone 4 (2010)", "com.apple.FinderInfo": {"color": 2, "stationarypad": false}, "com.apple.metadata:_kMDItemUserTags": [["Technology", 0], ["Entertainment", 0], ["\u2022Done", 2]], "com.apple.metadata:kMDItemFinderComment": ""}
{"_version": "0.99.37", "_filepath": "/Volumes/Shared/Videos/Creative-Ads/Benneton (1998).avi", "_filename": "Benneton (1998).avi", "com.apple.FinderInfo": {"color": 4, "stationarypad": false}, "com.apple.metadata:_kMDItemUserTags": [["Clothing", 0], ["Anti-Racism", 0], ["Peace", 0], ["Cooperation", 0], ["\u2022Done but unimportant", 4]], "com.apple.metadata:kMDItemFinderComment": ""}
{"_version": "0.99.37", "_filepath": "/Volumes/Shared/Videos/Creative-Ads/Hugo Boss (1995).mp4", "_filename": "Hugo Boss (1995).mp4", "com.apple.FinderInfo": {"color": 2, "stationarypad": false}, "com.apple.metadata:_kMDItemUserTags": [["Clothing", 0], ["Youth", 0], ["\u2022Done", 2]], "com.apple.metadata:kMDItemFinderComment": ""}

For human readable JSON this was already as close as it gets
If only possible in a way which would allow this one record per row while staying standard compatible (I dunno the syntax too well to reshape that v0.99.37 sample accordingly)

As I see it TSV would be quite efficient in terms of storage

The keys which come with values most probably filled can be leftmost, and those less likely filled more on the right.
That way you have the "essence" on the left, and to the right the tendency to emptiness (only one TAB per unfilled attribute, or even just a linebreak right away if empty until the rightmost column.
QuickLookCSV.qlgenerator shows TSVs very nicely like a spreadsheet, with sticky header, then you can totally inspect it like a spreadsheet.
But yeah, the application then is not soo much a backup/restore, but more an offline catalog (keeping the osxmeta file of a remote directory or offline disk locally for quick lookup). But I just wanted to bring TSV into the discussion, maybe worth considering.

RhetTbull · 2022-11-12T16:24:53Z

I personally don't find the excerpt from v0.99.37 very readable because I don't like horizontal scrolling. However, the point is moot. That format is not valid JSON (each line is valid JSON, but the file is not...and there was an issue (#57) to request the file be valid JSON hence the current format. Valid JSON does not support one record per line in the form that osxmetadata needs the data. It would be one line with all the records or the current format.

TSV might be more human readable but the backup format is meant to read by osxmetadata and TSV would require more work. As it is, I can read the backup file and get it into the format osxmetadata needs in 3 lines of code:

with open(backup_file, mode="r") as fp:
    backup_records = json.load(fp)
backup_data = {data["_filename"]: data for data in backup_records}

I think you're trying to use the backup file for something it's not intended for. A better solution would be to add an option to osxmetadata -list to output the data in json, tsv, or csv and then you could create whatever report you wanted in whatever format. I'll open a separate issue for this feature.

porg · 2022-11-12T16:27:26Z

osxmetadata -list output to json, tsv, or csv. Yes! Fine!

RhetTbull added the enhancement New feature or request label Nov 10, 2022

RhetTbull added a commit that referenced this issue Nov 11, 2022

Merge pull request #77 from RhetTbull/feature-remove-json-null-75

6e173fa

Removed null values from backup JSON #75

RhetTbull closed this as completed Nov 11, 2022

RhetTbull mentioned this issue Nov 12, 2022

Add --format option to --list #78

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backup File format: JSON compliant + Only includes set attributes + One line per record #75

Backup File format: JSON compliant + Only includes set attributes + One line per record #75

porg commented Nov 10, 2022

RhetTbull commented Nov 10, 2022

RhetTbull commented Nov 11, 2022

porg commented Nov 11, 2022 •

edited

Loading

RhetTbull commented Nov 11, 2022

porg commented Nov 12, 2022

RhetTbull commented Nov 12, 2022 •

edited

Loading

porg commented Nov 12, 2022

Backup File format: JSON compliant + Only includes set attributes + One line per record #75

Backup File format: JSON compliant + Only includes set attributes + One line per record #75

Comments

porg commented Nov 10, 2022

.osxmeta JSON v0.99.37

.osxmeta JSON v1.0.0 — This update brought 1 improvement but 2 disadvantages

Proposed next JSON format

RhetTbull commented Nov 10, 2022

RhetTbull commented Nov 11, 2022

porg commented Nov 11, 2022 • edited Loading

RhetTbull commented Nov 11, 2022

porg commented Nov 12, 2022

Ascertain user that backup goes / went well

Readability

Excerpt .osxmetadata v0.99.37

As I see it TSV would be quite efficient in terms of storage

RhetTbull commented Nov 12, 2022 • edited Loading

porg commented Nov 12, 2022

porg commented Nov 11, 2022 •

edited

Loading

RhetTbull commented Nov 12, 2022 •

edited

Loading