Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select columns by name #25

Closed
Llammissar opened this issue Feb 2, 2017 · 7 comments
Closed

Select columns by name #25

Llammissar opened this issue Feb 2, 2017 · 7 comments

Comments

@Llammissar
Copy link

Another day, another feature request. ;)

I was doing some ad hoc data mangling late yesterday and kept thinking it would be very useful to be able to refer to columns by name. Part of it was the constant checking of which data had which column number. Especially once I started using multiple tools in pipelines, it would have saved time to be able to just call things by name. A simple example:
tsv-join -k 1,2 -f transaction-rate.tsv -a 3 average-crossover-times.tsv | tsv-select -f 1,2,7 --rest last
I'd find it more readable (and thus better for use in larger scripts) to be able to do something like this:
tsv-join -k cores,threads -f transaction-rate.tsv -a 't/m' average-crossover-times.tsv | tsv-select -f cores,threads,'t/m' --rest last

@jondegenhardt
Copy link
Contributor

Yup, agreed. It's on the list.

@nickray
Copy link

nickray commented Jul 27, 2017

Is there any progress on this issue? Otherwise, I would like to start implementing towards a pull request. Are there any code guidlines besides https://ebay.github.io/tsv-utils-dlang/docs/AboutTheCode.html?

@jondegenhardt
Copy link
Contributor

@nickray No, I haven't had time recently to do this. It'd be great if you wanted to do it. Code guidelines are the same as Phobos (see D Style). The other key things are consistency between all the tools, primarily command line argument consistency, and unit tests. It won't be hard, but several of the tools will require some restructuring at the top-level, tsv-summarize especially. Helper functions/classes would probably make this simpler. Still, it won't be a trivial amount of work.

If you take this on, I suggest starting with a one or two tools and submit a pull request before proceeding. That'll give me a chance to provide feedback.

@wavefancy
Copy link

Is there any progress on this issue? Very useful feature. :-)

@jondegenhardt
Copy link
Contributor

@wavefancy I haven't done any further work with specifying columns via names. Not sure when I'll get to it, I don't have time for it currently. Though not hard, it's a reasonable amount of work because at present all the tools assume the columns numbers are known at the conclusion of command line argument processing. There's a bit of design work involved to change this so it works smoothly and consistently across all the tools.

@jondegenhardt
Copy link
Contributor

Did a review of this, and some of the key internal building blocks are now available in the code. In particular, the "fields list" processing code and BufferedByLine in utils.d provide some the key abstractions. It'll still a reasonable bit of work, don't know when I'll get to it, but it's perhaps not as far off as I once thought.

@jondegenhardt
Copy link
Contributor

jondegenhardt commented Jul 11, 2020

Enhancement complete as part of release v2.0.0. Yah!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants