Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

istream >> json --- 1st character skipped in stream #976

Closed
us-er-name opened this issue Feb 21, 2018 · 15 comments
Closed

istream >> json --- 1st character skipped in stream #976

us-er-name opened this issue Feb 21, 2018 · 15 comments
Labels
platform: visual studio related to MSVC solution: invalid the issue is not related to the library

Comments

@us-er-name
Copy link

BUG: The first character in the stream file is skipped.

I took the single include source and used it in some test code.
Source code posted about Feb 13
https://github.com/nlohmann/json/blob/develop/single_include/nlohmann/json.hpp

What I did amounted to:

// Extracted from my code
json j;
std::ifstream is("filename");
is >> j;

Tracing thru the source, I found this behaviour is the result of the code looking for a BOM at the start of the UTF-8 stream:
It fetches the first character. Since it is not part of the BOM, it performs:
Line 1669 is.unget(); // no byte order mark; process as usual

Which is what it should do BUT ...
when get() is called shortly after:

Line 3024    token_type scan()
			{
				// read next character and ignore whitespace
				do
				{
					get();
			...

the variable 'current' contains the second char, as if unget() did not happen.

Not sure as to why. Repeated get() do get all subsequent chars one by one.

Since this was just a test, the workaround is to add in a return and start text on line two. Then it works fine. Actually, to verify the bug, placing ANY character in the first position shows that character is ignored.

General settings:
Using MSVC C++ V17 (v141)
MFC in shared DLL
EXE w/Unicode support. no CLR support.

Text file loaded, edited, saved in MSVC IDE

Tested on source uploaded 8 days ago and also reproduced in a version before that.

@nlohmann
Copy link
Owner

I cannot reproduce this issue. We have a lot of test cases reading from files, and all these tests are executed on numerous compilers, including MSVC 2017 (see the end of section https://github.com/nlohmann/json#supported-compilers).

I tried to add a character a to the beginning of a valid JSON file and got the (expected) exception:

libc++abi.dylib: terminating with uncaught exception of type nlohmann::detail::parse_error:
[json.exception.parse_error.101] parse error at 1: syntax error - invalid literal; last read: 'a'

(using Xcode Version 9.2 (9C40b))

I would need more information on this.

@nlohmann nlohmann added platform: visual studio related to MSVC state: needs more info the author of the issue needs to provide more details labels Feb 21, 2018
@us-er-name
Copy link
Author

Ok. Now I feel stupid. I kinda thought this "bug" could not have slipped thru.

I went thru my original code and figured out what causes unget() to fail.
The result is strange (to me) so I figured I'd post this result for reference.

unget() fails silently, even though I read that unget() should set failbit or badbit if unsuccessful (which it didn't).

The simplified code (below) verifies the issue and was created as a Windows Console App:

// JsonTest.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

#include <fstream>
#include "json.hpp"
using json = nlohmann::json;

static const char * fname = "test.ini";

int main()
{
	json j;


	// If the stream is opened twice (like shown below) then the json stream read
	// will skip the 1st char.
	// If opened properly (once) then the file is read in properly.

	std::ifstream is(fname);  // <*** BAD. File will be opened again (below).
//	std::ifstream is; // Ok. File is opened below

	is.open(fname, std::ifstream::in); // If the stream is already open then this messes up the json load
	if (!is.is_open()) return -1;

	is >> j; // If opened twice, the 1st char is missed.

    return 0;
}

It was only when I totally exhausted my list of runtime checks and did all my tracing that I hesitantly posted this in the first place - unget() claimed success even though it failed.

Sorry about wasting your time.

@nlohmann
Copy link
Owner

No worries! Thanks for reporting, and also for following up!

@nlohmann nlohmann added solution: invalid the issue is not related to the library and removed state: needs more info the author of the issue needs to provide more details labels Feb 21, 2018
@abolz
Copy link
Contributor

abolz commented Feb 22, 2018

Hi,

IMHO this is actually a real bug. In the example above the input stream is has the failbit set (right before is >> j) and the library should not extract a valid json from it. For example, if the input file contains the number 10, the current behavior of the code is:

  • Extract a valid json from is containing the number 0 (instead of 10 !!!)
  • Clear the failbit of is
  • No exception is thrown; i.e. from the user's point of view, everything is ok and j is valid.

This is... unfortunate. After the call is >> j; there is no way for the user to check if something went wrong.

The input_stream_adapter implementation should really use the istream interface instead of using the underlying streambuf directly (and not clear any stream flags). And never succeed to get a valid json object out of the stream if the stream is actually invalid. I understand that using the streambuf directly is much faster, but the istream interface does much more than just extracting characters from the underlying streambuf. E.g. setting the failbit, eofbit etc. and if is.exceptions(...) has been called throws exceptions, etc...

The library could provide a input_streambuf_adapter which uses the streambuf directly (and works more or less exactly like the current input_stream_adapter) and could be used like

std::istream is(fname);
json j = json::parse(is.rdbuf());

The user would be responsible for testing/setting the stream flags and probably guarding the above code by first constructing an istream::sentry object.

@nlohmann nlohmann reopened this Feb 22, 2018
@nlohmann nlohmann added state: please discuss please discuss the issue or vote for your favorite option and removed solution: invalid the issue is not related to the library labels Feb 22, 2018
@abolz
Copy link
Contributor

abolz commented Feb 22, 2018

Just an idea, I haven't read through the source very carefully yet, but

I think that stream parsing can be made more conforming by switching from get/unget to peek/consume/at_end in input_adapter_protocol. (The only part which currently requires unget is skipping the BOM mark (?), but there is actually no need for that: If the sequence starts with \xEF is must be a BOM, no valid JSON can start with this byte.)

I don't expect using istream::peek/get will break too much (any?) user code. I'm quite confident that code like

std::istream in(fname);
is >> j;

will continue to work (if the stream is valid) as before. (This is 100% of my use cases actually).

But I really need to write some code before commenting further, to see if it works out ;-)

@nlohmann
Copy link
Owner

unget is also required when parsing numbers.

@abolz
Copy link
Contributor

abolz commented Feb 26, 2018

Thanks @nlohmann. I have implemented get_character in terms of is.peek/ignore so that no code in the (binary-)lexer needs to be adjusted and I think I got it working now. I'll try to submit a PR later today.

@stale
Copy link

stale bot commented Mar 28, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Mar 28, 2018
@nlohmann
Copy link
Owner

nlohmann commented Apr 2, 2018

In #834 we discussed that the unget_character function in the input adapter protocol could be removed, because it is only used in one situation in the JSON lexer, and even there can be simulated by adding a state variable. In aa89c5e I added a patch to the SAX2 branch which also removes the BOM handling from the input stream adapter.

Maybe this can also simplify PR #984.

@stale stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Apr 2, 2018
@stale
Copy link

stale bot commented May 2, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label May 2, 2018
@stale stale bot closed this as completed May 9, 2018
@xgdgsc
Copy link

xgdgsc commented May 12, 2018

I have a program that opens the same json file with this stream approach twice in two functions:

std::ifstream i(jsonFIle);
json j_conf;
i >> j_conf;

It would through the exception of parse error on the second function:

[json.exception.parse_error.101] parse error at 2: syntax error - invalid literal; last read: '[C'

Is it related to this issue? Is there any workaround for now?

@nlohmann nlohmann reopened this May 12, 2018
@stale stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label May 12, 2018
@gregmarr
Copy link
Contributor

Is the same file open twice at the same time, or does one function open, read, and close the function before the second one opens it?

@xgdgsc
Copy link

xgdgsc commented May 12, 2018

The first function should be closed/returned before the second one opens it.

@xgdgsc
Copy link

xgdgsc commented May 13, 2018

Oh. I found the reason, it' s my fault. 😅

@stale
Copy link

stale bot commented Jun 12, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jun 12, 2018
@stale stale bot closed this as completed Jun 19, 2018
@nlohmann nlohmann added solution: invalid the issue is not related to the library and removed state: please discuss please discuss the issue or vote for your favorite option state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated labels Dec 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform: visual studio related to MSVC solution: invalid the issue is not related to the library
Projects
None yet
Development

No branches or pull requests

5 participants