Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some result operators (e.g. Distinct) after selecting a property generate incorrect SQL if query also contains optional navigation or explicit GroupJoin call #7348

Closed
kierenj opened this issue Jan 4, 2017 · 4 comments
Assignees
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-bug
Milestone

Comments

@kierenj
Copy link

kierenj commented Jan 4, 2017

Same model as #6937, #6938, #7102, #7230.

This code:

            var identities = _context.Set<WidgetManufacturedIdentity>();
            var result = identities.Where(i => i.Manufacturer.Id == id && !i.Manufacturer.Retired.HasValue)
                .Select(i => i.RangeName)
                .Distinct()
                .ToArray()
                .Where(r => !string.IsNullOrEmpty(r))
                .Select(r => new { name = r });

Should generate a distinct set of RangeName property values. Instead, it retrieves all values.

Log output:

info: Microsoft.EntityFrameworkCore.Storage.IRelationalCommandBuilderFactory[1]
      Executed DbCommand (33ms) [Parameters=[@__id_0='?'], CommandType='Text', CommandTimeout='30']
      SELECT DISTINCT [i].[WidgetId], [i].[BrandName], [i].[ManufacturerId], [i].[ProductName], [i].[ProductRef], [i].[RangeName], [i.Manufacturer].[Id], [i.Manufacturer].[Added], [i.Manufacturer].[ListOrder], [i.Manufacturer].[Modified], [i.Manufacturer].[Name], [i.Manufacturer].[Retired]
      FROM [WidgetManufacturedIdentity] AS [i]
      LEFT JOIN [WidgetManufacturer] AS [i.Manufacturer] ON [i].[ManufacturerId] = [i.Manufacturer].[Id]
      WHERE ([i.Manufacturer].[Id] = @__id_0) AND [i.Manufacturer].[Retired] IS NULL

As you can see, the DISTINCT is there, but the argument list to the SELECT is not limited to the select list I specified.

Further technical details

EF Core version: 1.2.0-preview1-22878
Database Provider: Microsoft.EntityFrameworkCore.SqlServer
Operating system: Windows 10.0.14393
IDE: VS2015, commandline, Azure App Service

@maumar
Copy link
Contributor

maumar commented Jan 5, 2017

This happens because #6647 (left join on WidgetManufacturer)

However for this case we now produce incorrect results. Before the change, Distinct was done on the client, based on only one column and therefore produced the correct results. We should aim to fix this in 1.1.1 @divega

@divega
Copy link
Contributor

divega commented Jan 5, 2017

@maumar sounds good. Feel free to assign it.

@maumar maumar added this to the 1.1.1 milestone Jan 5, 2017
@maumar maumar self-assigned this Jan 5, 2017
@maumar maumar added the type-bug label Jan 5, 2017
@maumar maumar changed the title Distinct() after Select()ing a string property doesn't generate correct SQL Distinct() after Select()ing a string property doesn't generate correct SQL if query also contains optional navigation Jan 6, 2017
@maumar maumar changed the title Distinct() after Select()ing a string property doesn't generate correct SQL if query also contains optional navigation Distinct() after Select()ing a property doesn't generate correct SQL if query also contains optional navigation Jan 6, 2017
maumar added a commit that referenced this issue Jan 10, 2017
…nerate correct SQL if query also contains optional navigation

Problem was that optional navigation is translated to GroupJoin which in turn forces materialization on entire entity. This causes some result operators (e.g. Distinct) to be applied on all columns of that entity, even if the customer specified a subset of columns to perform Distinct on.

Fix is to evaluate result operators on the client if their result depends on whether it was applied to a single column or entire row, and the query also contains GroupJoin.
Operators that can still be safely translated to SQL: Skip, Take, FirstOrDefault, Count, All, Any.
maumar added a commit that referenced this issue Jan 11, 2017
…nerate correct SQL if query also contains optional navigation

Problem was that optional navigation is translated to GroupJoin which in turn forces materialization on entire entity. This causes some result operators (e.g. Distinct) to be applied on all columns of that entity, even if the customer specified a subset of columns to perform Distinct on.

Fix is to evaluate result operators on the client if their result depends on whether it was applied to a single column or entire row, and the query also contains GroupJoin.
Operators that can still be safely translated to SQL: Skip, Take, FirstOrDefault, Count, All, Any.
maumar added a commit that referenced this issue Jan 12, 2017
…nerate correct SQL if query also contains optional navigation

Problem was that optional navigation is translated to GroupJoin which in turn forces materialization on entire entity. This causes some result operators (e.g. Distinct) to be applied on all columns of that entity, even if the customer specified a subset of columns to perform Distinct on.

Fix is to evaluate result operators on the client if their result depends on whether it was applied to a single column or entire row, and the query also contains GroupJoin.
Operators that can still be safely translated to SQL: Skip, Take, FirstOrDefault, Count, All, Any.
@maumar
Copy link
Contributor

maumar commented Jan 12, 2017

Fixed in 858a34f

@maumar maumar added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Jan 12, 2017
@maumar maumar changed the title Distinct() after Select()ing a property doesn't generate correct SQL if query also contains optional navigation Some result operators (e.g. Distinct) after selecting a property generate incorrect SQL if query also contains optional navigation or explicit groupjoin call Jan 12, 2017
@maumar maumar changed the title Some result operators (e.g. Distinct) after selecting a property generate incorrect SQL if query also contains optional navigation or explicit groupjoin call Some result operators (e.g. Distinct) after selecting a property generate incorrect SQL if query also contains optional navigation or explicit GroupJoin call Jan 12, 2017
@Eilon
Copy link
Member

Eilon commented Jan 19, 2017

This patch bug is approved. Please use the normal code review process w/ a PR and make sure the fix is in the correct branch, then close the bug and mark it as done.

@maumar maumar closed this as completed Jan 19, 2017
maumar added a commit that referenced this issue Mar 10, 2017
…ulting in unnecessary data pulling

Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway.

Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.

Other bugs fixed alongside the main change:

#7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.

Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)

#7497 - Error calling Count() after GroupJoin()

(and removed the temporary fix to #7348 and implemented a proper one)
This turned out to be a chicken-and-egg problem - we had discrepancy between query sources that we marked for materialization in case of GroupJoin + result operator, and the actual client side operation that had to be performed.
We had to fix the group join materialization to allow for translation of those result operators, but without fixing this issue also we would get invalid queries in some cases.
Proper fix is to force client result operator only if the GroupJoin cannot be flattened. We also now do this for all result operators, not just a subset because it was possible to generate invalid queries with operaors like Count().
This is now consistent with the behavior of RequiresMaterializationExpressionVisitor.

Also fixed several minor issues that were encountered once no longer do extensive materialization.
maumar added a commit that referenced this issue Mar 11, 2017
…ulting in unnecessary data pulling

Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway.

Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.

Other bugs fixed alongside the main change:

#7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.

Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)

#7497 - Error calling Count() after GroupJoin()

(and removed the temporary fix to #7348 and implemented a proper one)
This turned out to be a chicken-and-egg problem - we had discrepancy between query sources that we marked for materialization in case of GroupJoin + result operator, and the actual client side operation that had to be performed.
We had to fix the group join materialization to allow for translation of those result operators, but without fixing this issue also we would get invalid queries in some cases.
Proper fix is to force client result operator only if the GroupJoin cannot be flattened. We also now do this for all result operators, not just a subset because it was possible to generate invalid queries with operaors like Count().
This is now consistent with the behavior of RequiresMaterializationExpressionVisitor.

Also fixed several minor issues that were encountered once no longer do extensive materialization.
maumar added a commit that referenced this issue Mar 14, 2017
…ulting in unnecessary data pulling

Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway.

Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.

Other bugs fixed alongside the main change:

#7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.

Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)

#7497 - Error calling Count() after GroupJoin()

(and removed the temporary fix to #7348 and implemented a proper one)
This turned out to be a chicken-and-egg problem - we had discrepancy between query sources that we marked for materialization in case of GroupJoin + result operator, and the actual client side operation that had to be performed.
We had to fix the group join materialization to allow for translation of those result operators, but without fixing this issue also we would get invalid queries in some cases.
Proper fix is to force client result operator only if the GroupJoin cannot be flattened. We also now do this for all result operators, not just a subset because it was possible to generate invalid queries with operaors like Count().
This is now consistent with the behavior of RequiresMaterializationExpressionVisitor.

Also fixed several minor issues that were encountered once no longer do extensive materialization.
maumar added a commit that referenced this issue Mar 15, 2017
…ulting in unnecessary data pulling

Problem was that for GroupJoin we would always force materialization on participating query sources.
We were doing this because we are not always able to correctly divide outer elements of the GroupJoin into correct groups.
However, if the GroupJoin clause is wrapped around SelectMany clause the groups don't matter because they are getting flattened by SelectMany anyway.

Fix is to recognize those scenarios and only force materialization when the correct grouping actually matters.
We can avoid materialization if the GroupJoin is followed by SelectMany clause (that references the grouping) and that the grouping itself is not present anywhere else in the query.
This addresses optional navigations, which is the 80% case. Manually created GroupJoins that are not modeling LeftOuterJoins still require additional materialization, but this can be addressed later as the priority is not nearly as high.

Other bugs fixed alongside the main change:

#7722 - Query : error during compilation for queries with navigation properties and First/Single/client method operators inside a subquery.

Problem here was that for some queries we don't know how to properly bind to a value buffer (when the result of binding is subquery, and not qsre).
Fix/mitigation is to recognize those scenarios and force materialization on the subqueries. This can be properly addressed in later commit (i.e. by improving the binding logic)

#7497 - Error calling Count() after GroupJoin()

(and removed the temporary fix to #7348 and implemented a proper one)
This turned out to be a chicken-and-egg problem - we had discrepancy between query sources that we marked for materialization in case of GroupJoin + result operator, and the actual client side operation that had to be performed.
We had to fix the group join materialization to allow for translation of those result operators, but without fixing this issue also we would get invalid queries in some cases.
Proper fix is to force client result operator only if the GroupJoin cannot be flattened. We also now do this for all result operators, not just a subset because it was possible to generate invalid queries with operaors like Count().
This is now consistent with the behavior of RequiresMaterializationExpressionVisitor.

Also fixed several minor issues that were encountered once no longer do extensive materialization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-bug
Projects
None yet
Development

No branches or pull requests

4 participants