Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Fix type widening with char/varchar columns #3744

Merged

Conversation

johanl-db
Copy link
Collaborator

Description

Using type widening on a table that contains a char/varchar column causes the following reads to fail with DELTA_UNSUPPORTED_TYPE_CHANGE_IN_SCHEMA:

CREATE TABLE t (a VARCHAR(10), b INT);
ALTER TABLE t SET TBLPROPERTIES ('delta.enableTypeWidening' = 'true');
ALTER TABLE t ALTER COLUMN b TYPE LONG;

SELECT * FROM t;
[DELTA_UNSUPPORTED_TYPE_CHANGE_IN_SCHEMA] Unable to operate on this table because an unsupported type change was applied. Field cut was changed from VARCHAR(10) to STRING`

Type changes are recorded in the table metadata and a check on read ensures that all type changes are supported by the current implementation as attempting to read data after an unsupported type change could lead to incorrect results.
CHAR/VARCHAR columns are sometimes stripped down to STRING internally, for that reason, ALTER TABLE incorrectly identify that column a type changed to STRING and records it in the type widening metadata.

The read check in turn doesn't recognize that type change as one of the supported widening type changes (which doesn't include changes to string columns).

Fix:

  1. Never record char/varchar/string type changes in the type widening metadata
  2. Never record unsupported type changes in the type widening metadata and log an assertion instead.
  3. Don't fail on char/varchar/string type changes in the type widening metadata if such type change slips through 1. This will prevent failing in case a non-compliant implementation still record a char/varchar/string type change.
  4. Provide a table property to bypass the check if a similar issue happens again in the future.

Copy link
Contributor

@cstavr cstavr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@johanl-db johanl-db self-assigned this Oct 2, 2024
@scottsand-db scottsand-db merged commit e70ca04 into delta-io:master Oct 2, 2024
16 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants