Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Golang []byte in vertica? #87

Closed
ghost opened this issue Jun 18, 2020 · 9 comments
Closed

Golang []byte in vertica? #87

ghost opened this issue Jun 18, 2020 · 9 comments
Labels
question Further information is requested

Comments

@ghost
Copy link

ghost commented Jun 18, 2020

What is the vertica data type for golang []byte?

@sitingren sitingren added the question Further information is requested label Jun 18, 2020
@sitingren
Copy link
Member

I think the answer is BINARY, VARBINARY, LONG VARBINARY. But your question is unclear if you are reading from/writing to a table with vertica-sql-go.

@watercraft
Copy link

The Vertica type varbinary and friends makes sense to me but this driver encodes them as hex on the way out which makes it hard to write Scan and Value function that will also work with the Postgres driver. Here is a patch I'm working on to address this:

diff --git a/vendor/github.com/watercraft/vertica-sql-go/msgs/febindmsg.go b/vendor/github.com/watercraft/vertica-sql-go/msgs/febindmsg.go
index 15375018..2115e5c3 100644
--- a/vendor/github.com/watercraft/vertica-sql-go/msgs/febindmsg.go
+++ b/vendor/github.com/watercraft/vertica-sql-go/msgs/febindmsg.go
@@ -36,6 +36,8 @@ import (
        "database/sql"
        "database/sql/driver"
        "fmt"
+       "log"
+       "reflect"
        "time"
 )
 
@@ -68,6 +70,7 @@ func (m *FEBindMsg) Flatten() ([]byte, byte) {
        var strVal string
 
        for _, arg := range m.NamedArgs {
+               log.Printf("ALAN type %s", reflect.TypeOf(arg.Value))
                switch v := arg.Value.(type) {
                case int64, float64:
                        strVal = fmt.Sprintf("%v", v)
@@ -84,6 +87,9 @@ func (m *FEBindMsg) Flatten() ([]byte, byte) {
                        continue
                case time.Time:
                        strVal = v.Format("2006-01-02T15:04:05.999999Z07:00")
+               case []uint8:
+                       strVal = string(v)
+                       log.Printf("ALAN strVal %s", strVal)
                default:
                        strVal = "??HELP??"
                }
diff --git a/vendor/github.com/watercraft/vertica-sql-go/rows.go b/vendor/github.com/watercraft/vertica-sql-go/rows.go
index 4db0516b..cd2cea11 100644
--- a/vendor/github.com/watercraft/vertica-sql-go/rows.go
+++ b/vendor/github.com/watercraft/vertica-sql-go/rows.go
@@ -111,8 +111,11 @@ func (r *rows) Next(dest []driver.Value) error {
                        dest[idx], _ = parseTimestampTZColumn(string(colVal) + r.tzOffset)
                case common.ColTypeTimestampTZ:
                        dest[idx], _ = parseTimestampTZColumn(string(colVal))
-               case common.ColTypeVarBinary, common.ColTypeLongVarBinary, common.ColTypeBinary: // to []byte - this one's easy
+                       //case common.ColTypeVarBinary, common.ColTypeLongVarBinary, common.ColTypeBinary: // to []byte - this one's easy
+               case common.ColTypeLongVarBinary, common.ColTypeBinary: // to []byte - this one's easy
                        dest[idx] = hex.EncodeToString(colVal)
+               case common.ColTypeVarBinary:
+                       dest[idx] = colVal
                default:
                        dest[idx] = string(colVal)
                }

@watercraft
Copy link

Example of Value()/Scan() that works with Postgres's JSONB type. I'd like to use these same functions for Vertica's VARBINARY.

type StringSlice []string

func (s StringSlice) Value() (driver.Value, error) {
	j, err := json.Marshal(s)
	return j, err
}

func (s *StringSlice) Scan(src interface{}) error {
	val, ok := src.([]byte)
	if !ok {
		return fmt.Errorf("StringSlice: Scan of inappropriate type")
	}
	err := json.Unmarshal(val, s)
	if err != nil {
		return err
	}
	return nil
}

@watercraft
Copy link

The strategy above is working great for JSON because it is all printable. It seems that the Vertica on the wire protocol is encoding non-printable binary values in some multi-byte sequence. For example, I can clearly see that a varbinary column has a length of 691 using a select in vsql. However, the slice returned from bedatarowmsg.Chunk() has a length of 1982. Does anyone know what this encoding is and how to decode it in Go?

@watercraft
Copy link

watercraft commented Jun 12, 2021

This seems to do the trick. The varbinary data is coming out of bedatarowmsg.Chunk() with non-printables escaped as backslash octals with backslash itself escaped as double backslash.

index 7f6ffcb..455b4d0 100644
--- a/rows.go
+++ b/rows.go
@@ -110,8 +110,25 @@ func (r *rows) Next(dest []driver.Value) error {
                        dest[idx], _ = parseTimestampTZColumn(string(colVal) + r.tzOffset)
                case common.ColTypeTimestampTZ:
                        dest[idx], _ = parseTimestampTZColumn(string(colVal))
-               case common.ColTypeVarBinary, common.ColTypeLongVarBinary, common.ColTypeBinary: // to []byte - this one's easy
-                       dest[idx] = colVal
+               case common.ColTypeVarBinary, common.ColTypeLongVarBinary, common.ColTypeBinary:
+                       // to []byte; convert escaped octal (eg \261) into byte with \\ for \
+                       var out []byte
+                       for len(colVal) > 0 {
+                               c := colVal[0]
+                               if c == '\\' {
+                                       if colVal[1] == '\\' {
+                                               colVal = colVal[2:]
+                                       } else {
+                                               x, _ := strconv.ParseInt(string(colVal[1:4]), 8, 32)
+                                               c = byte(x)
+                                               colVal = colVal[4:]
+                                       }
+                               } else {
+                                       colVal = colVal[1:]
+                               }
+                               out = append(out, c)
+                       }
+                       dest[idx] = out
                default:
                        dest[idx] = string(colVal)
                }

@watercraft
Copy link

The challenge with the above solution is writing the varbinary values from []byte in Go using database/sql. Using param substitution with jmoiron/sqlx I got an error from Vertica. I was able to get it working by using the hex input syntax X'' and converting my []byte field to hex in the client code: fmt.Sprintf("update table1 set col1=X'%s'", hex.EncodeToString(field1))

@watercraft
Copy link

Again, examining the python driver I found a solution:

index 33ee85b..e2a373f 100644
--- a/msgs/febindmsg.go
+++ b/msgs/febindmsg.go
@@ -33,6 +33,7 @@ package msgs
 // THE SOFTWARE.
 
 import (
+       "bytes"
        "database/sql"
        "database/sql/driver"
        "fmt"
@@ -85,7 +86,10 @@ func (m *FEBindMsg) Flatten() ([]byte, byte) {
                case time.Time:
                        strVal = v.Format("2006-01-02T15:04:05.999999Z07:00")
                case []uint8:
-                       strVal = string(v)
+                       v = bytes.ReplaceAll(v, []byte("\\"), []byte("\\134"))
+                       buf.appendUint32(uint32(len(v)))
+                       buf.appendBytes(v)
+                       continue
                default:
                        strVal = "??HELP??"
                }

@sitingren
Copy link
Member

@watercraft Thanks for the discussion. There are two directions of [var]binary data transfer:

  • from client to server: The server's default format for binary data allows you to escape non-printable ASCII values as a \xxx octal string. The client has to encode binary data as a UTF-8 string. The server is smart enough to handle non-printable bytes, except for \ itself. So the cilent needs to escape the byte value \ with "\134"(octal for backslash) -- 3 extra characters. You are right to refer to the python client.
  • from server to client: This is a known issue for both python and go client. The server sends a byte array encoded as text, the client has to convert it back. The team is discussing whether changing the server behavior or adding a fix for python and go client.

@sitingren
Copy link
Member

@watercraft UPDATE: Our team discussed the second issue in my above comment. We decide to change the server to send binary data directly, in a future release. We also have to adding a fix for python and go client because of backwards compatibility for old servers, which can be done right now.

It would be a big help if you can create a pull request for the last two code snippets you provide and add some tests. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants