Borbin the 🐱

chatGPT result encoding

14 February, 2023


chatGPT returns the result as a UTF-8 byte sequence in text form. Anything but ASCII 7-bit chars, for example any extended chars, languages with other scripts, will result in not readable text.


For example a result returned for the Spanish language:

¿Qué habitaciones tienen disponibles?  

Expected result:

¿Qué habitaciones tienes disponibles?


Result returned for the Japanese language:

どの部屋が利用可能ですか?  

Expected result:

どの部屋が利用可能ですか? 


You need to read the result as iso-8859-1 encoding and convert as UTF-8.
For example 'é' gets encoded in UTF-8 as the byte sequence: 0xc3: 'Ã' 0xa9: '©'
But instead of 'é', chatGPT sends 'é', which is the raw UTF-8 byte sequence.
The string 'é' is a string sequence of the byte sequence 0xc3 0xa9. To get the correct Unicode string, the string elements needs to be mapped to byte elements.

[byte[]]$byteContent = [System.Text.Encoding]::GetEncoding("iso-8859-1").GetBytes($resultText)

This is done with the iso-8859-1 encoding. This will convert each char into a 8-bit representation, which then can be correctly decoded as UTF-8 to a Unicode string:


# Run chatGPT query.
$result = (Invoke-RestMethod @RestMethodParameter)

[string]$resultText = $result.choices[0].text
[byte[]]$byteContent = [System.Text.Encoding]::GetEncoding("iso-8859-1").GetBytes($resultText)

# Get the encoded result.
[string]$text = [System.Text.Encoding]::UTF8.GetString($byteContent)


Here is a full example on how to use chatGPT in PowerShell:


# https://platform.openai.com/account/api-keys
$apikey = "sk-....

<#
– Model [Required]
The ChatGPT got multiple models. Each model has its feature, strength point, and use case. You need to select one model to use while building the request. The models are:

text-davinci-003    Most capable GPT-3 model. It can do any task the other models can do, often with higher quality, longer output, and better instruction-following. It also supports inserting completions within the text.
text-curie-001      Very capable, but faster and lower cost than Davinci.
text-babbage-001    Capable of straightforward tasks, very fast, and lower cost.
text-ada-001        Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost
#>

$requestBody = @{
    prompt      = "What is the capital of Germany?"
    model       = "text-ada-001"
    temperature = 1
    stop        = "."
} | ConvertTo-Json

$header = @{ 
    Authorization = "Bearer $apikey " 
}

$restMethodParameter = @{
    Method      = 'Post'
    Uri         = 'https://api.openai.com/v1/completions'
    body        = $requestBody
    Headers     = $header
    ContentType = 'application/json'
}

# Run chatGPT query.
$result = (Invoke-RestMethod @restMethodParameter)

[string]$resultText = $result.choices[0].text
[byte[]]$byteContent = [System.Text.Encoding]::GetEncoding("iso-8859-1").GetBytes($resultText)

# Get the encoded result.
[string]$text = [System.Text.Encoding]::UTF8.GetString($byteContent)