Borbin the 🐱

chatGPT result encoding

14 February, 2023


chatGPT returns the result as a UTF-8 byte sequence in text form. Anything but ASCII 7-bit chars, for example any extended chars, languages with other scripts, will result in not readable text.


For example a result returned for the Spanish language:

¿Qué habitaciones tienen disponibles?  

Expected result:

¿Qué habitaciones tienes disponibles?


Result returned for the Japanese language:

どの部屋が利用可能ですか?  

Expected result:

どの部屋が利用可能ですか? 


You need to read the result as iso-8859-1 encoding and convert as UTF-8.
For example 'é' gets encoded in UTF-8 as the byte sequence: 0xc3: 'Ã' 0xa9: '©'
But instead of 'é', chatGPT sends 'é', which is the raw UTF-8 byte sequence.
The string 'é' is a string sequence of the byte sequence 0xc3 0xa9. To get the correct Unicode string, the string elements needs to be mapped to byte elements.

[byte[]]$byteContent = [System.Text.Encoding]::GetEncoding("iso-8859-1").GetBytes($resultText)

This is done with the iso-8859-1 encoding. This will convert each char into a 8-bit representation, which then can be correctly decoded as UTF-8 to a Unicode string:


# Run chatGPT query.
$result = (Invoke-RestMethod @RestMethodParameter)

[string]$resultText = $result.choices[0].text
[byte[]]$byteContent = [System.Text.Encoding]::GetEncoding("iso-8859-1").GetBytes($resultText)

# Get the encoded result.
[string]$text = [System.Text.Encoding]::UTF8.GetString($byteContent)


Here is a full example on how to use chatGPT in PowerShell:


# https://platform.openai.com/account/api-keys
$apikey = "sk-....

<#
– Model [Required]
The ChatGPT got multiple models. Each model has its feature, strength point, and use case. You need to select one model to use while building the request. The models are:

text-davinci-003    Most capable GPT-3 model. It can do any task the other models can do, often with higher quality, longer output, and better instruction-following. It also supports inserting completions within the text.
text-curie-001      Very capable, but faster and lower cost than Davinci.
text-babbage-001    Capable of straightforward tasks, very fast, and lower cost.
text-ada-001        Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost
#>

$requestBody = @{
    prompt      = "What is the capital of Germany?"
    model       = "text-ada-001"
    temperature = 1
    stop        = "."
} | ConvertTo-Json

$header = @{ 
    Authorization = "Bearer $apikey " 
}

$restMethodParameter = @{
    Method      = 'Post'
    Uri         = 'https://api.openai.com/v1/completions'
    body        = $requestBody
    Headers     = $header
    ContentType = 'application/json'
}

# Run chatGPT query.
$result = (Invoke-RestMethod @restMethodParameter)

[string]$resultText = $result.choices[0].text
[byte[]]$byteContent = [System.Text.Encoding]::GetEncoding("iso-8859-1").GetBytes($resultText)

# Get the encoded result.
[string]$text = [System.Text.Encoding]::UTF8.GetString($byteContent)

Scan text with regex in PowerShell

24 April, 2022


The named group capture (?<name>exp) in a regex is an easy way to scan content. In this example, to get the text enclosed in quotes in a string. This is how it is done in PowerShell:

# Get the text enclosed in quotes.
[string]$text = 'This is an "example text".'
[string]$textRegex = '\"(?<Text>.*?)\"'

if ($text -match $textRegex) {
    $matches['Text']
}

This outputs
example text


Or split a formatted string into parts. For example the assignment structure 'id=value':

# Parse the id and value of the text.
[string]$text = '  id123 = abc  '
[string]$idValueRegex = "^\s*(?<id>\w+?)\s*=\s*`"?(?<value>.+?)`"?\s*$"

if ($text -match $idValueRegex) {
    "id=$($matches['id']), value=$($matches['value'])"
}

This outputs
id=id123, value=abc


Or parse a pattern, for example the content of each bracket in " abc { 123 } { def } 456 {xyz}"

[string]$text = " abc { 123 } { def } 456 {xyz}"
[string]$bracketRegex = "[{]\s*(?<Text>.*?)\s*[}]"

([regex]$bracketRegex).Matches($text) | % {
    [System.Text.RegularExpressions.Group]$match = $_
    [string]$value = $match.Groups["Text"].Value

    $value
}

This outputs
123
def
xyz


Using List in PowerShell

24 April, 2022


PowerShell has lots of array and lists support, but changing or creating a list with dynamic data recreate the list on each change which is inefficient for large lists.
The most simple solution is to use the .NET List class:

    [System.Collections.Generic.List[string]]$content = [System.Collections.Generic.List[string]]::new()

    $content.Add("line1")
    $content.Add("line2")

Text file encoding with PowerShell

24 April, 2022


Text files contain Text with a certain encoding. The usual symbols can be displayed with one byte and encoded as such in the file. But extended chars or other glyphs need more than one byte for representation. The standard for this is Unicode.


Common Unicode encodings are utf-8 and utf-16.
utf-8 encodes 7bit chars as it is and is one of the most used formats out there because it results in small file sizes as most text is 7bit anyway. All non 7bit chars are encoded with a sequence.
utf-16 uses the surrogate pairs to encode char points out of the basic plane, but for most cases it is 2 byte per char. Also known as 'Unicode' with the option for big/little endian order of the byte sequence. The .NET string class is also using utf-16 encoding. As with the file format, don't assume each char is two bytes.


The PowerShell functions Get-Content and Set-Content need an encoding to properly read/write the file.
Without any encoding, this loops through all bytes in the text file instead of the encoded chars, and the loop variable is only the byte part of the original encoding and not very useful.

# No encoding.
Get-Content $textFile | % { 
    $_
} | Set-Content $textFileOut


If the encoding is missing when the file is read, the original text content in utf-8:
😺abcパワーシェル
will be stored as this instead:
😺abcパワーシェル

# Encoding missing, wrong content in output file.
Get-Content $textFile | % {
    $_
} | Set-Content -Encoding UTF8 $textFileOut


The encoding is needed to properly read the chars in a text file:

# Read utf-8 file and write as utf-8.
Get-Content -Encoding UTF8 $textFile | % {
    $_
} | Set-Content -Encoding UTF8 $textFileOut


# Read utf-8 file and write as Unicode (utf-16).
Get-Content -Encoding UTF8 $textFile | % {
    $_
} | Set-Content -Encoding Unicode $textFileOut


Note: The Get-Content will read a unicode file even when the utf-8 encoding is used, but it won't read a utf-8 file when the unicode encoding is used. Do not rely on this.
But when the encoding is not known, it is difficult to use Get-Content. Best practice is to use the ReadLines API from .Net to read any file encoding:

# Read any file encoding and write as utf-8.
[System.IO.File]::ReadLines($textFile) | % { 
    $_
} | Set-Content -Encoding UTF8 $textFileOut


If the Byte Order Mask (BOM) is not needed, use this to write out as utf-8 without BOM:

# Read any file encoding and write as utf-8 without BOM.
[string[]]$contentLines = [System.IO.File]::ReadLines($textFile)
[Text.UTF8Encoding]$encoding = New-Object System.Text.UTF8Encoding $false
[IO.File]::WriteAllLines($textFileOut, $contentLines, $encoding)


The ReadLines API does not load all content into memory at once and allow for very large files to be processed line by line. If you need the file in one string, use this:

# Read text as one string with any file encoding and write as utf-8 without BOM.
[string]$content = [System.IO.File]::ReadAllText($textFile)
[Text.UTF8Encoding]$encoding = New-Object System.Text.UTF8Encoding $false
[IO.File]::WriteAllText($textFileOut, $content, $encoding)


XML files are also text files using an encoding. Most XML files use utf-8, but if the encoding is different, this commonly used code is not working anymore:

# Do not use.
[xml]$xml = Get-Content -Encoding UTF8 $xmlFile


Use this instead:

# Read XML file.
[xml]$xml = New-Object xml
$xml.Load($xmlFile)


The default output file encoding is utf-8 with a BOM:

# Save xml as utf-8 with signature (BOM).
$xml.Save($xmlFileOut)


To not write a BOM, use this:

# Save xml as utf-8 without BOM.
$encoding = [System.Text.UTF8Encoding]::new($false)
$writer = [System.IO.StreamWriter]::new($xmlFileOut, $false, $encoding)
$xml.Save($writer)
$writer.Dispose()

AI Code calculus examples

11 October, 2021


Example scripts to calculate the integral and zero points of functions using the AI code programmable calculator for Android.

import."mathlib"

// Math calculus examples

// Store the function in a variable 'fx1' .
// -e^(x-2)
{ 2 - e swap ^ -1 * } sto.fx1

// Integral
0    // start
2    // end
0,001    // precision
rcl.fx1  // f(x) as lambda
integral

// Zero point, uses the function inline.
-3    // start
{ sto.x rcl.x dup dup * * rcl.x dup * + rcl.x 4 * - 4 - }    // x^3+x^2-4x-4
nullstelle

See the pre installed scripts for more examples.

AI code at Google Play


Project AI Code update

11 October, 2021


AI code is a programmable calculator for Android.
AI code supports Forth-style definitions, variables, code import and lamda expressions. The latest update adds Vector/Matrix operations and statistics. Along with the color coded editor it is a new concept of using a programmable calculator.

// Vector, Matrix in variable
[ 1 2 3 ] "Vector " +
sto.v1
[ [ 3 2 1 ]
  [ 1 0 2 ] ] "Matrix " +
sto.m1

rcl.v1
rcl.m1

// Matrix multiplication
[ [ 1 2 ]
  [ 0 1 ]
  [ 4 0 ] ]

[ [ 3 2 1 ]
  [ 1 0 4 ] ]

*

// solve system of equations
[ [ 1 2 3 ]
[ 1 1 1 ]
[ 3 3 1 ] ]

[ 2 2 0 ]

solve

// Matrix invert
[ [ 1 2 0 ]
  [ 2 4 1 ]
  [ 2 1 0 ] ] 

inv

See the pre installed scripts for more examples.

AI code at Google Play


← Vorherige Beiträge