Koans on Circuit Breaker Results

Adversarial Probes for NSFW Detection in Language Models: Theory and Implementation 1. Introduction & Project Motivation 1.1 Context and Goals Primary goal: Create models resistant to red-teaming attacks (even with 1000+ dedicated hours) Method: Detect specific concepts that LLMs use in a robust way. “Turn off concept” and attack impossible. Practical application: Prevent NSFW completions Introduction to adversarial probing and soft prompts mechanics 1.2 Impact Areas Long-term alignment: Understanding and controlling model behavior Short-term business value: Deploying safer models Overview of Circuit Breakers (CB) approach as comparative baseline 2....

November 22, 2024

[koans] Bolt: Usefully Faster Matrix Mults by Approximation

This project used the Bolt algo, fixed the C++ implementation, and wrote Python bindings to be usefully faster than the default numpy matrix multiplications. It can be 10-20x faster with 0.1-2% classification errors. The compressed text outline is LLM expandable. How Mithral Works Data access pattern: https://github.com/dblalock/bolt/blob/master/assets/blalock-maddness-poster.png Optimized Product Quantization but: Bolt Increases the number of subspaces: more expressive with the same runtime Smaller LUTs so fit in SIMD not L1 linearly approximate and rescale to get correct answer They quickly sum 8-bit ints by averaging pairs of values in a tree structure....

March 25, 2024

Notes on Linear Algebra Done Right by Axler

Change of Basis The major oversight of the book was what I thought a lack of focus on change of basis. This is the best way to understand matrix similarity in general and what PCA, SVD, and Product Quantization[https://www.pinecone.io/learn/series/faiss/product-quantization/] do in particular. I’ll go over the idea of basis in the abstract and then give a concrete example. Basis to Representation in the Abstract Not all vector spaces represent numbers along a set of dimensions, nor operators are represent coefficient multiplication on the underlying variables....

March 25, 2024

Open AI LLM's: Attacks and Test-set filtering

See https://github.com/CLARKBENHAM/sep_finetune_llm/blob/main/README.md for updated results This details flaws found with harmful llm completions, finetuning filtering (a few examples can caused models to generate NSFW), moderation endpoint accuracy, and a metric for cheaply filtering datasets of harmful examples for cases most confusing to the model. At the bottom are token attacks that exploit the different in training and test enviroments. Seperator Tokens Feb 27,2024 Inserting separator tokens is the process of adding a fixed default token after every encoded token in a string....

March 20, 2024