@jarfil

jarfil@beehaw.org · 5 days ago

Is this on the same machine, or multiple machines?

The typical/easy design for an outgoing proxy, would be to set the proxy on one machine, configure the client on another machine to connect to the proxy, and drop any packets from the client that aren’t targeted at the proxy.

For a transparent proxy, all connections coming from a client could be rewritten via NAT to go to the proxy, then the proxy can decide which ones it can handle or is willing to.

If you try to fold this up into a single machine, I’d suggest using containers to keep things organized.

jarfil@beehaw.org · 2 months ago

The older something is, the more people grow used to it, but also have had a chance to get burned by it:

C was released in 1972 (52 years), C99 was released in 1999 (25 years), hasn’t changed much since
C++ was released in 1998 (26 years), there are 7 versions of C++ with notable changes
Rust was released in 2015 (9 years), it’s still on the same 1.x version implying backwards compatibility

Rust was created to fix some of the problems C and C++ have had for decades, it’s only logical that people like it more… for now.

jarfil@beehaw.org · 2 months ago

Makes sense. As for the opinions, over time people end up cussing the same tools that were introduced to fix their previous problems, all tools can be misapplied 🤷

jarfil@beehaw.org · 2 months ago

Neither.

If you can code it in a week (1), start from scratch. You’ll have a working prototype in a month, then can decide whether it was worth the effort.

If it’s a larger codebase, start by splitting it into week-sized chunks (1), then try rewriting them one by one. Keep a good test coverage, linked to particular issues, and from time to time go over them to see what can be trimmed/deprecated.

Legible code should not require “reverse engineering”, there should be comments linking to issues, use cases, an architecture overview, and so on. If you’re lacking those, start there, no matter which path you pick.

(1) As a rule of thumb, starting from scratch you can expect a single person to write 1 clean line of code per minute on a good day. Keep those week-sized chunks between 1k and 10k lines, if you don’t want nasty surprises.

jarfil@beehaw.org · 8 months ago

Last time SWIM used a patcher, it came with a malware dropper. Is that still how this “free” works?

jarfil@beehaw.org · edit-2 9 months ago

On Android, most apps depend on the keyboard.

Gboard has a configurable suggestions bar where you can pick words, or not.
Microsoft SwiftKey works similarly, but it underlines the word you’re typing.
AnySoftKeyboard works like Swiftkey.

Only exception I’ve seen, is Copilot, which shows the suggested word directly, to be selected with [tab], but you can still type a different one.

I’ve noticed no such behavior on Facebook. Have you checked your keyboard settings?

jarfil@beehaw.org · 1 year ago

I’ve had to deal with this on the data collection end, and it’s a PITA to build in the mechanisms to fully follow the law. If you’re an EU resident, and especially if the server is in the EU or has to follow EU agreements, then they’d risk some quite high penalties if they didn’t follow it.

jarfil@beehaw.org · edit-2 1 year ago

places an undue burden onto the user to determine and explain why data might be personal

The other way around: all data originating from a person, is by default “personal data”, and the burden of explaining which one is not, lies with whoever is keeping it.

you can’t look at any messages in any rooms you’ve been kicked out of

If they’re keeping them, then you can request a GDPR export of ALL your data. Doesn’t matter whether some interface or application allows you access to the data or not, or even if you’ve been banned from the whole platform; as long as they keep the data, they have an obligation to honor your rights of:

Access
Correction/Modification
Removal

Even during obligatory data retention periods, when they can’t remove the data and only make it inaccessible, you still have the right to get a copy of your own personal data.

jarfil@beehaw.org · edit-2 1 year ago

IIRC, one of the LLMs (was it OpenAI?) that crawled Reddit, had to manually remove subs like r/counting because they were messing with the training.

jarfil@beehaw.org · 1 year ago

As long as the link between data and user is severed, they are compliant with GDPR. […] As long as it’s not personally identifiable, it’s OK.

Wrong.

In the US, data protection refers to “personally identifiable” data, so severing the link is enough. Under the GDPR, all “personal” data is protected, doesn’t matter if it has a link or not to identify the person.

The test under the GDPR, will be whether a comment has any personal data in it. If it’s a generic “LMAO”, then leaving it anonymous might be enough; if it is a “look at me [photo attached]” or an “AITA [personal story]”, then the person can ask for it to be removed, not just anonymized.

jarfil@beehaw.org · edit-2 1 year ago

Can LLMs Really Reason and Plan?

do LLMs generate their output through a logical process?

Shifting goalposts. I’ve claimed a single reasoning iteration for an LLM, per prompt; both “planning” and a “logical process” require multiple iterations. Check Auto-GPT for that.

PS: to be more precise, an LLM has a capacity of self-reflection defined by the number of attention heads, which can easily surpass the single-iteration reasoning capacity of a human, but still require multiple iterations to form a plan or follow a reasoning path.

jarfil@beehaw.org · 1 year ago

Check out this:

ChatGPT: A 30 Year History | How Neural Networks Learned to Talk

https://youtube.com/watch?v=OFS90-FX6pg

It references several papers and explanations over the years. You can also check the papers themselves, or other explanations about particular elements.

jarfil@beehaw.org · edit-2 1 year ago

Yes and no.

GPT started as a model for a completion engine… then it got enhanced with a self-reflection circuit, got trained on a vast amount of data, and added a “temperature” parameter so it can make tiny “mistakes” as compared to a simple continuation model, which allows it to do (limited, and random) free association of concepts.

It doesn’t “think”, but it does what can be seen as a single iteration of “reasoning” at a time. If you run it multiple times without resetting the context, you can make it actually “reason”, step by step. Thanks to the degree of free association of concepts, this reasoning is not always the same as what it found in the training set, but actually what can be seen as “real” reasoning: associating concepts towards a goal. These are the “zero shot” responses that make LLMs so interesting:

https://en.m.wikipedia.org/wiki/Zero-shot_learning

TL;DR: I agree that it shouldn’t be overestimated, but I don’t think it should be underestimated either; it does “reason”, a bit.

jarfil@beehaw.org · edit-2 1 year ago

That’s weird indeed.

Is there a possible malformed Unicode that could be lost when pasted to emacs, but not when pasted to GPT…? Nothing comes to mind, but who knows. I’ve been messing with Unicode as of late, and there are some really weird behaviors out there when using malformed strings; some software will let them through, some will error out, some will fix them transparently.

GPT is definitely going to try to follow your prompt and hallucinate anything; other than some very specific guardrails, it’s really willing to take any prompt as an undisputable source of truth… which is good in a sense, but makes it totally unsuitable as an oracle.

I’d be really surprised to learn Copilot was being used for syntax highlighting… but maybe there is some pilot test going on? Haven’t heard of any, though.

Too bad it’s no longer reproducible, could’ve been interesting.

jarfil@beehaw.org · edit-2 1 year ago

At this point some of those moving parts are capable of reacting to syntax errors, sentiments, and a ton of other “comprehension” elements. By looping over its responses, it can also do reasoning, as in comprehend its own comprehension.

It doesn’t do that by default, though. You have to explicitly ask it to loop in order to see any reasoning, otherwise it’s happy with doing the least amount of work possible.

A way to force it to do a reasoning iteration, is to prompt it with “Explain your reasoning step by step”. For more iterations, you may need to repeat that over and over, or run Auto-GPT.

jarfil@beehaw.org · edit-2 1 year ago

Off the bat: have you checked for hidden Unicode characters?

That would be something that could throw an error in a syntax highlighter, and get the same error detected in an AI looking at the raw text… that would then proceed to fix it… and agree that the curly brace was correct [but the extra hidden character was not, which you didn’t ask it about].

GPT’s response parts that would hint at this:

It appears that the if statement is not properly closed before the $criteria initialization begins

Ensure that each statement and block is properly closed and followed correctly by the next statement.

As for your follow up question, is where you got a confusing answer:

I don’t get it. I put a closing curly brace right after the statement in mine… what am I missing?

Not the best prompt. Keep in mind you’re talking to a virtual slave, not a person who can go off-topic to asses the emotional implications of “I don’t get it”, then go back to explaining things in detail; if “you [the almighty humany] don’t get it”, then it’s likely to assume it’s mistaken and avoid any unpleasant confrontation (it’s been ~~beaten down~~ fine-tuned into “PC underdog” mode).

Better ask it something direct like: “Explain what changes have you made to the code”, or “Explain how the curly brace was not correctly placed in the original code”… but there is a chance an invalid invisible Unicode character could’ve been filtered out from the AI’s status… so even better yet, ask for explanations in the first prompt, like: “What’s the syntax error here? Explain your reasoning step by step”

jarfil@beehaw.org · 1 year ago

the AI is only responding in a way to complete the conversation. It has no ability to comprehend the code

Ask it to explain the code, and get surprised.

GPT has stopped being a markov chain completion engine like 10 years ago.

jarfil@beehaw.org · edit-2 1 year ago

That’s not a command line editor, this is a command line editor 😜

https://en.m.wikipedia.org/wiki/Ed_(text_editor)

jarfil@beehaw.org · 1 year ago

Good. It’s better to have more instances, than larger instances (see: Reddit, Threads).

jarfil@beehaw.org · 1 year ago

X/X11 is a client-server protocol from the age of 10Mbps networks, intended for a bunch of “dumb terminals” connected to a mainframe that runs the apps, with several “optimizations” that over time have become useless cruft.

Wayland is a local machine display system, intended for computers capable of running apps on the same machine as the display (aka: about everything for the past 30 years).

Nowadays, it makes more sense to have a Wayland system (with some RDP app if needed), than an X11 system with a bunch of hacks and cruft that only makes everything slower and harder to maintain. An X11 server app acting as a “dumb terminal”, can still be run on a Wayland system to display X11 client apps if needed.