• Hacker News
  • new|
  • comments|
  • show|
  • ask|
  • jobs|
  • pixeyo 20 hours

    The actual cost depends almost entirely on context window size and task frequency, not the hosting tier.

      Light usage (a few conversations a day, no cron jobs) typically lands $5-20/month in API tokens. The trap is scheduled tasks or       
      heartbeat loops running against Opus — that compound fast. Switching the default model to Sonnet cuts costs ~5x for most workloads with
      no real quality difference for non-coding tasks.
    
      A few things that actually move the needle:
      - Run openclaw models list to see what's configured, then set a cheaper default for routine tasks
      - Set a token budget in any cron job skill config before running it overnight
      - Keep MEMORY.md trimmed — long memory files add to every request
    
      I put together a cost calculator at openclawcheatsheet.com that lets you model different usage patterns (message frequency, cron jobs,
      context size) and get a realistic monthly estimate. Helped me stop being surprised by my Anthropic bill.

  • usplusAI 6 hours

    usplus.ai has similar functionality without the hardware rental. Agentic AI in Org Chart form. Try our new Ralp Mode today for free for fully autonomous OpenClaw level Agent

  • jimmySixDOF 1 days

    Nice turn key solution I like that it comes with it's own email and you don't need to add anything .... I was a fan of this VPS setup service for a beads agent system up from end to end but you need to BYO everything still it's free as in open source so got to thank Sir Dicklesworthstone for putting it together --

    https://agent-flywheel.com/

  • skybrian 1 days

    Not a fan of OpenClaw but if you're going to do that, why not on a service that gives you multiple VM's? https://exe.dev/docs/use-case-openclaw

  • simple10 1 days

    Klaus looks great! It's definitely looks like a step up from the one-click VPS deploys that are terribly insecure.

    I spent the past month hacking on openclaw to play nice in a docker container for my own VPS use.

    This project has a lot of useful debugging tools for running multiple claws on a single VPS:

    https://github.com/simple10/openclaw-stack

    For average users, Klaus is a much better fit.

  • sam_chenard 11 hours

    on the prompt injection via email problem — model choice helps but it's not the right layer to defend. you want to scan at ingestion, before the content ever hits context.

    we built LobsterMail (lobstermail.ai) specifically for this. we're an email security team behind (palisade.email) and have been really obsessed with this problem for the last 6 months.

    every inbound email gets scanned for 6 injection categories (boundary manipulation, role hijacking, data exfiltration attempts, obfuscated payloads, etc.) before it reaches the agent. the SDK exposes `email.isInjectionRisk` and `safeBodyForLLM()` which wraps untrusted content in boundary markers with a metadata header. the agent can make an informed decision rather than blindly consuming whatever lands in its inbox.

    it's also agent-native — the agent self-provisions its own `@lobstermail.ai` address, no oauth app needed, no borrowing the user's gmail. big respect for agentmail too but give a shot to lobstermail if youre interested!

  • tristanwaddell 1 days

    That's a cool idea, i'll be sure to check it out

  • vzaliva 1 days

    VM hosting is good. But I want to go step further and have a local model in this VM.

  • rockmanzheng 1 days

    [dead]

  • _pdp_ 1 days

    Focusing on where the agent runs instead of what it can do is basically the wrong strategy. Hosting these agents is hardly the problem and frankly AWS is not the most cost effective or secure path forward.

    What is more important is making them do actual useful things that are net positive and right now the use-case are pretty limited.

  • rcarmo 1 days

    Hmm. OK, I guess. But I are you going to stick to just OpenClaw, or look into variants? (I created https://github.com/rcarmo/piclaw, which is a leaner, less hype-driven and more utilitarian thing)

  • august- 1 days

    OpenClaw indeed breaks with every update haha, nicely done

  • briandoll 1 days

    The biggest value IMHO of OpenClaw is that it's in the Apple ecosystem, so it leverages Reminders, iCloud sync for Obsidian values, etc., so not having a Mac option is pretty limiting for anyone who's relying on those integrations currently.

  • brtkwr 1 days

    I imagine this would be quite a good fit for people who don't want to manage their own OpenClaw instance on their home network which is near zero cost especially if you use it with Gemini free tier + a low power arm board.

  • gostsamo 1 days

    For a product that supposedly handles the most private bits of one's personal life, I would've expected much stronger wording in the privacy section. Instead, privacy and security are meshed up in one soup, there is no mention of internal access controls, and no promise that this info won't be shared under no shape or form or derivative beyond providing the functions necessary for the service. CCPA is mentioned but only for California residents. Generally, use at your own risk.

  • _joel 1 days

    "The week after our launch we spent 20+ hours fixing broken machines by hand."

    oh fuck yea, sounds great.

    Hard pass on this (and OpenClaw) thanks.

  • Yash16 1 days

    [dead]

  • webpolis 1 days

    [dead]

  • baileywickham 1 days

    [dead]

  • nonameiguess 1 days

    Acknowledging the reality of history and business here that there's a 99% chance you don't exist in a few years, I would encourage you nonetheless to break EC2 and AWS in every single way you can possibly imagine and in ways you can't, obviously not in your customer account, but in a separate one. I was doing consulting services for a machine learning company that sold pre-configured EC2s and associated data infra to third-party researchers at a markup and basically stood up and ran their whole environment for about two years. Networking is probably the most frustrating thing you'll ever encounter and beware when they change their APIs and parameters that used to default to null no longer do. It's especially fun when the Linux kernel on the hypervisors you can't see messes with your packets.

  • 1 days

  • octoclaw 1 days

    [dead]

  • Serginusa 1 days

    [dead]

  • nullcathedral 1 days

    Do you run a dedicated "AI SRE" instance for each customer or how do you ensure there is no potential for cross-contamination or data leakage across customers?

    Basically how do you make sure your "AI SRE" does not deviate from it's task and cause mayhem in the VM, or worse. Exfiltrates secrets, or other nasty things? :)

    webpolis 1 days

    [dead]

    baileywickham 1 days

    We run a dedicated AI SRE for each instance with scoped creds for just their instance. OpenClaw by nature has security risks so we want to limit those as much as possible. We only provision integrations the user has explicitly configured.

  • hasa 1 days

    I get impression that this is automation tool for sales people. Does it do robotic phone calls to try to book meetings with customers?

    robthompson2018 1 days

    We certainly have customers who work in sales, but that's not the only use case.

    OpenClaw is capable of using ElevenLabs or other providers to make phone calls, but I personally haven't done this and as far as I know none of our customers have either. Is AI good enough at cold calling yet for this to work? I personally would never entertain such a call.

  • jdeng 19 hours

    For openclaw to become helpful, you have to connect it to your personal email, access to your file etc. All of these requires user's manual setup right?. I do not get the point of "batteries included". Installing it is not the bottleneck right? The official docs has detail procedures for all deployment options.

    Lalabadie 17 hours

    Right, whether it runs in a sandbox is the least of my concerns if the point is to give that sandbox a way to spend or communicate in my name.

  • Myzel394 1 days

    Sounds like a perfect data leak any% speedrun to me... :P

    baileywickham 1 days

    You're right that security is a major risk. Our perspective here is that by defaulting to an EC2 instance, you're in control of what data is at risk. If you connect Google Workspace, you are exposing yourself to some security risk risk there, but tons of users do email through AgentMail which doesn't have access to your personal data. Also no risk of filesystem access/Apple ID access by default.

  • ericlevine 1 days

    > Connecting your email is still a risk.

    > If you’ve built something agents want, please let us know. Comments welcome!

    I'll bite! I've built a self-hosted open source tool that's intended to solve this problem specifically. It allows you to approve an agent purpose rather than specific scopes. An LLM then makes sure that all requests fit that purpose, and only inject the credentials if they're in line with the approved purpose. I (and my early users) have found substantially reduces the likelihood of agent drift or injection attacks.

    https://github.com/clawvisor/clawvisor

    robthompson2018 1 days

    Would love to see any evals you've run of this system

  • rid 1 days

    What does the VM consist of? Is the image available?

    baileywickham 1 days

    It's an Amazon Linux image on an EC2 instance. We install some custom packages too.

  • ar_lan 1 days

    I tried this service a few weeks ago, and I commend the goal - but there were a few issues I ran into:

    1. There are many interactions I just could not get to work. I may have done something wrong, but in general, I have the perspective that most products should "just work" if it's as simple as clicking a button or directing something. In this case, I'm tangibly talking about the Browser feature, and the Canvas feature. In my account, I tried many times to have OpenClaw use the Browser to access a website and send me a screenshot, and it regularly reported the Browser was inaccessible, even though I had enabled it via Klaus UI. Secondly, I asked it to write certain reports to the Canvas as HTML pages that I could review - the entries would show up as files I could click on, but the files themselves were always empty. 2. OpenClaw with tokens is insanely expensive - I blew through the $15 tokens in a matter of a day.

    For the first, my guess is I misconfigured something, but it's really difficult to identify what is wrong. My expectation was that I could prompt via Telegram to configure anything and everything, but some link was missing. Although I am a technical person, my expectation was that I would not need to muck around via `ssh` to figure out where my files ended up.

    For the latter - and more broadly - OpenClaw is not well understood for most, and I think they will be caught off guard just how expensive it is. $15 in tokens is not a lot with how inefficient OpenClaw can be. My suggestion would be:

    1. Pre-configure OpenClaw with already extremely memory-efficient rules and skills. 2. Provide clear guidance/documentation on ideal agent setup with different models as necessary. I think OpenRouter attempts to achieve this pretty well, but you are providing a layer on top of OpenRouter that may not be obvious to less-well-versed people. 3. Batteries-included options should "just work" - I felt I wasted a lot of tokens just figuring out how to get the thing to do simple tasks for me.

    ---

    A lot of the notes I made are less about your product and what you've achieved, and more to do with OpenClaw. However, you've achieved one major milestone - which is the one-click setup of OpenClaw. But if your target demographic is the less technically inclined folks that want to be able to play with the bleeding edge of AI practices, I think your platform needs to guide users to how to actually use this thing, and become useful right away.

    It may even be beneficial to showcase extremely clear workflows for users to get started and sell why they even want OpenClaw.

    ---

    Anyway, kudos on the release! It is not easy to ship and you've done that hard bit! I bid you good luck on the next phase!

    baileywickham 1 days

    Thanks for the feedback here, this matches a bunch of patterns we have seen.

    One of the fundamental problems is OpenClaw is tech for nerds. It's hard to use, it breaks all the time, it's built on LLMs, etc. We'd like to be the one to bridge the gap but that will take a ton of work. It's something we spend all day thinking about. Some issues like the one you hit with canvas are likely some mix of our problem and the model doing something unexpected like putting the file in the wrong directory which is constantly a problem.

    Also agree on the cost being a huge issue. We give $15 up front and it just disappears so quickly for many users. Some users switch to smaller models but often this just ends up with people being more unhappy because the performance is bad. Opus is the least likely to make mistakes but also the most expensive.

    Thanks for the advice, it's great to hear you believe in it too! At a personal level, it means a ton to me. Just got to keep writing code.

  • docybo 1 days

    Feels like most agent security discussions focus on where the agent runs (VMs, sandboxes, etc), but not whether the action itself should execute.

    Even in a locked-down VM the agent can still send emails, spin up infra, hit APIs, burn tokens.

    A pattern we've been experimenting with is putting an authorization boundary between the runtime and the tools it calls. The runtime proposes an action, a policy evaluates it, and the action only runs if authorization verifies.

    Curious if others building agent runtimes are exploring similar patterns.

    1 days

    lombasihir 1 days

    agree, maybe use threadlocker-like mode? confirm any action before it ran, but then it defeat the purpose of autonomous agents.

  • ilovesamaltman 1 days

    [flagged]

    baileywickham 1 days

    Go for it!

  • sealthedeal 1 days

    Is this not just Claude Code? Genuinely hoping someone could spell it out for me

    gavinray 1 days

    We're all asking the same thing. It's basically Claude Code, AFAICT

    https://news.ycombinator.com/item?id=47327474

    throwatdem12311 1 days

    Claude Desktop app had scheduled tasks now for both Code and Cowork. For what I would use OpenClaw for it’s basically obsolete now.

    baileywickham 1 days

    Claude Code is awesome, I use it all day, every day. OpenClaw is similar but not the same. I think if all you do is write code, CC is probably best for you.

    OpenClaw is interesting because it does a lot of things ok, but it was the first to do so. It will chat with you in Telegram/messages which is small but surprisingly interesting. It handles scheduled tasks. The open source community is huge, clawhub is very useful for out of the box skills. It's self building and self modifying.

    throwaway314155 1 days

    It all runs on commands like imsg that Claude would be excellent at running given a suitable CLAUDE.md. Scheduled tasks are literally just cron, no problem for Claude.

  • Frannky 1 days

    I found ZeroClaw plus Hetzner to be a good option. I've been using it for a week, and it's stable and robust.

    Complex abilities unlocked calling a FastAPI server with one skill for each endpoint

    WA 1 days

    FastAPI server on the same Hetzner box? The endpoints are written by ZeroClaw?

    Frannky 17 hours

    I use another vps for FastAPI. I assume the vps with Zeroclaw will become compromised.

    I don't use other people's skills. Just use Claude Code for adding endpoints and the relative skill.

    Communication via Telegram bot, calendar via synced CalDAV server. Emails not connected; I don't need it personally.

    LLM API is OpenCode Go Minimax. $10/month capped, I have never hit the limits.

  • Tharre 1 days

    I don't get it. The point of OpenClaw is it's supposed to be an assistant, helping you with whatever random tasks you happen to have, in natural language. But for that to work, it has to have access to your personal data, your calendar, your emails, your credit card, etc., no?

    Are there other tasks that people commonly want to run, that don't require this, that I'm not aware of? If so I'd love to hear about them.

    The ClawBert thing makes a lot more sense to me, but implementing this with just a Claude Code instance again seems like a really easy way to get pwned. Without a human in the loop and heavy sandboxing, a agent can just get prompt injected by some user-controlled log or database entry and leak your entire database and whatever else it has access to.

    lifis 1 days

    You can solve that by requiring confirmation for anything except reading information from trusted sites. Web visits can be done without confirmation by reading a cached copy and not executing any JavaScript on it with network access (otherwise visiting arbitrary sites can leak information via the URLs sent to arbitrary servers)

    jascha_eng 1 days

    Yes and even now if you tell the LLM any private information inside the sandbox it can now leak that if it gets misdirected/prompt injected.

    So there isn't really a way to avoid this trade-off you can either have a useless agent with no info and no access. Or a useful agent that then is incredibly risky to use as it might go rogue any moment.

    Sure you can slightly choose where on the scale you want to be but any usefulness inherently means it's also risky if you run LLMs async without supervision.

    The only absolutely safe way to give access and info to an agent is with manual approvals for anything it does. Which gives you review fatigue in minutes.

    robthompson2018 1 days

    I don't follow your argument about getting pwned.

    A user could leave malicious instructions in their instance, but Clawbert only has access to that user's info in the database, so you only pwned yourself.

    A user could leave malicious instructions in someone else's instance and then rely on Clawbert to execute them. But Clawbert seems like a worse attack vector than just getting OpenClaw itself to execute the malicious instructions. OpenClaw already has root access.

    Re other use cases that don't rely on personal data: we have users doing research and sending reports from an AgentMail account to the personal account, maintaining sandboxing. Another user set up this diving conditions website, which requires no personal data: https://www.diveprosd.com/

    Tharre 1 days

    > But Clawbert seems like a worse attack vector than just getting OpenClaw itself to execute the malicious instructions. OpenClaw already has root access.

    Well the assumption was that you could secure OpenClaw or at least limit the damage it can do. I was also thinking more about the general usecase of a AI SRE, so not necessarily tied to OpenClaw, but for general self hosting. But yeah probably doesn't make much of a different in your case then.

  • Mooshux 1 days

    [dead]

    otterley 1 days

    The DNS record for apistronghold.com doesn't resolve for me. (NXDOMAIN)

    Mooshux 1 days

    Guess I don't have the naked domain set up yet. Ill fix that up. You should be able to go to www.apistronghold.com

  • orsorna 1 days

    Does the claw in the VM have proven capability (verified by your team) to track changes it makes to itself and persist across reboots? What about rollback capability?

    baileywickham 1 days

    We allow you to backup to a private Github repo you own so if you want to version control your setup that way you can. Otherwise most changes are tracked in the chat history and the LLM has some ability to repair itself or validate changes before they are made.

    0x008 1 days

    Why not use something like Temporal to recover state?

    baileywickham 1 days

    OpenClaw doesn't play well with SDKs like that. It expects to be able to run on a full machine (or container), to execute commands, to write files to disk. If we wanted we could fork and run something like this but we want to stay as close to the OSS as possible.

  • scosman 1 days

    What's the best "docker with openclaw" currently available? I have my own computers to run it on (I don't need a server). I want to play around, but containerized to avoid the security risk of MacOS app.

    There seem to be about 20 options, and new ones every day. Any consensus on the best few are, and their tradeoffs?

    clawguy 1 days

    I'm working on KubeClaw: https://kubeclaw.ai - a bit more sophisticated then all the open source cloud native implementations I found in my research.

    scosman 16 hours

    update: I did a standard openclaw install in docker and it works great.

    Their docs are confusing. It read like the gateway is in docker, and you'll need a connected computer. However the gateway can run agents/web_search/etc. The tools you'd expect to work in a CLI environment. Even headless browsers.

    Docs: https://docs.openclaw.ai/install/docker

    raizer88 1 days

    I am still searching for a compose up -d to this day, but without success. And the other poster want me to create a k8s cluster for a bot?!?!

    scosman 1 days

    right? From what I can tell it really needs MacOS, so alts are really parallel implementations (nanoClaw, etc).

    stavros 1 days

    Try mine:

    https://github.com/skorokithakis/stavrobot

    It does indeed only need compose up -d.

    raizer88 1 days

    Does it support local llm, on openai standard, and out of the box web browsing?

    stavros 23 hours

    What's web browsing? Yes on the rest.

    raizer88 23 hours

    Web browsing as give the bot access to the web with an headless chrome install.

    stavros 23 hours

    Oh, no, you'd need a plugin for that. Out of the box it has curl and web searches.

  • ndnichols 1 days

    This sounds awesome and exactly like the easy and safe on-ramp to OpenClaw that I've been looking for! I want to believe.

    Two questions as a potential user who knows the gist of OpenClaw but has been afraid to try it: 1. I don't understand how the two consumption credits play into the total cost of ownership. E.g. how long will $20 of Orthogonal credits last me? I have no idea what it will actually cost to use Klaus/OpenClaw for a month. 2. Batteries included sounds great, but what are those batteries? I've never heard of Apollo or Hunter.io so I don't know the value of them being included.

    In general, a lot of your copy sounds like it's written for people already deep into OpenClaw. Since you're not targeting those folks, I would steer more towards e.g. articulating use cases that work ootb and a TCO estimate for less technical folks. Good luck, and I'm eager to try it!

    somewhatrandom9 1 days

    You may want to also look into AWS's OpenClaw offering (I was surprised to see this): https://aws.amazon.com/blogs/aws/introducing-openclaw-on-ama...

    xienze 1 days

    > safe on-ramp to OpenClaw

    IMO I don't think the "OpenClaw has root access to your machine" angle is the thing you should worry that much about. You can put your OpenClaw on a VM, behind a firewall and three VPNs but if it's got your Google, AWS, GitHub, etc. credentials you've still got a lot to worry about. And honestly, I think malicious actors are much more interested in those credentials than wiping out your machine.

    I'm honestly kind of surprised everyone neglects to think about that aspect and is instead more concerned with "what if it can delete my files."

    baileywickham 1 days

    I think I agree here but for us it's more of a defense in depth thing. If you want to give it access to your email you are opening yourself up to attacks, but it doesn't have that access by default. We have an integration to give the agent it's own inbox instead of requiring access to your gmail for this reason. Similarly, if you want to only use Klaus for coding there is no risk to your personal data, even if your Klaus instance is hacked.

    necrodome 1 days

    Because no one has a reliable solution to that problem. The file deletion angle is easier to advertise. "runs in a sandbox, can't touch your system" fits on a landing page, even if it's not the more important problem.

    robthompson2018 1 days

    Our average user spends $50 a month all-in (tokens and subscription). If you're budget conscious you can use a cheap model (eg Gemini Flash) or even a free one. I confess I am a snob and only use Claude Opus, but even using OpenClaw all day every day I only spend about $500 a month on tokens.

    Orthogonal credits are used more frequently by power users. For everyday tasks they'll last a very long time, I don't think any of our users have run out.

    Some example Orthogonal user cases:

    * customers in sales uses Apollo to get contact info for leads

    * I use Exa search to help me prepare for calls by getting background info on customers and businesses

    * I used SearchAPI to help find AirBnbs.

    Point taken on the copy! We made this writing more technical for the HackerNews audience and try to use less jargon on other platforms.

    iJohnDoe 1 days

    Thanks for giving real-world examples of your usage.

    Do you think it’s worth $500 a month? Also, maybe tough to answer, does it seem like the token usage ($500 a month) would be equivalent if you did the same things using Claude or GPT directly?

    My reason for asking is because I tried OpenClaw and a quick one-line test question used 10,000 tokens. I immediately deleted the whole thing.

    _joel 1 days

    Your average user spends £50 a month? How long have you been running, just wondering since OpenClaw was only released (as openclaw) a month ago.

    robthompson2018 1 days

    We have been live since Feb 7.

    Maybe $50 a month is an underestimate because our average user has been live for less than a month.

    TheDong 1 days

    The cost of ownership for an OpenClaw, and how many credits you'll use, is really hard to estimate since it depends so wildly on what you do.

    I can give you an openclaw instruction that will burn over $20k worth of credits in a matter of hours.

    You could also not talk to your claw at all for the entire month, setup no crons / reoccurring activities / webhooks / etc, and get a bill of under $1 for token usage.

    My usage of OpenClaw ends up costing on the order of $200/mo in tokens with the claude code max plan (which you're technically not allowed to use with OpenClaw anymore), or over $2000 if I were using API credits I think (which Klause is I believe, based on their FAQ mentioning OpenRouter).

    So yeah, what I consider fairly light and normal usage of OpenClaw can quite easily hit $2000/mo, but it's also very possible to hit only $5/mo.

    Most of my tokens are eaten up by having it write small pieces of code, and doing a good amount of web browser orchestration. I've had 2 sentence prompts that result in it spinning up subagents to browse and summarize thousands of webpages, which really eats a lot of tokens.

    I've also given my OpenClaw access to its own AWS account, and it's capable of spinning up lambdas, ec2 instances, writing to s3, etc, and so it also right now has an AWS bill of around $100/mo (which I only expect to go up).

    I haven't given it access to my credit card directly yet, so it hasn't managed to buy gift cards for any of the friendly nigerian princes that email it to chat, but I assume that's only a matter of time.

    giancarlostoro 1 days

    Just have to know... What the heck are you building?

    multidude 19 hours

    The model choice matters a lot for cost. I've been running a production NLP pipeline on OpenClaw using Claude Haiku exclusively — it's roughly 25x cheaper than Opus for inference tasks where you don't need the full reasoning power. For most "read this text, classify it" tasks Haiku is more than sufficient.

    The hard part for a new user who knows about VMs isn't the VM setup — it's knowing which model to reach for. Opus for complex reasoning, Sonnet for balanced tasks, Haiku for high-volume classification or anything where you're calling the API repeatedly in a loop. Getting that wrong is where bills explode.

    A sensible default for a hosted product like Klaus would be Sonnet with Haiku available for bulk operations. Opus should require an explicit opt-in with a cost warning.

    grim_io 1 days

    Absolute madman :)

    Giving an agent access to AWS is effectively giving it your credit card.

    At the max, I would give it ssh access to a Hetzner VM with its own user, capable of running rootles podman containers.

    TheDong 1 days

    I am using an AWS Organization managed sub-account, so it's all pretty self-contained to that one account, and I can easily enough terminate that single sub-account.

    There's infamously no way to set a max bill amount for an account in AWS, so it indeed has unlimited spending, but I'm okay with a couple hundred bucks a month.

    > Hetzner VM with its own user, capable of running rootles podman containers

    Why not give it root on the full VM, and not use the VM for anything else? Giving it a user, and presumably also running your own stuff as a different user, sounds like a very weak security boundary to me compared to giving it a dedicated machine.

    If you're not doing multi-tenancy, there's no reason to not give it root, and if you are doing multi-tenancy, then your security boundary is worse than mine is, so you can't call me a madman for it.

    haolez 1 days

    Not at all. AWS IAM policy is a complex maze, but incredibly powerful. It solves this exact problem very well.

    wiether 1 days

    Do you honestly believe that they made the effort of setting the appropriate roles and policies, though?

    baq 22 hours

    you tell your clanker to do it obviously

    jimbob45 1 days

    Would having a locally-hosted model offset any of these costs?

    robthompson2018 1 days

    Our starter plan gives you a machine with 2GB of RAM. You will not be able to run a local LLM. OpenRouter has free models (eg Z.ai: GLM 4.5 Air), I recommend those.

    TheDong 1 days

    Generally the benefit you get out of claws involves untrusted input, i.e. it using the browser tool to scrape websites, etc.

    Claude 4.6 is at least a bit resilient to prompt injection, but local models are much worse at that, so using a local model massively increases your chance of getting pwned via a prompt injection, in my estimation.

    You're kinda forced to use one of the better proprietary models imo, unless you've constrained your claw usage down to a small trusted subset of inputs.

    kennywinker 1 days

    Yes, but that comes at the cost of using a dumber llm. The state of the art ones are only available via commercial api, and the best self-hostable models require $10,000+ gpus.

    This is a problem for coding as smarter really has an impact there, but there are so so so many tasks that an 8b model that runs on a $200 gpu can handle nicely. Scrape this page and dump json? Yeah that’s gonna be fine.

    This is my conclusion based on a week or so of using ollama + qwen3.5:3b self hosted on a ~10 year old dell optiplex with only the built-in gpu. You don’t need state of the art to do simple tasks.

    gbro3n 1 days

    I saw that the Hetzner matrix like has GPU servers < £300 per month (plus set up fee). I haven't tried it but I think if I was getting up to that sort of spend I'd be setting up Ollama on one of those with a larger Qwen3 max model (which I hear is on par with Opus 4.5?? - I haven't been able to try Qwen yet though so that could be b*****ks).

    hhh 1 days

    I have tried most of the major open source models now and they all feel okay, but i’d prefer Sonnet or something any day over them. Not even close in capability for general tasks in my experience.

    TheDong 1 days

    > Scrape this page and dump json? Yeah that’s gonna be fine.

    Only gonna be fine on a trusted page, an 8b model can be prompt injected incredibly trivially compared to larger ones.

    kennywinker 1 days

    Relying on the model to protect you seems like a bad idea…

    TheDong 1 days

    I mean, clawbots are inherently insecure. Using a better model is defense in depth.

    Obviously you should also take precautions, like never instructing it to invoke the browser tool on untrusted sites, avoiding feeding it untrusted inputs where possible in other places, giving it dedicated and locked-down credentials where possible....

    But yeah, at this point it's inherent to LLMs that we cannot do something like SQL prepared statements where "tainted" strings are isolated. There is no perfect solution, but using the best model we can is at least a good precaution to stack on top of all our other half-measures.