During the session, Emad shared plans and perspectives on a wide range of topics, an invaluable contribution for anybody tracking the evolution of the generative AI market. If you are a venture capital firm (and somehow you didn’t pay attention to this yet), an industry analysis firm, an end-user organization, a research team, an artist, or simply an artificial intelligence enthusiast, you might want to review this content.
The recording of that session has been released by the Stability AI staff and it’s available here.
I was interested in testing the accuracy and performance of a new multi-language speech recognition AI model released by OpenAI called Whisper. Hence, I used Whisper to produce the transcription of the Q&A session below.
I tested Whisper on an iMac 2021 with M1 CPU and 16GB of RAM. On this system, the full transcription of the 1h and 18 minutes recording took 18 hours to complete.
Warning
I did not edit the content in any way other than paragraph formatting to show the accuracy of the transcription, and highlighting the questions.
While the accuracy of the transcription is truly remarkable, you should always verify the recording before sharing any of the content below. During the recording, on at least a couple of occasions, Emad spoke too far from the microphone, resulting in his voice being hard to understand even for a human listener. This and/or flaws in the Whisper model might have produced inaccurate transcribed sentences. A missing “not” makes a huge difference.
Use this content responsibly.
1.5 isn’t that big an improvement over 1.4, but it’s still an improvement.
And as we go into version 3 and the image and models that are trailing away now, which is like we have a 4.3 billion parameter one and others, we’re considering what is the best data for that, what’s the best system for that, to avoid extreme edge cases. Because there’s always people who want to spoil the party.
This has caused the developers themselves, and again, kind of, I haven’t done a big push here, it has been from the developers, to ask for a bit more time to consult and come up with a proper roadmap for releasing this particular class of model.
They will be released for research and other purposes, and again, I don’t think the license is going to change from the OpenRailM license.
It’s just that they want to make sure that all the boxes are ticked rather than rushing them out, given, you know, some of these edge cases of danger here.
The other part is the movement of the repository and the taking over from Conviz, which is an academic research lab, again, who had full independence, relatively speaking, over the creation of the decisions around the model, to Stability AI itself.
Now this may seem like just hitting a fork button, but, you know, we’ve taken in legal counsel and a whole bunch of other things, just making sure that we are doing the right thing and are fully protected around releasing some of these models in this way. I believe that that process is nearly complete.
It will certainly cost us a lot of money, but, you know, it will either be ourselves or an independent charity maintaining that particular repository and releasing more of these generative models.
Stability itself and, again, kind of our associated entities have been releasing over half a dozen models in the last weeks, so a model a week, effectively. And in the next couple of days, we will be making three releases, so the Discord bot will be open sourced.
There is a diffusion-based upscaler that is really quite snazzy that will be released as well.
And then finally, there will be new decoder architecture that Rivers Have Wings has been working on for better human faces and other elements trained on the aesthetic and humans thing.
The core models themselves are still a little bit longer while we sort out some of these edge cases. But once that’s in place, hopefully we should be able to release them as fast as our other models, such as, for example, the open clip model that we released.
And there will be our clip guidance instructions released soon that will enable you to have mid-journey level results utilizing those two. Which took 1.2 million A100 hours, so like almost eight times as much as stable diffusion itself.
Similarly, we released our language models and other things, and those are pretty straightforward
if they are MIT.
It’s just, again, this particular class of models needs to be released properly and responsibly, otherwise it’s going to get very messy.
Some of you will have seen a kind of Congresswoman issue coming out and directly attacking us and asking us to be classified as dual-use technology and be banned by the NSA. There is European Parliament actions and others because they just think the technology is too powerful. We are working hard to avoid that, and again, we’ll continue from there.
Okay, next question.
Oh, wait. You’ve been pinning questions. Thank you, mods.
Okay.
The next question was interested in hearing SD’s views on artistic freedom versus sensitive models. So that’s Kerwin.
Yeah, so my view is basically if it’s legal, then it should be allowed. If it’s illegal, then we should at least take some steps to try and adjust things around that.
Now, that’s obviously a very complicated thing because legal is different in a lot of different countries, but there are certain things that you can look up the law that it’s illegal to create anywhere.
I’m in favor of more permissiveness and leaving it up to localized ethics and morality because the reality is that that varies dramatically across many areas, and I think it’s our place to kind of police that.
Similarly, as you’ve seen with Dreambooth and all these other extensions on stable diffusion, these models are actually quite easy to train. So if something’s not in the dataset, you can train it back in.
If it doesn’t fit in with the legal area of where we ourselves release from. So I think, again, what’s legal is legal, ethical varies, et cetera.
The main thing that we want to try and do is that the model produces what you want it to produce. I think that’s an important thing.
I think you guys saw at the start before we had all the filters in place that stable diffusion trained on the snapshot of the internet, as it was, it’s just whenever you type in a woman, it had kind of toplessness for a lot of any type of artistic thing because there’s a lot of topless women in art, even though art is less than like 0.5% of the dataset.
That’s not what people wanted, and again, we’re trying to make it so that it produces what you want as long as it is legal.
I think that’s probably the core thing here.
Sirius asks, any update on the updated credit pricing model that was mentioned a couple of hours ago? As in, is it getting much cheaper?
Yes, next week there will be a credit pricing adjustment from our side. There have been lots of innovations around inference and a whole bunch of other things, and the team has been testing it in staging and hosting.
You’ve seen this as well in the Diffusers library and other things, Facebook recently came out with some really interesting fast attention kind of elements, and we’ll be passing on all of those savings.
The way that it’ll probably be is that credits will remain as is, but you will be able to do a lot more with your credits as opposed to the credits being changed in price because I don’t think that’s fair to anyone if we change the price of the credits.
Can we get an official statement on why automatic was banned and why Novel AI used his code?
The official statement is as follows.
I don’t particularly like discussing individual user bans and things like that, but this was escalated to me because it’s a very special case, and it comes at a time, again, of increased notice on the community and all of these other things.
We’ve been working very hard around this.
Automatic created a wonderful web UI that increased the accessibility of stable diffusion to a lot of different people. You can see that by the styles and other things.
It’s not open source. I don’t believe there is a copyright on it, but still, again, worked super hard.
A lot of people helped out with that, and it was great to see.
However, we do have a very particular stance on community as to what’s acceptable and what’s not.
I think it’s important to first take a step back and understand what stability is and what stable diffusion is and what this community is.
Stability AI is a company that’s trying to do good. We don’t have profit as our main thing. We are completely independent.
It does come a lot from me and me trying to do my best as I try to figure out governance structures to fit things, but I do listen to the devs. I do listen to my team members and other things.
Obviously, we have a profit model and all of that, but to be honest, we don’t really care about making revenue at the moment because it’s more about the deep tech that we do.
We don’t just do image. We do protein folding. We release language models, code models, the whole gamut of things.
In fact, we are the only multimodal AI company other than OpenAI, and we release just about everything with the exception of generative models until we figure out the processes for doing that MIT open sourced.
What does that mean? It means that literally everything is open sourced.
Against that, we come under attack. Our model weights when we released it for academia were leaked.
We collaborate with a lot of entities, so novel AI is one of them. Their engineers have hit with various code-based things, and I think we’ve helped as well. They are very talented engineers, and you’ll see they just released a list of all the things that they did to improve stable diffusion because they were actually going to open source it very soon.
I believe it was next week before the code was stolen from their system.
We have a very strict no-support policy for stolen code because this is a very sensitive area for us.
We do not have a commercial partnership with novel AI. We do not pay them. They do not pay us.
They’re just members of the community like any other, but when you see these things like if someone stole our code and released it and it was dangerous, I wouldn’t find that right.
If someone stole their code, if someone stole other codes, I don’t believe that’s right either in terms of releasing.
Now in this particular case, what happened is that the community member in person was contacted and there was a conversation made. He made some messages public. Other messages were not made public.
I looked at all the facts, I decided that this was a bannable offense on the community.
I’m not a stupid person. I am technical. I do understand a lot of things and I put everyone there to kind of make this as a clear point.
Stable diffusion community itself is one of community of stability AI and it’s one community of stable diffusion.
Stable diffusion is a model that’s available to the whole world and you can build your own communities and take this in a million different ways.
It is not healthy if stability AI is at the center of everything that we do and that’s not what we’re trying to create.
We’re trying to create a multiplicity of different areas that you can discuss and take things forward and communities that you feel you yourself are a stable part of.
Now this particular one is regulated and it is not a free for all. It does have specific rules and there are specific things within it.
Again, it doesn’t mean that you can’t go elsewhere to have these discussions. We didn’t take it down off GitHub or things like that, we leave it up to them. But the manner in which this was done and there are other things that aren’t made public, I did not feel it was appropriate and so I approved the banning and the buck stops with me there.
If the individual in question wants to be unbanned and rejoin the community, there is a process for appealing bans.
We have not received anything on that side and I’d be willing to hear other stuff if maybe I didn’t have the full picture. But as it is, that’s where it stands.
And again, like I said, we cannot support any illegal theft as direct theft in there.
With regards to the specific code point, you can ask AAI themselves what happened there. I believe that there was AGPL code copied over and then they rescinded it as soon as it was notified and they apologized.
That did not happen in this case.
And again, we cannot support any leaked models and we cannot support that because again, the safety issues around this and the fact that if you start using leaked and stolen code, there are some very dangerous liability concerns that we wish to protect the community from.
So we cannot support that particular code base at the moment and we can’t support that individual being a member of the community. Also I would like to say that a lot of insulting things were said and we let it slide this
once.
Don’t be mean, man. Just talk responsibly.
Again, we’re happy to have considered and thought out discussions offline and online. If you do start insulting other members, then please flag it to moderators and there will be timeouts and bans because again, what is this community meant to be?
It’s meant to be quite a broad but core and stable community that is our private community as StabilityAI.
But like I said, the beauty of open source is that if this is not a community you’re comfortable with, you can go to other communities, you can set up your own communities. You can set up your own notebooks and others.
In fact, when you look at it, just about every single web UI has a member of Stability contributing.
From Pharma Psychotic at D Forum through to Dango on Majesty through to Gendamu at Disco, we have been trying to push open source front ends with no real expectations of our own because we believe in the ability for people to remix and build their own communities around that.
Stability has no presence in these other communities because those are not our communities. This one is.
So again, like I said, if Automatic does want to have a discussion, my inbox is open. And if anyone feels that they’re unjustly timed out or banned, they can appeal them. Again, there is a process for that.
That hasn’t happened in this case and again, it’s a call that I made looking at some publicly available information and some other publicly available information and I wish them all the best.
I think that’s it.
Will Stability provide fund a model to create new medicines?
We’re currently working on DNA diffusion that will be announced next week for some of the DNA expression things in our OpenBioML community. Feel free to join that. It’s about two and a half thousand members. And currently, I believe it’s been announced LibraFold with Sergei Ovchinnikov’s lab at Harvard and UCL.
So that’s probably going to be the most advanced protein folding model in the world, more advanced than AlphaFold.
It’s just currently undergoing ablations.
Developing of medicines and discovery of new medicines is something that’s very close to my heart.
And many of you may know that basically the origins of Stability were leading and architecting and running the United Nations AI initiative against COVID-19.
So I was the lead architect of that to try and get a lot of this knowledge coordinated around that. So we made all the code research in the world free and then helped organize it with the backing of the UNESCO World Bank and others.
So that’s one of the genesis is alongside education.
For myself as well, if you’ve listened to some of my podcasts, I quit being a hedge fund manager of five years to work on repurposing drugs for my son doing AI based lit review and repurposing of drugs through neurotransmitter analysis. So taking things like Clonazepam and others to treat the symptoms of ASD.
The papers around that will be published and we have several initiatives in that area.
Again, to try and just catalyze it going forward, because that’s all we are, we’re a catalyst. Community should take up what we do and run forward with that.
Okay.
RM, RF, we’re removing everything.
Do you think the new AI models push us closer to a post-copyright world?
I don’t know. I think that’s a very good question. It might.
To be honest, no one knows what the copyright is around some of these things.
Like at what point does free use stop and start and derivative works? It hasn’t been tested. It will be tested.
I’m pretty sure there will be all sorts of lawsuits and other things soon. Again, that’s something we’re preparing for.
But I think one of the first AI pieces of art was recently granted a copyright.
Any way to create anything is an interesting one as well, because again, it makes content more valuable. So in an abundant scarcity is there.
But I’m not exactly sure how this will play out.
I do think you’ll be able to create anything you want for yourselves. It just becomes, what happens when you put that into a social context and start selling that?
It goes down to the personal agency side of the models that we build as well. You’re responsible for the inputs and the outputs that result from that. This is where I think copyright law will be tested the most, because people usually did
not have the means of creation. Whereas now you have literally the means of creation.
Trextel asks, prompt engineering may well become an elective class in schools over the next decade. With extremely fast-paced development, what do you foresee as the biggest barriers of entries? Why introduce reluctancy to adoption, death of the concept artist, and the dangers outweighing the benefits?
Well, the interesting thing here is that a large part of life is the ability to prompt. Prompting humans is the key thing. My wife tries to prompt me all the time and she’s not very successful, but she’s been working on it for 16 years.
I think that a lot of the technologies that you’re seeing right now from AI, because it understands these latent spaces or hidden meanings, it also includes the hidden meanings in prompts.
I think what you see is you have these generalized models like stable diffusion and stable video fusion and dance diffusion and all these other things. It pushes intelligence to the edge, but what you’ve done is you compressed 100,000 gigabytes of images into a two-gigabyte file of knowledge that understands all those contextualities.
The next step is adapting that to your local context. That’s what you guys do when you use Dreambooth or when you do textual inversion. You’re injecting a bit yourself into that model so it understands your prompts better.
I think a combination of multiple models doing that will mean that prompt engineering isn’t really the thing. It’s just understanding how to chain these tools together for more context-specific stuff.
This is why we’re partnered with Example for Repl.it so that people can build dynamic systems and we’ve got some very interesting things on the way there. I think the barriers to entry will drop dramatically.
Do you really need a class on that? For the next few years, yeah, but then soon it will not require that.
Ammonite says, how long does it usually take to train?
That’s a piece of string, it depends.
We have models, so stable diffusion is 150,000 A100 hours and A100 hours is about $4 on Amazon, which you need for the interconnect.
Open clip was 1.2 million hours. That’s literally hours of compute.
For stable diffusion, can someone in the chat do this? It’s 256 A100s over 150,000 hours, so divide one by the other.
What’s the number? Who can get it quick? Quickest? No one? Oh, man, you guys kind of calculate slow.
24 days, says Ninjaside. There we go. That’s about how long it took to train the model.
To do the tests and other stuff, it took a lot longer.
And the bigger models, again, it depends because it doesn’t really need any scale. So it’s not that you chuck 512 and it’s more efficient. It is really a lot of the heavy lifting is done by the supercomputer.
So what happens is that we’re doing all this work up front, and then we release the model to everyone.
And then, as Joe said, Dreambooth takes about 15 minutes on an A100 to then fine tune. Because all the work of those years of knowledge, the thousands of gigabytes, are all done for you. And that’s why you can take it and extend it and kind of do what you want with it.
That’s the beauty of this model over the old school internet, which was always computing all the time. We can push intelligence to the edges.
All right.
So Mr. John Fingers is asking, how close do you feel you might be able to show a full motion video model like Google or Meta showed off recently?
We’ll have it by the end of the year. But better.
Reflyn Wolf asks, when do you think we will be able to talk to an AI about the image? Like can he fix his nose a little bit or make her hair longer and stuff like that?
To be honest, I’m kind of disappointed in the community’s not built that yet. It’s not complicated. All you have to do is whack Whisper on the front end. Thank you, OpenAI. You know, obviously, you know, that was a great benefit. And then have that input into StyleClip or kind of FitBase thing.
So if you look up, Max Wolf has this wonderful thing on StyleClip that you can see how to create various scary Zuckerbergs as if he wasn’t scary himself. And so I’m putting that into the pipeline basically allows you to do what it says there with a bit of targeting.
So there’s some StyleClip right there in the stage chat.
And again, with the new clip models that we have and a bunch of the other bit models that Google have released recently, you should be able to do that literally now when you combine that with Whisper.
All right.
And Ivy Dory, how do you feel about the use of generative technology being used by surveillance capitalists to further profit-aimed goals? What can StabilityAI do about this?
The only thing we can really do is offer alternatives. Like do you really want to be in a meta, what do they call it, horizonverse where you’ve got no legs or genitals? Not really, you know, like legs are good, genitals good. And so by providing open alternatives, we can basically out-compete the rest.
Like look at the amount of innovation that’s happened on the back of stable diffusion.
And again, you know, acknowledge our place in that. We don’t police it. We don’t control it. You know, like people can take it and extend it. If you want to use our services, great. If you don’t, it’s fine.
We’re creating a brand new ecosystem that will out-compete the legacy guys because thousands, millions of people will be building and developing on this.
Like we are sponsoring the faster AI course on stable diffusion so that anyone who’s a
developer can rapidly learn to be a stable diffusion developer. And you know, this isn’t just kind of interfaces and things like that, it’s actually you’ll be able to build your own models.
And how crazy is that? Let’s make it accessible to everyone.
And again, that’s why we’re working with gradios and others on that.
All right, what have we got?
David, how realistic do you think dynamically creating realistic 3D content with enough fidelity in a VR setting would be? And what would you say the timeline on something like that is?
You know, unless you’re Elon Musk, self-driving cars have always been five years away. Always, always, always.
And you know, $100 billion has been spent on self-driving cars and the research. And it’s to me, it’s not that much closer.
The dream of photorealistic VR though is very different with generative AI.
Like again, look at the 24 frames per second image and video. Look at the, pardon, look at the long fanarchy video as well, and then consider Unreal Engine 5.
What’s Unreal Engine 6 going to look like? Well, it’ll be photorealistic, right? And it’ll be powered by Nerf technology, the same as Apple is pioneering for use on the neural engine chips that make up 16.8% of your MacBook M1 GPU.
It’s going to come within four to five years, fully high res, 2K in each eye resolution VR, even 4K or 8K actually.
It just needs an M2 chip with the specialist transformer architecture in there. And that will be available to a lot, a lot of people.
But then like I said, Unreal Engine 6 will also be out in about four or five years. And so that will also up the ante.
There’s a lot of amazing compression and customized stuff you can do around this.
And so I think it’s just going to be insane when you can create entire worlds. And hopefully it’ll be built on the type of architectures that we help catalyze, whether it’s built by ourselves or others.
So we have a metric shit ton, I believe is the appropriate term of partnerships that we’ll be announcing over the next few months, where we’re converting closed source AI companies into open source AI companies, because, you know, it’s better to work together.
And again, we shouldn’t be at the center of all this with everything laying on our shoulders
but it should be a team work initiative, because this is cool technology that will help a lot
of people.
All right.
What guarantees, this is BitFortress 2, what guarantees does the community have that stability
AI won’t go down the same path as open AI? That one day you won’t develop a good enough model, you decide to close things after benefiting from all the work of the community and the visibility generated by it?
That’s a good question. I mean, it kind of sucks what happened with open AI, right?
You can say it’s safety, you can say it’s commercials, like whatever.
The R&D team and the developers have in their contracts, except for one person that we need to send it to, that they can release any model that they work on open source. So legally, we can’t stop them.
Well, I think that’s a pretty good kind of thing. I don’t think there’s any AI company in the world that does that.
And again, if you look at it, the only thing that we haven’t instantly released is this particular class of generative models, because it’s not straightforward. And because you have freaking congresswomen petitioning to ban us by the NSA, and a lot more stuff behind that.
Look, we’re going to get B Corp status soon, which puts in our official documents that we are mission focused, not profit focused. But at the same time, I’m going to build $100 billion company that helps a billion people.
We have some other things around governance that we’ll be introducing as well. But currently, the governance structure is simple, yet not ideal, which is that I personally have control of board, ordinary common, everything. And so a lot is resting on my shoulders, which are not sustainable.
As soon as we figure that out and how to maintain the independence and how to maintain it so that we are dedicated to open, which I think is a superior business model, a lot of people agree with, we’ll implement that post-haste.
Any suggestions, please do send them our way.
But like I said, one core thing is if we stop being open source and go down the open AI route, there’s nothing we can do to stop the developers from releasing the code. And without developers, what are we?
We’re a nice front end company that does a bit of model deployment, so it’d be killing ourselves.
Any plans for stability to tackle open source alternatives to AI code generators, like CodePilot
and AlphaGo?
Yeah, you can go over to kappa.ai and see our code generation model that’s training right now.
We released one of the FID-based language models that will be core to that, plus our Instruct framework, so that you can have the ideal complement to that.
So I think by Q1 of next year, we will have better code models than CodePilot.
And there’s some very interesting things in the works there, you just look at our partners
and other things. And again, they’ll be open source, available to everyone.
Sunbury, will support be added for training at sizes other than 512 by default?
Training? I suppose you meant inference.
There are things like that already.
If you look at the recently released novel AI improvements to stable diffusion, you’ll see there are details there as to how to implement arbitrary resolutions, similar to something like mid-journey.
I just posted it there.
The model itself, like I said, enables that, it’s just that the kind of code wasn’t there.
It was part of our expected upgrades, and again, different models are being trained at different sizes.
So we have a 768 model, a 512 model, et cetera, so a 1024 model, et cetera, coming in the pipeline.
I mean, again, I think that not many people have actually tried to train models yet. They’re probably just getting to grips with it, but you can train and extend this.
Again, view it as a base of knowledge onto which you can adjust a bunch of other stuff.
Crack ass, do you have any plans to improve the model in terms of face, limbs, and hand
generation? Is it possible to improve on specifics on this checkpoint?
Yep, 100%.
So I think in the next day or so, we’ll be releasing a new fine-tuned decoder that’s just a drop-in for any latent diffusion or stable diffusion model that is fine-tuned on the face Lyon dataset, and that makes better faces.
Then as well, you can train it on Hagrid, which is the hand dataset to create better hands, et cetera.
Some of this architecture is known as a VAE architecture for doing that, and again, that’s discussed a bit in the novel AI thing because they do have better hands, and again, this knowledge will proliferate around that.
What is the next question? There’s a lot of questions today.
Sawyer Partnership, they are a grant with Nat and Daniel. If you guys would support startups in case they aren’t selected by them, any way startups can connect with you folks to get mentorship or guidance.
We are building a grant program and more. It’s just that we’re currently hiring people to come and run it.
That’s the same as Bruce.Codes’ question.
In the next couple of weeks, there will be competitions and all sorts of grants announced to stimulate the growth of some essential parts of infrastructure in the community, and we’re going to try and get more community involvement in that so people who do great things for the community are appropriately awarded.
There’s a lot of work being done there.
Ivy Dory, is Stability Eye considering working on climate crisis via models in some way?
Yes, and this will be announced in November. I can’t announce it just yet. They want to do a big grant thing.
We’re doing that. We’re supporting several entities that are doing climate forecasting functions and working with a few governments on weather patterns using transformer-based technologies as well.
There’s that.
Okay. What else we got?
We have Reflyn Wolf. Which jobs do you think are most dangerous being taken by AI?
I don’t know, man. It’s a complex one.
I think that the probably most dangerous ones are call center workers and anything that involves human-to-human interaction.
I don’t know if you guys have tried character.ai. I don’t know if they’ve stopped it because you could create some questionable entities.
The… It’s very good. And it will just get better because I think you look at some of the voice models we have coming up.
You can basically do emotionally accurate voices and all sorts of stuff and voice-to-voice. So you won’t know it’s a call center worker, but that goes to a lot of different things.
I think that’s probably the first for disruption before anything else.
I don’t think that artists get disrupted that much, to be honest, by what’s going on here. Unless you’re a bad artist, in which case you can use this technology to become a great artist. And the great artist will become even greater.
So I think that’s probably my take on that.
Liquid right now has a question in two parts. What work is being done to improve the attention mechanism of stable diffusion to better handle and interpret composition while preserving artistic style? There are natural language limitations when it comes to interpreting physics from simple statements. Artistic style further deforms and challenges this kind of interpretation. Is stability.ai working on high-level compositional language for use of generative models?
The answer is yes.
This is why we spent millions of dollars releasing the new Clip. Clip is at the core of these models.
There’s a generative component, and there is a guidance component, and when you infuse the two together, you get models like they are right now.
The guidance component, we used Clip L, which was Clip Large, which was the first one, the largest one that OpenAI released. They had two more, H and G, which I believe are huge and gigantic.
We released H and the first version of G, which literally take like a million A100 hours to do, and that improves compositional quality so that as that gets integrated into a new version of stable diffusion, it will be at the level of Dali 2, just even with a small size.
There are some problems around this in that the model learns from both things. It learns from the stuff the generative thing is fine-tuned on and from the Clip models.
We’ve been spending a lot of time over the last few weeks, and there’s another reason for the delay, seeing what exactly does this thing know? Because even if an artist isn’t in our training data set, it somehow knows about it, and it
turns out it was Clip all along.
So we really wanted to output what we think it outputs and not output what it shouldn’t output, so we’ve been doing a lot of work around that.
Similarly what we found is that embedding pure language models like T5XXL and we tried UL2 and some of these other models, these are like pure language models like GPT-3 improves the understanding of these models, which is kind of crazy.
And so there’s some work being done around that for compositional accuracy, and again, you can look at the blog by novel AI, where they extended the context window so that it can accept three times the amount of input from this.
So your prompts get longer from, I think, like 74 to 225 or something like that.
And there are various things you can do once you do proper latest place exploration, which I think is probably another month away, to really hone down on this.
I think, again, a lot of these other interfaces from the ones that we support to others have already introduced negative prompting and all sorts of other stuff.
You should have kind of some vector-based initialization, et cetera, coming soon.
All right.
We’ve got Maeve, what are the technical limitations around recreating SD with a 1.0.2.4 data set rather than 5.1.2, and why not have varying resolutions for the data set? Is the new model going to be a ton bigger?
So version three right now is 1.4 billion parameters, we’ve got a 4.3 billion parameter imaging training and 900 million parameter imaging training.
We’ve got a lot of models training, we’re just waiting to get these things right before we can start releasing them one after the other.
The main limitation is the lack of 1.0.2.4 images in the training data set. Like Lion doesn’t have a lot of high-resolution images, and this is one of the things why what we’ve been working on the last few weeks is to basically negotiate and license amazing data sets that we can then put out to the world so that you can have much better models.
And we’re going to pay a crap load for that, but again, release it for free and open source to everyone. And I think that should do well.
This is also why the upscaler that you’re going to see is a two-times upscaler, that’s good.
Four-times upscaling is a bit difficult for us to do, like it’s still decent, because we’re just waiting on the licensing of those images.
All right. What’s next?
Any plans for creating a worthy open source alternative, something like AI Dungeon or Character AI?
Well, a lot of the Carper AI team’s work around abstract models and contrastive learning should enable Carper Character AI type systems on chatbots. And you know, from narrative construction to others, again, it will be ideal there.
The open source versions of Novel AI and AI Dungeon, I believe the leading one is Cobold AI, so you might want to check that out. I haven’t seen what the case has been with that recently.
All right, we’ve got Joe Rogan. When will we be able to create full-on movies with AI?
I don’t know, like five years, again? I’m just sticking that out there. If I was Elon Musk, I’d say one year.
I mean, it depends what you mean by a feature like movies. So like animated movies, when you combine stable diffusion with some of the language models and some of the code models, you should be able to create those.
Maybe not in UFO Table or Studio Bone style within two years, I’d say. But I’d say a five-year timeframe for being able to create those in high quality, like super high res, is reasonable, because that’s the time it will take to create these high res dynamic VR kind of things.
To create fully photorealistic proper people movies, I mean, you can look at EBSynth and some of these other kind of pathway analyses, it shouldn’t be that long, to be honest. It depends on how much budget and how quick you want to do it.
Real time is difficult. But you’re going to see some really amazing real time stuff in the next year, TouchWood.
After lining it up, it’s going to blow everyone’s socks away. That’s going to require a freaking supercomputer. But it’s not movie length, it’s something a bit different.
Query on water. Did you read the distillation of guided diffusion models paper? Do you have any thoughts on it? Like if it will improve things on consumer level hardware or just the high VRM data centers?
I mean, distillation and instruction in these models is awesome. And the step counts they have for kind of reaching cohesion are kind of crazy.
Those have wings have done a lot of work on a kind of DDPM fast solvent, but already reduced
the number of steps required to get to those stages.
And again, like, I keep telling everyone, once you start chaining these models together, you’re going to get down really sub one second and further.
Because I think you guys have seen image to image works so much better if you just even
give it a basic sketch, then text to image. Why don’t you change together different models, different modalities to kind of get there.
And I think it’ll be easier once we release our various model resolution sizes plus upscalers so you can dynamically switch between models.
If you look at the dream studio kind of teaser that I posted six weeks ago, that’s why we’ve got model chaining integrated right in there.
All right.
Rafflin Wolf. Who do you think should own the copyright of an image video made by an AI or do you think there shouldn’t be an owner?
I think that if it isn’t based on copyrighted content, it should be owned by the prompter of the AI if the AI is a public model and not owned by someone else. Otherwise, it is almost like a code creation type of thing. But I’m not a lawyer, and I think this will be tested severely very soon.
Question by Prue Prue. Updates on more paying methods for dream studio.
I think we’ll be introducing some alternate ones soon. The one that we won’t introduce is PayPal. No, no PayPal, because that’s just crazy what’s going on there.
Jason the Artist. With Stable Diffusion having been publicly released for over a month now and with the release of version 5 around the corner, what is the most impressive implementation you’ve seen someone create out of the application so far?
Actually, I really love the dream booth stuff. I mean, come on, that shit’s crazy.
You know, even though some of you fine-tuned me into kind of weird poses, I think it was
pretty good.
I don’t think we would get that level of quality. I thought we’d be at textual and version level quality.
Beyond that, I think that, you know, there’s been this well of creativity, like, you’re
starting to see some of the 3D stuff come out. And again, I don’t think we’d get quite there, even with the chaining.
I think that’s pretty darn impressive.
Okay, so what is next? Okay, so I’ve just been going through all of these chat things. Do-do-do-do-do-do.
Notepad, are there any areas of the industry that is currently overlooked that you’ll be
excited to see the effects of diffusion-based AI being used?
Again, like, I can’t get away from this PowerPoint thing. Like, it’s such a straightforward thing that causes so much real annoyance that I think we could kind of get it out there.
I think it just requires kind of a few fine-tuned models, plus a code model, plus a language
model to kind of kick it together.
I mean, diffusion is all about denoising, and information is about noise. So our brains filter out noise and denoise all the time. So these models can be used in a ridiculous number of scenarios.
Like I said, we’ve got a DNA diffusion model going on in open by mail, that’s shit crazy,
right?
But I think right now I really want to see some of these practical high-impact use cases, like the PowerPoint kind of thing.
All right.
We’ve got S1, S2, do you have any plans to release a speech since this model, like script
overdone voices?
Yes, we have a plan to release a speech-to-speech model soon, and some other ones around that.
I think Audio LM by Google was super interesting recently. For those who don’t know, that’s basically you give it a snippet of a voice or of music or something, it just extends it.
It’s kind of crazy.
But I think we get the arbitrary kind of length thing there, combined with some other models,
that could be really interesting.
All right, maybe Dory, do you have any thoughts on increasing the awareness of generative
models? Is this something you see as important? How long do you think until the mass-glow population becomes aware of these models?
I think I can’t keep up as it is, and I don’t want to die. But more realistically, we have a B2B2C model.
We’re partnering with the leading brands in the world and content creators to both get their content so we can build better open models, and to get this technology out to just everyone.
Similarly, on a country basis, we have country-level models coming out very soon.
On the language side of things, you can see we released Polyglot, which is the best Korean language model, for example, via Eleuther AI, and our support of them recently.
I think you will see a lot of models coming soon, and a lot of different kind of elements
around that.
Okey-dokey.
Error broken again, will we always be limited by the hardware cost to run AI? Do you expect something to change?
Yeah, I mean, this will run on the edge, it’ll run on your iPhone in a year. Stable diffusion will run on an iPhone in probably seconds, that level of quality. That’s again a bit crazy.
All right.
A zero-shin, well this is a long one. I’m unsure how to release licensed images based on SD output. Some suggest creative commons, zero is fine. Some say raw output, wording of license suggests reality. Oh, sorry, that’s just a really long question, my brain’s a bit fried.
Okey, so if someone takes a CCO out image and violates a license, then something can be done around that.
I always suggest that if you’re worried about some of this stuff, CCO licensing, and again I am not a lawyer, please consult with a lawyer, does not preclude copyright.
And there’s a transformational element that incorporates that.
If you look at artists like Necro13 and Kaiselva and others, you will see that their outputs usually aren’t one-shot, they are multi-sasic. And then that means that this becomes one part of that, a CCO licensed part, that’s
part of your process.
Like even if you use GFPGAN or upscaling or something like that, again I am not a lawyer, please consult with one, I think that should be sufficiently transformative that you can assert full copyright over the output of your work.
Kingping, is Stability Eye going to give commissions to artists?
We have some very exciting in-house artists coming online soon. Some very interesting ones, I’m afraid that’s all I can say right now. But yeah, we will have more art programs and things like that as part of our community engagement.
But right now it’s been a struggle even to keep Discord and other things going and growing the team.
Like we’re just over a hundred people now, God knows how many we actually need. I think we probably need to hire another hundred more.
All right, RMRF, a text-to-speech model too?
Yep.
I couldn’t release it just yet as my sister-in-law was running Senantic, but now that she’s been
absorbed by Spotify, we can release emotional text-to-speech. Not soon though, I think that we want to do some extra work around that.
I’m going to build that up.
All right, Anisham, is it possible to get vector images like an SVG file from stable diffusion or related systems?
Not at the moment.
You can actually do that with a language model, as you’ll find out probably in the next month. But right now I would say just use a converter, and that’s probably going to be the best way to do that.
All right, Rufflin Wolf, is there a place to find all stable AI-made models in one place?
No, there is not, because we are disorganized. We barely have a careers page up, and we’re not really keeping a track of everything. We are employing someone as an AI librarian to come and help coordinate the community
and some of these other things.
Again, there’s a one-stop shop there, but yeah.
Also there’s this collaborative thing where we’re involved in a lot of stuff. There’s a blurring line between what we need and what we don’t need. We’re just going to want to be the catalyst for all of this. I think the best models go viral anyway.
All right, Infinite Monkey, where do you see stability AI in five years?
Hopefully with someone else leading the damn thing so I can finish Elden Ring.
No, I mean, our aim is basically to build AI subsidiaries in every single country so
that there’s localized models for every country and race that are all open and to basically
be the biggest best company in the world that’s actually aligned with you rather than trying
to suck up your attention to serve you ads.
I really don’t like ads, honestly, unless they’re artistic, I like artistic ads.
The aim is to build a big company, to list, and to give it back to the people so ultimately
it’s all owned by the people.
For myself, my main aim is to ramp this up and spread as much profit as possible into
Imagine Worldwide, our education arm run by our co-founder, which currently is teaching
kids literacy and numeracy in refugee camps in 13 months and one hour a day.
We’ve just been given the remit to extend this and incorporate AI to teach tens of millions
of kids around the world that will be open source hosted at the UN.
One laptop per child, but really one AI per child. That’s one of my main focuses because I think I did a podcast about this.
A lot of people talk about human rights and ethics and morals and things like that. One of the frames I found really interesting from Vinay Gupta, who’s a bit of a crazy guy but a great thinker, was that we should think about human rights in terms of the rights of children because they don’t have any agency and they can’t control things.
What is their right to have a climate? What is their right to food and education and other things?
We should really provide for them and I’m going to use this technology to provide for
them so there’s literally no child left behind. They have access to all the tools and technology they need.
That’s why creativity was a core component of that and communication, education, and
healthcare.
Again, it’s not just us, all we are is the catalyst and it’s the community that comes
and helps and extends that.
As Zeroshin, my question was about whether I have to pass down the rare license limitations
when licensing SD-based images or I can release as good.
Yes, you don’t have to do rare license, you can release as is. It’s only if you are running the model or distributing the model to other people that you have to do that.
If you’d like to learn more about our education initiative, they’re at Magic Worldwide. It’s more on that soon as we scale up to tens of millions of kids.
We have, Chuck Still, as a composer and audio engineer myself, I cannot imagine AI will approach the emotional intricacies and depths of complexity found in music by what class of musicians? At least not anytime soon. That said, I’m interested in AI as a tool, would love to explore how it can be used to help in this production process.
Are we involved in this? Yes, we are.
I think someone just linked it to Harm and I play and we will be releasing a whole suite of tools soon to extend the capability of musicians and make more people into musicians.
It’s one of the interesting ones, these models, they pay attention to the important parts of any media.
There’s always this question about expressivity and humanity. They are trained on humanity and so they resonate and I think that’s something that you have to acknowledge.
Then it’s about aesthetics have been solved to a degree by this type of AI.
Music can be aesthetically pleasing, but aesthetics are not enough. If you are an artist, a musician or otherwise, I’d say a coder, it’s largely about narrative and story.
What does that look like around all of this? Things don’t exist in a vacuum. It can be a beautiful thing or a piece of music, but you remember it because you’re driving a car when you’re 18 with your best friends or it was at your wedding or something like that.
Context and story matters for music, for art, for other things as well like that.
All right. One second. I’m going to have a drink of tea.
We’ve got G.H.P. Kishore. Are you guys working on LLMs as well, something to compete with OpenAI GPT-3?
Yes.
We recently released from the CarpaLab the Instruct framework and we are training Chinchilla optimal models which outperform GPT-3 on a fraction of the parameters. They will get better and better and better and then as we create localized data sets and the education data sets, those are ideal for training foundation models at ridiculous power relative to the parameters.
I think that it will be pretty great to say the least as we kind of focus on that.
In fact, LuthorAI, which was the first community that we properly supported and a number of stability employees helped lead that community, the focus was GPT Neo and GPT-J which were the open source implementations of GPT-3 but on a smaller parameter scale which have been downloaded 25 million times by developers which I think is a lot more use than GPT-3
has got.
GPT-3 is fantastic or Instruct GPT which it really is. I think this Instruct model that took it down 100 times.
Again, if you’re technical, you can look at the Carpa community and you can see the framework around that.
All right. What is the next question here? I tapped the wrong thing. I’ve lost the questions. I have found them. Yes.
Gimmick from the FAQ. In the future for other models, we are building an opt-in and opt-out system for artists and others that will lead to use in partnerships leading organizations. This model has some principles. The outputs are not direct for any single piece. What initiatives are motioned with regards to this?
There will be announcements next week about this. And various entities that we’re bringing in place for that.
That’s all I can say because I’m not allowed to spoil announcements but we’ve been working super hard on this.
I think there’s two or maybe three announcements. It will be 17th and 18th will be the dates of those.
Aha! Through the questions, I think.
Okay. I think now go back to center stage. I do not know how there are no requests so I can’t do requests so are there any other questions from anyone?
Okay. As the mod team are not posting, I’m going to look in the chat.
Peter asks, when will stability and Luther be able to translate geist speech in real time?
I think the honking models are very complicated. Actually, it’s very interesting.
People have been using diffusion models to translate animal speech and understand it. If you look at something like Whisper, it might be in reach.
Whisper by OpenAI, they open sourced it kindly, I wonder what caused them to do that. It’s a fantastic speech to text model.
One of the interesting things about it is you can change the language you’re speaking in the middle of a sentence and it will still pick that up. If you train it enough, then you’ll be able to do that.
One of the entities we’re talking with wants to train based on whale song to understand whales. This sounds a bit like Star Trek, but that’s okay, I like Star Trek.
We’ll see how that goes.
Will Dream Studio be open sourced so it can be used on local GPUs?
I do not believe there’s any plans for that at the moment because Dream Studio is a ProSumerend kind of thing, but you’ll see more and more local GPU usage.
You’ve got Visions of Chaos at the moment on Windows machines by Softology, who’s fantastic, where you can run just about any of these notebooks like D4 and others, or HLKY or whatever.
I think that’s kind of a good step. Similarly, if you look at the work being done on the Photoshop plugin, it will have local inference in a week or two. You can use that directly from Photoshop and soon many other plugins.
What do you think of the situation where a Google engineer believed that AI Chatbot achieved sentience?
It did not. He was stupid.
Unless you have a very low bar of sentience, I suppose you could. Some people are barely sentient, it must be said, especially when they’re arguing on the internet. Never win an argument on the internet.
That’s another thing. Facts don’t really work on the internet. A lot of people have preconceived notions. Instead, you should try to be as open-minded as possible and let people agree to disagree.
Andy Cochran says, thoughts on getting seamless equirectangular 360-degree and 180-degree in HDR outputs in one shot for image-to-text and text-to-image?
I mean, you could use things like, I think I called it StreamFusion, which was DreamFusion’s
stable diffusion combined. There are a bunch of datasets that we’re working on to enable this kind of thing, especially
from GoPro and others. But I think it’ll probably be a year or two away still.
Funky McShot. Man, is there any plans for text-to-3D fusion models?
Yes, there are, and they are in the works.
Malcontender with some of the recent backlash from artists. Is there anything you wish that SD did differently in the earlier stages that would have changed the framing around image synthesis?
I don’t know, really. I mean, the point is that these things can be fine-tuned anyways. I think people have attacked fine-tuning. I mean, ultimately, it’s like, I understand the fear. This is threatening to their jobs and the thing, because anyone can kind of do it. But it’s not ethically correct for them to say, actually, we don’t want everyone to be
artists.
Instead, they focus on, it’s taken my art and trained on my art, and it’s impossible for this to work without my art.
Not really. It’s been trained on ImageNet, and it can still create just about any composition.
Again, part of the problem was having the clip model embedded in there, because the clip model knows a lot of stuff.
We don’t know what’s in the OpenAI dataset, actually, we do kind of, and it’s interesting.
I think that all we can do is kind of learn from the feedback from the people that aren’t shouting at us, or like, you know, members of the team have received death threats and other things, which are completely over the line.
This is, again, a reason why I think caution is the better part of what we’re doing right now.
Like, you know, we have put ourselves in our way, like, my inbox does look a bit ugly in certain places.
To try and calm things down and really listen to the calmer voices there and try and build systems so people can be represented appropriately.
It’s not an easy question, but again, like, I think it’s incumbent on us to try and help facilitate this conversation, because it’s an important question.
All right, let’s see. What’s next?
Aldras, are you looking to decentralize GPU AI compute?
Yeah, we’ve got kind of models that enable that. Hivemind that you’ll see on the decentralized learning side is an example, whereby we can train a distributed GPUs, actually models.
I think that the best version of that is on reinforcement learning models, I think those are deep learning models, especially when considering things like community models, etc. Because as those proliferate, they will create their own custom models, fine-tuned or dream loop or others, there’s no way that centralized systems can keep up, but I think decentralized compute is pretty cheap there.
All right, so, oops, did I kind of disappear there for a second? Testing, testing. All right, I’m back. Can you hear me? All right. Sorry.
Okay, are we going to do NERF type models, yes.
I think NERFs are going to be the big thing. They are going to be supported by Apple and Apple hardware, so I think you’ll see lots of NERF type models there.
Do you guys hate it when there’s a lack of battery? I think it’s so stupid. I can’t remember if it was a TV show or if it was in real life, but there was like this app called, like, I’m dying or something like that, that you could only use to message people when your battery life was like below 5% or something like that.
I think that’s a great idea if it doesn’t exist for someone to create an actual life.
Like, you know, feeling a solidarity for that tension that occurs, you know, I think makes you realize the fragility of the human condition.
All right. Wait, sorry. Am I meant to be doing center stage? Well, there’s nobody who can help me. Can’t figure out how to get lab people up on the stage.
So back to the questions.
Will AI lead to UBI, Casey Edwin?
Maybe. Will AI lead to UBI and Utopia or Panopticon that we can never escape from? Because the models that were previously used to focus our attention and service ads will be used to control our brains instead. And they’re really good at that.
So, you know, no big deal. Just two forks in the road. That’s the way we kind of do.
Let’s see. Who’s next?
Joe Rogan. When will we be able to generate games with AI?
You can already generate games with AI.
So the code models allow you to create basic games. But then we’ve had generative games for many years already.
So I’m just trying to figure out how to get people on stage or do this. Maybe we don’t.
Okay.
Marz says, how does your faith influence your mission?
I mean, it’s just like all faiths are the same. Do unto others as you’d have done unto yourself, right?
The golden rule for all the stuff around there. I think people forget that we are just trying to do our best. Like it can lead to bad things, though.
So Chief Rabbi Jonathan Sacks, sadly past, very smart guy, had this concept of altruistic evil where people who try to do good can do the worst evil because they believe they’re doing good. No one wants to be an asshole and bad. Even if we have our arguments and it makes us forget our humanity.
So I think again, what I really want to focus on is this idea of public interest and bring this technology to the masses because I don’t want to have this world where I look to the future and this is AI God that is controlled by a private enterprise.
Like that enterprise would be more powerful than any nation unelected and in control of everything. And that’s not a future that I want for my children, I think. Because again, I would not want that done unto me.
And I think it should be made available for people who have different viewpoints to me as well.
This is why, like I said, look, I know that there was a lot of tension over the weekend and everything on the community, but we really shouldn’t be the only community for this and we don’t want to be the sole arbiter of everything here.
We’re not open AI or DeepMind or anyone like that.
We’re really trying to just be the catalyst to build ecosystems where you can find your own place, whether you agree with us or disagree with us.
Having said that, I mean, like the Stable Diffusion hashtag has been taken over by Waifu Diffusion. If you like big boobs, it’s fine. Maybe just stick to the Waifu Diffusion tag because it’s harder for me to find the Stable
Diffusion pictures in my own media now.
So yeah, I think that also it’ll be nice when people of other faiths or no faith can actually talk together reasonably.
That’s one of the reasons that we accelerated arinfaith.org.
Again, you don’t have to agree with it, but just realize these are some of the stories that people subscribe to. And everyone’s got their own faith in something or other, literally not.
Waifu says, have you looked at training speed cost on TPUs versus A100s or the cost of switching TensorFlow from PyTorch to Grape?
We have code that works on both. And we have had great results on TPU v4s. The horizontal and vertical scaling works really nicely. And gosh, there is something called a v5 coming soon.
Would that be interesting?
You will see models trained across a variety of different architectures, and we’re trying just about all the top ones there.
And Lindsay says, does Stability AI have plans to take on investors at any point, or have they already?
We have taken on investors. There will be an announcement on that. We have given up zero control, and we will not give up any control. I am very good at this.
As I mentioned previously, the original Stable Diffusion model was financed by some of the leading AI artists in the world and collectors. And so we’ve been kind of community focused.
I wish that we could do a token sale or an IPL or something and be community focused, but that just doesn’t fit with regulations right now.
So the only thing that I can say is that we will and will always be independent. No one’s going to tell us what to do, because otherwise we can’t pivot to WiFus if it turns out that WiFu diffusion is the next big thing.
All right. And what have we got now?
We’ve got a notepad. How much of an impact do you think AI will impact neural implant cybernetics? It appears one of the limiting facts of cybernetics is the input method, not necessarily the hardware.
I don’t know. I have no idea. To be honest, I’ve never thought about that.
Yeah, like, I think that it’s probably required for the interface layer. The way that you should look at this technology is that you’ve got the highest structure to the unstructured world, right? And this acts as a bridge between it.
So like with Stable Diffusion, you can communicate in images that you couldn’t do otherwise.
Cybernetics is about the kind of interface layer between humans and computers. And again, you’re moving that in one direction, and the cybernetics allow you to move in the other direction. So you’re going to have much better information flow.
So I think it will have a massive impact from these foundation devices.
All right.
Overmayer, AI cannot make Cyberpunk 2077 not broken.
No.
I was the largest investor in CD Projekt at one point, and it is a crying shame what
happened there. I have a lot of viewpoints on that one.
But, you know, we can create like cyberpunk worlds of our own in, what did I say, five
years? Yeah, not Elon Musk in there.
So that’s going to be pretty exciting.
What is next?
Are you guys planning on creating any hardware devices, something consumer oriented, one
which has AI as OS?
We have been looking into customized ones. So some of the kind of edge architecture, but it won’t be for a few years.
On the AI side, actually, no, it will be. It’ll probably be towards the end of next year, because we’ve got that on our tablets.
So we’ve got basically a fully integrated stack for tablets for education, health care, and others.
And again, we’re trying to open source as much as possible. So we’re looking to risk five on alternative architectures there. Probably an announcement there in Q1, I think.
Blisky says, anything specific you’d like to see out of the community I’m at?
I just like people to be nice to each other, right? Communities are hard.
It’s hard to scale community.
Humans are designed for one to 150. And what happens is that as we scale communities bigger than that, this dark monster of our being, Moloch, kind of comes out. People get really angsty, and there’s always going to be education, there’s always going to be drama.
Happy communities, do you know, there aren’t drama.
Just consider what your aunts do, and they chat all the time. It’s all kind of drama.
I like to focus on being positive and constructive as much as possible, and acknowledging that everyone is bored humans.
But again, sometimes you make tough decisions, I made a tough decision this weekend, it might be right, it might be wrong. But you know, it’s what I thought was best for the community.
We wanted to have checks and balances and things, but it’s a work in progress.
I don’t know how many people we got in the community right now, like 60,000 or something
like that. That’s a lot of people.
And you know, I think it’s 78,000, that’s a lot of freaking people, that’s like a small
town in the US, or like a city in Finland or something like that, right?
So yeah, I just like people to be excellent to each other.
And Mr. M says, how are you Ahmad?
I’m a bit tired.
Back in London for the first time in a long time, I was traveling trying to get the education thing set up, I got stability Africa set up as well. There’s some work that we’re doing in Lebanon, which unfortunately is really bad.
I said stability does a lot more than image. And it’s just been a bit of a stretch even now with 100 people.
But the reason that we’re doing everything so aggressively is because you kind of have to, because there’s just a lot of unfortunateness in the world. And I think you’d feel worse about yourself if you don’t have to.
And there’s an interesting piece I read recently, it’s like I know Simon Fried, FTX.
You know, he’s got this thing about effective altruism. He talks about this thing of expected utility.
How much impact can you make on the world and you have to make big bets.
So I made some really big bets.
I put all my money into freaking GPUs, I really created together a team, I got government
international backing and a lot of stuff because I think everyone has agency and you have to
figure out where you can add the most agency and accelerate things up there.
We have to bring in the best systems and we’ve built this multivariate system of multiple communities and now we’re doing joint ventures in every single country because we think there is a whole new world.
So again, like there’s another great piece Sequoia did recently about generative AI being a whole new world that will create trillions. We’re at this tipping point right now.
And so I think unfortunately you’ve got to work hard to do that because it’s a once in a lifetime opportunity.
Just like everyone in this community here has a once in a lifetime opportunity.
You know about this technology that how many people in your community know about now. Everyone in the world, everyone that you know will be using this in a few years and no one knows the way it’s going to go.
Forced to feel in communities, what’s a good way to handle possible tribalism extremism?
So if you Google me, me, my name, you’ll see me writing in the Wall Street Journal and
the Reuters and all sorts of places about counter extremism.
It’s one of my expert topics and unfortunately it’s difficult with social media echo changers
to kind of get out of that and you find people going in loops because sometimes things aren’t
fair.
Like, you know, again, let’s take our community for example.
This weekend actions were taken, you know, at the banning that were considered unfair and again, that’s understandable because it’s not a cut and dry easy decision. You had kind of the discussions going on loop.
We had people saying some really unpleasant things, you know, some of the stuff made me kind of sad because I was exhausted and you know, people questioning my motivations and things like that.
Again, it’s your prerogative, but as a community member myself, it made me feel bad.
I think the only way that you can really fight extremism and some things like that is to have checks and balances and processes in place. There are a lot of people working super hard on that.
I think this community has been really well behaved.
Like you know, it was super difficult and some of the community members got really burned
out during the beta because they had to put up with a lot of shit, to put it quite simply. But getting people on the same page, getting a common mission and kind of having a degree of psychological safety where people can say what they want, which is really difficult in a community where you don’t know where everyone is.
That’s the only way that you can get around some of this extremism and some of this hate element.
Again, I think common mission is the main thing. I think everyone here is in a common mission to build cool shit, create cool shit and you know, like I said, the tagline kind of create, don’t hate, right?
People said, Emad, in real meetup for us members, yeah, we’re going to have little stability societies all over the place and agathons. We’re just putting an events team together to really make sure they’re well-organized and not our usual disorganized shambles, but you know, feel free to do it yourselves, you know?
Like we’re happy to amplify it when community members take that forward and the things we’re trying to encourage are going to be like artistic oriented things, get into the real world, go and see galleries, go and understand things, go and painting lessons, et cetera, as well as hackathons and all this more techie stuff, techie kind of stuff.
You can be part of the events team by messaging careers at stability.ai. Again, we will have a careers page up soon with all the roles. We’ll probably go to like 250 people in the next few months and yeah, it’s going very fast.
Procher says, any collaboration in China yet, can we use Chinese clip to guide the current one or do we need to retrain the model, embed the language clip into the model?
I think you’ll see a Chinese variant of Stable Diffusion coming out very soon. Can’t remember what the current status is.
We do have a lot of plans in China. We’re talking to some of the coolest entities there.
As you know, it’s difficult due to sanctions and the Chinese market, but it’s been heartening to see the community expand in China so quickly and again, as it’s open source, it didn’t need us to go in there to kind of do that.
I’d say that on the community side, we’re going to try and accelerate a lot of the engagement things.
I think the Dr. Fusions one’s ongoing, shout out to Dwight White for Nerf Gun and Almost AT for kind of the really amazing kind of output there.
I don’t think we do enough to appreciate the things that you guys post up and simplify them and I really hope we can do better in the future.
The mod team are doing as much as they can right now and again, we’ll really try to amplify the voices of the artistic members of our community as well more and more and give support through grants, credits, events, and other things as we go forward.
All right, who’s next?
We’ve got Almark. Is there going to be a time when we have AI friends we create ourselves, personal companions speaking to us via our monitor, much of the same way a webcam call is done, high quality, et cetera?
Yes, you will have her from Joaquin Phoenix’s movie Her with Scarlett Johansson, whispering in your ear.
Hopefully she won’t dub you at the end, but you can’t guarantee that.
If you look at some of the text to speech being emotionally resonant, then, you know, it’s kind of creepy, but it’s very immersive. So I think voice will definitely be there first.
Again, try talking to a character.ai model and you’ll see how good some of these chatbots can be. There are much better ones coming.
You’ve seen this already with Xiaoxi in China, so Alice, which a lot of people use for mental health support and then Lisa in Iran. So millions of people use these right now as their friends and again, it’s good to have
friends.
Again, we recommend 7cups.com if you want to have someone to talk to, but it’s not the
same person each time. Or, you know, like just going out and making friends, but it’s not easy.
I think this will help a lot of people with their mental health, etc.
Glinski says, how early do you think we are on this AI wave that’s emerging? How fast it’s changing? Sometimes it’s hard to feel FOMO.
It is actually literally exponential.
So like when you do a log normal return of the number of AI papers that are coming out, it’s a straight line. So it’s literally an exponential kind of curve. Like I can’t keep up with it. No one can keep up with it.
We have no idea what’s going on. And the technology advances, like there’s that meme, like one hour here is seven years
on earth. Like from interstellar, that’s how life kind of feels like.
I was on top of it for a few years and now it’s like, I don’t even know what’s happening.
Here we go. It’s a doubling rate of 24 months. It’s a bit insane. So yeah.
As Squonky says, any comments on Harmony AI? How close do you think we are to having music sound AI with the same accessibility afforded by stable diffusion?
Harmony has done a slightly different model of releasing dance diffusion gradually. We’re putting it out there as we license more and more data sets.
Yeah, I’m kind of bound with some of the own next and other work that’s going on. I mean, basically considering you’re at the VQGAN moment right now, if you guys can remember that from all of a year ago or 18 months ago, it’ll go exponential again because the amount of stuff here is going to go crazy.
Like generative AI, look at that Sequoia link I posted, is going to be the biggest investment theme of the next few years. And literally tens of billions of dollars are going to be deployed like probably next year alone into this sector. And most of it will go to stupid stuff. Some of it will go to good stuff. Most of it will go to stupid stuff. But a decent amount will go to forwarding music in particular because the interesting thing about musicians is that they’re already digitally intermediated versus artists who are not.
So artists, some of them use Procreate and Photoshop, a lot of them don’t. But musicians use synthesizers and DSPs and software all the time. So it’s a lot easier to introduce some of these things to their workflow and then make it accessible to the people.
Yeah, musicians just want more snares. You see the drum and bass guy there.
Safety mark, when do we launch the full Dream Studio and will it be able to do animations? If so, do you think it’ll be more cost effective than using Colab?
Very soon, yes, and yes.
There we go. Keep an eye here. The next announcements won’t be hopefully quite so controversial but instead very exciting, shall we say.
I’m running out of energy. So I think we’re going to take three more questions and then I’m going to be done. And then I’m going to go and have a nap.
Do you think an AI therapist could be something to address the lack of access to qualified mental health experts, RacerX?
I would rather have volunteers augmented by that.
So again, with 7cups.com, we have 480,000 volunteers helping 78 million people each month trained on active listening that hopefully will augment by AI as we help them build their models.
AI can only go so far but the edge cases and the failure cases I think are too strong. And I think again, a lot of care needs to be taken around that because people’s mental health is super important.
At the same time, we’re trialing art therapy with stable diffusion as a mental health adjunct in various settings from survivors of domestic violence to veterans and others. And I think it will have amazing results because there’s nothing quite like the magic of using this technology.
I think again, magic is kind of the operative word here that we have. That’s how you know technology is cool. There’s a nice article on magic.
A few more questions.
Ardisco, what are your thoughts on Buckminster Fuller’s work and his thoughts on how to build a world that doesn’t destroy himself?
To be honest, I’m not familiar with it. But I think the world is destroying itself at the moment and we’ve got to do everything we can to stop it.
Again, I mentioned earlier, one of the nice frames I’ve thought about this is really thinking about the rights of children because they can’t defend themselves and are we doing our big actions with a view to the rights of those children?
I think that children have a right to this technology and that’s every child, not just ones in the West. And that’s why I think we need to create personalized systems for them and infrastructure so they can go up and kind of get out.
All right, Ira, how will generative models and unlimited custom tailored content to an audience of one impact how we value content? The paradox of choice is more options tend to make people more anxious. And we get infinite choice right now. How do we get adapted to our new godlike powers of this hedonic treadmill? And it’s a net positive for humanity. How much consideration are we given to potential bad outcomes?
I think this is kind of one of those interesting things whereby, like I was talking to Alexander Wang at scale about this and he posted something on everyone being in their own echo chambers as you basically get hedonic to death, entertained to death. Kind of like this WALL-E, the fat guys with their VR headsets, yeah, kind of like that.
I don’t think that’s the case.
I think people use this to create stories because we’re prosocial narrative creatures and the N equals one echo chambers are a result of the existing internet without intelligence on the edge.
We want to communicate unless you have Asperger’s like me and social communication disorder, in which case communicating is actually quite hard, but we learn how to do it.
And I think, yeah, we’re prosocial creatures that love seeing people listen to what we do. This is why you click on likes and you know you’ve got this kind of hook model where you input something, you’re triggered, and then you wait for verification and validation.
I think actually this will allow us to create our stories better and create a more egalitarian internet because right now the internet itself is this intelligence amplifier that means that some of the voices are more heard than others because some people learn how to use the internet and they drown out those who do not and a lot of people don’t even have access to this.
So yeah, all righty, I am going to answer one more question because I’m tired now.
Ivy Dory, when do you think multi models will emerge, combining language, video and image?
I think they will be here by Q1 of next year and they’ll be good. I think that by 2024 they will be truly excellent.
You can look at the DeepMind Gato paper on the auto regression of different modalities on reinforcement learning to see some of the potential on this.
So Gato is just a 1.3 billion parameter model that is a generalist agent.
As we’ve kind of showed by merging image and others, these things can cross learn just like humans and I think that’s fascinating and that’s why we have to create models for every culture, for every country, for every individual so we can learn from the diversity and plurality of humanity to create models that align with us and work for us instead
of against us.
And I think that’s much better than stacking more layers and build giant freaking supercomputers to train models to serve ads or whatever.
So with that, I bid you adieu.
I apologize that I didn’t bring anyone to the stage. Our team’s kind of busy right now and yeah, I am not good at technology right now in my brain dead state. But hopefully it won’t be too long until we kind of connect again.
There’ll be a lot more community events coming up and engagement.
I think it’s been seven weeks, feels like seven years or seven minutes, I’m not even sure anymore.
I think we made a time machine. But hopefully we can start building stuff a lot more structured.
So thanks all and you know, stay cool, rock on, bye.