I have a new idea for a web app, basically a quick search for if a certain chemical is soluble in a certain solvent (or range of solvents).
Obviously, the hard part is going to be getting the large amount of solubility data. I think getting the solubility data of [pretty much anything] in
water will be easy since Wikipedia usually displays the water solubility in its own row with the first cell containing "Solubility in water", then the
next sibling cell has the solubility in a standardized format:
So that kinda takes care of solubility data in water. But does anyone know of a source that has useful solubility data in a format that I could use?
The trick is to find an online resource that:
Has a lot of data
Makes it possible to search that data
Can easily be crawled with a script
Shows the solubility data in a standardized format that can be programmatically targeted using something like Regex
Unfortunately, a lot of the resources I know of that have solubility "data" in a standardized format that would be easy to crawl, don't really list
off the specific data.
For example, take a look at this:
The data is just displayed as:
Very soluble
Easily soluble
Soluble
Sparingly soluble
Slightly soluble
Very slightly soluble
Practically insoluble
Which honestly isn't very specific. Even though each of those terms are assigned a solubility range, it's just not specific enough.
I think crawling Pubchem would be a nightmare. Every time I look for solubility data in Pubchem, it looks like they literally just copy/paste
whatever content is on the source page, so it's quite different based on where it was pulled from.
Any input or advice would be greatly appreciated.
Thanks!
-J
[Edited on 13-1-2023 by SuperOxide]DraconicAcid - 12-1-2023 at 21:14