Loptr
International Hazard
Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline
Mood: Grateful
|
|
Looking for ideas for a new Synthesis Open Access Platform
Hey everyone!
I am seeking ideas for a potential new open access platform for synthesis and related resources that could provide the ability to capture syntheses
from journals, researchers, individual contributors, and provide cross-referencing between reactions, reagents, products, etc. to provide a searchable
database. I would eventually like to have educational content in the form of a collaboratively developed chemistry text book.
Right now, it seems the easiest way to create a repository of chemicals would be by ingesting data from ChemSpider, and other data sources through
APIs into a structured format using Wikibase, and bringing in Chemical Data Cartridges, such as Bingo, OpenBabel, etc., to allow interpreting the
compounds by structure, functional group. This would allow for easy search, comparison, and navigation between them, and to begin building an
ontology.
Integration of visual chemical editors like Ketcher, ChemDoodle, MarvinJS, would allow for the visual construction of compounds and reaction sequences
that can be interpreted and further processed for correlation with other data in the repository, as well as using them to allow visual search for
chemicals based on exact structure, substructure, and similiar structures. This is similar to other systems currently in use.
Also, the ability to ingest PDFs from journals and other data sources, run OCR on it to extract text, as well as use IMAGO OCR to extract and identify
chemical structures from drawings, and then further enrich the repository with this information. There could also be the conversion of the PDFs into
wiki pages, or something along those lines, or integrate a PDF viewer to view the original PDF, which might be problematic with copyright issues. Once
these PDFs have been ingested you would be able to either navigate from chemical to all data sources, or from an article with the mention of the
chemical to its properties, reactions, alternatives, synthesis, articles, suppliers, etc.
There is a lot that could be done here, and would be a lot of work, however, I have been thinking about this for the last several weeks and really
like the idea.
The overall idea is to ingest or allow the development of chemical and synthesis content that could then be processed and linked with the other
content in the repository. If you are looking at a synthesis, and need a specific reagent, you could navigate from that synthesis to the reagent to
find its preparation, or possibly, alternatives based on similar reagents and reaction products.
What other features do you think would be useful, or just comments in general.
[Edited on 9-2-2024 by Loptr]
"Question everything generally thought to be obvious." - Dieter Rams
|
|
Sulaiman
International Hazard
Posts: 3694
Registered: 8-2-2015
Location: 3rd rock from the sun
Member Is Offline
|
|
Might be 'better' to add to existing repositories such as orgsyn.org?
CAUTION : Hobby Chemist, not Professional or even Amateur
|
|
Loptr
International Hazard
Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline
Mood: Grateful
|
|
How do you suggest I go about that? It's managed by an organization and grants.
Also, how would you go about contributing to it? I highly doubt they would accept anything that you submit. It's a repository of validated synthesis
by professional chemists.
I was hoping to also extend this to the amateur community.
"Question everything generally thought to be obvious." - Dieter Rams
|
|
j_sum1
Administrator
Posts: 6320
Registered: 4-10-2014
Location: At home
Member Is Offline
Mood: Most of the ducks are in a row
|
|
Sounds like you want to reproduce prepchem but add a bot that automatically harvests synths from journals and reliable sources.
I don't know how you would do this. Seems like a significant coding challenge. And distinguishing between reliable/reproducable and bonkers-conjecture
will be quite a feat to pull off.
|
|
Loptr
International Hazard
Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline
Mood: Grateful
|
|
Quote: Originally posted by j_sum1 | Sounds like you want to reproduce prepchem but add a bot that automatically harvests synths from journals and reliable sources.
I don't know how you would do this. Seems like a significant coding challenge. And distinguishing between reliable/reproducable and bonkers-conjecture
will be quite a feat to pull off. |
I am trying to take what's in my head and capture it in words. I will take some time tonight and try to lay it out better.
It wouldn't be duplicating prepchem. That is not a community. This would hopefully grow a community around it with collaborative features with
accounts and roles, and allow the community to curate the information, and as information is added to it, the community could tag various things in it
to allow interlinking the content.
For instance, say you have sodium ferrocyanide and want to know what reactions you could use it for, you would be able to click on potassium
ferrocyanide and find every reaction in the repository where it has been tagged.
The other features I mentioned earlier would be down the road.
"Question everything generally thought to be obvious." - Dieter Rams
|
|
Texium
Administrator
Posts: 4580
Registered: 11-1-2014
Location: Salt Lake City
Member Is Offline
Mood: PhD candidate!
|
|
It sounds like you want to make SciFinder without the paywall
|
|
Loptr
International Hazard
Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline
Mood: Grateful
|
|
I have never had access to SciFinder, so can't say for sure.
@Texium, what are some of the features of SciFinder? What can it do?
[Edited on 10-2-2024 by Loptr]
"Question everything generally thought to be obvious." - Dieter Rams
|
|
Texium
Administrator
Posts: 4580
Registered: 11-1-2014
Location: Salt Lake City
Member Is Offline
Mood: PhD candidate!
|
|
It does pretty much everything you described in the OP! Quote: Originally posted by Loptr | I am seeking ideas for a potential new open access platform for synthesis and related resources that could provide the ability to capture syntheses
from journals, researchers, individual contributors, and provide cross-referencing between reactions, reagents, products, etc. to provide a searchable
database. | It does this with patents and journals automatically. Pretty much as soon as a paper is published
you can find it and all the reactions that it contains, including from the supplementary information.
Quote: Originally posted by Loptr | Integration of visual chemical editors like Ketcher, ChemDoodle, MarvinJS, would allow for the visual construction of compounds and reaction sequences
that can be interpreted and further processed for correlation with other data in the repository, as well as using them to allow visual search for
chemicals based on exact structure, substructure, and similiar structures. This is similar to other systems currently in use. | It has this functionality as well. You can search for a structure and it will pull up every reported reaction that it is found in.
You can further limit it to reactant or product, and apply a myriad of other filters to find relevant results. You can also draw out a whole reaction
scheme and search for any matches, and use variables in the structure drawings to get more general results. Substructure and similarity are options as
well. Plus, it can also list suppliers that sell the chemicals you search for and the quantity and price they sell it in, though that isn’t as
useful for home chemistry since most of them don’t sell to individuals.
Quote: Originally posted by Loptr | Also, the ability to ingest PDFs from journals and other data sources, run OCR on it to extract text, as well as use IMAGO OCR to extract and identify
chemical structures from drawings, and then further enrich the repository with this information... Once these PDFs have been ingested you would be
able to either navigate from chemical to all data sources, or from an article with the mention of the chemical to its properties, reactions,
alternatives, synthesis, articles, suppliers, etc. | I don’t know if it’s the same mechanism that you
describe, but it does this too! When you view reaction search results, it’s clear that the conditions were automatically scraped from the
publications, including their SIs, almost always quite accurately. When you view an article you have the option to “get reactions” and see all the
schemes from the paper. Likewise, when you view a chemical, you can “get reactions” or “get references.” It’ll also directly provide you
with spectroscopic data and physical properties of compounds if they are published.
Quote: Originally posted by Loptr | The overall idea is to ingest or allow the development of chemical and synthesis content that could then be processed and linked with the other
content in the repository. If you are looking at a synthesis, and need a specific reagent, you could navigate from that synthesis to the reagent to
find its preparation, or possibly, alternatives based on similar reagents and reaction products. | Yeah, that
is exactly what it is. Honestly, it’s such a powerful tool that I’ve been spoiled to have access to the last several years. It’s going to be
hard to go back to not having it, so I would certainly support an endeavor to create an open-access alternative, though it would indeed be a colossal
undertaking.
|
|
bnull
Hazard to Others
Posts: 433
Registered: 15-1-2024
Location: South of the border, wherever the border is.
Member Is Offline
Mood: Dazed and confused.
|
|
Quote: Originally posted by Texium | Yeah, that is exactly what it is. Honestly, it’s such a powerful tool that I’ve been spoiled to have access to the last several years. It’s
going to be hard to go back to not having it, so I would certainly support an endeavor to create an open-access alternative, though it would indeed be
a colossal undertaking. |
You lucky bastard... I got a glimpse of it the other day. Dammit. An open-access version would be amazing.
@Loptr, why don't you try contacting researchers from the universities closest to you? I think that if you discuss with them, they'll offer advice and
suggestions. It would be as useful to them as much as to any amateur chemist. If there were a free alternative almost as powerful, they would gladly
ditch Scifinder the way some libraries did to those expensively useless journal subscriptions.
Even so, it would take at least a couple of years to make it run smoothly, and you can't do it alone.
Quod scripsi, scripsi.
B. N. Ull
P.S.: Did you know that we have a Library?
|
|
Loptr
International Hazard
Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline
Mood: Grateful
|
|
Quote: Originally posted by bnull | Quote: Originally posted by Texium | Yeah, that is exactly what it is. Honestly, it’s such a powerful tool that I’ve been spoiled to have access to the last several years. It’s
going to be hard to go back to not having it, so I would certainly support an endeavor to create an open-access alternative, though it would indeed be
a colossal undertaking. |
You lucky bastard... I got a glimpse of it the other day. Dammit. An open-access version would be amazing.
@Loptr, why don't you try contacting researchers from the universities closest to you? I think that if you discuss with them, they'll offer advice and
suggestions. It would be as useful to them as much as to any amateur chemist. If there were a free alternative almost as powerful, they would gladly
ditch Scifinder the way some libraries did to those expensively useless journal subscriptions.
Even so, it would take at least a couple of years to make it run smoothly, and you can't do it alone. |
Yeah, I am well aware of that. I run a software development organization that contracts, and has commercial and IR&D projects as well.
I was thinking about starting small and trying to use as much open source software available as possible. There is quite a bit from what I can find.
I was reaching out to you all to see what else could be put on the wish list because I was mostly focusing on the technology, rather than the use case
of what it would actually do. Most of my posts have been general statements about interlinking content and ingestion because that's what I am most
familiar with professionally, and had an idea, but was trying to understand what was already being done by the other existing systems.
I think a combination of Wikibase, ingestion from Chemspider to create a listing of a bunch of reagents, along with other APIs to get additional
details for each chemical would be a good first start. That way you have pages that can be linked to within reactions for reference. From there adding
a visual editor with the ability to import reagents from the repository, and from there you have the beginnings of a collaborative platform that could
allow individuals to contribute reactions and the platform be able to understand (somewhat) the reactant, solvents, etc. for linkage. Allow some
members to then be able to curate the content using annotations and notices for references needed, or bogus content, and then you have the basis for
the Wikipedia business model.
The article ingestion can be done but would require finetuning and experimentation, with constant adjustment as article formats change. The idea would
be to extract text, determine paragraphs and their possible titles, extract images with their relation to the paragraphs, and then convert it into
wikitext with a basic formatting. You don't have to duplicate the article exactly. Then once the content had been ingested, you process it for named
entity resolution to other content within the repository for linkage.
Sounds easy. right?
"Question everything generally thought to be obvious." - Dieter Rams
|
|
digga
Harmless
Posts: 43
Registered: 11-6-2018
Member Is Offline
|
|
Allow to user to maintain a stock list. The show the user what can be made from it and what skills/equipment are needed. Assign each reaction
keywords for searching. Allow users to add reactions to the list. For example:
Benzenesulfonic Acid. Materials: purified toluene, concentrated sulfuric acid, bicarbonate. Gear: dean stark apparatus Skills: distillation,
refluxing, filtration. Keywords: moderate catalyst useful.
Show reaction if stock list contains the ingredients AND the user wants easy to moderate.
Have a flag which shows reactions you are missing some of the requirements highlighting what you are missing.
Forum members can add reactions.
This is a small project which will provide immediate reward by helping suggest projects.
Howzaboutit?
[Edited on 10-2-2024 by digga]
|
|
Loptr
International Hazard
Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline
Mood: Grateful
|
|
Assuming that a given reaction has been captured with the list of chemicals needed, then yes, it would be possible to somewhat easily identify
products that you could possibly produce. At that point its just a matter of finding the complete intersection of your stock, and reactants for a
given reaction. It could also list the of possible reactions given that a required reactant was acquired in a sorted order by distance based on the
number needed additional reactants.
Would that be considered a generally beneficial feature? It sounds like it would be great for amateur chemists, but not something that would greatly
benefit academics or industry, which isn't necessarily a requirement. Maybe part of an exploratory mode.
"Question everything generally thought to be obvious." - Dieter Rams
|
|
clearly_not_atara
International Hazard
Posts: 2787
Registered: 3-11-2013
Member Is Offline
Mood: Big
|
|
I think I would avoid trying to be all things to all people. One thing I've thought would be useful would be a repository of published reactions
(papers or patents) that have been successfully replicated by amateurs.
|
|
Loptr
International Hazard
Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline
Mood: Grateful
|
|
Quote: Originally posted by clearly_not_atara | I think I would avoid trying to be all things to all people. One thing I've thought would be useful would be a repository of published reactions
(papers or patents) that have been successfully replicated by amateurs. |
That's a very wise point.
If you try to be good at everything, you will end up not being good at anything.
I am mostly just talking through ideas with the community at this point. All suggestions welcome.
"Question everything generally thought to be obvious." - Dieter Rams
|
|
|