Sciencemadness Discussion Board
Not logged in [Login ]
Go To Bottom

Printable Version  
 Pages:  1    3  
Author: Subject: Lengthy, time-consuming overhaul of forum software
Melgar
Anti-Spam Agent
*****




Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

[*] posted on 17-5-2018 at 19:14


Okay, well one of the issues is that spambots come here all the time, and while the admins do remove them, whenever you report a post, that sends U2U messages to EVERY admin and moderator. The admin tools kind of suck, really. Also, the only thing preventing this site from getting hacked and getting every U2U message stolen is the fact that most people don't actually care. We actually had a hacker break into our site and make himself an admin and try to demand ransom, but the mods just kept banning him until he got tired of making messes and left. But his coding skills were really bad, which was the main thing that limited what he could do. If someone with better coding skills wanted to do some real damage though, they'd certainly be able to.

I've been working on this for a while, and think it might help to write up a "diagnosis" or something, where I lay out the different pieces that would have to be reverse-engineered and reassembled. It's doable for sure, and it's not like I'm too busy to do it, it's just hard to stay motivated. Also, my internet connection can be kind of crappy, which makes it hard to remote into the web server I set up. But I have been on it.




The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
View user's profile View All Posts By User
CouchHatter
Hazard to Others
***




Posts: 152
Registered: 28-10-2017
Location: Oklahoma
Member Is Offline

Mood: 76 elements taken!

[*] posted on 17-5-2018 at 20:56


The site has zero mobile optimization, which seems to be affecting a portion of the total users. I would like to see a migration to a more popular forum software, based on others' assessments of its current state.
Quote: Originally posted by Polverone  
I appreciate the thought, but I am well-funded from my day job nowadays. I don't need money. What I am short of is time.

Our forum software is ancient. It was one of maybe two "good" open source options back in 2002. It's not good any more, security-wise. (I actually rather like its minimalism apart from security issues.) Transitioning to another forum has been oft-discussed but it's a monumental undertaking.

In the mean time, if you are familiar with PHP, I would gladly welcome security audits and enhancements in the form of pull requests against this github repository:

https://github.com/mattbernst/xmbforum


It sounds to me like Melgar is doing the Lord's work.
View user's profile View All Posts By User
Tsjerk
International Hazard
*****




Posts: 3032
Registered: 20-4-2005
Location: Netherlands
Member Is Offline

Mood: Mood

[*] posted on 18-5-2018 at 14:54


Maybe all members could mail Melgar a token of appreciation? Doesn't have to be much, should be non suspicious to customs, but I for example have quite some chemicals which are not too everyday, and I would happily send some to the hero who is getting our forum fixed.

Anything to help Melgar get reminded that what he does is appreciated.

Without transfer soon or late someone is going to do a lot of damage.
View user's profile View All Posts By User
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 18-5-2018 at 17:35


Quote: Originally posted by Melgar  
I've been working on this for a while, and think it might help to write up a "diagnosis" or something, where I lay out the different pieces that would have to be reverse-engineered and reassembled.


Absolutely!

This type of thing interests me and I could easily see it sucking me in. No point needlessly repeating your work though.

Perhaps put it up as a github repository to facilitate collaboration?
View user's profile View All Posts By User
JJay
International Hazard
*****




Posts: 3440
Registered: 15-10-2015
Member Is Offline


[*] posted on 19-5-2018 at 11:19


My major concern with this project is that it shouldn't have taken more than a weekend to complete, and there are also serious concerns about the security practices and individuals involved. A github repository is pretty much the minimum measure that should be taken to ensure that no one is secretly inserting backdoors or encrypting new passwords with MD5, etc.



View user's profile View All Posts By User
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 19-5-2018 at 16:42


Quote: Originally posted by JJay  
My major concern with this project is that it shouldn't have taken more than a weekend to complete


To me, that seems a little ambitious but I'm a scientist who programs as necessary rather than as my full time profession.

Quote: Originally posted by JJay  
there are also serious concerns about the security practices and individuals involved. A github repository is pretty much the minimum measure that should be taken


From reading this thread and a few other posts on the subject I think the intention is that a guide would be created so that Polverone can migrate the site without having to solve every problem. Addressing each issue will take a lot of work but when it's working it should be a straightforward process. It would be very difficult to hide a backdoor in the directions. If the forum needs to be customized at all it would be a simple process to look at the difference from the official phpBB release.

Quote: Originally posted by JJay  
encrypting new passwords with MD5, etc.


This is how passwords are currently stored. For example, here is the code for updating a password. It isn't even salted.

How to handle this going forward is an interesting question. The options seem to be:

  1. Continue using unsalted md5 hashes
  2. Silently migrate to something better and update passwords as users log in
  3. Migrate to something better and delete all existing passwords forcing everyone to go through the forgot password process


For something better, phpBB supports several options. Bcrypt is by far the best although the default work factor seems a little low.
View user's profile View All Posts By User
Melgar
Anti-Spam Agent
*****




Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

[*] posted on 20-5-2018 at 04:53


Lol @ JJay.

I basically have to reverse engineer both forum database schemas, (though phpBB's is at least well-documented, I guess) then write scripts and a guide for Polverone to do the transition himself. Also, one of the problems is that he deleted all the U2U messages in the test database, making it a lot harder for me to reverse-engineer that table. I'd been meaning to ask him if he could release a version of the test database where instead of deleting all the U2U messages, he just replaced the text of them with "hello!" Then at least I could work on that and be reasonably confident that it'll work on the real data if it works on the test data.

Also, I need to make sure Polverone is okay with AWS as a host. I've been charged about $7 a month for an Ubuntu server with 100GB of SSD primary storage, (Needed to be able to run the virtual machine) so I think if we did go that route, there would be huge savings on hosting. Also not sure if I'd set up the server with test data, transfer it to Polverone, then he'd import the real data. I don't foresee any problem doing it this way, as it'd eliminate the need for Polverone to set up a server from scratch. I mean, if I'd have wanted everyone's data, I'd have already gotten it. :P (I don't though.)

The trouble with github is that there isn't going to be much coding involved. The hard part is going to be all the SQL commands that would be needed to transform the XMB data such that it works with phpBB, and that will involve a lot of trial and error. However, I could start a github repo for the SQL I've written so far, mostly to export the data from the test server as CSV, create a similar database structure in Postgres, then import all the CSV data into Postgres. Also, we'd probably have to modify the base phpBB installation in order to minimize broken links and such.

@streety: Already figured that part out! phpBB actually checks for several different formats of encrypted passwords, including MD5. I guess way back in the day, phpBB used MD5 too. However, if it detects an MD5 hash, it will automatically bump up the encryption to bcrypt. (I think)

I'm in the process of writing up a proposal where I break down all the different parts that will need to be done. I know it doesn't look like I've done much yet, but I've gotten the XMB databases into Postgres, which was tricky, and I've also done a lot of research into what needs to be done. Still writing my plan of action, mainly. I'll be posting that soon, so you guys can tell me what you think of it, propose changes, etc.




The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
View user's profile View All Posts By User
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 20-5-2018 at 09:06


Quote: Originally posted by Melgar  
I'd been meaning to ask him if he could release a version of the test database where instead of deleting all the U2U messages, he just replaced the text of them with "hello!" Then at least I could work on that and be reasonably confident that it'll work on the real data if it works on the test data.


I haven't looked at the virtual image but another option would be to copy it, get it running, and populate it with new data. Then do the migration from the copy to test.

Quote: Originally posted by Melgar  
Also, I need to make sure Polverone is okay with AWS as a host. I've been charged about $7 a month for an Ubuntu server with 100GB of SSD primary storage, (Needed to be able to run the virtual machine) so I think if we did go that route, there would be huge savings on hosting.


This is a real apples to oranges comparison. I think you would be in for a nasty surprise. $7/month is probably a t2.nano or a t2.micro with at best a single cpu core and 1 GB of memory. That's fine for a small test server but to handle the higher traffic for the live site something with a little more power is probably needed. That's not the surprise though. The price on EC2 does not include traffic. You pay extra for the traffic. In that way it is quite different to a more traditional host.

I don't know what traffic the site gets so I'm just making up numbers - 10GB/month in and 500GB/month out. This would push up the price to $54.90/month.

Currently, I believe the site is hosted on linode. A server there with an 80GB SSD has 2 CPU cores, 4GB memory and 4TB data transfer all included for just $20/month. If 80GB is too small you can get 160GB for $40/month and CPUs and memory also double.


Quote: Originally posted by Melgar  
Also not sure if I'd set up the server with test data, transfer it to Polverone, then he'd import the real data. I don't foresee any problem doing it this way, as it'd eliminate the need for Polverone to set up a server from scratch. I mean, if I'd have wanted everyone's data, I'd have already gotten it. :P (I don't though.)


This seems quite fragile and messy. A set of steps including every task would make the process easily repeated and testable. It would also allay any security concerns if the process started from a default version of ubuntu and phpBB.

Quote: Originally posted by Melgar  
The trouble with github is that there isn't going to be much coding involved. The hard part is going to be all the SQL commands that would be needed to transform the XMB data such that it works with phpBB, and that will involve a lot of trial and error. However, I could start a github repo for the SQL I've written so far, mostly to export the data from the test server as CSV, create a similar database structure in Postgres, then import all the CSV data into Postgres. Also, we'd probably have to modify the base phpBB installation in order to minimize broken links and such.


Doesn't matter if it isn't going to be computer code. You could almost think of Polverone as the human "computer" your instructions will run on. :D

The advantage of something like git is you can go through multiple iterations and keep track of where you have been. It's perfect for trial and error.

Great to read you are already thinking about the password hashes.
View user's profile View All Posts By User
Melgar
Anti-Spam Agent
*****




Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

[*] posted on 20-5-2018 at 18:26


Yeah, I've used git extensively. The trouble is that the process I'd use for migrating the tables isn't exactly the sort of process git helps with. It's more of a trial-and-error type of process where you look at a lot of data and try to ascertain how it fits into the larger process, based on the patterns you see in it. After a while, it gets to be like in The Matrix, where the guy's looking at a screen full of changing numbers, and describing the people they represent. Fortunately, it's pretty rare to lose data permanently if you're not stupid about making backups and copies.

I will start at least two git repos though. One for the list of commands/scripts to migrate the data from XMB/MySQL to phpBB/PostgreSQL, and another for a custom version of phpBB.

As far as AWS, there's a reason half the internet runs on it. Although Linode gives you a flat rate, they can only do this because only a fraction of customers ever approach hitting it. AWS, on the other hand, charges only what customers actually use. Say you're off by an order of magnitude in your estimation? After all, this site mainly hosts text-based content. And a bulletin board with the traffic that this one gets is hardly at the high end of what might tax a server. Another advantage of AWS is that you can start off with a low-end instance, and can easily add services or upgrade services when you need them, rather than guessing what you might end up using eventually, and going with that. There are other nice things too, like you never have to worry about running out of disk space from attachments and other static files, since you'd just move them to S3.

Thinking about it further, I uploaded the entire virtual machine image to S3, which adds 40GB to the total space I'm using. And I logged in and downloaded huge amounts of data, including the virtual machine multiple times if I'm not mistaken. I should see if I can find some breakdown of the charges on my account, and see what it comes down to, just out of curiosity.

One thing about github is that you have to have a paid account to have a private repo. Probably no need though, right?




The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
View user's profile View All Posts By User
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 21-5-2018 at 16:34


I think a public repo should be fine but there are other free options if private is important. I think bitbucket is probably the largest.

For the type of scaling you are talking about any provider would be sufficient. For the setup being discussed here the work involved in adding an extra server and utilizing it on AWS or linode or anywhere else is going to be mostly changing the application. AWS would only have a big advantage if using the different services from the beginning, S3 for file hosting, RDS instead of postgresql, etc.

For the traffic it was just a guess and running some numbers through the calculator so it could well be very wrong. The traffic may well be relatively low and the costs better than alternatives. There is no upper limit though. If there is a spike in traffic for any reason the costs could go up dramatically. With conventional hosts you know what you are spending ahead of time.

View user's profile View All Posts By User
Melgar
Anti-Spam Agent
*****




Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

[*] posted on 21-5-2018 at 19:08


When I ended up looking at the contents of the virtual machine, out of 12.2 GB of the space that it takes up, 9 GB of that was attachments, in the MySQL database. Without attachments, the MySQL database is only 273 MB. Looking at the $7 a month I'm paying for hosting, almost all of that is because of the 100 GB SSD that I'm using. Of course, the only reason the SSD is so big is because it had to accommodate both its own software and virtual machines. I had to host both the original virtual machine file, the working virtual machine, the data that was extracted from the virtual machine via SQL and CSV files (both inside and outside of the virtual machine), and the working data in the host database (both in XMB and phpBB format), all at once. When everything goes live, that will no longer be the case.

I've used Linode, DigitalOcean, Rackspace, and AWS, and AWS seemed like it would require me to pay the least amount of my own money out-of-pocket while I tinkered around with the data trying to get everything live. That, and S3 is nice for static file hosting, which is maybe 70% of the data in the virtual machine.

I set up a github repo containing the SQL code to set up a postgres database with tables that can store the data from XMB CSV dumps. CSV dumps were MUCH easier to work with than SQL dumps, mainly because XMB hails from an era when UTF-8 wasn't standard. Although there are comments indicating it was automatically dumped from MySQL, I basically had to go through the whole thing and rewrite it by hand anyway, which is the main reason I thought it was important enough to include:

https://github.com/toldani/sm-transition

I'll add more files and comments once I'm more sure of the process myself, obviously. I'm using Ubuntu 16.04 LTS, php-7.0, PostgreSQL 9.5, and phpBB 3.2.2.




The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
View user's profile View All Posts By User
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 22-5-2018 at 04:33


Awesome! I've starred the repository and will take a look/give it a run in the next few days. I have the same username on github as here.

AWS is perfect for getting something like this up and running, my concern was more for the future when it is in the harsh glare of the larger world.

Are you thinking of going to S3 for hosting attachments from the beginning? wasabi also looks interesting. Faster than S3 but at Glacier like prices and data egress is free.
View user's profile View All Posts By User
Melgar
Anti-Spam Agent
*****




Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

[*] posted on 14-6-2018 at 19:23


If anyone wants the .pem file for accessing the AWS server, PM me with your email. This is just a test server anyway, I suppose, and so there's no need to be too careful about it.

I've been thinking about how to store the attachments such that their actual filenames are used, despite the fact that there can be duplicates. XMB seems to use some redirect mechanism that ensures that the filename is correct when downloading attachments, and that would need to be carried over somehow with the S3 attachment system. There's a phpBB plugin that allows S3 to work as the attachment system, so I'd kind of like to continue using that route, because it seems like there would be the least resistance.

I registered the "sciencemadness" S3 subdomain, so for the attachments, I'd probably just figure out how to transfer that whole system over to a different account, if Polverone wants to keep it under his name (which I assume he does). The attachments aren't exactly a secret anyway, I'd just need to maintain a consistent system for storing the attachments.




The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
View user's profile View All Posts By User
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 15-6-2018 at 15:32


Quote: Originally posted by Melgar  
If anyone wants the .pem file for accessing the AWS server, PM me with your email. This is just a test server anyway, I suppose, and so there's no need to be too careful about it.


This seems like a bad idea.

If anyone wants access I suggest you ask for their public key and then add that to the list of authorized keys. Your private key should be just that ... private.

Quote: Originally posted by Melgar  
I registered the "sciencemadness" S3 subdomain, so for the attachments, I'd probably just figure out how to transfer that whole system over to a different account, if Polverone wants to keep it under his name (which I assume he does). The attachments aren't exactly a secret anyway, I'd just need to maintain a consistent system for storing the attachments.


Do you mean you got the bucket with the name sciencemadness? Another way to handle this would be to alias a subdomain of sciencemadness.com to the s3 bucket. I'm not sure how either option works with using https though.

I downloaded the backup image and set up Apache, phpBB, and MySQL using docker on my laptop. I started writing a conversion script for the users and then started looking at the forums. Then I realized you intended to change from MySQL to postgresql. Should be easy to change what I have though.

XMB is quite simple. There are some peculiarities but easy enough to grasp. phpBB seems like the polar opposite, it's very complex. I wonder whether it is actually the best option or whether there is a simpler alternative that also has ongoing support.
View user's profile View All Posts By User
Melgar
Anti-Spam Agent
*****




Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

[*] posted on 16-6-2018 at 09:28


Quote: Originally posted by streety  
This seems like a bad idea.

If anyone wants access I suggest you ask for their public key and then add that to the list of authorized keys. Your private key should be just that ... private.

Yeah, that is a better idea. Initially I was thinking it didn't matter because the server wouldn't ultimately be used anyway, and I don't use that key or AWS account for anything else. But you're right that it's bad practice, and it would be a lot easier to just add public keys to the authorized_keys file.
Quote:
Do you mean you got the bucket with the name sciencemadness? Another way to handle this would be to alias a subdomain of sciencemadness.com to the s3 bucket. I'm not sure how either option works with using https though.

Yes, that is what I mean. For attachments, I think it's best to just stick to the S3 bucket, with that URL. The URLs would be automatically generated anyway, and there's a phpBB extension that allows attachments to use S3 for storage.

Quote:
I downloaded the backup image and set up Apache, phpBB, and MySQL using docker on my laptop. I started writing a conversion script for the users and then started looking at the forums. Then I realized you intended to change from MySQL to postgresql. Should be easy to change what I have though.

XMB is quite simple. There are some peculiarities but easy enough to grasp. phpBB seems like the polar opposite, it's very complex. I wonder whether it is actually the best option or whether there is a simpler alternative that also has ongoing support.

That's the problem I got stuck on too. This seems to be caused by all the new features phpBB has added over the years.

The best solution I could come up with is to install phpBB 2.X, transition the XMB data to that (which would be a lot easier), then upgrade phpBB. I hadn't decided whether to try and go direct to phpBB 3.2.2 anyway or do it in two steps, but what you're saying reinforces that it'd be better to do the transition with an older version of phpBB that would have a much more similar database structure.

As far as using PostgreSQL or MySQL, I have a lot of experience doing things like regex find/replace with PostgreSQL, which seems like it'd be necessary to make sure all the bbcode tags convert correctly, all the links work, etc. It seems like it's inevitable that there will be various formatting problems caused by different tag behavior, so I anticipate there will be a lot of global tag-fixing needed.

If anyone DOES want their public key added, let me know and I'll add you.

[Edited on 6/16/18 by Melgar]




The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
View user's profile View All Posts By User
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 16-6-2018 at 14:21


Quote: Originally posted by Melgar  

That's the problem I got stuck on too. This seems to be caused by all the new features phpBB has added over the years.

The best solution I could come up with is to install phpBB 2.X, transition the XMB data to that (which would be a lot easier), then upgrade phpBB. I hadn't decided whether to try and go direct to phpBB 3.2.2 anyway or do it in two steps, but what you're saying reinforces that it'd be better to do the transition with an older version of phpBB that would have a much more similar database structure.


That's a brilliant idea.

It looks like the oldest version is 2.0.12 on an "official" phpBB page - https://sourceforge.net/projects/phpbb/files/phpBB%202/

There are other sites with earlier copies but not sure I would trust them.


Quote: Originally posted by Melgar  
As far as using PostgreSQL or MySQL, I have a lot of experience doing things like regex find/replace with PostgreSQL, which seems like it'd be necessary to make sure all the bbcode tags convert correctly, all the links work, etc. It seems like it's inevitable that there will be various formatting problems caused by different tag behavior, so I anticipate there will be a lot of global tag-fixing needed.


:o

You want to do all that in SQL? I'm perfectly comfortable with the basics but that is way beyond my level of familiarity.

There are some complex transformations needed for the users table but I used python instead. I'm more familiar with it and there are a greater variety of packages available.

I had intended to include a link to what I had in my previous post but forgot about it. It's up on github now.

User migration: https://github.com/streety/new-sm-forum/blob/master/migrate_... - needs testing but I think this is close.

Forum and post migration: https://github.com/streety/new-sm-forum/blob/master/migrate_... - only just started, I still don't understand the structure fully.
View user's profile View All Posts By User
Melgar
Anti-Spam Agent
*****




Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

[*] posted on 16-6-2018 at 19:33


I've written C extensions and PLPGSQL functions for Postgres, so I'm pretty comfortable with optimizing it. I'm also used to using 20+ GB databases, and SM's database is only 273 MB if you subtract the file attachments. So it's be really simple to do the sort of DB manipulation that'd be needed.

My experience is mainly as a Rails backend dev, so my go-to language is Ruby, but I'm pretty comfortable writing complex SQL queries too. Not much experience with Python, but the use cases are similar to Ruby, from what I've seen. I have used Python a little for something to do with Sublime extensions, but I'm more comfortable with Ruby, mainly due to the nicer console and the more forgiving syntax. But if you just comment your Python code, I'm sure I'd be able to figure it out.

I don't think it should be necessary to go as far back as possible with phpBB. The jump from 2.X to 3.X was the major one, and I don't think that there are significant increases in complexity between the various minor versions of phpBB 2.X. Also, if the goal is to eventually upgrade to phpBB 3.X, then it would be better to have one of the later 2.X versions as a target, since upgrading to 3.X would then be easier.




The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
View user's profile View All Posts By User
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 18-6-2018 at 17:22


It looks like the latest version in the 2.0.x series is 2.0.23. I doubt I'll have time during the week but I can take a look at the weekend.
View user's profile View All Posts By User
Melgar
Anti-Spam Agent
*****




Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

[*] posted on 21-8-2018 at 14:12


Sorry to leave everyone hanging. The spam problem certainly seems to have become too serious to ignore anymore. Had my development laptop break (Macbook) and it was going to cost $700 to fix. So I picked up this crappy little Lenovo Ideapad for $150 just to hold me over until I can afford something better. It came with Windows 10, which it ran incredibly poorly, but wiping everything and replacing it with Linux really improved its performance. So I spent yesterday getting it set up for writing code, which it's actually been halfway decent at. I do most development via a remote terminal anyway.

Anyway, I've been wanting to try out GCP (Google Cloud / Google's version of Amazon Web Services) for a while, since I've always wondered what AWS would be like if it was scrapped and rebuilt by people who were trying to be helpful. And it really is pretty nice. They had the benefit of learning from Amazon's mistakes, plus Google is head and shoulders above Amazon when it comes to developing software that people actually want to use. Rather than force people to learn their own flavor of SQL, Google has graciously used MySQL and Postgres as the engines powering its relational database service. You even have your choice of which one to use! And Google's documentation is not only up-to-date, but has helpful links that take you to the appropriate part of the control panel scattered throughout it. So yeah, I really like what Google's doing, and I feel way better about supporting their innovation than the sort of bean-counting that characterizes Amazon.

What was I talking about again? Oh right, the SM database. Now my first order of business now that I'm working with GCP is going to be converting the attachment table into files, then storing them in GCP's static data storage system. Then I can reduce the database size from 9GB to 273MB, and the data will be much easier to work with.

IP address for the main GCP server is 35.196.251.155. I'd be glad to add your public keys to the list, if anyone feels they have anything to contribute.




The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
View user's profile View All Posts By User
Melgar
Anti-Spam Agent
*****




Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

[*] posted on 26-8-2018 at 18:02


Now I have the virtual machine running on a Google Cloud instance, and I set up port forwarding so you can access it here:

http://35.185.63.230:8080/smtalk/

I uploaded all the attachments in the database to Google Cloud Storage. Here's an example of one of them:

https://storage.googleapis.com/sciencemadness/attachments/10...

If you want to try checking to see if they're all there, look at any of the attachment links in this forum. There will be two numbers in the link, one of which is the thread id, and the other that's the attachment id. You can tell which one is the thread id because it's in the URL of the page you're on. The filename should constitute the text of the link. The only thing is, I had to sanitize the filesnames (I used the Ruby CGI.escape method) that had special characters in them. So files without spaces or commas or parentheses in their file names are easier to figure out the path for.




The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
View user's profile View All Posts By User
streety
Hazard to Others
***




Posts: 110
Registered: 14-5-2018
Member Is Offline


[*] posted on 27-8-2018 at 04:30


I'm glad to see you're open to alternative hosting options. You had seen quite set on AWS.

At the risk of throwing a spanner in the works I had been wondering whether phpBB is actually the best target for this effort. Implementing the spam captcha on this forum was somewhat challenging, largely because it was

a) a language I had not used in many years
b) poorly organized
c) lacked separation between data, logic, and presentation
d) not written with modification in mind

phpBB is an improvement but far from a complete solution. I fear writing each of the changes we would need would be a lengthy and difficult process even after the database has been migrated across.

Contrast that with working on the spam monitoring script and there have been times when I've thought it would be easier to re-implement the forum than to migrate it to phpBB. I'm working in a language I know (python) and I can separate functionality across organized files.

How long would it take you to re-implement the forum in a language you know (Ruby) using a modern framework, the MVC pattern, an ORM etc? How long would it take you to make changes later on? It's not obvious to me it would be any longer than migrating to phpBB.

I'm not suggesting we switch to writing our own forum software now but we should get a better idea of how much work will be needed to make the necessary modifications to phpBB. You have the attachments in google cloud storage. I don't know what your next intended step was but I suggest modifying phpBB to store attachments there and then display them.
View user's profile View All Posts By User
JJay
International Hazard
*****




Posts: 3440
Registered: 15-10-2015
Member Is Offline


[*] posted on 27-8-2018 at 04:48


I use PHP all the time, mainly on Zend (I could use Laravel or something but it wouldn't be my first choice). I have a strong bias against Ruby, but Python is cool. SimpleMachines is pretty popular....

I've worked on forum software before... I don't think it would be very hard to implement a forum, but if I were doing it for myself, I would be tempted to use something like bbPress.




View user's profile View All Posts By User
Melgar
Anti-Spam Agent
*****




Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

[*] posted on 27-8-2018 at 10:26


My thought was that phpBB has been going strong for a very long time, and will probably continue to be maintained in the future. So even if we make changes to the code in places, then if we move on like Polverone has, our successors here won't be stuck with a huge mess on their hands.

The main reason I thought phpBB would be a good choice had more to do with the similarities to the XMB interface. I went around to all the major open-source bulletin board software sites and looked around, trying to rank them in how difficult they might be to get used to for people who don't seem to want things to change very much. Here's how I ranked at least some of them:

1. phpBB https://www.phpbb.com/community/
2. Simple Machines https://www.simplemachines.org/community/
3. myBB https://community.mybb.com/
4. vBulletin https://www.vbulletin.com/forum/
5. Discourse https://meta.discourse.org/

I assumed the consensus among long-time contributors would be that the less "web 2.0" we went with it, the better. Correct me if I'm wrong. I've used bbPress, and don't have any complaints about it. I'd assume there would be some hesitation about tying this site to the Wordpress ecosystem though. When I was a system administrator, I cut our routing errors in half by setting up this route to where the Wordpress login page would be if we were running Wordpress. The security isn't necessarily bad, but it's definitely where hackers are focusing their energies.

And yes, thank you to whoever pushed me in the direction of Google Cloud. I'd incorrectly assumed that since AWS was a huge mess to figure out, that the same might hold true of Google's offerings. Unlike Amazon though, Google actually seems to put some priority on ease-of-use and having good documentation.

As an aside, I think this has a lot to do with their corporate cultures. Amazon is notoriously competitive, even among themselves. So they don't spend a lot of time thinking about how they can improve things for their uses. Google seems to prioritize making things more useful and helpful a lot more, even if they aren't immediately profitable.

Once things are set up, you still spend most of the time using Linux through a terminal over ssh in any case. Google seems to use a less confusing system of authentication between services too. For whatever reason, AWS made you copy and paste this modified JSON object into a textbox to set up your S3 bucket to automatically allow publicly accessible downloads. With Google, not only can you just turn it on from the console, but they have a few really powerful and really useful command-line utilities that are way more intuitive to use for authenticating connections between different services. Like, it makes a lot more sense to log into your secured Linux server, and then use a command line utility there to establish your credentials on that side. That way, you only need to look up the credentials for the other side.

As far as Ruby, I was only planning to use it for converting data between the databases. It's really good for writing "glue code", where you have to make several different applications process each other's data in ways that they aren't quite set up to do. It fills a similar role as Python does, but is more forgiving of syntax. I also use shells a lot, and Ruby does those really well. For the record, here's the Ruby script I used to pull out all the attachment data and save them as files. After doing that, I used Google's command-line tool to transfer the whole attachment filesystem into cloud storage:

https://github.com/toldani/sm-transition/blob/master/parse_a...

I've been making it a point to comment my code too, so even if Ruby isn't your language, it should be pretty obvious how it works.




The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
View user's profile View All Posts By User
Loptr
International Hazard
*****




Posts: 1348
Registered: 20-5-2014
Location: USA
Member Is Offline

Mood: Grateful

[*] posted on 27-8-2018 at 10:33


Quote: Originally posted by Melgar  
My thought was that phpBB has been going strong for a very long time, and will probably continue to be maintained in the future. So even if we make changes to the code in places, then if we move on like Polverone has, our successors here won't be stuck with a huge mess on their hands.

The main reason I thought phpBB would be a good choice had more to do with the similarities to the XMB interface. I went around to all the major open-source bulletin board software sites and looked around, trying to rank them in how difficult they might be to get used to for people who don't seem to want things to change very much. Here's how I ranked at least some of them:

1. phpBB https://www.phpbb.com/community/
2. Simple Machines https://www.simplemachines.org/community/
3. myBB https://community.mybb.com/
4. vBulletin https://www.vbulletin.com/forum/
5. Discourse https://meta.discourse.org/

I assumed the consensus among long-time contributors would be that the less "web 2.0" we went with it, the better. Correct me if I'm wrong. I've used bbPress, and don't have any complaints about it. I'd assume there would be some hesitation about tying this site to the Wordpress ecosystem though. When I was a system administrator, I cut our routing errors in half by setting up this route to where the Wordpress login page would be if we were running Wordpress. The security isn't necessarily bad, but it's definitely where hackers are focusing their energies.

And yes, thank you to whoever pushed me in the direction of Google Cloud. I'd incorrectly assumed that since AWS was a huge mess to figure out, that the same might hold true of Google's offerings. Unlike Amazon though, Google actually seems to put some priority on ease-of-use and having good documentation.

As an aside, I think this has a lot to do with their corporate cultures. Amazon is notoriously competitive, even among themselves. So they don't spend a lot of time thinking about how they can improve things for their uses. Google seems to prioritize making things more useful and helpful a lot more, even if they aren't immediately profitable.

Once things are set up, you still spend most of the time using Linux through a terminal over ssh in any case. Google seems to use a less confusing system of authentication between services too. For whatever reason, AWS made you copy and paste this modified JSON object into a textbox to set up your S3 bucket to automatically allow publicly accessible downloads. With Google, not only can you just turn it on from the console, but they have a few really powerful and really useful command-line utilities that are way more intuitive to use for authenticating connections between different services. Like, it makes a lot more sense to log into your secured Linux server, and then use a command line utility there to establish your credentials on that side. That way, you only need to look up the credentials for the other side.

As far as Ruby, I was only planning to use it for converting data between the databases. It's really good for writing "glue code", where you have to make several different applications process each other's data in ways that they aren't quite set up to do. It fills a similar role as Python does, but is more forgiving of syntax. I also use shells a lot, and Ruby does those really well. For the record, here's the Ruby script I used to pull out all the attachment data and save them as files. After doing that, I used Google's command-line tool to transfer the whole attachment filesystem into cloud storage:

https://github.com/toldani/sm-transition/blob/master/parse_a...

I've been making it a point to comment my code too, so even if Ruby isn't your language, it should be pretty obvious how it works.


Polverone hasn't moved on, he is here every so often.




"Question everything generally thought to be obvious." - Dieter Rams
View user's profile View All Posts By User
Melgar
Anti-Spam Agent
*****




Posts: 2004
Registered: 23-2-2010
Location: Connecticut
Member Is Offline

Mood: Estrified

[*] posted on 27-8-2018 at 11:26


Well, I just meant he's moved on to having other priorities in his life. And if we do the same thing, we want to make sure that most of the software we implement is going to be supported in the future.

phpBB has a nice system for developing extensions, and my impression was that it would offer the most room for the site to evolve. I know there hadn't been any sort of consensus reached yet, but the security and spam problems have gotten so bad that it was clear somebody needed to just grab the bull by the horns. And phpBB was one of the top contenders already, so it seemed like as good a choice as any.

Initially, my plan was to import all the MySQL data into Postgres, then use SQL functions as a means of converting the data. But since I don't know what types of problems I'd run into doing it that way, the way I'm approaching it now is to just use something like Ruby or Python to pull the data in from one database, modify it as needed, then write it to the other database. Considering a Ruby script extracted 9GB of binary attachment data and saved it as files in a few hours, I think having flexibility in our data conversion tools should take precedence. For these one-off scripts, language isn't important, and would fall to whoever writes the scripts. We'd have to just make sure to comment them well enough that it's obvious how they work.

Initially, I was set on using Postgres as the database, since I would have needed to write SQL functions to do the conversion. But now that plans have changed, the database software isn't going to be critical to my choice. I've also been using MySQL more, so if someone had a compelling reason to use MySQL going forward, I wouldn't be opposed.

Apache, though, is a whole other story. After getting lost in a maze of configuration files, trying to figure out which one was responsible for the weird caching behavior that was happening, my vote is for Nginx. Your configuration only needs to be complicated if your setup is! Imagine that!

Any bulletin board software we do go with though, I think we should disable any features where you can "like" or "rate" people's posts. That seems to almost universally divide people up into factions, and I like that we don't have that here.




The first step in the process of learning something is admitting that you don't know it already.

I'm givin' the spam shields max power at full warp, but they just dinna have the power! We're gonna have to evacuate to new forum software!
View user's profile View All Posts By User
 Pages:  1    3  

  Go To Top