How to speed up drupal's slow batch processing
Note committed on 10 August 2011, tagged drupal, drupal 7
I have a task to import pretty huge drupal 5 site into drupal 7. I decided to create a specific import module that would import old drupal site content one by one. Perfect candidate to perform this job seemed to be a drupal's Batch API.
What I've later discovered is that for huge tasks like mine (import several thousands of items) is very slow, probably because of overhead that is created by drupal bootstrap. The thing is that batch api creates a list of functions and parameters for those functions upfront and then executes them one by one. Each step includes the drupal bootstrap with all the regular page processing, and that's what making the overall batch processing slow. I've measured my import function call lasts 0.8s..1s and the overhead is about 5s, for almost every call.
If only it could do series of my import function calls in one session.
Digging the batch API I found this line (file includes/batch.inc, function _batch_process()):
if ($batch['progressive'] && timer_read('batch_processing') > 1000) {
// Record elapsed wall clock time.
$current_set['elapsed'] = round((microtime(TRUE) - $current_set['start']) * 1000, 2);
break;
}
After I changed that 1000 milliseconds in the 'if' condition to something like 30000 ms, I found that the process of importing became ~5 times higher! The source code is pretty self explanatory: if the process takes longer than 1 second, it breaks the execution and waits for another call. I will probably reset this value back when the site goes into production, but for now it is what I was looking for.
======= Add new comment =======
Wikidpad synchronization
Note committed on 29 May 2011, tagged Wikidpad
Problem:
This part of my work usually includes talking to customers, writing meeting minutes, project documentation, noting the ideas, referring and updating these records while I actually code. This is a trivial task if you do everything from one laptop which you carry around. But I've got around 3 to 4 computers which I constantly switch depending on situation.
This is where you start looking for a kind of software which could be
- a central point where you put all your notes
- synchronized between all your computers
- free/opensource
Googling didn't yield any acceptable results. Most of the findings were online solutions which I am a bit paranoid about (send your private data to someone you don't know?). Others have had a single database without a possibility to synchronize. The rest of them were not free.
So I went back to good old WikidPad and looked at it from a different angle.
First of all I must say it is the most convenient, most geeky-looking, and most simple yet powerful offline wiki database I've ever seen. It has a few storage options, one of them is "original_sqlite". It is the most convenient way to store all your notes. Yet it doesn't have a built-in possibility to synchronize.
Today I have come up with a Wikidpad synchronization solution.
It is very simple: git + sqlite cli + wikidpad database.
For those who have similar needs, here is what I've done to achieve that:
- download wikidpad, msysgit, sqlite cli
- create your wikidpad database using the original_sqlite data storage option.
- copy the sqlite3.exe to the root folder of your wikidpad database
- create two files, one for dumping the db to text file, another one for reading it back:
dbdump.bat:
dbimport.bat:
cat dbdump.sql | sqlite3.exe data/wikiovw.sli
- execute both to see they really work
- add everything to source control except (create .gitignore file and state all the exceptions): data/wikiovw.*
- push it to the central repo
With this you will have your notes synchronized on all of your locations, you'll have total control over the change history, and last but not least, be easily able to resolve any conflicts that will inevitably appear from time to time.
I am using git, but I am sure this is possible to do on all revision control software. With git you can go further and add those dbdump/dbimport commands to hooks (haven't tried these yet) and everything will be totally automated.
======= 1 comment =======
Equal Height Columns with Bottom Border and Margin
Note committed on 16 April 2011, tagged css, jquery
As probably all web developers, I have been stuck with this problem for quite a while: how to make two columns equal height (to draw a line between them, color them, .. whatever).
First I used the most popular margin:-value, padding:+value technique, but quickly discovered that internal anchors are not working. Further internet researching has shown solutions so huge and so hacky (pure CSS abuse), so I just wrote these two lines of jquery code to make both columns equal height. All of the problems (bottom border, anchors, margin/padding weirdness) disappeared instantly.
c1 = $("#left-column").height();
c2 = $("#right-column").height();
if(c1>c2) $("#right-column").height(c1)
else $("#left-column").height(c2);
});
======= Add new comment =======
Imagefield: automatically transfer file on selection without clicking upload button
Note committed on 03 April 2011, tagged drupal, drupal 6, js
I wanted file to be uploaded automatically after selecting it, without having to press the Upload button:

Here my solution to this problem, which seems to be working better than the one in the filefield issue queue on d.o:
$(context, '.filefield-upload input.form-file').change(function(context) {
$(context.target).parent().find('input.form-submit:not(.au)').addClass("au").mousedown();
});
}
======= Add new comment =======
Search and process all numbers in a word document (vba word macro)
Note committed on 30 January 2011, tagged vb
My wife has had a mindblowingly tedious task at her job: to lower all the prices by 10% in her word file (some sort of a tourist agency price list document). The document was about 70 pages long with numbers scattered all around the document: in paragraphs as well as in tables, so effectively exporting it to excel was not an option.
The thing I've noticed that gave me the idea of trying VBA macros, was that the numbers to be cut by 10% had the format of XXXX.XX, like 500.00, so I decided to google it and compile a VB macro that could probably be of some (future) use too.
When the document was processed in only couple of seconds, it was one of those moments when I felt that what I did was some kind of magic: "How nice it is to have a programmer by my side!" :) This piece of code saved her a day or two of her life.
Set objWdDoc = ActiveDocument
Set objWdRange = ActiveDocument.Content
Do
With objWdRange.Find
.MatchWildcards = True
.Text = "[0-9]{1;5},[0-9]{2}"
bFound = .Execute
If Not bFound Then
Exit Do
End If
.Execute Replace:=wdReplaceOne, replacewith:=Format(Round(objWdRange.Text * 0.9 + 0.0000001, 2), "Standard")
Set objWdRange = objWdDoc.Range(objWdRange.Start + 6, objWdDoc.Range.End)
End With
Loop
End Sub
======= 2 comments =======
How to insert Javascript into PDF using PdfSharp library (adding javascript for automatic PDF self-printing functionality)
Note committed on 21 December 2010, tagged .net
A couple of days ago we needed a tool that can: 1) merge PDFs, 2) insert Javascript to be able to automatically send itself to printer after opening and 3) have suitable license for including it to commercial closed-source app.
iTextSharp can do first two, but the commercial license costs way too much for us.
PdfSharp is another awesome library which was a breeze to do the merging of several PDFs into one and is free even for commercial use.
The only thing that was lacking is built-in ability to inject some javascript.
So when I found the problem is not yet solved by anyone across the internet, I downloaded the PdfSharp source code and went through it with a debugger.
Here is a piece of code that I came up with after a day or two that does just that. Maybe it will save you some time too
// ... [add pages to the PDF] ...
// Insert self-printing javascript
PdfDictionary dictJS = new PdfDictionary();
dictJS.Elements["/S"] = new PdfName("/JavaScript");
dictJS.Elements["/JS"] = new PdfStringObject(merged, "print(true);");
merged.Internals.AddObject(dictJS);
PdfDictionary dict = new PdfDictionary();
PdfArray array = new PdfArray();
dict.Elements["/Names"] = array;
array.Elements.Add(new PdfString("EmbeddedJS"));
array.Elements.Add(PdfInternals.GetReference(dictJS));
merged.Internals.AddObject(dict);
PdfDictionary group = new PdfDictionary();
group.Elements["/JavaScript"] = PdfInternals.GetReference(dict);
merged.Internals.Catalog.Elements["/Names"] = group;
string localFile = ""; // path of PDF file to be created
merged.Save(localFile);
As you can see, the code isn't hacky at all, which is good (special thanks to the authors of PdfSharp for making the library really extensible). Basically what it does is implement the method described here: http://www.fpdf.de/downloads/addons/36/ with means of PdfSharp.
======= Add new comment =======
How to configure a private proxy on your VPS
Note committed on 15 November 2007, tagged hosting, proxy, vps
VPS hosting is very configurable. One can do things that were not imaginable before. One of such cool things is your own private proxy server.
Whether you need to do some things anonymously in internet, or you want to see how does a landing page looks, of an affiliate marketing offer, accessible only from a country you are not living in, but your VPS server is...
This is exactly the situation a felt myself in last couple of weeks, I was using internet proxy web pages, like hacksurfing.com or any other, but they are not always working correctly. Imagine, that the affiliate redirect link contains a little more complicated redirecting link done in javascript, which is far too much for hacksurfing to parse, and there is completely nothing you can do about that. (If you don't have a VPS, of course)
Lately I've been trying to find out how to configure a proxy on my JaguarPC VPS.
I have found some articles on internet but they were too common for me.
Here is a complete solution for you...
Configuration of apache private proxy on VPS
On JaguarPC VPS we have apache httpd service, and I think this private proxy configuration will work for anyone with apache. I have a plesk configuration, but I guess that doesn't matter, the private proxy is configured at apache level.
vi your apache configuration file /etc/httpd/conf/httpd.conf
find the proxy section which starts with ‹IfModule mod_proxy.c›. If you are reading this article, then probably that whole section in that file is commented out. Uncomment it. That's it, for a first step.
But...
It might not be wise to leave this proxy open to everyone, because it will be immediately used by hackers to hide their identity and do their evil plans :). Who will be guilty in this case? You, of course!
So we have to add this code: "Allow from 127.0.0.1" into that section. Meaning that traffic is allowed from local host only.
restart the httpd deamon: "apachectl -k restart"
Here is how that private proxy configuration section should look at the end:
ProxyRequests On
‹Proxy *›
Order deny,allow
Deny from all
Allow from 127.0.0.1
‹/Proxy›
ProxyVia Block
‹/IfModule›
note that I used ProxyVia Block, which tells the private proxy server not to tell anyone that it is a proxy. I don't really know whether it's really important, but I like things look real :)
We are going to access the proxy via ssh tunnel, using a ssh program that supports tunnels. I always recommend putty, it really has everything!
So, open PuTTY, type your vps server's name in the host and go to tunnels section of putty's interface. Configure the tunnel so it maps your local port, let's say 8080, to remote http port: 80.
it should read:
source port: 8080
destination: localhost:80
When you connect to your VPS server with these settings, it will create a secure data tunnel, which we will use to access our private proxy configured at our VPS.
In your web browser, go to proxy servers configuration and type localhost:8080 as the proxy server name.
That's it!
When you visit some page, your browser goes through the secure tunnel made between putty and vps, then the data comes out of the tunnel at remote port 80, which is then correctly handled by the private proxy we've configured before!
======= 4 comments =======
Choosing the right VPS plan: Jaguar PC hosting, VPS review (VPS discovery plan)
Note committed on 24 September 2007, tagged hosting, vps
A few words about hosting again, this time about JaguarPC
During last two years I have been jumping from one hosting company to another and actually I've got pretty tired of that. Through all this time my own personal development in the field of internet money making was actively progressing especially during last half a year, resulting in a very comfortable state, when the sites I am dealing with are finally able to pay for themselves, so I have decided that these sites can afford themselves something better.
Cyber Ultra Network was the best hosting so far and still is, their price/performance level was superb, so I am still recommending them to everybody who is searching for a good shared inexpensive hosting... It has a lot of features and still not expensive (3.95$/month)
As I needed more control and also perfect uptime (say, 99.99% a year), the idea of some good VPS hosting was always flying above me, so during last three months I was actively researching this area of internet.
That wasn't an easy task - the market is full of VPS hosting providers!
Another thing is that I had no idea what kind of difficulties I could ran into after I would transfer my sites to VPS, but man, sooner or later it had to happen! So after a lot of research there was a winner
And the winner was the JaguarPC Hosting Company and their VPS discovery plan.
They are affordable, they had mostly good recommendations and what I liked the most, they have their own helping community! This is the only web hosting company I have seen, that has this feature.
The last drop for me was the promotion they had for the last week of august, which I almost missed, there was only two days left but yes, that has really made the final drop.
Originally for the price 19.97$ per month you get fully managed service, 10Gb of space on hard drive, 150Gb bandwidth, 128Mb RAM and 100% uptime guarantee.
The promotional package was a "little" better: 192Mb RAM, 20 Gb HDD and two free months. Impressive? I thought so too, so I decided to try and if something goes wrong they have 30 days moneyback guarantee.
First two weeks I was playing a bit here and there, trying the shiny new VPS, exploring plesk panel and checking whether I could do at least the same I could do in my shared hosting account.
The overall impression is like you own your own server somewhere there on the other side of the world (I live in Europe). You can do whatever you like and nobody on the same physical server hosted together with your account can influence you either by memory, bandwidth or even processor. The response time comparing with shared hosting is superb. Database speed is unbelievable. I am hosting a number of Drupal sites including this one and I can say they are flying now!
By default JaguarPC gives you 3 dedicated IPS, and you can do anything with them! Typically one IP is for hosting and the other two are for your own private name servers. That was something new for me. With the private name servers you can do any DNS operations including setting all DNS records (A, MX, etc) and even changing their life time, very useful when you plan some big change!
That is the power, I personally think that I can not live without anymore!..
I have moved all my sites to this VPS hosting and so far the functionality, quality and also support team were very good. If I exclude the downtime caused by my own lack of DNS knowledge the sites have had 100% uptime.
If you liked this VPS review and want to try yourself in "god mode" :) My recommendations!. You'll definitely like it!
Updated. They have a promotion again! It is even better than the package I currently have. It includes: 25 GB diskspace, 300 MB bandwidth, double guaranteed RAM (256 MB) for -10% of price ($17.97 /month), (I am talking about beginner's package, there are more). The promotion expires pretty soon, so try it now, especially if you're new to VPS! (Use JaguarPC coupon code "VPS10YRS")
Please feel free to ask questions, as the person who has learned something a month ago will answer them more enthusiastically than the one that had been using VPS for years! :)
======= 11 comments =======
Cyber Ultra Positive Hosting Review
Note committed on 24 August 2007, tagged hosting
Since I went away from Servage, I have looked through a lot of different hosting services.
The key points I was looking for:
- relatively cheap hosting: we are a non-profit organization and I am not a guru in web pages building;
- ssh access: that is a must if you want to do something quickly and least painful. Ftp is not ok anymore.
- rich statistics: awstats and alike is OK
- guaranteed uptime: if it is stated to be 99.98%, it should be guaranteed.
I didn't need some overwhelming space and bandwidth (which as you can see, are a big lie at some "nice" hostings, just a reasonable number I can really live with.
I finally found such a company, which doesn't advertise itself too much in the internet and has all the things that I want from the hosting company - it is Cyber Ultra! My projects are living with them for about half a year and I am completely satisfied. With both uptime and price.
Their guaranteed uptime is 99% (which is more or less OK for me) but 1% of probable downtime gets reflected in their price, it is just 3 EUR/month. Their actual uptime was 100% during past two months, and slightly below 100% before.
CyberUltra has SSH, cPanel, and therefore complete control over everything.
This is my third hosting company and I think they are simply the best!
P.S. Pity they don't have affiliate program, this post could bring me some additional credits :)
======= Add new comment =======
Enhancing the drupal statistics: ip geolocation
Note committed on 20 August 2007, tagged drupal, module
Drupal statistics logs every access to a page. It is kind-of useful when the user is logged in, but there are times when you want to see where the user physically comes from. For this situation you can just go into "details" and see his ip address. Copy and paste it into some kind of online ip2location service and see the results.
This code allows you to shorten that cycle to just one click.
Here we will enhance the statistics page, providing link to a ip2location free service
open the statistics.module and find the statistics_recent_hits function. The changes in code are marked bold
....
$sql = 'SELECT a.aid, a.path, a.title, a.uid, u.name, a.timestamp<b>, a.hostname</b> FROM {accesslog} a LEFT JOIN {users} u ON u.uid = a.uid' . tablesort_sql($header);
$result = pager_query($sql, 30);
while ($log = db_fetch_object($result)) {
$rows[] = array(
array('data' => format_date($log->timestamp, 'small'), 'class' => 'nowrap'),
_statistics_format_item($log->title, $log->path),
theme('username', $log)<strong>.", ". l(check_plain($log->hostname), "http://www.ip2location.com/$log->hostname")</strong>,
l(t('details'), "admin/logs/access/$log->aid"));
}
...
after that you will have clickable ip address in the username column.
======= 6 comments =======
