Tuesday, September 12, 2023

Japan after the Pandemic

I have first cousins, second cousins, and lots of extended family in Japan, so I used to go every year.  Then the pandemic happened, and Japan closed its borders until October 2022.  I finally made it back in August 2023.  Some of my favorite places had gone downhill but some were as good as ever.  So here is my brain dump on how to have a good time in Japan.

Before You Arrive

If you plan to travel around Japan, get a JR Pass.  You buy a voucher from the Japan Travel Bureau, and exchange it for the actual ticket once you get to Japan.  Make sure the name on the voucher matches the name on your passport.  In Tokyo, you can exchange the voucher at the JR Travel Service Center in Tokyo Station or Ueno Station.

Be sure to enroll in the Visit Japan Web app.  Make sure you have 2 separate QR codes for each traveler; one for passport control and one for your customs declaration. 

Arriving in Tokyo Narita

When you walk out of customs, look for the escalator down to the train station.  I usually take the Narita Express, a one-hour express train to Tokyo station that runs twice an hour for about 4000 yen.  You can use credit cards in the JR Ticket Office.

Tokyo

If you are in Tokyo, you can use a JR Pass to ride the JR Trains.  The JR Yamanote line runs in a circle around the city.  Most cool places are near one of the Yamanote line stops.  The JR Chuo line runs through the middle of the city from Tokyo station to the fashionable western suburbs.

In addition to the JR Trains, there is also the subway system.  If you are a tourist, you can buy a 72 hour unlimited subway pass for 1500 yen, which is one of the biggest bargains in Japan.  The subway is newer than the JR Lines and much cooler in the summer, since the subway stations are underground.

You could spend months in Tokyo and never run out of things to do. My favorites:

Asakusa

In Asakusa, everyone goes to Sensoji temple and the little Shinto shrine next to it.  You can visit the temple anytime, but the shops on the street leading up the temple are good too.

There are great restaurants in Asakusa.

Akihabara

The Akihabara Electric Town is a famous place to buy electronics, anime stuff, comic books, etc.  Because of all the nerds who make pilgrimages here, it also has a ton of ramen shops and maid cafes.

The two big electronics chains are Bic Camera and Yodobashi Camera, and they both have flagship stores in Akihabara.  They sell everything here from liquor to electronics to toys.  If you need a SIM, this is the place to go.  Also, the toy / model kit / computer gaming sections are awesome.  Don Quijote ("Donki") is another Japanese retail chain with a 24 hour flagship store in Akihabara.  Donki is famous for their costume section for cosplayers.  They also have a decent luggage section.

If you are into vintage toys, go to Mandarake Complex.  It's like a toy museum where everything is for sale.  You can also just wander around and walk into any store that looks interesting.

In the last trip, we went to the Owl Cafe Fukurou.  They don't actually serve any food.  You just pet owls.  It was really cool.  Note: you must book in advance since space is limited.

Okachimachi / Ameyoko

One stop on the Yamanote line from Akihabara is Okachimachi.  Between Okachimachi and Ueno Park is the Ameyoko open air market.  Ameyoko is on the west side of the train tracks north of Okachimachi.

This is one of my favorite places.  Go there in the evening to do some shopping and then find a seat at an open-air cafe for dinner.  The tables are small and the people sitting next to you might become your best friends once everyone gets liquored up.  The Japanese are like that.

Yurakucho

Yurakucho is a cluster of small eating and drinking places south of Tokyo station and east of the palace.  It is most famous for the yakitori bars in the little tunnels under the JR tracks.  But pretty much all the food is good here since their target market is Japanese sararimen.

Shibuya

The main sites here are the scramble crossing and the Hachiko statue.  But as Akihabara is to the nerd, so Shibuya is to the fashionista because of the Shibuya 109 shopping center.  There are a lot of stores there and I've spent a lot of time holding Tracy's purse while she tried on various things.

The only thing that could get her out of Shibuya 109 is a sushi bar. My favorite in Shibuya is a standing sushi bar. They had to downsize during the pandemic, and they didn't have whale (kujira) when I went, but the selection is decent and the fish is fresh because of the high turnover.  Pay attention to what the OGs order and order the same if it looks good.

Harajuku / Omotesando

Two other fashionista centers are Takeshita Street in Harajuku for young fashion, and Omotesando for designer stuff.  Our favorite vintage kimono store, Gallery Kawano, was off Omotesando, but it closed during the pandemic.  Harajuku is also near Meiji Shrine.

Sushi in Tokyo

Sushi, as we know it, was invented in Tokyo.  The sushi bar market is so competitive that any meal at a Tokyo sushi bar for 5000 yen will be better than anything you can get in North America for 5 times the price.  So don't be sad if you can't get a seat at Jiro.

The fish market has moved from Tsukiji to Toyosu but there are still lots of sushi bars there.  Sushi Zanmai is a reliable chain with excellent fish.  They have several branches in Tsukiji, as well as other places. The owner of Sushi Zanmai always buys the first tuna of the year as a PR stunt.  

One of my business colleagues loves Sharaku Akasakaten so we eat there whenever we meet.  They had the best whale of the trip and are as good as they were before the pandemic, if not better.

If you really want to splash out, get a private room at Kyubey Ginza, and tell me how it was.   I had an amazing business dinner there in the early 2000s and the sushi chef told us how Bill Clinton ate there during a state visit in the 1990s and how the secret service had to shut down Ginza for Clinton.

Thursday, September 19, 2019

How to bulk load 1.6 billion records into AWS RDS Postgres

The nightmare scenario.  We had a star schema on AWS Aurora Postgres (compatible with Postgres 9.6).  I designed the dimension tables with 32 bit autoincrement keys (the Postgres  SERIAL data type).  Everything was fine for a few months.  Then the data loads started failing: the sequences for the 32 bit autoincrement keys had hit their max values.

How could this have happened?  The dimension tables all had record counts under 100,000.  It turns out that we loaded the dimension tables using simple INSERT statements:

  INSERT INTO dim_table (...) VALUES (...) ON CONFLICT DO NOTHING;

The 'ON CONFLICT DO NOTHING' is a cool Postgres feature.  It means that we don't have to select before inserting; just do batch inserts and let the database ignore any record that is already there.  The problem is that when you have an insert conflict, YOU STILL INCREMENT THE SEQUENCE; which is how we ran out of numbers.

Our fact table had about 1.6 billion records.  We looked at changing the data types in place but it seemed to take too long.  So we made a new database with a modified version of the star schema that uses 64 bit keys (BIGSERIAL data type) and started migrating the data from the old to the new database.  We were able to use pg_dump / pg_restore to migrate the dimension tables.  But we ran into a number of problems with the big fact table.

We initially tried to use ETL tools to migrate the facts in hourly and daily tranches. These loads would start out fast.  I would walk away optimistically; only to come back a few hours later to discover that the data loads had slowed to a crawl.  This was caused by RDS IOPS thottling.  Unless you are willing to pay for provisioned IOPS, RDS gives you 3 IOPS per GB per 30 minute period, with a max of 3000 IOPS; and throttles your I/O performance if you do too much I/O.  We had provisioned the DB with 100GB, intending to let it automatically grow.  But the IOPS thottle means you want to provision 1000GB at the outset to get the 3000 IOPS sustained allowance.  (More info on this can be found here under 'I/O Credits and Burst Performance'.)

Then we ran into timeout problems with pg_dump / pg_restore.  Long-running jobs were vulnerable to network disconnects.  So we had to set up .pgpass files so that we could run pg_dump with nohup (without needing to manually type a password).  We ran pg_dump using the directory format for the single fact table.  This got us a large compressed tab-delimited file with the 1.6 billion facts.  We tried to load this in the new schema using pg_restore, but it would consistently fail after running for a few hours.

So after some intensive web searches, we came up with the following solution:

1. Split the big file into smaller files with a million records each:

    gunzip -c 3942.dat.gz | split -l 1000000

    This gave us about 1600 files to load: xaa, xab, xac, etc.

2. Use psql to run the following script:

    CREATE TEMP TABLE temp_fact AS SELECT * FROM fact_table WITH NO DATA; 
    \copy temp_fact from 'xaa'
    INSERT INTO fact_table SELECT * FROM temp_fact ON CONFLICT DO NOTHING;
    TRUNCATE temp_fact;
    \copy temp_fact from 'xab'
    INSERT INTO fact_table SELECT * FROM temp_fact ON CONFLICT DO NOTHING;
    TRUNCATE temp_fact;
    etc.

3.  We dropped foreign key constraints on the fact table before doing the loads, but retained a unique index that was required to detect duplicate records during the inserts. It took 8-12 hours to load all the data files using this approach and maybe another hour to rebuild the foreign key indices once the loads had completed.


Tuesday, September 19, 2017

Gradle 4: Copy War to Directory

This took me 90 minutes to figure out and the internet was no help.

I was trying to copy the output of my gradle war project to a Tomcat webapps directory so I can run it in the debugger. 

I defined "tomcathome" in gradle.properties.  So far so good.

Then after reading a bunch of confusing stuff about how to define gradle tasks, I came up with this:

    task deployTomcat (dependsOn: 'war') {
        doLast {
            mkdir "${tomcathome}/webapps/${project.name}"
            copy {
                from zipTree(war)
                into "${tomcathome}/webapps/${project.name}"
            }
        }
    }


This task got an error:
Execution failed for task ':deployTomcat'.
> Cannot convert the provided notation to a File or URI: task ':war'.
  The following types/formats are supported:
    - A String or CharSequence path, for example 'src/main/java' or '/usr/include'.
    - A String or CharSequence URI, for example 'file:/usr/include'.
    - A File instance.
    - A Path instance.
    - A URI or URL instance.
So it doesn't like 'zipTree(war)'.  I modified the code to print 'war' and 'war.outputs'.  'war.outputs' is an instance of org.gradle.api.internal.tasks.DefaultTaskOutputs

So I googled this, found the source on github, found that DefaultTaskOutputs has a 'files' property of type FileCollection, and FileCollection has a property called 'singleFile'.  So I added more prints and found that war.outputs.files.singleFile gives me the name of the war file.

So this is the working task:

task deployTomcat (dependsOn: 'war') {
    doLast {
        mkdir "${tomcathome}/webapps/${project.name}"
        copy {
            from zipTree(war.outputs.files.singleFile)
            into "${tomcathome}/webapps/${project.name}"
        }
    }
}
Yay.

Friday, February 3, 2017

Solution for missing /vagrant file system.

If you are messing around with Vagrant and VirtualBox, you will sometimes end up with a vm where the /vagrant shared file system does not mount properly.  Vagrant complains that the vboxsf kernel module is not loaded properly and gives you a mount command to run manually.  When you try to mount it manually with
sudo mount -t vboxfs -o uid=500,gid=500 vagrant /vagrant
You get this error:
/sbin/mount.vboxsf: mounting failed with the error: No such device
  According to this page, you have to reinstall the VBox guest additions with this command:
cd /opt/VBoxGuestAdditions-*/init
sudo ./vboxadd setup
However, for a generic box, it fails because gcc and the kernel sources are not installed.  So when you run into this problem on a Centos 6 box (specifically, a vm based on the bento/centos-6.8 vagrant box), here is the exact sequence of commands that fixes the problem:
cd /opt/VBoxGuestAdditions-*/init
sudo yum install gcc make kernel-devel
sudo ./vboxadd setup
Then rerun the special mount command to mount the /vagrant file system.

Thursday, September 1, 2016

Changing Custom Field Types in Redmine

At Brick Street, we use Redmine to manage engineering and product management. We host it ourselves on an AWS instance because we're cheap and because we want to integrate it with our other line-of-business systems. We recently upgraded from Redmine 2.1 to 3.3. In Redmine 2.1, we had some custom fields on Issue records that held text to include in our release notes. (We have a program that scans subversion logs for Issues, and then generates release notes by pulling the Redmine text for the issues fixed in the branch.)

In Redmine 2.1, these custom fields are 'text' type, which shows up in the Redmine UI as a single line (an HTML input element). In Redmine 3.3, they added a 'Long Text' type which shows up in the UI as a resizable multiline control (an HTML textarea element).

We wanted to convert the 'text' custom fields to 'long text'. So I played with our preproduction instance and reverse engineered the schema: The custom field definitions are in the CUSTOM_FIELDS table:

MySQL> desc custom_fields;
+-----------------+--------------+------+-----+---------+----------------+
| Field           | Type         | Null | Key | Default | Extra          |
+-----------------+--------------+------+-----+---------+----------------+
| id              | int(11)      | NO   | PRI | NULL    | auto_increment |
| type            | varchar(30)  | NO   |     |         |                |
| name            | varchar(30)  | NO   |     |         |                |
| field_format    | varchar(30)  | NO   |     |         |                |
| possible_values | mediumtext   | YES  |     | NULL    |                |
| regexp          | varchar(255) | YES  |     |         |                |
| min_length      | int(11)      | YES  |     | NULL    |                |
| max_length      | int(11)      | YES  |     | NULL    |                |
| is_required     | tinyint(1)   | NO   |     | 0       |                |
| is_for_all      | tinyint(1)   | NO   |     | 0       |                |
| is_filter       | tinyint(1)   | NO   |     | 0       |                |
| position        | int(11)      | YES  |     | NULL    |                |
| searchable      | tinyint(1)   | YES  |     | 0       |                |
| default_value   | mediumtext   | YES  |     | NULL    |                |
| editable        | tinyint(1)   | YES  |     | 1       |                |
| visible         | tinyint(1)   | NO   |     | 1       |                |
| multiple        | tinyint(1)   | YES  |     | 0       |                |
| format_store    | text         | YES  |     | NULL    |                |
| description     | text         | YES  |     | NULL    |                |
+-----------------+--------------+------+-----+---------+----------------+
19 rows in set (0.00 sec)

The field_format value for text fields is 'string'. The field_format value for long text fields is 'text'.

The actual values of the custom fields are stored in the CUSTOM_VALUES table:

MySQL>; desc custom_values;
+-----------------+-------------+------+-----+---------+----------------+
| Field           | Type        | Null | Key | Default | Extra          |
+-----------------+-------------+------+-----+---------+----------------+
| id              | int(11)     | NO   | PRI | NULL    | auto_increment |
| customized_type | varchar(30) | NO   | MUL |         |                |
| customized_id   | int(11)     | NO   |     | 0       |                |
| custom_field_id | int(11)     | NO   | MUL | 0       |                |
| value           | mediumtext  | YES  |     | NULL    |                |
+-----------------+-------------+------+-----+---------+----------------+
5 rows in set (0.01 sec)

The customized_id field is a foreign key to the record in the ISSUES table. The custom_field_id field is a foreign key to the CUSTOM_FIELDS record. The value field stores the custom field value. This query will show you all the custom field values for Issue 2001:

select * from custom_values where customized_id = 2001;

So it turns out that you can change the type of a custom field value by updating the custom_field_id field of the CUSTOM_VALUES record. In our Redmine, we changed the text fields to long text by following this procedure:
  1. Define a new long text field in the UI.
  2. Look in the CUSTOM_FIELDS table to get the id of the text field (e.g. id 5) and the new long text field (e.g. id 15).
  3. Run this update query to change types of all the custom field values:
    update CUSTOM_VALUES set custom_field_id = 15 where custom_field_id = 5;
    

  4. Delete the custom field definition of the old field (id 5) in the UI.
This approach works great for Text custom fields. List custom fields are harder because the custom field definition includes a list of possible values (in the CUSTOM_FIELDS.possible_values). We have a number of multiselect fields that contain lists of release version. My next task is to convert these to the new Version custom field type.

Monday, June 24, 2013

Why @OntarioTCU should investigate @BitmakerLabs

BitmakerLabs is a private training school that does a 9 week, $9000 intensive course in Ruby on Rails and related technologies. In Ontario, so-called Private Career Colleges are regulated by the Ontario Ministry of Training, Colleges, and Universities (MTCU) under the Private Career Colleges Act, 2005. This law is basically a consumer protection law that says the MTCU must review and approve Private Career Colleges before they can operate in Ontario. Today, Bitmaker announced they are closing (temporarily) in response to an investigation from the MTCU, and that they will be registering as a Private Career College in order to comply with the 2005 law.

Almost everyone in Toronto with a twitter account seems to be enraged by this blatant attempt to enforce consumer protection laws. But consider this:

BitmakerLabs claims their 9 week program will turn a non-programmer into a web developer. As someone who writes checks to companies who might hire BitmakerLabs' graduates, I am skeptical, to say the least. Apart from my skepticism, however, I wish BitmakerLabs well; if they are doing good work, they should have no problem registering with MTCU. But it's kind of silly to blame MTCU for doing their job.

Furthermore, compare this program with the Humber College Web Development Program, which is full time for three semesters for $8662 and no HST. Or the Udacity / Georgia Tech / AT&T online Computer Science Masters Degree for $6k which promises to offer a fully accredited masters degree. Your money may be better spent on longer programs that provide more time to internalize the skill set.

The background here is that the traditional model of higher education (low volumes at extremely high costs) is ripe for disruption. So we are going to be seeing a lot of interesting experiments in the next few years; experiments like BitmakerLabs, MOOCs, and a general disaggregation of the teaching, evaluation, and certification functions that have traditionally been provided by colleges and universities. What this means for consumers is that they will have to be a lot more careful until this disruption sorts itself out.

Notes:
  1. Interesting background info on rationale and prior enforcement of the Private Career Colleges Act, 2005: Ontario Ombudsman Report.
  2. The United States has similar laws; e.g. NH Regs for Career Schools

Thursday, March 28, 2013