Monday, September 10, 2018

Using Github to help manage a large batch of Koha SQL reports

When I started managing the Next Kansas 51 library shared catalog in 2016 I inherited a collection of 1200+ SQL reports that had been created by seven or eight different employees over a seven or eight year period working with Koha.

Some reports have names that describe what the report does, but some have names like "pcm records," or "Problem hold," or (my personal favorite) "Not sure?"  Some reports have notes that describe what the report is intended to do, but many do not have notes ("Not sure?" does not have any notes).  Many are categorized into groups and sub-groups.  And sometimes the groups and sub-groups actually make sense.  But still, we had 1200+ reports.  Essentially, I inherited a SQL reports black hole.

For staff at our member libraries, finding a report went something like this:

1. A staff member from Library A would go to "Home > Reports > Guided reports wizard > Saved reports" and search for a report that does A, B, and C using the keyword search tools on the page to search for A, B, or C or some variations on that search.

2. If the searcher could not find a report that appeared to do what they wanted, they would phone or e-mail NEKLS saying "I'm looking for a report that does A, B, and C.  Do we have a report that does A, B, and C?"

3. NEKLS would respond with "Well, I think we have a report that does A, B, and C, but I can't find it, so I wrote a new report that does A, B, and C.  It's report #4656 and I named it "Report for Library A".  Let me know if it works."

4. A month later a staff member from a different library would go to "Home > Reports > Guided reports wizard > Saved reports" and search for and search for a report that does A, B, and C using the keyword search tools on the page to search for A, B, or C or some variations on that search.

5. If the new searcher could not find a report that appeared to do A, B, and C, they would phone or e-mail NEKLS saying "I'm looking for a report that does A, B, and C.  Do we have a report that does this?"

6. NEKLS would respond with "Well, I think we have a report that does that, but I can't find it, so I wrote a new report that does A, B, and C.  It's report #4799 and I named it "Report for Staff Member".  Let me know if it works."

One variation on this is when a staff member from a library calls and says "I found report 4656 that does A, B, and C, but I need a report that does A, B, C, and D.  Can you write me a report that does A, B, C, and D."  And then we end up with report "A, B, and C" and report "A, B, C, and D" and neither one of them has a name that says "This report does _____."

The result has been that, over time, we ended up with a lot of reports that were either duplicates, or minor variations on other reports written by many people over a long period of time and they are very difficult to navigate.

They are an SQL report black hole.

My solution to the too-many-reports problem was to start deleting old reports that didn't make any sense to me.

But this led to another problem.

About two months after my first pass at deleting reports that didn't make sense,  I got a phone call from one of our libraries saying "But I needed that report.  That was my special report that I only needed to run once every 163 days.  Liz wrote it for me in 2011."

Generally speaking, for every 10-15 reports I got rid of, I would get one phone call asking "Where's my super special report?"

Since re-creating these reports is a hassle, I started using an SQL report-of-reports, Microsoft Excel with a VBA macro, Atom, and a Github repository to make regular backups of our SQL library so that I could track the changes I was making to our reports.

This is the process:

I created a repository on Github.  In my case I called it nexpress.sql.  Then I cloned that repository to my local computer.  (https://github.com/will1410/nexpress.sql)


I created a folder on the computers I run this process from called C:\git

Then I run this SQL:

-----
SELECT
  Concat("R.", LPad(saved_sql.id, 6, 0)) AS FILE_NAME,
  Concat(
    Concat("R.", LPad(saved_sql.id, 6, 0)), Char(13), Char(10), Char(13), Char(10),
    Concat("----------"), Char(13), Char(10), Char(13), Char(10),
    Concat("Name: ", Coalesce(saved_sql.report_name, "-")), Char(13), Char(10),
    Concat("Created by: ", If(Coalesce(borrowers.borrowernumber, 0) = 0, "-", Concat(borrowers.firstname, " ", borrowers.surname))), Char(13), Char(10), Char(13), Char(10),
    Concat("----------"), Char(13), Char(10), Char(13), Char(10),
    Concat("Group: ", Coalesce(reportgroups.lib, "-")), Char(13), Char(10),
    Concat("     ", Coalesce(reportsubgroups.lib, "-")), Char(13), Char(10), Char(13), Char(10),
    Concat("Created on: ", Coalesce(saved_sql.date_created, "-")), Char(13), Char(10),
    Concat("Modified on: ", Coalesce(saved_sql.last_modified, "-")), Char(13), Char(10),
    Concat("Date last run: ", Coalesce(saved_sql.last_run, "-")), Char(13), Char(10), Char(13), Char(10),
    Concat("----------"), Char(13), Char(10), Char(13), Char(10),
    Concat("Public: ", Coalesce(saved_sql.public, "-")), Char(13), Char(10),
    Concat("Expiry: ", Coalesce(saved_sql.cache_expiry, "-")), Char(13), Char(10), Char(13), Char(10),
    Concat("----------"), Char(13), Char(10), Char(13), Char(10),
    Concat(Coalesce(saved_sql.notes, "-")), Char(13), Char(10), Char(13), Char(10),
    Concat("----------"), Char(13), Char(10), Char(13), Char(10),
    Concat(IF(Length(saved_sql.savedsql) > 32766, "Too large to process", saved_sql.savedsql)), Char(13), Char(10), Char(13), Char(10)
  ) AS CONTENTS
FROM
  saved_sql
  LEFT JOIN borrowers ON saved_sql.borrowernumber = borrowers.borrowernumber
  LEFT JOIN (SELECT
        authorised_values.id,
        authorised_values.category,
        authorised_values.authorised_value,
        authorised_values.lib,
        authorised_values.imageurl,
        authorised_values.lib_opac
      FROM
        authorised_values
      WHERE
        authorised_values.category = 'REPORT_GROUP') reportgroups ON
    saved_sql.report_group = reportgroups.authorised_value
  LEFT JOIN (SELECT
        authorised_values.id,
        authorised_values.category,
        authorised_values.authorised_value,
        authorised_values.lib,
        authorised_values.imageurl,
        authorised_values.lib_opac
      FROM
        authorised_values
      WHERE
        authorised_values.category = 'REPORT_SUBGROUP') reportsubgroups ON saved_sql.report_subgroup =
    reportsubgroups.authorised_value
GROUP BY
  saved_sql.id
ORDER BY
  saved_sql.id
-----

Once I run this report I download the results as a CSV file and open it in Excel.

I make sure that c:\git is empty then I run the following VBA macro:

-----
Sub WriteTotxtSQL()

Const forReading = 1, forAppending = 3, fsoForWriting = 2
Dim fs, objTextStream, sText As String
Dim lLastRow As Long, lRowLoop As Long, lLastCol As Long, lColLoop As Long

lLastRow = Cells(Rows.Count, 1).End(xlUp).Row

For lRowLoop = 1 To lLastRow

    Set fs = CreateObject("Scripting.FileSystemObject")
    Set objTextStream = fs.opentextfile("c:\git\" & Cells(lRowLoop, 1) & ".txt", fsoForWriting, True)

    sText = ""

    For lColLoop = 2 To 2
        sText = sText & Cells(lRowLoop, lColLoop) & Chr(10) & Chr(10)
    Next lColLoop

    objTextStream.writeline (Left(sText, Len(sText) - 1))


    objTextStream.Close
    Set objTextStream = Nothing
    Set fs = Nothing

Next lRowLoop

End Sub

-----

This will take each row in the CSV file and output it to a separate text file in the c:\git folder.  Each SQL report will have a filename formatted as "R.xxxxxx.txt" where the "Xs" represent the report number.  The exception will be any situations where the output is too big for Excel to process (more than 32,767 characters).  Since those reports can't be processed in Excel, placeholder data gets output into a file that starts with an X instead of an R.

Next I go to the nexpress.sql folder on my local computer and delete all of the files that start with an "R." or an "X."

Then I go to c:\git and cut all of these files and paste them to my nexpress.sql folder on my computer.

Next I run this report:

-----
 SELECT
  Concat(
    LPad(saved_sql.id, 5, 0),
    "<br /><br />",
    Coalesce(saved_sql.report_name, "-"),
    "<br /><br />",
    Concat(Coalesce(groups.lib, "-"),
    "<br />",
    Coalesce(subgroups.lib, "-")),
    "<br /><br />",
    Concat("Created by:<br />", If(borrowers.borrowernumber IS NULL, "-", Concat(borrowers.firstname, " ", borrowers.surname)))
  ) AS NAME,
  Coalesce(saved_sql.notes, "-") AS NOTES
FROM
  saved_sql
  LEFT JOIN borrowers ON saved_sql.borrowernumber = borrowers.borrowernumber
  LEFT JOIN (SELECT
        authorised_values.id,
        authorised_values.category,
        authorised_values.authorised_value,
        authorised_values.lib,
        authorised_values.imageurl,
        authorised_values.lib_opac
      FROM
        authorised_values
      WHERE
        authorised_values.category = 'REPORT_GROUP') groups ON saved_sql.report_group = groups.authorised_value
  LEFT JOIN (SELECT
        authorised_values.id,
        authorised_values.category,
        authorised_values.authorised_value,
        authorised_values.lib,
        authorised_values.imageurl,
        authorised_values.lib_opac
      FROM
        authorised_values) subgroups ON saved_sql.report_subgroup = subgroups.authorised_value
GROUP BY
  saved_sql.id
ORDER BY
  saved_sql.id

LIMIT 10000
-----

And let the results display on the screen.

Then I view the source of this page and cut out everything between the <table></table> tags and paste that into my "report_index.html" file using Atom where the old table used to be.

Next I look for any files that start with an X and I manually copy the SQL from the reports interface in Koha into the area for the SQL in that text file in Atom and rename the file to start with an "R."

Then I use Atom to push all of the updated files to my repository.

The end result is that I have a text file with the original SQL, name, and description for every saved SQL report in our system starting with an "R," I have an index of those files on Githup pages (https://will1410.github.io/nexpress.sql/report_index.html), and several other files related to my SQL reports where I can track changes to these reports over time.

No comments:

Post a Comment