================================================================
BUGFIXES

================================================================
FUNDAM:

* synctool alias/flag handling ...

----------------------------------------------------------------
? bare ./configure ?

! dirname/basename functions !

* sort fcn / issue77

airable:

!! --gen:
* olh/mld
* mlrval.?/mvfuncs.? -> lib; then support mv_t @ gen-reader
* w/ multiple file names it gens twice. abend unless #infile==0.
  -> also '$z=FILENAME' says '(stdin)' :(

! stdout fflushes ..

! mld/mlh stats1 regex cookbook/10min examples

! termcvt -I
! aux-list -> main help; dox too

* UT unhex

! faqent/cookbook/more:
  mlr termcvt --cr2lf foo.csv.cr > foo.csv

* reg_test/run --mlrexec flag

* across-all/regex x stats1 / many verbs
  k stats1 functionality
  k UT
  k stats1 --help
  - merge-fields -x
  - how many others for incremental release? stats2?

* more json handling: topmost/second-topmost array, put nidx-style as with dkvp.

?? JIT conversion (#150)? makes a mess of the disposition matrices; bleah. but the outcome would be nice.

----------------------------------------------------------------
non-airable:

! appveyor !
  ! 10min etc re single quotes vs. triple-doubles
  - augment msys2 notes w/ the frombelows
  - pcreposix.dll sadpanda
  - https://github.com/msys2/msys2/wiki
  - https://github.com/Alexpux/MINGW-packages
  - https://github.com/Alexpux/MINGW-packages/wiki
  - see also bash on windows: https://msdn.microsoft.com/en-us/commandline/wsl/about
  - https://help.github.com/articles/dealing-with-line-endings/#platform-windows
  - big dev readme about .gitattributes

? bash completion script ?

? into moreutils?

! fix autoreconf timestamp wtf

* valgrinds

! ASAN build @ travis !
  -> doesn't look quite right at travis
  -> to release notes if successful

* --barred & autodetect notes @ the long issue

! rm temp-junk files !

! pokipush

! write-temp-file abend on negfd etc

* rh/fedora/centos

* ship ./configure, w/ clean-target mods

* travis clang build w/ asan ?

----------------------------------------------------------------
other:

? coroutines (r/w pipes)?
  o maybe not now: need notion of # lines in/out which, simplest case, are 1-1,
    but in the more general case might not be.
  o coexec as well as shellout; stderr caveats x both

================================================================
MAPVAR CHECKLIST:

* MT_ERROR vs. fatal vs. skip assign. iron it out.

* more generally, audit error-handling for consistency:
  - absent/skip
  - '(error)'
  - abend
  - coerce
  -> doc smoothouts for 5.0.0 incompat notes
  - e.g. non-string =~ regex: coerce LHS to string?

* feels weird: '$*=3' is a syntax error but 'o=3;$*=o' is not. also '@*=3' is not a syntax error.

* death knell for mv-only evaluators: this is a syntax error:
  mlr put -q 'o = haskey(NR==4 ? {"a": NF} : {"b": NF}, "a"); print "NR=".NR.",haskeya=".o'

* gross: $*=splitkv(...) results in string keys always. strengthen the typed overlay for filter/put?
  lhmsmv -> mlhmmv?? need to be careful with int-key gets ... char* field_name in many places ...

* debt-reduction
  ! naming conventions to clarify transfer/copy semantics
  - copy/ref of keys vs. values in mlhmmv, local_stack, and callees -- ?!?

* clarify ownership semantics in localstack & mlhmmv via function names, & top-of-file comments

================================================================
5.4.0 ideas:

----------------------------------------------------------------
! multi-field x many verbs: -f/-r field-name-spec opportunities throughout

which verbs:
  k stats1 (done in 5.2.0)
  * stats2
  * merge-fields -x
  - count-distinct

  - sec2gmt/seg2gmtdate
  - step
  - top

  ? uniq -g
  ? sample -g
  ? head -g
  ? tail -g

  ? reorder
  ? bar
  ? histogram
  ? label
  ? nest
  ? sort

  ? decimate -g
  ? join

----------------------------------------------------------------

? weighted stats?

* xxxes

* mt_error/fatal/skip-assign audit

! more than 'syntax error' from lemon-parse failures ... lemon API research

? some sort of debug/single-step/trace ... include the AST in the CST & reverse-generate ?

* seqgen as input lrec "reader"? the seqgen verb doesn't (and shouldn't) step NR.

? free vs. free_contents names ?

* emph override wording: from rel notes.
  "Likewise --ofs, --ors, --ops, --jvstack, and all other output-formatting
  options from the main help at mlr -h and/or man mlr default to the main
  command-line options, and may be overridden with flags supplied to mlr put and
  mlr tee."

? some help for keyword clashes? e.g. 'func E(x) { ... }' ?

* bargv0 throughout, replacing "mlr" and argv[0] both. (former against name-changes; latter against /usr/bin/mlr.)

* prune unused functions

? mlr step -a shift --by {n}

? --imd ?

* bool_t:uchar typedef & perf check?

COOK:
* use: https://github.com/petewarden/dstkdata
  mlr --from ../data/big.dkvp put -q '
    #weight = $y;
    #weight = $y * $y;
    weight = $i;

    @sumwx[$a] += weight * $i;
    @sumw[$a] += weight;

    @sumx[$a] += $i;
    @sumn[$a] += 1;

    end {
      map wmean = {};
      map mean  = {};
      for (a in @sumwx) {
        wmean[a] = @sumwx[a] / @sumw[a]
      }
      for (a in @sumx) {
        mean[a] = @sumx[a] / @sumn[a]
      }
      emit wmean, "a";
      emit mean, "a";
    }'

COMPARES:
* https://github.com/shenwei356/csvtk
* http://csvkit.readthedocs.io/en/540/
* https://github.com/BurntSushi/xsv
* https://github.com/eBay/tsv-utils-dlang
* https://www.gnu.org/software/datamash/manual/datamash.html

================================================================
5.4.0 IDEAS:

!! coroutines: line-oriented; operate on field(s); r/w popen

* json_encode / json_decode functions? (map->string and string->map)
  - array handling ...

* option to inhibit dquot of int map keys on dump
  ! fix for srec keys !

* expose strcmp & strcasecmp for lhm sorting
* expose movefirst/movelast funcs
* mlr sort --fold!
* mlr paste (sjackman)
* dump @records after @records[NR]=$*: json "1" rather than 1 indices are b04k3d. make a cliopt.

* regexes for IPS/IFS/IRS; also in mlr nest etc.

!!?? disallow implicit assignment? force w/ var at least??
  - w/  implicit: for (...) {sum += x} FUBAR
  - w/o implicit: sad panda cannot type 'x=1'
  -> at least make an mld:ref note about it.

* perf profile

! sortk sortv sortf
  -> how to compare map-valued keys on sortv? recurse?
  -> string keys before/after int?
  -> keys: int before string before map
  -> values?
     o definitely intermix int & float
     o definitely intermix empty & string
     o definitely num before string.
     o definitely maps after that.
     o what about boolean?
     o what about error/absent?
  -> how to incorporate numeric & lexical flags -- defer all complexity to sortf?

? compile-time checks between locals, fcnargs, fcnretvals -- if unsatisfiable by typing (irrespective
  of later stream data) then throw at compile time.

* poki-run no '$ ' for comment lines

? maybe some sort of collections-API logic to complement for-loops? important, or just fun to code up?

? map-literal grammar as separate parser?
  - replace existing json parser?
  - use this for streaming parser?
  - implement byte-getter with fgetc to keep reading until closing bracket?

? 'null' JSON syntax -- what syntactic meaning for miller? mv_absent or mv_empty, neither would render back as 'null'
  on output. yet i also don't want another null-type.

? support JSON arrays:
  - allow them in map-literal input
  - allow them in JSON record input
  - add mlhmmv array type, complementing map type
  - negative-index read semantics (backwards index)
  - out-of-bounds read semantics (mv_absent)
  - -out-of-bounds write semantics (array extend)
  - for srec-to-mlhmmv: loop over level & if all integer keys & all in sequence, treat as array

? array/list types? [a,b,c]
? set types? {a,b,c} (vs. map {a:1,b:2,c:3})
? maybe overkill (map covers array & set) but with those there would be solid collections coverage.

----------------------------------------------------------------
EXECUTION TRACE:

! missing operators
  $ mlr --from ../c/s put -T 'int a=1;var b=a[2]'
  TRACE (int a 1)
  TRACE (var b (a 2))

! trace *assignments* (mutations of pvars state)

* work through the grammar & neaten up pnode->text instances, now that these names are user-visible.
* mld. note it's an AST dump, not source-level.
* maybe go through and replace (...)-style with AST-to-source formatting ...
* UT
! this is all very cute tracing at source-level but would be significantly more useful with values shown.

----------------------------------------------------------------
SYNTAX ERROR IMPROVEMENT:

* try to get some source-level info in there.
  http://archive.oreilly.com/pub/a/linux/excerpts/9780596155971/error-reporting-recovery.html

----------------------------------------------------------------
? double-quote-aware (non-lite) nidx & dkvp options?
? comma-number -- using locale?

* MLRPATH

----------------------------------------------------------------
? narrative/slides doc ...
? stdin-then-tail-f

? CONVFMT? IGNORECASE? MLRPATH?

* lrec get followed by put/remove: getext variant returning node for unlink, valpoke, or null == append to avoid
  double-searching.
* 64-bit lengths for containers. test with 5-billion-integer-seq data.

----------------------------------------------------------------
MORE

* look into 'perf'
* linreg in terms of stats only ... check py book
* cat/cut langcomps (w/ gh links) -> perf page

* pipe-viewer-like feature to stderr?

* stats1/stats2 sliding-window feature? and/or with ewma-coefficients (much easier)
  - mean/stddev/var; skew/kurt?
  - linregs; corr/cov?
  ? also, option of weighted stats w/ explicit weights field?
  ? maybe just EWMA with well-known sumw followed by then-chaining. write up the weights if so?

* tbin/ok -> cookbook
* debian screenshot
* ruby @ optextdep @ mld; poki+mkman
* stdin filename keyword for read-from-file-then-tail-f mode (e.g. mlr etc)
  - needs refactor for lrec_reader_alloc callsite
* perf page: (1) redo; (2) note GNU/etc; (3) compare to mawk (http://invisible-island.net/mawk/)
* EOS comments thruout
* valgrind note @ new dev page/section
* join: final sllv_free in destructor (lo-pri)
* anim ref https://github.com/edi9999/path-extractor

* benchmark lua per se
* benchmark lua-jit

* bug or feature?
  near-overflow case:
  $ mlr sort -n x reg_test/input/near-ovf.dkvp 
  x=9223372036854775807,y=-9223372036854775803
  x=9223372036854775804,y=-9223372036854775804
  x=9223372036854775801,y=-9223372036854775801
  x=9223372036854775806,y=-9223372036854775805
  x=9223372036854775803,y=-9223372036854775807
  x=9223372036854775802,y=-9223372036854775806
  x=9223372036854775805,y=-9223372036854775802

----------------------------------------------------------------
COOKBOOK/FAQ/ETC.:

* cookbook:
  - eval stuff from https://github.com/johnkerl/miller/issues/88

    $ mlr --csvlite stats2 -a linreg-pca  -f x,y x
    x_y_pca_m,x_y_pca_b,x_y_pca_n,x_y_pca_quality
    1.030300,0.949250,4,0.999859

    $ mlr --csvlite --odkvp --ofs semicolon stats2 -a linreg-pca  -f x,y x
    x_y_pca_m=1.030300;x_y_pca_b=0.949250;x_y_pca_n=4;x_y_pca_quality=0.999859

    $ eval $(mlr --csvlite --odkvp --ofs semicolon stats2 -a linreg-pca  -f x,y x)

    $ echo $x_y_pca_m
    1.030300

  - hold-and-fit regressor doc: 'then put' for residuals; note avoids two-pass &
    the saving of fit parameters

  - histo w/ min/max is effectively 2-pass (unless you have prior knowledge about the data).
    note count-distinct w/ int() func.

  - two-pass lin/logi reg vs. hold-and-fit.

  - very specific R/mysql/etc inouts

  - polyglottal dkvp/etc production.

  - very specific examples of sed/grep/etc preprocessing to structurize semi-structured data (e.g. logs)

  - checku.dash -> cookbook

* faqents:
  - rsum as proxy for per-record/agg-only mixed output

* other doc besides cookbook & faq:
  - R doc:
    ! xref @ covers x 2
    ! be very clear streaming vs. dataframe -- each has things the other can't do
    ! emph mlr has light stats but for heavyweight analysis use R et al.

* --mmap @ mlr -h

================================================================
* ect feature?
  -> maybe better in cookbook ...
  - in1 optional: t (epoch seconds); default systime()
  - in2: nleft
  - in3 optional: target #/field name
  - in optional: -s flag or not
  - out1: etchours
  - out2: etcstamp

  o expose mapper_stats2_alloc
  o expose mapper_cut_alloc
  o encapsulate the following:
    mlr put '$t=systime()' \
      then filter 'NR>4' \
      then  put '$nleft=$target-$n' \
      then stats2 -s -a linreg-pca -f t,nleft \
      then put '$etc= -$t_n_pca_b/$t_n_pca_m; $etcstamp=sec2gmt($etc); $etchours=($etc-systime())/3600.0'

* mlr step -a from-first -f t \
    then cut -o -f t_from_first,ntodo \
    then step -a ewma -d 0.005,0.01,0.1 -o a,b,c -f ntodo \
    then stats2 -s -a linreg-pca -f \
      t_from_first,ntodo,t_from_first,ntodo_ewma_a,t_from_first,ntodo_ewma_b,t_from_first,ntodo_ewma_c \
    then put '
      $ect0 = -$t_from_first_ntodo_pca_b/$t_from_first_ntodo_pca_m;
      $ecta = -$t_from_first_ntodo_ewma_a_pca_b/$t_from_first_ntodo_ewma_a_pca_m;
      $ectb = -$t_from_first_ntodo_ewma_b_pca_b/$t_from_first_ntodo_ewma_b_pca_m;
      $ectc = -$t_from_first_ntodo_ewma_c_pca_b/$t_from_first_ntodo_ewma_c_pca_m
    ' \
    then cut -o -f t_from_first,ect0,ecta,ectb,ectc

----------------------------------------------------------------
* introduce a fourth, padding separator for all formats? (for leading/trailing strip/skip.)
  o allows 'x = 10' in DKVP
  o allows right-justified keys in XTAB

? wiki quickselect ?

* hash-collision ifdef instrumentation -> maybe find a better hash function out there
* lemon in-dir -- cf wiz note
* gprof link with -lc on FreeBSD -- ?

================================================================
DOC

* Note that PCA is better than OLS for roundoff error (sum of squares ...):
  grep red data/multicountdown.txt | head -n 13 | mlr --opprint stats2 -a linreg-ols -f t,count
  grep red data/multicountdown.txt | head -n 14 | mlr --opprint stats2 -a linreg-ols -f t,count

================================================================
IMPROVEMENTS

* run go/d/etc on sprax & include #'s in perf pg, and/or rm xref in the latter & just post xlang perf #'s there
* link to gh/jk/m xlang impls ... and/or cardify their sources :) ... or maybe just link to gh/jk/m xlang dir
* ack c impl has been repeatedly optimized but even the original version (also cutc.c ...) outperforms

* update t1.rb including numeric sort; fix appropriateness of -t=

* more use of restrict pointers ... ?

================================================================
PYTHON
* pgr + stats_m same I/O modules??

================================================================
FYI

shell: mlr put '$z=$x+$y'
lldb:  run put "\$z=\$x+\$y"

http://include-what-you-use.org/

----------------------------------------------------------------
https://fedoraproject.org/wiki/How_to_create_an_RPM_package
https://wiki.centos.org/HowTos/Packages/ContributeYourRPMs
https://lists.centos.org/pipermail/centos/2012-September/129227.html
https://fedoraproject.org/wiki/Join_the_package_collection_maintainers
https://fedoraproject.org/wiki/How_to_get_sponsored_into_the_packager_group
https://fedoraproject.org/wiki/Package_Review_Process
https://docs.fedoraproject.org/ro/Fedora_Draft_Documentation/0.1/html/RPM_Guide/ch11s03.html
http://wiki.networksecuritytoolkit.org/nstwiki/index.php/HowTo_Create_A_Patch_File_For_A_RPM

----------------------------------------------------------------
Squash commits by:
  brew update
  git checkout $YOUR_BRANCH
  git rebase --interactive origin/master
  mark each commit other than the first as "squash" or "fixup"
  git push -f

http://codeinthehole.com/writing/pull-requests-and-other-good-practices-for-teams-using-github/
